A Tale of Three Rust Codebases
When is it a good time to start using Rust?
The founding team at Convex has had the privilege of leading development on some of the most heavily used Rust-based systems in the world:
- Magic Pocket, Dropbox's geo-distributed data storage system. This system has run on close to a million simultaneous storage nodes and directly manages exabytes of on-disk storage, all without a single major incident.
- Nucleus, the clean slate rewrite of the Dropbox sync engine. This codebase runs on over half a billion devices, manages trillions of files, contends with all manner of bizarre end user filesystem configurations, and has an unblemished reliability track record.
- Convex, the zero-setup, infinitely scalable backend designed for the needs of reactive app developers. Ok, this isn't one of the most used codebases (yet). It's the company we're building right now. Come work with us.
We were actually actively using Rust back in 2014 before it hit 1.0! The language has come a long way, and there are now significantly lower barriers to adoption. Despite our extremely positive experiences using Rust however, we wouldn't always recommend it to every development team. In this post we hope to provide some color on the conditions for which Rust is a good fit for a project.
If you're reading this article you probably don't need too much convincing on the strengths of Rust. Rust has been an existentially important contributor to the success of some of the projects we've worked on.
A project like Magic Pocket runs on a seriously large (and expensive) fleet of servers. One of our goals in the project was to reduce cost overhead as much as possible, where "overhead" is generally defined as "anything that's not a hard drive." The switch to Rust, with predictable utilization from a lack of runtime and tight control over memory allocation, allowed us to significantly decrease the amount of CPU and RAM in our storage nodes without the constant threat of OOM issues we had in our early Go codebase. Ultimately, Rust gave us C++ levels of control and efficiency within a much safer and more ergonomic development environment.
Seamlessly migrating exabytes of data or the sync state for hundreds of millions of users is scary. Projects like these require an extremely high bar for correctness. Rust's strong type system and borrow checker are a tremendous out-of-the-box resource for developing correct multi-threaded code.
The Nucleus project heavily leveraged a testing framework codenamed Trinity that deterministically verified invariants on complex interleavings of execution threads. Rust's support for futures and the freedom to implement our own adversarial scheduler made this project possible, whereas it would have been far more difficult in another language, especially one with a language runtime.
We're continuing to advance the state of the art in testing distributed and concurrent systems at Convex.
While Rust does have a steep learning curve, nearly all the engineers we've worked with have experienced a productivity increase from using it. Freedom from type errors and data races is a huge productivity boost on complex projects.
Rust libraries also tend to be extremely high quality, and the build system provided by Cargo makes working on large monorepos a breeze. We confess that we've wasted countless hours dealing with build system anomalies in npm or wrestling with Bazel which has not been the case at all with Cargo.
One potentially unexpected benefit of programming in Rust is that it's actually made us better prototypers! In the past we have prototyped distributed systems in languages like Python before rewriting in Go for production. Refactoring in Rust is so much easier and has allowed us to evolve prototypes into fully-fledged production systems without a rewrite. Refactoring is usually as easy as making a change and then chasing the tail of compiler errors until it's complete. Rust has the most helpful compiler suggestions of any language we've used.
The fact that Rust doesn't depend on a language runtime makes it an excellent choice for code that needs to be embedded in other systems. For example, Unicode normalization is a surprisingly complex and error-prone process. When we needed to implement identical normalization algorithms across Python, Go and Rust we contributed to the open-source Rust implementation and embedded the same code in all three languages.
The "best tool for the job"?
One of the debates that often occurs at larger companies is the desire for consistency vs. just letting engineers use the best tool for the job. It's easy to want to use the language that makes you personally more productive or to jump on board with the hot new thing. Unfortunately this often comes at an organizational cost. Local decisions can have broad implications in engineering, and typically your code lives on long after you're gone. Maintainability is also usually more important than the initial development effort on a large project. This tension has been a source of debate amongst our team on previous projects.
Rust is a hard language to learn. In our experience developers can usually jump between Python, Go, Java, etc. and pattern-match their way to success even if they're not fluent in the language. The first time a developer runs afoul of the borrow checker in Rust or has to reason about trait objects however requires some actual dedicated learning time.
A tempting alternative to the onboarding problem is to spin off a small sub-team of budding Rust experts to work on mission critical components that require Rust levels of performance or type safety. This was a mistake we made on the Magic Pocket project where a handful of engineers rewrote the storage engine in Rust while the rest of the team continued building the rest of the system in Go. This split was necessary when the Rust components were a small skunkworks project but it persisted for years afterwards.
The Rust components were a victim of their own success — they worked well, didn't require much maintenance, and by the time the original authors left the team to work on Nucleus there weren't many engineers left who were experts in these very-large systems. This slowed innovation on the Rust components for a period of time and left the remaining team with mixed feelings about the use of Rust to begin with.
When we started Convex we swore not to repeat this mistake and to ensure that all engineers work on Rust and develop mastery of the language.
Library support as organizational burden
With Nucleus, the next major project at Dropbox in Rust, we made sure that the entire team was fluent and actively contributing to the codebase. Within the Nucleus team, Rust was an unequivocal success. The desktop client is relatively isolated from the backend codebase and the development team were already used to interoperating with multiple languages.
On the boundaries of the project things were slightly more complicated. Large companies have hundreds of internal systems that talk to each other: monitoring, RPC frameworks, release process, authentication, etc. Engineers need to port these existing subsystems to every new supported language. Nucleus's prototype API server, Tomahawk, was originally rewritten in Rust. The Nucleus team took on the burden of implementing support for internal systems within Tomahawk themselves. Eventually, it was just too much work to port the entire ecosystem, so we rewrote Tomahawk in Go, the de facto backend language at the company. Sometimes the barrier to migration is just too high.
At Convex we have the luxury of making things clean from the beginning: all backend code is in Rust; all frontend code is in Typescript; operational tooling is primarily in Python. As a startup we need to focus on getting stuff done, not supporting a menagerie of languages.
More wood behind fewer arrows
Perhaps one of the biggest downsides from introducing new languages isn't that it requires more work on the new codebases but that it detracts from investment in existing codebases. We switched Dropbox's backend stack from Python to Go and later switched some high profile projects from Go to Rust. Each time we gained a lot in terms of performance, type safety and productivity. As senior engineers switched to these exciting new projects however there was less cutting-edge work done on the existing codebases and thus less continued library investment.
When one team has feature requests for another it's also very common just for them to jump in and make the change themselves. It's often far faster to coordinate changes via a simple code review rather than a series of Jira tasks, plus it's only fair that the team that wants the feature invests the time to make it happen. This is a major challenge when there's a (programming) language barrier between teams.
Startups can't afford to erect artificial barriers between teams. We've made a conscious effort at Convex to make the codebase as approachable as possible for all developers and to ensure that all developers have sufficient expertise to work wherever that work is required.
Getting Rusty without getting rusty
We're big Rust advocates and have benefitted a lot from the language but introducing a new language into an organization has to be done carefully.
For new teams
- First decide if you actually want to use Rust. Are you going to benefit from type safety and performance? Awesome. Are you primarily a front-end shop with an IO-bound workload that will perform roughly the same regardless? Maybe you're better off with a garbage collected language like Go, Java or Kotlin.
- Accept that there'll be ramp up time. Regardless of what people tell you, a language like Rust takes considerable time to become productive in. Ensure you already have some experts on the team who can help ramp others up rather than slowing down the whole team simultaneously. We have seen projects go poorly when there wasn't an existing level of Rust expertise to serve as a guide.
- Decide how many languages you'll have and own the responsibility of supporting them for a long time. Be cognizant of the overhead in maintaining libraries and operational systems in multiple languages.
- Be willing to say no. Sometimes the right solution is just to make-do with what you've already got.
For existing teams
- Decide if a rewrite or new language is merited. Would it be sufficient just to refactor your existing codebase? Have you identified the hotspots that actually matter and tried optimizing them? Are you confident in the benefit to customers and developers from the changes you want to make?
- Start on a smaller sub-project that is off the critical development path. This will allow you the flexibility to slip on timelines and adapt to unexpected interoperability issues.
- Once the smaller project is becoming successful, make sure the entire team is working on it — you don't want expertise siloed amongst a handful of engineers.
- Commit to propagating the language more widely throughout the company if things go well.
- Take on personal responsibility for building support from neighboring systems. It's probably not ok just to tell the Monitoring team to build a whole new set of client bindings for you — you may have to bootstrap this effort yourself.
- Ensure continual development on core systems. Core systems need a team of engineers who are ramped up and able to innovate or intervene when things go wrong. There are always improvements to be made, even when there aren't pressing feature requirements.
Get on board
We love Rust and hope you benefit as much as we have from the language. We also hope that if you jump on board you do so wholeheartedly, as a concerted effort within your project or company. If you're a Rust enthusiast who wants to do things "the right way" and have fun building mission critical systems, you know where to come.