Many years ago we wrote about our transition from monolithic to microservices, and how we enabled the scalability of Mixmax by developing and deploying a number of microservices for different areas of our product.
Almost one year ago, we wrote another article about how those same services have grown both in number and size over the last four years. Slowly but steadily, they transformed our local environment into something almost impossible to run.
In the wake of our growth, we once again found ourselves in need of a new solution. For our second post in our twelve-day blog Advent series, we'll discuss how we improved our local development environment and the alternatives we considered along the way.
Our Local Environment
Running Mixmax one hundred percent locally was not an easy—or lightweight—task. To do so involves running many different datastores (Mongo, Elasticsearch, Redis) and services that not only communicate with each other but also interact with a Chrome extension.
We also use Docker to run most of our third-party dependencies, so we don’t need to deal with different OS scenarios. And we use Supervisor to run all of our services, which ensures they are always up and running and provides a simple interface to start/stop each process as we need it. We also have hot-reload in place.
Mixmax services running
“I Can’t Open a New Tab in Chrome”
And that’s as far as we got before it became clear that we needed a change. At this point, our services had increased so greatly in number and size that most of the time, our CPU and memory usage would look something like this:
Saturated resources from an engineer’s laptop
We found that our laptops couldn’t handle basic daily tasks, like screen-sharing over a call or running npm install. And our engineers were impacted the most. Instead of using their time to add value to Mixmax, they were wasting hours of their focus and engineering superpowers on trivial issues:
So, what did we do over the course of two weeks to improve our situation (and the mood of our Slack messages)?
Spoiler alert: We went remote.
To address the growing concerns about the condition of our local environment, we opened a discussion within the entire Mixmax team and listened to everyone’s opinions. We also launched a survey asking the question, “What does a great development environment look like?” Using the input from the team, we set our goals for our improved environment to be:
- Implemented with an easy, one-time setup, and low maintenance after
- Lightweight, without using significant resources from our machines
- As close to production as possible, and
- Fast loading of changes (hot-reload)
Our four goals pointed us toward a remote dev environment. Following this lead, we came up with two possible solutions.
Solution #1: Telepresence
This means that we could deploy our whole development stack on a Kubernetes cluster (for example, in EKS). Then, we could run one specific service in our local environment, which would replace the Kubernetes service in the cluster.
Telepresence proxy intercepting and routing traffic to the local destination
We’ve since run a few tests with dummy services and this solution works really well. Using Telepresence would have probably been the most scalable and longest-lasting solution for us (and we’ll probably use it at some point in the future). But, we had a couple of problems with it.
- We currently use Fargate to deploy and run our services. So, we’d have to face a significant Kubernetes learning curve to use Telepresence, even if only for our dev environment.
- We didn’t have a lot of time. This was a pressing matter, and having to modify every service (at least to some degree) to work on a Kubernetes cluster was something that would definitely take longer than the time we could spend.
Solution #2: dev-sync (or rsync++)
Our second (and winning) idea was to replicate the same scenario we were currently using, but on a cloud computer. We liked this idea as something that we could implement fast and that would immediately relieve our engineers.
So we launched a new instance, installed every dependency, and prepared the environment. Then we modified our local /etc/hosts file to point every record to the IP of the new server, and voilà! Our solution was working. But, of course, it came with other things to consider, like hot-reloading the services when we updated the code locally, and installing and updating dependencies.
To handle these matters, we decided to build a tool. We called it “dev-sync” and based it on rsync, which is a known tool used to synchronize files between two different paths. rsync also works with remote locations over SSH.
Today dev-sync watches over our local files for modifications, and triggers rsync when it sees one. We decided to ignore changes in node_modules to keep networking tasks as low as possible; so dev-sync detects changes in the package-lock.json files and executes npm install in the server when needed.
And that’s it! We now have our development environment running on an EC2 instance, and:
- It’s a “known” environment, very similar to the one we were using locally
- Each engineer has an isolated and private environment
- It gives us the possibility to “share demos”, by pointing /etc/hosts records to another IP so we can access other engineers’ dev environments
- It’s easy to replicate for new engineers joining the team (through an EC2 AMI)
- We can keep the environment up and running without saturating our local resources, and
- It works fast.
Status of the same laptop’s resources, mostly free
Room for Improvement
Even though we successfully implemented a new environment on a cloud server, we are (of course) far from a perfect solution. We’re already working on a few things to improve it, including:
- Cost optimization, by automatically shutting down the instance when not used
- Replacing our process manager with a more powerful one (like PM2), and
- Designing a next-level solution that would scale horizontally instead of vertically.
Overall, we’re in a much better position today than we were before. Most importantly, our engineers are happy with our solution. Instead of fighting with their local environment, they can now focus on what they do best—adding value to Mixmax—as we continually evolve to meet the next challenge.
Want to help us build our next-generation dev environment? Check out our open positions!