December 2, 2020
This is the second post of twelve for Mixmax Advent 2020.
As you may have picked up from previous blog posts, MongoDB is our primary data store and source of truth at Mixmax.
At its peak, our largest cluster had 8 shards and stored 6.9TB of uncompressed document data, and over 2.7TB of indexes. As you can imagine, running this cluster wasn’t cheap. To make matters worse, DB writes took over 200ms on average.
We noticed that all of our largest indexes were compound indexes that started with a
userIds field, followed by an
organizationId field. As the name suggests,
userIds was an array of user IDs used to denote the set of users who had access to a document, and
organizationId was the organization that the document belonged to. An implicit relationship was that all users referenced in the
userIds field were members of the organization referenced by organizationId.
As Mixmax has grown, so have our customers. What were once 10-20 user organizations were now 100-200. This meant that for a given index, a single document could be responsible for hundreds of index keys. Coupled with the fact that organizations with more users tend to have proportionately more documents overall, it was obvious that this architecture couldn’t scale at a reasonable cost.
Looking closer at the data, we saw that in more than 99% of all documents, the
userIds array contained more than 90% of the organization’s users. We realized then that we weren’t getting much value out of the leading
userIds field; it was marginally more selective than the
organizationId field at best, and equally selective at worst.
Our path forward was clear, if a bit tedious: we’d have to replace all 16 of our indexes that lead with the userIds field. To keep our total index size as small as possible throughout the process, we went through the indexes in ascending order of size, taking the following steps:
We knew that the leading
userIds field was responsible for a lot of our total index size, but weren’t sure how much. We were floored by the results: removing the
userIds field resulted in indexes anywhere from 80 to 95 percent smaller (the percentage decrease increased linearly with the size of the original index.)
If a field in a compound index isn’t used for sorting, and doesn’t significantly reduce the number of documents scanned in production queries, it might be taking up more RAM than it’s worth. Especially if it’s an array field.
Serially waiting for multi-day index builds to finish wasn’t all that exciting, but seeing folks’ reactions to the resulting changes in index size and costs sure was.
As your product evolves, so will your database usage. It’s great if the design decisions you made years ago are still the right call, and it’s worth finding out. What you don’t know could be costing you thousands of dollars a month.