MongoDB
Introduction to MongoDB indexes
Introduction
Much like an encyclopedia, a database is a bountiful store of accessible information. To answer find a specific piece of information in an encyclopedia would require sifting through every page until you come across what you are looking for. This inefficiency is why an encyclopedia has an index which points you to the exact page you need to turn to for the information you are looking for.
A database index similarly points you in the right place for information more efficiently. In MongoDB, an un-indexed query that "searches through the whole book" is called a collection scan.
Indexes can be thought of as shortcuts for accessing your data so that the entire database does not need to be scanned to find what you are looking for. In this article, we are going to introduce indexes in MongoDB, discuss when to use them, and how to manage them.
If you're using MongoDB, checkout Prisma's MongoDB connector! You can use the Prisma Client to manage production MongoDB databases with confidence.
To get started working with MongoDB and Prisma, checkout our getting started from scratch guide or how to add to an existing project.
When should you use an index
Keeping with our encyclopedia analogy, one might think to have an index line for every word in the book. If it is always faster to use an index, then this seems beneficial. However, as you can imagine, the more lines of words in an index, the larger the book becomes. At some point, the size of the book needed to accommodate indexing every word becomes ineffective. There are many words such as "the" or "because" that are less useful to search for than "hippopotamus".
This is similar to an index in MongoDB and databases in general. While yes, it is faster to have an index for any data a query might use, there simply is data that doesn't need indexing. Like the size of a book, adding too many indexes to a database also takes up space and has a detrimental effect on write operations on the database if not kept in check.
Indexes are an incredibly useful way to optimize access to specific data that is often used as selection criteria in queries. Knowing when to use them is important, so making sure that you are adding them on database fields that are often being queried will keep your reads efficient without negatively impacting database size and write efficiency.
How to create an index
Now that we have an understanding of what indexes are and when to use them, we can get into the method of creating one.
Once you have identified a field that could benefit from indexing, you use the createIndex()
method of MongoDB. The basic syntax is as follows:
db.COLLECTION_NAME.createIndex( { "FIELD_NAME": 1 } )
FIELD_NAME
is the name of the field that you want to create the index on, and 1
dictates an ascending order.
An example use of the method would look something like:
db.mycoll.createIndex( { "country": 1 } )
You can also create an index on multiple field with the createIndex()
method by creating a comma seperated list like the following:
db.COLLECTION_NAME.createIndex( { "FIELD_NAME_1": 1, "FIELD_NAME_2": -1 } )
How to show indexes
Once you have started creating indexes, you may want to check what indexes exist on your database instances. In MongoDB, you can use the getIndexes()
method to return descriptions of all the indexes in a collection.
The basic syntax for seeing all of your collection's indexes is:
db.COLLECTION_NAME.getIndexes()
Using the previous example from when we created an index, the following displays the method and what it would return.
db.mycoll.getIndexes()
with a return of:
[{"v" : 2,"key" : {"country" : 1},"name" : "country"}]
Index information includes the keys and options used to create the index.
If you're interested in using full text indexes with MongoDB, Prisma has the fullTextIndex
Preview feature which allows you to easily migrate full text indexes into a Prisma schema for type safety and protection against validation errors.
How to understand index performance
Now with the ability to create and check what indexes exist on your collection, you will want to see if your indexes are performing the way you expect.
To begin the example, we will work with the sample_mflix
database and the comments
collection which has ~50.3k documents. This is a sample collecton provided by MongoDB University simulating a datastore of movie and tv comments.
To understand the performance of an index, we will want to first run a query that does not have an index. The following query will return all 273 documents in the collection that are comments made by Ramsay Bolton
:
db.comments.find( { "name" : "Ramsay Bolton" } )
Now, if we append MongoDB's explain plan to the query, we will see the performance of this query.
db.comments.find( { "name" : "Ramsay Bolton" } ).explain("executionStats")
This results in the following:
{queryPlanner: {plannerVersion: 1,namespace: 'sample_mflix.comments',indexFilterSet: false,parsedQuery: { name: { '$eq': 'Ramsay Bolton' } },winningPlan: {stage: 'COLLSCAN',filter: { name: { '$eq': 'Ramsay Bolton' } },direction: 'forward'},rejectedPlans: []},executionStats: {executionSuccess: true,nReturned: 273,executionTimeMillis: 23,totalKeysExamined: 0,totalDocsExamined: 50303,executionStages: {stage: 'COLLSCAN',filter: { name: { '$eq': 'Ramsay Bolton' } },nReturned: 273,executionTimeMillisEstimate: 6,works: 50305,advanced: 273,needTime: 50031,needYield: 0,saveState: 50,restoreState: 50,isEOF: 1,direction: 'forward',docsExamined: 50303}}}
There are several key results to focus on in this output. First, we can see in the winningPlan
that the stage
for this query is a COLLSCAN
. This means that a collection scan occurred to complete this query with a totalDocsExamined
of 50,303 and executionTimeMillis
of 23 milliseconds. The query had to examine every document in the collection and took 23ms even though nReturned
was only 273 documents. While 23ms may not sound like much, this can get much longer for a collection housing one million documents.
If querying on name
is going to be a recurrent pattern for the application accessing this collection, we may want to create an index on this field. To do so we write the following:
db.comments.createIndex( {"name":1} )
If we take the same query with the explain plan from before:
db.comments.find( { "name" : "Ramsay Bolton" } ).explain("executionStats")
{queryPlanner: {plannerVersion: 1,namespace: 'sample_mflix.comments',indexFilterSet: false,parsedQuery: { name: { '$eq': 'Ramsay Bolton' } },winningPlan: {stage: 'FETCH',inputStage: {stage: 'IXSCAN',keyPattern: { name: 1 },indexName: 'name_1',isMultiKey: false,multiKeyPaths: { name: [] },isUnique: false,isSparse: false,isPartial: false,indexVersion: 2,direction: 'forward',indexBounds: { name: [ '["Ramsay Bolton", "Ramsay Bolton"]' ] }}},rejectedPlans: []},executionStats: {executionSuccess: true,nReturned: 273,executionTimeMillis: 0,totalKeysExamined: 273,totalDocsExamined: 273,executionStages: {stage: 'FETCH',nReturned: 273,executionTimeMillisEstimate: 0,works: 274,advanced: 273,needTime: 0,needYield: 0,saveState: 0,restoreState: 0,isEOF: 1,docsExamined: 273,alreadyHasObj: 0,inputStage: {stage: 'IXSCAN',nReturned: 273,executionTimeMillisEstimate: 0,works: 274,advanced: 273,needTime: 0,needYield: 0,saveState: 0,restoreState: 0,isEOF: 1,keyPattern: { name: 1 },indexName: 'name_1',isMultiKey: false,multiKeyPaths: { name: [] },isUnique: false,isSparse: false,isPartial: false,indexVersion: 2,direction: 'forward',indexBounds: { name: [ '["Ramsay Bolton", "Ramsay Bolton"]' ] },keysExamined: 273,seeks: 1,dupsTested: 0,dupsDropped: 0}}}}
Comparing to the un-indexed query, we now see that the winningPlan.inputstage
is now IXSCAN
which indicates an index was used.
Additionally, we see that totalDocsExamined
is now just the 273 documents where name
is "Ramsay Bolton"
and not the entire 50,303 documents. This increased efficiency can be see especially now with executionTimeMillis
totaling 0ms. Our new index on name
communicated to the query exactly where to look to find the data it was looking for.
Analyzing the explain plan on your most important queries will show you the performance of your index or highlight when one needs to be created to increase the efficiency of your application.
How to drop an index
While the explain plan may show the need for an index, it can also do the opposite. For example, if an index is no longer needed or not adding that much improved performance, it may be in your best interest to drop that index to reserve the space or gain some write performance.
To drop an index on your collection, the basic syntax is as follows using the dropIndexes()
method:
db.COLLECTION_NAME.dropIndex( { "FIELD_NAME": 1 } )
If we wanted to drop our country
index example from earlier, we would write the following:
db.mycoll.dropIndex( { "country":1 } )
Conclusion
In this guide, we discussed how efficiently querying your database leads to an improved user experience of your application. Additionally those using the data for analytics or other internal efforts will have faster performance and ease of working with the database. Knowing how to index and how indexes work is key to achieving query efficiency.
We covered the basics of creating, analyzing, and dropping indexes in MongoDB. Knowing these index foundations will provide the right base for continuing on to more advanced indexing with MongoDB.
If you're using MongoDB, checkout Prisma's MongoDB connector! You can use the Prisma Client to manage production MongoDB databases with confidence.
To get started working with MongoDB and Prisma, checkout our getting started from scratch guide or how to add to an existing project.
FAQ
Use a 2d
index for data stored as points on a two-dimensional plane. It is intended for legacy coordinate pairs on older MongoDB versions.
A 2d
index can reference two fields. The first must be the location field. A 2d
compound index constructs queries that select first on the location field, and then filters those results by the additional criteria.
Whether a small or large collection, you will still use the createIndex()
method.
If you are running into issues building an index on a larger collection, then you may want to consider horizontal scaling so that it is more manageable.
MongoDB also recommends a rolling index build approach.
In order to index and embedded object field in MongoDB, you can use dot notation.
For example, if you have an app for tracking books read, then there might be a collection for each user structured like the following:
db.users.insertOne({"first_name": "Alex","last_name": "Emerich","books": {"first_book": {"title": "Flights","author": "Olga Tokarczuk"},"second book": {"title": "The Master and Margarita","author": "Mikhail Bulgakov"},"total": 2}})
In order to create an index on the embedded total
field, write the following statement:
db.users.createIndex( {"books.total": 1 } )
A compound index is a single index structure that holds reference to multiple fields within a collection’s documents.
The basic syntax to create a compound index is as follows:
db.collection.createIndex( { <field1>: <type>, <field2>: <type2>, ... } )
A unique index ensures that no two rows of a table have duplicate values in the indexed column or columns. In the case of MongoDB, it is a duplicate value in a document’s field or fields.
A non-unique index does not impose this restriction.