MongoDB's Aggregation Framework

No matter how well-designed your database schema is, there will always be cases where you need more flexibility. Even with MongoDB's powerful query engine you might find some complex scenarios that demand more, enter the Aggregation Framework.

The Aggregation Framework provides a structured way to manipulate and process data, allowing you to extract precisely the information you need.

Aggregation Pipelines

At the core of the Aggregation Framework are "pipelines" composed of "stages". While there are many distinct stages available, they can be categorized into four main actions: filtering, transforming, grouping, and sorting.

Each stage uses $ and a stage type to define what it is doing, for example $limit, $sort or $match.

The order of these stages typically doesn't matter because MongoDB's query optimizer strives to optimize each step.
However, some best practices can help streamline your pipeline, such as using $match earlier on, to reduce the total number of documents

When constructing your pipeline, start by defining what you want in the result and break it down into stages.

For instance, the first stage might filter documents based on specific conditions, like matching a field to a certain string or checking if a field is greater than or equal to a given value.

Subsequent stages can sort the documents or control which fields are included in the final output.

Building a Pipeline

Each stage in the aggregation pipeline is specified as an object in an array. You can use the same stage type multiple times if needed. Instead of using the find() command for querying, you'll employ the aggregate() method like this:

> db.collection.aggregate([stageOne, stageTwo])

Your pipeline can consist of just one stage, or as many as 1,000 with memory being the primary constraint.
By default, MongoDB limits memory usage to 100 megabytes unless you permit the query to use disk, which is significantly slower.

Projecting Aggregated Fields

You may already know about the concept of project within MongoDB, which is used in find() to control which fields are returned in query results.

With the Aggregation Framework, you can go beyond this and create new fields by aggregating or composing existing ones using the $project stage.

Consider a scenario where you have recipe documents containing a rating array:

rating: [4, 2, 3, 3, 4, 5, 1, 2]

To obtain the average rating for each recipe, you can use two operators: $project and $avg which will calculate the average of the numbers in the rating array.

The result can be assigned to a new field, avgRating using a query like this:

> db.cookbook.aggregate([
{
  "$project": {
    "avgRating": { "$avg": "$rating" }
  }
}
])

This query specifies which field to average using $rating to match the rating field.

The result includes the _id and a new field, avgRating, which contains the average rating for each document. The results might end up like this:

[
  { _id: ObjectId("636821387dd21c28fda4939f"), avgRating: 3.7142857142857144 },
  { _id: ObjectId("636aa92f7dd21c28fda493a0"), avgRating: 3.888888888888889 },
  { _id: ObjectId("636aa94c7dd21c28fda493a1"), avgRating: 4.777777777777778 },
  { _id: ObjectId("636aa9617dd21c28fda493a2"), avgRating: 3.888888888888889 },
  { _id: ObjectId("636aa9707dd21c28fda493a3"), avgRating: 5 },
  { _id: ObjectId("636aa9817dd21c28fda493a4"), avgRating: 4.357142857142857 },
  { _id: ObjectId("636ab56e956f91c56f02f049"), avgRating: null }
]

It's important to note that this query output represents the result of the query and does not modify the underlying documents.

You can then take a things a step further by adding more stages (here to sort our results by the average rating, with the highest first):

> db.cookbook.aggregate([
  {
    "$project": {
      "avgRating": { "$avg": "$rating" }
    }
  },
  { 
    "$sort": { "avgRating": -1 }
  }
])

While quotes around fields are technically optional in most cases, it's recommended to include them to ensure valid JSON formatting and make query syntax validation more straightforward.

MongoDB's Aggregation Framework empowers you to manipulate and process data with precision. By understanding the framework's typical pipelines, complex possibilities, and additional use cases, you'll be better equipped to leverage its capabilities in your applications.

So, go ahead and start to use the power of aggregation to extract the exact data you need from your MongoDB databases!

MongoDB Aggregation Framework Series

MongoDB Aggregation Framework: $match and $project Stages

MongoDB's Aggregation Framework

Aggregation Pipelines

Building a Pipeline

Projecting Aggregated Fields

MongoDB Aggregation Framework Series

Written by Justin Jenkins

Fetching comments