MongoDB Aggregation Framework: $match and $project Stages

In the introductory post we discussed the fundamentals of MongoDB's Aggregation Framework, and how it works by applying a number of stages (or steps) in a pipeline to achieve the desired result.

Unlike a single MongoDB query, utilizing the Aggregation Framework provides multiple interaction points with your data, offering various chances for data manipulation within a single operation.

Aggregation Framework queries run "server side" (or within your database). This means you don't need to send data back and forth between the database and your client, which will make your operations faster.

In this post we will discuss arguably the two "workhorse" stages when it comes to filtering documents – the $match and $project stages.

These versatile stages allows you to cherry-pick documents (and parts of documents) that match specific criteria, effectively streamlining your data processing and enhancing the efficiency of your pipelines.

Filtering Documents with the $match Stage

The $match stage is akin to a gatekeeper within the MongoDB aggregation pipeline. It lets you decide which documents are allowed to continue their journey through the pipeline and which are left behind. This filtering is done using standard MongoDB queries, making it an incredibly flexible and indispensable tool.

Typically you will want to run $match stages earlier on in your pipelines to reduce the amount of documents you are working with.

Let's start with a simple example. Suppose you have a collection of recipes called "cookbook". Here is what an example of what one of those documents might look like:

{
  "_id": {
    "$oid": "636aa9707dd21c28fda493a3"
  },
  "title": "Toast",
  "calories_per_serving": 75,
  "prep_time": 1,
  "cook_time": 4,
  "ingredients": [
    {
      "name": "bread",
      "quantity": {
        "amount": 4,
        "unit": "slice"
      },
      "vegetarian": true
    },
    {
      "name": "butter",
      "quantity": {
        "amount": 2,
        "unit": "tablespoon"
      },
      "vegetarian": true
    }
  ],
  "directions": [
    "Toast bread.",
    "When both sides are an even golden brown, butter one side, care being taken to butter the edges.",
    "Melt butter.",
    "Serve hot."
  ],
  "rating": [
    5
  ],
  "rating_avg": 5,
  "servings": 4,
  "tags": [
    "bread",
    "quick",
    "vegetarian"
  ],
  "type": "Breakfast",
  "vegetarian_option": true
}

Now, you want to extract only those recipes that have a rating field. Given that, here's how you can achieve this using the $match stage:

db.cookbook.aggregate([
  {
    "$match": {
      "rating": { "$exists": true }
    }
  }
])

In this example, we filter out recipes that lack a rating field by using the $exists operator.

The result will be a collection of recipes excluding any document that doesn't meet the specified criteria.

Going Deeper: Multiple Criteria Filtering

The $match stage isn't limited to a single criterion. You can combine multiple conditions to refine your document selection. For instance, suppose you want to find "Breakfast" recipes with a vegetarian option and a rating. Here's how you can construct your query:

db.cookbook.aggregate([
  {
    "$match": {
      "rating": { "$exists": true },
      "type": "Breakfast",
      "vegetarian_option": true
    }
  },
  {
    "$project": {
      "_id": 0,
      "title": 1,
      "avgRating": {
        "$round": [{ "$avg": "$rating" }, 2]
      }
    }
  }
])

In this example, we filter recipes that meet three criteria: they must have a rating, belong to the "Breakfast" type, and offer a vegetarian option.

We then have another stage $project, which will further alter our results.

Shaping Results With the $project Stage

The MongoDB $project stage is used to reshape the documents in a MongoDB collection, typically by specifying which fields to include or exclude from the output documents and performing various transformations on the data within those fields.

In the provided $project stage, it is performing the following actions:

  1. _id: 0: This indicates that the _id field will be excluded from the output documents. By setting it to 0, you are telling MongoDB not to include the _id field in the projected documents.

  2. "title": 1: This includes the title field in the output documents and sets its value to 1, indicating that you want to keep this field in the projected documents.

  3. "avgRating": { "$round": [{ "$avg": "$rating" }, 2] }: This is creating a new field called avgRating in the output documents.

The value of this field is calculated by first using the $avg aggregation operator to calculate the average value of the rating field within the documents. Then, it uses the $round aggregation operator to round the calculated average rating to two decimal places (with a precision of 2). So, the avgRating field will contain the average rating rounded to two decimal places.

In short, this $project stage will produce output documents that do not include the _id field, include the title field, and include a new field avgRating that contains the average rating rounded to two decimal places based on the values in the rating field of the input documents.

Conclusion

The $match and $project stages are an invaluable tool in your MongoDB aggregation arsenal. It empowers you to filter documents based on your specific criteria, allowing you to focus on the data that matters most.

Whether you're extracting documents with certain fields, multiple conditions, or anything in between, the $match and $project stages streamline your data pipelines and enhances your MongoDB experience.

Avatar for Justin Jenkins

Written by Justin Jenkins

Loading

Fetching comments

Hey! 👋

Got something to say?

or to leave a comment.