Прескочи към съдържанието

MongoDB Embedded vs Reference Model. Transactions. Aggregations.

Updated: at 04:12 PMSuggest Changes

MongoDB on Docker

MongoDB Embedded vs Reference Model. Transactions. Aggregations.

Table of Contents

Open Table of Contents

Embedded vs Reference Model

MongoDB allows modeling your database schema around two data models:

The model you pick, depends on your use case. It is also possible to mix the two models, depending on your application requirements.

Embedded Document Model Example

Imagine employees collection where we store employee related Data. We have 1:1 relationships for department and salary fields, and 1:M for the addresses:

{
  "_id" : ObjectId(...),
  "name" : "John Smith", 
  "department" : "dev", 
  "salary" : 4000,
  "addresses" : [
    {
      "type" : "office", 
      "city" : "Sofia",
      "street" : "Tsarigradsko Shose 47"
    }, 
    {
      "type" : "home", 
      "city" : "Sofia", 
      "street" : "Boulevard Bulgaria 61"
    }, 
    {
      "type" : "home", 
      "city" : "Plovdiv", 
      "street" : "Hristo Botev 2"
    }
  ]
}

Another example would be an order, which contains an array of products that a customer had ordered.

Create an eshop database and orders collection. Insert the below data into the orders collection.

[
  {
    "_id": "ORD_001",
    "customer_name": "Ivan Ivanov",
    "address": {
      "city": "Sofia",
      "street": "Kliment Ohridski 8"
    },
    "products": [
      {
        "name": "Lenovo Thinkpad T15 Gen4",
        "price": 1500,
        "qty": 1
      },
      {
        "name": "Samsung Galaxy S25 FE",
        "price": 600,
        "qty": 1
      }
    ]
  }, 
  {
    "_id": "ORD_002",
    "customer_name": "Mihaela Dimitrova",
    "address": {
      "city": "Sofia",
      "street": "Dragan Tsankov 22"
    },
    "products": [
      {
      "name": "Apple Macbook M4 Pro 14 inch",
      "price": 2000,
      "qty": 1
      },
      {
      "name": "Apple AirPods 4 Pro",
      "price": 250,
      "qty": 1
      }
    ]
  }
]


Let’s use the below filter, to find the orders, where a “Macbook” was bought:

{ "products.name" : /macbook/i }

As long as we have the products data within the document, the above query is not hard, right?

The Embedded Document Model is usually recommended for 1:1 and 1:M relationships. However, in case the Many side of the relationship is limited in size. If this side of the relationship can grow indefinitely (for example comments in social media), Reference model is prefered.

The benefits of the Embedded Document Model are:

We duplicate some of the data - customer_name, address, also product’s name and price. The products may have different prices tomorrow, the customer may delete his profile on the e-shop, but the e-shop owner needs to have the order data for his own needs - legal requirements, statistics, etc.., hence we use the Embedded model to bound together the data that has same lifecycle.

Reference Document Model Example

The Reference Document Model - in this model the data is being stored in different collections, and documents from one collection refer documents from other collections by _id and vice versa. This model is suitable for M:N relationships.

Imagine a social network platform with posts, users, and comments.

{
  "_id" : ObjectId("69dbfcbdc2f9d8e3922a5f0e"),
  "user_id" : "user_id",
  "post_text" : "What is your favourite NoSQL DB?", 
  "comments" : [
    {
      "name" : "Ivan Petrov", 
      "text" : "It depends on the use case."
    }, 
    {
      "name" : "Dimitar Ivanov", 
      "text" : "I enjoy the flexibility of MongoDB."
    }
  ]
}

We see the comments are embedded within the post document. The name of the users is also embedded. However, imagine a user changing his name? What name would appear in the old comments of that user?

This introduces a design decision:

In this case a better decission would be to have consistency - show the current name, hence we decide to use the Reference Document Model. Instead of having the user’s name directly in the document, we refer the user by id via user_id:

{
  "_id" : ObjectId(...),
  "user_id" : "user_id",
  "post_text" : "What is your favourite NoSQL DB?", 
  "comments" : [
    {
      "user_id" : ObjectId("69dbfb96c2f9d8e3922a5f08"), 
      "text" : "It depends on the use case."
    }, 
    {
      "user_id" : ObjectId("69dbfb96c2f9d8e3922a5f09"), 
      "text" : "I enjoy the flexibility of MongoDB."
    }
  ]
}

The drawback however is, that we have to perform an SQL like JOIN of the posts collection with users collection, so we display the user’s name when we want to display the posts and comments.

Let’s try the example.

Create a database named social and users collection inside it:

[
  {
    "_id" : { "$oid" : "69dbfb96c2f9d8e3922a5f08"},
    "name" : "Ivan Petrov"
  }, 
  {
    "_id" : { "$oid" : "69dbfb96c2f9d8e3922a5f09" },
    "name" : "Dimitar Ivanov"
  }
]

Create posts collection inside the social database and insert the following data:

{
  "user_id" : "user_id",
  "post_text" : "What is your favourite NoSQL DB?", 
  "comments" : [
    {
      "user_id" : {"$oid" :"69dbfb96c2f9d8e3922a5f08"}, 
      "text" : "It depends on the use case."
    }, 
    {
      "user_id" : {"$oid" : "69dbfb96c2f9d8e3922a5f09"}, 
      "text" : "I enjoy the flexibility of MongoDB."
    }
  ]
}

In order to perform the JOIN between the two collections, we need to use aggregation and the $lookup operator:

db.posts.aggregate( 
	[ 
    // 1. Break the 'comments' array into individual documents. If a post has 2 comments, it becomes 2 separate documents.
		{$unwind : "$comments"} , 

    // 2. JOIN: Look up the user details from the 'users' collection based on the user_id inside each individual comment.
		{$lookup : { 
					from: "users", 
					localField : "comments.user_id", 
					foreignField : "_id", 
					as : "comment_user" 
					} 
		} , 

    // 3. Since $lookup returns an array, we "unwind" the comment_user to turn it into a single object.

		{$unwind : "$comment_user" }, 

    // 4. Inject the user's name from the joined data 
		{$addFields : {"comments.name" : "$comment_user.name"}}, 

    // 5. Group the flattened documents back into the original post structure, pushing the updated comments into an array.

		{$group : 
			{
				_id : "$_id", 
				user_id : {$first : "$user_id" }, 
				post_text : { $first : "$post_text"}, 
				comments : {$push : "$comments" } 
			} 
		} 
	] 
)

Important! Aggregation pipelines with $lookup can be expensive on large datasets. They should be used carefully and often optimized with indexes.

Conclusion

When to use Embedded Model:

When to use Reference Model:

Common Mistakes:

  1. Embedding unbounded arrays (e.g. millions of comments)
  2. Over-normalizing (too many references → too many joins)
  3. Not considering data lifecycle
  4. Ignoring read vs write patterns

Transactions

A Transaction in a database system is a sequence of operations that are treated as one logical unit. Either all operations are completed, either all of them fail.

In MongoDB, an operation on a single document is atomic. The Embeded Document Model allows to put related data within the same document, hence the necessity for transactions is reduced. However, there are cases where there is a need to perform atomic operations accross multiple documents and collections. This is where the Transaction role comes into play.

Fortunately MongoDB provides support for ACID transactions. ACID stands for:

When to use Transactions

Transaction example

Imagine the following classical example - we have a bank database and accounts collection. We want to transfer money from one account to another, so we have two operations:

  1. Take an amount of money from the source account
  2. Add the taken amount of money to the destination account.
var session = db.getMongo().startSession();
session.startTransaction();
var accounts = session.getDatabase("bank").accounts;

try {
	accounts.updateOne( { username: "borislav" }, { $inc: { balance: -100 } } );
	accounts.updateOne( { username: "ivan" }, { $inc: { balance: 100 } } );

	session.commitTransaction();

} catch (error) {
	print("An error occured: " + error);
	session.abortTransaction();
} finally {
	session.endSession();
}

Here are the main concepts regarding the MongoDB Transactions:

Note: The transaction operates over a snapshot of the data, taken at the begining of the transaction. Hence, from transactions perspective we see the same data during execution. As the transaction tries to modify document (update/delete/insert), it performs a check whether the affected document had been modified from another transaction or outside a transaction. If that’s the case, an error occurs and the transaction gets aborted.

Transaction Demo

In order to try the Transactions in MongoDB you need to have a MongoDB Cluster (i.e. a Replica set). By default, the standard installation of MongoDB consists of a single node, i.e. it is not Cluster/Replica Set.

Your teacher will provide you an access to cloud-based MongoDB Cluster, where you will be able to try the Transactions.

Once you have connection to the cluster, create bank database and insert into accounts collection the following documents:

[
  {
    "username": "borislav",
    "balance": 1000
  }, 
  {
    "username": "ivan",
    "balance": 300
  }
]

Now, execute the below statements, IN CHUNKS. Please CHANGE your database name in Chunk 1 according to the account suffux number, provided by your teacher, i.e. for account student001, the db name will be bank001, for account student002 db name will be bank002, and so on…:

//Chunk 1:
var session = db.getMongo().startSession();
session.startTransaction();
var accounts = session.getDatabase("bankXXX").accounts;

//Chunk 2:
accounts.updateOne( { username: "borislav" }, { $inc: { balance: -100 } } );

//Chunk 3: 
accounts.updateOne( { username: "ivan" }, { $inc: { balance: 100 } } );

//Chunk 4:
session.commitTransaction();

Check the results after the transaction. The money transfer should be successful.

Now, we will try to simulate a write conflict during the transaction. Here is the idea - we start transaction and deduct money from borislav. At this point we manually modify the balance of ivan via Compass. Then we continue with the “money transfer” - we transfer the deducted money from borislav to ivan (Chunk 3), below are the steps:

  1. Execute operations from Chunks 1 and 2
  2. Modify via Compass, the balance of ivan, give him some money.
  3. Continue execution with Chunk 3

You should observe a WriteConflict error.


Next Post
MongoDB Geolocation with MongoDB Compass