Map-Reduce¶

On this page

Map-Reduce Examples
Incremental Map-Reduce
Temporary Collection
Concurrency
Sharded Cluster
Troubleshooting Map-Reduce Operations

Map-reduce operations can handle complex aggregation tasks. To perform map-reduce operations, MongoDB provides the mapReduce command and, in the mongo shell, the db.collection.mapReduce() wrapper method.

For many simple aggregation tasks, see the aggregation framework.

Map-Reduce Examples¶

This section provides some map-reduce examples in the mongo shell using the db.collection.mapReduce() method:

copy

db.collection.mapReduce(
                         <mapfunction>,
                         <reducefunction>,
                         {
                           out: <collection>,
                           query: <document>,
                           sort: <document>,
                           limit: <number>,
                           finalize: <function>,
                           scope: <document>,
                           jsMode: <boolean>,
                           verbose: <boolean>
                         }
                       )

For more information on the parameters, see the db.collection.mapReduce() reference page .

Consider the following map-reduce operations on a collection orders that contains documents of the following prototype:

copy

{
     _id: ObjectId("50a8240b927d5d8b5891743c"),
     cust_id: "abc123",
     ord_date: new Date("Oct 04, 2012"),
     status: 'A',
     price: 25,
     items: [ { sku: "mmm", qty: 5, price: 2.5 },
              { sku: "nnn", qty: 5, price: 2.5 } ]
}

Return the Total Price Per Customer Id¶

Perform map-reduce operation on the orders collection to group by the cust_id, and for each cust_id, calculate the sum of the price for each cust_id:

Define the map function to process each input document:
- In the function, this refers to the document that the map-reduce operation is processing.
- The function maps the price to the cust_id for each document and emits the cust_id and price pair.
copy
```
var mapFunction1 = function() {
                       emit(this.cust_id, this.price);
                   };
```
Define the corresponding reduce function with two arguments keyCustId and valuesPrices:
- The valuesPrices is an array whose elements are the price values emitted by the map function and grouped by keyCustId.
- The function reduces the valuesPrice array to the sum of its elements.
copy
```
var reduceFunction1 = function(keyCustId, valuesPrices) {
                          return Array.sum(valuesPrices);
                      };
```
Perform the map-reduce on all documents in the orders collection using the mapFunction1 map function and the reduceFunction1 reduce function.
copy
```
db.orders.mapReduce(
                     mapFunction1,
                     reduceFunction1,
                     { out: "map_reduce_example" }
                   )
```
This operation outputs the results to a collection named map_reduce_example. If the map_reduce_example collection already exists, the operation will replace the contents with the results of this map-reduce operation:

Calculate the Number of Orders, Total Quantity, and Average Quantity Per Item¶

In this example you will perform a map-reduce operation on the orders collection, for all documents that have an ord_date value greater than 01/01/2012. The operation groups by the item.sku field, and for each sku calculates the number of orders and the total quantity ordered. The operation concludes by calculating the average quantity per order for each sku value:

Define the map function to process each input document:

In the function, this refers to the document that the map-reduce operation is processing.
For each item, the function associates the sku with a new object value that contains the count of 1 and the item qty for the order and emits the sku and value pair.

copy

var mapFunction2 = function() {
                       for (var idx = 0; idx < this.items.length; idx++) {
                           var key = this.items[idx].sku;
                           var value = {
                                         count: 1,
                                         qty: this.items[idx].qty
                                       };
                           emit(key, value);
                       }
                    };

Define the corresponding reduce function with two arguments keySKU and valuesCountObjects:

valuesCountObjects is an array whose elements are the objects mapped to the grouped keySKU values passed by map function to the reducer function.
The function reduces the valuesCountObjects array to a single object reducedValue that also contains the count and the qty fields.
In reducedValue, the count field contains the sum of the count fields from the individual array elements, and the qty field contains the sum of the qty fields from the individual array elements.

copy

var reduceFunction2 = function(keySKU, valuesCountObjects) {
                          reducedValue = { count: 0, qty: 0 };

                          for (var idx = 0; idx < valuesCountObjects.length; idx++) {
                              reducedValue.count += valuesCountObjects[idx].count;
                              reducedValue.qty += valuesCountObjects[idx].qty;
                          }

                          return reducedValue;
                      };

Define a finalize function with two arguments key and reducedValue. The function modifies the reducedValue object to add a computed field named average and returns the modified object:

copy

var finalizeFunction2 = function (key, reducedValue) {

                            reducedValue.average = reducedValue.qty/reducedValue.count;

                            return reducedValue;
                        };

Perform the map-reduce operation on the orders collection using the mapFunction2, reduceFunction2, and finalizeFunction2 functions.

copy

db.orders.mapReduce( mapFunction2,
                     reduceFunction2,
                     {
                       out: { merge: "map_reduce_example" },
                       query: { ord_date: { $gt: new Date('01/01/2012') } },
                       finalize: finalizeFunction2
                     }
                   )

This operation uses the query field to select only those documents with ord_date greater than new Date(01/01/2012). Then it output the results to a collection map_reduce_example. If the map_reduce_example collection already exists, the operation will merge the existing contents with the results of this map-reduce operation:

Incremental Map-Reduce¶

If the map-reduce dataset is constantly growing, then rather than performing the map-reduce operation over the entire dataset each time you want to run map-reduce, you may want to perform an incremental map-reduce.

To perform incremental map-reduce:

Run a map-reduce job over the current collection and output the result to a separate collection.
When you have more data to process, run subsequent map-reduce job with:
- the query parameter that specifies conditions that match only the new documents.
- the out parameter that specifies the reduce action to merge the new results into the existing output collection.

Consider the following example where you schedule a map-reduce operation on a sessions collection to run at the end of each day.

Data Setup¶

The sessions collection contains documents that log users’ session each day, for example:

copy

db.sessions.save( { userid: "a", ts: ISODate('2011-11-03 14:17:00'), length: 95 } );
db.sessions.save( { userid: "b", ts: ISODate('2011-11-03 14:23:00'), length: 110 } );
db.sessions.save( { userid: "c", ts: ISODate('2011-11-03 15:02:00'), length: 120 } );
db.sessions.save( { userid: "d", ts: ISODate('2011-11-03 16:45:00'), length: 45 } );

db.sessions.save( { userid: "a", ts: ISODate('2011-11-04 11:05:00'), length: 105 } );
db.sessions.save( { userid: "b", ts: ISODate('2011-11-04 13:14:00'), length: 120 } );
db.sessions.save( { userid: "c", ts: ISODate('2011-11-04 17:00:00'), length: 130 } );
db.sessions.save( { userid: "d", ts: ISODate('2011-11-04 15:37:00'), length: 65 } );

Initial Map-Reduce of Current Collection¶

Run the first map-reduce operation as follows:

Define the map function that maps the userid to an object that contains the fields userid, total_time, count, and avg_time:

copy

var mapFunction = function() {
                      var key = this.userid;
                      var value = {
                                    userid: this.userid,
                                    total_time: this.length,
                                    count: 1,
                                    avg_time: 0
                                   };

                      emit( key, value );
                  };

Define the corresponding reduce function with two arguments key and values to calculate the total time and the count. The key corresponds to the userid, and the values is an array whose elements corresponds to the individual objects mapped to the userid in the mapFunction.

copy

var reduceFunction = function(key, values) {

                        var reducedObject = {
                                              userid: key,
                                              total_time: 0,
                                              count:0,
                                              avg_time:0
                                            };

                        values.forEach( function(value) {
                                              reducedObject.total_time += value.total_time;
                                              reducedObject.count += value.count;
                                        }
                                      );
                        return reducedObject;
                     };

Define finalize function with two arguments key and reducedValue. The function modifies the reducedValue document to add another field average and returns the modified document.

copy

var finalizeFunction = function (key, reducedValue) {

                          if (reducedValue.count > 0)
                              reducedValue.avg_time = reducedValue.total_time / reducedValue.count;

                          return reducedValue;
                       };

Perform map-reduce on the session collection using the mapFunction, the reduceFunction, and the finalizeFunction functions. Output the results to a collection session_stat. If the session_stat collection already exists, the operation will replace the contents:

copy

db.sessions.mapReduce( mapFunction,
                       reduceFunction,
                       {
                         out: { reduce: "session_stat" },
                         finalize: finalizeFunction
                       }
                     )

Subsequent Incremental Map-Reduce¶

Later as the sessions collection grows, you can run additional map-reduce operations. For example, add new documents to the sessions collection:

copy

db.sessions.save( { userid: "a", ts: ISODate('2011-11-05 14:17:00'), length: 100 } );
db.sessions.save( { userid: "b", ts: ISODate('2011-11-05 14:23:00'), length: 115 } );
db.sessions.save( { userid: "c", ts: ISODate('2011-11-05 15:02:00'), length: 125 } );
db.sessions.save( { userid: "d", ts: ISODate('2011-11-05 16:45:00'), length: 55 } );

At the end of the day, perform incremental map-reduce on the sessions collection but use the query field to select only the new documents. Output the results to the collection session_stat, but reduce the contents with the results of the incremental map-reduce:

copy

db.sessions.mapReduce( mapFunction,
                       reduceFunction,
                       {
                         query: { ts: { $gt: ISODate('2011-11-05 00:00:00') } },
                         out: { reduce: "session_stat" },
                         finalize: finalizeFunction
                       }
                     );

Temporary Collection¶

The map-reduce operation uses a temporary collection during processing. At completion, the map-reduce operation renames the temporary collection. As a result, you can perform a map-reduce operation periodically with the same target collection name without affecting the intermediate states. Use this mode when generating statistical output collections on a regular basis.

Concurrency¶

The map-reduce operation is composed of many tasks, including:

reads from the input collection,
executions of the map function,
executions of the reduce function,
writes to the output collection.

These various tasks take the following locks:

The read phase takes a read lock. It yields every 100 documents.
The JavaScript code (i.e. map, reduce, finalize functions) is executed in a single thread, taking a JavaScript lock; however, most JavaScript tasks in map-reduce are very short and yield the lock frequently.
The insert into the temporary collection takes a write lock for a single write.

If the output collection does not exist, the creation of the output collection takes a write lock.

If the output collection exists, then the output actions (i.e. merge, replace, reduce) take a write lock.

Although single-threaded, the map-reduce tasks interleave and appear to run in parallel.

Note

The final write lock during post-processing makes the results appear atomically. However, output actions merge and reduce may take minutes to process. For the merge and reduce, the nonAtomic flag is available. See the db.collection.mapReduce() reference for more information.

Sharded Cluster¶

Sharded Input¶

When using sharded collection as the input for a map-reduce operation, mongos will automatically dispatch the map-reduce job to each shard in parallel. There is no special option required. mongos will wait for jobs on all shards to finish.

Sharded Output¶

By default the output collection is not sharded. The process is:

mongos dispatches a map-reduce finish job to the shard that will store the target collection.
The target shard pulls results from all other shards, and runs a final reduce/finalize operation, and write to the output.
If using the sharded option to the out parameter, MongoDB shards the output using _id field as the shard key.

Changed in version 2.2.
If the output collection does not exist, MongoDB creates and shards the collection on the _id field. If the collection is empty, MongoDB creates chunks using the result of the first stage of the map-reduce operation.
mongos dispatches, in parallel, a map-reduce finish job to every shard that owns a chunk.
Each shard will pull the results it owns from all other shards, run a final reduce/finalize, and write to the output collection.

Note

During later map-reduce jobs, MongoDB splits chunks as needed.
Balancing of chunks for the output collection is automatically prevented during post-processing to avoid concurrency issues.

In MongoDB 2.0:

mongos retrieves the results from each shard, and performs merge sort to order the results, and performs a reduce/finalize as needed. mongos then writes the result to the output collection in sharded mode.
This model requires only a small amount of memory, even for large datasets.
Shard chunks are not automatically split during insertion. This requires manual intervention until the chunks are granular and balanced.

Warning

For best results, only use the sharded output options for mapReduce in version 2.2 or later.

Troubleshooting Map-Reduce Operations¶

You can troubleshoot the map function and the reduce function in the mongo shell.

Troubleshoot the Map Function¶

You can verify the key and value pairs emitted by the map function by writing your own emit function.

Consider a collection orders that contains documents of the following prototype:

copy

{
     _id: ObjectId("50a8240b927d5d8b5891743c"),
     cust_id: "abc123",
     ord_date: new Date("Oct 04, 2012"),
     status: 'A',
     price: 250,
     items: [ { sku: "mmm", qty: 5, price: 2.5 },
              { sku: "nnn", qty: 5, price: 2.5 } ]
}

Define the map function that maps the price to the cust_id for each document and emits the cust_id and price pair:
copy
```
var map = function() {
    emit(this.cust_id, this.price);
};
```

Define the emit function to print the key and value:

copy

var emit = function(key, value) {
    print("emit");
    print("key: " + key + "  value: " + tojson(value));
}

Invoke the map function with a single document from the orders collection:

copy

var myDoc = db.orders.findOne( { _id: ObjectId("50a8240b927d5d8b5891743c") } );
map.apply(myDoc);

Verify the key and value pair is as you expected.
copy
```
emit
key: abc123 value:250
```

Invoke the map function with multiple documents from the orders collection:

copy

var myCursor = db.orders.find( { cust_id: "abc123" } );

while (myCursor.hasNext()) {
    var doc = myCursor.next();
    print ("document _id= " + tojson(doc._id));
    map.apply(doc);
    print();
}

Verify the key and value pairs are as you expected.

Troubleshoot the Reduce Function¶

Confirm Output Type¶

You can test that the reduce function returns a value that is the same type as the value emitted from the map function.

Define a reduceFunction1 function that takes the arguments keyCustId and valuesPrices. valuesPrices is an array of integers:

copy

var reduceFunction1 = function(keyCustId, valuesPrices) {
                          return Array.sum(valuesPrices);
                      };

Define a sample array of integers:
copy
```
var myTestValues = [ 5, 5, 10 ];
```
Invoke the reduceFunction1 with myTestValues:
copy
```
reduceFunction1('myKey', myTestValues);
```
Verify the reduceFunction1 returned an integer:
copy
```
20
```

Define a reduceFunction2 function that takes the arguments keySKU and valuesCountObjects. valuesCountObjects is an array of documents that contain two fields count and qty:

copy

var reduceFunction2 = function(keySKU, valuesCountObjects) {
                          reducedValue = { count: 0, qty: 0 };

                          for (var idx = 0; idx < valuesCountObjects.length; idx++) {
                              reducedValue.count += valuesCountObjects[idx].count;
                              reducedValue.qty += valuesCountObjects[idx].qty;
                          }

                          return reducedValue;
                      };

Define a sample array of documents:

copy

var myTestObjects = [
                      { count: 1, qty: 5 },
                      { count: 2, qty: 10 },
                      { count: 3, qty: 15 }
                    ];

Invoke the reduceFunction2 with myTestObjects:
copy
```
reduceFunction2('myKey', myTestObjects);
```
Verify the reduceFunction2 returned a document with exactly the count and the qty field:
copy
```
{ "count" : 6, "qty" : 30 }
```

Ensure Insensitivity to the Order of Mapped Values¶

The reduce function takes a key and a values array as its argument. You can test that the result of the reduce function does not depend on the order of the elements in the values array.

Define a sample values1 array and a sample values2 array that only differ in the order of the array elements:

copy

var values1 = [
                { count: 1, qty: 5 },
                { count: 2, qty: 10 },
                { count: 3, qty: 15 }
              ];

var values2 = [
                { count: 3, qty: 15 },
                { count: 1, qty: 5 },
                { count: 2, qty: 10 }
              ];

Define a reduceFunction2 function that takes the arguments keySKU and valuesCountObjects. valuesCountObjects is an array of documents that contain two fields count and qty:

copy

var reduceFunction2 = function(keySKU, valuesCountObjects) {
                          reducedValue = { count: 0, qty: 0 };

                          for (var idx = 0; idx < valuesCountObjects.length; idx++) {
                              reducedValue.count += valuesCountObjects[idx].count;
                              reducedValue.qty += valuesCountObjects[idx].qty;
                          }

                          return reducedValue;
                      };

Invoke the reduceFunction2 first with values1 and then with values2:

copy

reduceFunction2('myKey', values1);
reduceFunction2('myKey', values2);

Verify the reduceFunction2 returned the same result:
copy
```
{ "count" : 6, "qty" : 30 }
```

Ensure Reduce Function Idempotentcy¶

Because the map-reduce operation may call a reduce multiple times for the same key, the reduce function must return a value of the same type as the value emitted from the map function. You can test that the reduce function process “reduced” values without affecting the final value.

Define a reduceFunction2 function that takes the arguments keySKU and valuesCountObjects. valuesCountObjects is an array of documents that contain two fields count and qty:

copy

var reduceFunction2 = function(keySKU, valuesCountObjects) {
                          reducedValue = { count: 0, qty: 0 };

                          for (var idx = 0; idx < valuesCountObjects.length; idx++) {
                              reducedValue.count += valuesCountObjects[idx].count;
                              reducedValue.qty += valuesCountObjects[idx].qty;
                          }

                          return reducedValue;
                      };

Define a sample key:
copy
```
var myKey = 'myKey';
```

Define a sample valuesIdempotent array that contains an element that is a call to the reduceFunction2 function:

copy

var valuesIdempotent = [
                         { count: 1, qty: 5 },
                         { count: 2, qty: 10 },
                         reduceFunction2(myKey, [ { count:3, qty: 15 } ] )
                       ];

Define a sample values1 array that combines the values passed to reduceFunction2:

copy

var values1 = [
                { count: 1, qty: 5 },
                { count: 2, qty: 10 },
                { count: 3, qty: 15 }
              ];

Invoke the reduceFunction2 first with myKey and valuesIdempotent and then with myKey and values1:
copy
```
reduceFunction2(myKey, valuesIdempotent);
reduceFunction2(myKey, values1);
```
Verify the reduceFunction2 returned the same result:
copy
```
{ "count" : 6, "qty" : 30 }
```

← Aggregation Framework Reference Simple Aggregation Methods and Commands →