- Aggregation >
- Aggregation Framework Reference
Aggregation Framework Reference¶
New in version 2.1.0.
On this page
The aggregation framework provides the ability to project, process, and/or control the output of the query, without using map-reduce. Aggregation uses a syntax that resembles the same syntax and form as “regular” MongoDB database queries.
These aggregation operations are all accessible by way of the
aggregate() method. While all examples in this document use this
method, aggregate() is merely a wrapper around the
database command aggregate. The
following prototype aggregation operations are equivalent:
These operations perform aggregation routines on the
collection named people. <pipeline> is a placeholder for the
aggregation pipeline definition. aggregate()
accepts the stages of the pipeline (i.e. <pipeline>) as an array,
or as arguments to the method.
This documentation provides an overview of all aggregation operators available for use in the aggregation pipeline as well as details regarding their use and behavior.
See also
Aggregation Framework overview, the Aggregation Framework Documentation Index, and the Aggregation Framework Examples for more information on the aggregation functionality.
Aggregation Operators:
Pipeline¶
Warning
The pipeline cannot operate on values of the following types:
Binary, Symbol, MinKey, MaxKey, DBRef,
Code, and CodeWScope.
Pipeline operators appear in an array. Conceptually, documents pass through
these operators in a sequence. All examples in this section assume that the
aggregation pipeline begins with a collection named article that
contains documents that resemble the following:
The current pipeline operators are:
-
$project¶ Reshapes a document stream by renaming, adding, or removing fields. Also use
$projectto create computed values or sub-documents. Use$projectto:- Include fields from the original document.
- Insert computed fields.
- Rename fields.
- Create and populate fields that hold sub-documents.
Use
$projectto quickly select the fields that you want to include or exclude from the response. Consider the following aggregation framework operation.This operation includes the
titlefield and theauthorfield in the document that returns from the aggregation pipeline.Note
The
_idfield is always included by default. You may explicitly exclude_idas follows:Here, the projection excludes the
_idfield but includes thetitleandauthorfields.Projections can also add computed fields to the document stream passing through the pipeline. A computed field can use any of the expression operators. Consider the following example:
Here, the field
doctoredPageViewsrepresents the value of thepageViewsfield after adding 10 to the original field using the$add.Note
You must enclose the expression that defines the computed field in braces, so that the expression is a valid object.
You may also use
$projectto rename fields. Consider the following example:This operation renames the
pageViewsfield topage_views, and renames thefoofield in theothersub-document as the top-level fieldbar. The field references used for renaming fields are direct expressions and do not use an operator or surrounding braces. All aggregation field references can use dotted paths to refer to fields in nested documents.Finally, you can use the
$projectto create and populate new sub-documents. Consider the following example that creates a new object-valued field namedstatsthat holds a number of values:This projection includes the
titlefield and places$projectinto “inclusive” mode. Then, it creates thestatsdocuments with the following fields:pvwhich includes and renames thepageViewsfrom the top level of the original documents.foowhich includes the value ofother.foofrom the original documents.dpvwhich is a computed field that adds 10 to the value of thepageViewsfield in the original document using the$addaggregation expression.
-
$match¶ Provides a query-like interface to filter documents out of the aggregation pipeline. The
$matchdrops documents that do not match the condition from the aggregation pipeline, and it passes documents that match along the pipeline unaltered.The syntax passed to the
$matchis identical to the query syntax. Consider the following prototype form:The following example performs a simple field equality test:
This operation only returns documents where the
authorfield holds the valuedave. Consider the following example, which performs a range test:Here, all documents return when the
scorefield holds a value that is greater than 50 and less than or equal to 90.Note
Place the
$matchas early in the aggregation pipeline as possible. Because$matchlimits the total number of documents in the aggregation pipeline, earlier$matchoperations minimize the amount of later processing. If you place a$matchat the very beginning of a pipeline, the query can take advantage of indexes like any otherdb.collection.find()ordb.collection.findOne().Warning
You cannot use
$whereor geospatial operations in$matchqueries as part of the aggregation pipeline.
-
$limit¶ Restricts the number of documents that pass through the
$limitin the pipeline.$limittakes a single numeric (positive whole number) value as a parameter. Once the specified number of documents pass through the pipeline operator, no more will. Consider the following example:This operation returns only the first 5 documents passed to it from by the pipeline.
$limithas no effect on the content of the documents it passes.
-
$skip¶ Skips over the specified number of documents that pass through the
$skipin the pipeline before passing all of the remaining input.$skiptakes a single numeric (positive whole number) value as a parameter. Once the operation has skipped the specified number of documents, it passes all the remaining documents along the pipeline without alteration. Consider the following example:This operation skips the first 5 documents passed to it by the pipeline.
$skiphas no effect on the content of the documents it passes along the pipeline.
-
$unwind¶ Peels off the elements of an array individually, and returns a stream of documents.
$unwindreturns one document for every member of the unwound array within every source document. Take the following aggregation command:Note
The dollar sign (i.e.
$) must proceed the field specification handed to the$unwindoperator.In the above aggregation
$projectselects (inclusively) theauthor,title, andtagsfields, as well as the_idfield implicitly. Then the pipeline passes the results of the projection to the$unwindoperator, which will unwind thetagsfield. This operation may return a sequence of documents that resemble the following for a collection that contains one document holding atagsfield with an array of 3 items.A single document becomes 3 documents: each document is identical except for the value of the
tagsfield. Each value oftagsis one of the values in the original “tags” array.Note
$unwindhas the following behaviors:$unwindis most useful in combination with$group.- You may undo the effects of unwind operation with the
$grouppipeline operator. - If you specify a target field for
$unwindthat does not exist in an input document, the pipeline ignores the input document, and will generate no result documents. - If you specify a target field for
$unwindthat is not an array,db.collection.aggregate()generates an error. - If you specify a target field for
$unwindthat holds an empty array ([]) in an input document, the pipeline ignores the input document, and will generates no result documents.
-
$group¶ Groups documents together for the purpose of calculating aggregate values based on a collection of documents. Practically, group often supports tasks such as average page views for each page in a website on a daily basis.
The output of
$groupdepends on how you define groups. Begin by specifying an identifier (i.e. a_idfield) for the group you’re creating with this pipeline. You can specify a single field from the documents in the pipeline, a previously computed value, or an aggregate key made up from several incoming fields. Aggregate keys may resemble the following document:With the exception of the
_idfield,$groupcannot output nested documents.Important
The output of
$groupis not ordered.Every group expression must specify an
_idfield. You may specify the_idfield as a dotted field path reference, a document with multiple fields enclosed in braces (i.e.{and}), or a constant value.Consider the following example:
This groups by the
authorfield and computes two fields, the firstdocsPerAuthoris a counter field that adds one for each document with a given author field using the$sumfunction. TheviewsPerAuthorfield is the sum of all of thepageViewsfields in the documents for each group.Each field defined for the
$groupmust use one of the group aggregation function listed below to generate its composite value:Warning
The aggregation system currently stores
$groupoperations in memory, which may cause problems when processing a larger number of groups.
-
$sort¶ The
$sortpipeline operator sorts all input documents and returns them to the pipeline in sorted order. Consider the following prototype form:This sorts the documents in the collection named
<collection-name>, according to the key and specification in the{ <sort-key> }document.Specify the sort in a document with a field or fields that you want to sort by and a value of
1or-1to specify an ascending or descending sort respectively, as in the following example:This operation sorts the documents in the
userscollection, in descending order according by theagefield and then in ascending order according to the value in thepostsfield.When comparing values of different BSON types, MongoDB uses the following comparison order, from lowest to highest:
- MinKey (internal type)
- Null
- Numbers (ints, longs, doubles)
- Symbol, String
- Object
- Array
- BinData
- ObjectID
- Boolean
- Date, Timestamp
- Regular Expression
- MaxKey (internal type)
Note
MongoDB treats some types as equivalent for comparison purposes. For instance, numeric types undergo conversion before comparison.
Note
The
$sortcannot begin sorting documents until previous operators in the pipeline have returned all output.$sortoperator can take advantage of an index when placed at the beginning of the pipeline or placed before the following aggregation operators:Warning
Unless the
$sortoperator can use an index, in the current release, the sort must fit within memory. This may cause problems when sorting large numbers of documents.
Expressions¶
These operators calculate values within the aggregation framework.
$group Operators¶
The $group pipeline stage provides the following
operations:
-
$addToSet¶ Returns an array of all the values found in the selected field among the documents in that group. Every unique value only appears once in the result set. There is no ordering guarantee for the output documents.
-
$first¶ Returns the first value it encounters for its group .
-
$last¶ Returns the last value it encounters for its group.
-
$max¶ Returns the highest value among all values of the field in all documents selected by this group.
-
$min¶ Returns the lowest value among all values of the field in all documents selected by this group.
-
$avg¶ Returns the average of all the values of the field in all documents selected by this group.
-
$push¶ Returns an array of all the values found in the selected field among the documents in that group. A value may appear more than once in the result set if more than one field in the grouped documents has that value.
-
$sum¶ Returns the sum of all the values for a specified field in the grouped documents, as in the second use above.
Alternately, if you specify a value as an argument,
$sumwill increment this field by the specified value for every document in the grouping. Typically, as in the first use above, specify a value of1in order to count members of the group.
Boolean Operators¶
The three boolean operators accept Booleans as arguments and return Booleans as results.
Note
These operators convert non-booleans to Boolean values according to
the BSON standards. Here, null, undefined, and 0 values
become false, while non-zero numeric values, and all other types,
such as strings, dates, objects become true.
-
$and¶ Takes an array one or more values and returns
trueif all of the values in the array aretrue. Otherwise$andreturnsfalse.Note
$anduses short-circuit logic: the operation stops evaluation after encountering the firstfalseexpression.
Comparison Operators¶
These operators perform comparisons between two values and return a Boolean, in most cases, reflecting the result of that comparison.
All comparison operators take an array with a pair of values. You may
compare numbers, strings, and dates. Except for $cmp,
all comparison operators return a Boolean value. $cmp
returns an integer.
-
$cmp¶ Takes two values in an array and returns an integer. The returned value is:
- A negative number if the first value is less than the second.
- A positive number if the first value is greater than the second.
0if the two values are equal.
-
$eq¶ Takes two values in an array and returns a boolean. The returned value is:
truewhen the values are equivalent.falsewhen the values are not equivalent.
-
$gt¶ Takes two values in an array and returns a boolean. The returned value is:
truewhen the first value is greater than the second value.falsewhen the first value is less than or equal to the second value.
-
$gte¶ Takes two values in an array and returns a boolean. The returned value is:
truewhen the first value is greater than or equal to the second value.falsewhen the first value is less than the second value.
-
$lt¶ Takes two values in an array and returns a boolean. The returned value is:
truewhen the first value is less than the second value.falsewhen the first value is greater than or equal to the second value.
-
$lte¶ Takes two values in an array and returns a boolean. The returned value is:
truewhen the first value is less than or equal to the second value.falsewhen the first value is greater than the second value.
-
$ne¶ Takes two values in an array returns a boolean. The returned value is:
truewhen the values are not equivalent.falsewhen the values are equivalent.
Arithmetic Operators¶
These operators only support numbers.
-
$add¶ Takes an array of one or more numbers and adds them together, returning the sum.
-
$divide¶ Takes an array that contains a pair of numbers and returns the value of the first number divided by the second number.
-
$mod¶ Takes an array that contains a pair of numbers and returns the remainder of the first number divided by the second number.
See also
-
$multiply¶ Takes an array of one or more numbers and multiples them, returning the resulting product.
-
$subtract¶ Takes an array that contains a pair of numbers and subtracts the second from the first, returning their difference.
String Operators¶
These operators manipulate strings within projection expressions.
-
$strcasecmp¶ Takes in two strings. Returns a number.
$strcasecmpis positive if the first string is “greater than” the second and negative if the first string is “less than” the second.$strcasecmpreturns 0 if the strings are identical.Note
$strcasecmpmay not make sense when applied to glyphs outside the Roman alphabet.$strcasecmpinternally capitalizes strings before comparing them to provide a case-insensitive comparison. Use$cmpfor a case sensitive comparison.
-
$substr¶ $substrtakes a string and two numbers. The first number represents the number of bytes in the string to skip, and the second number specifies the number of bytes to return from the string.Note
$substris not encoding aware and if used improperly may produce a result string containing an invalid UTF-8 character sequence.
Date Operators¶
All date operators take a “Date” typed value as a single argument and return a number.
-
$dayOfYear¶ Takes a date and returns the day of the year as a number between 1 and 366.
-
$dayOfMonth¶ Takes a date and returns the day of the month as a number between 1 and 31.
-
$dayOfWeek¶ Takes a date and returns the day of the week as a number between 1 (Sunday) and 7 (Saturday.)
-
$year¶ Takes a date and returns the full year.
-
$month¶ Takes a date and returns the month as a number between 1 and 12.
-
$week¶ Takes a date and returns the week of the year as a number between 0 and 53.
Weeks begin on Sundays, and week 1 begins with the first Sunday of the year. Days preceding the first Sunday of the year are in week 0. This behavior is the same as the “
%U” operator to thestrftimestandard library function.
-
$hour¶ Takes a date and returns the hour between 0 and 23.
-
$minute¶ Takes a date and returns the minute between 0 and 59.
-
$second¶ Takes a date and returns the second between 0 and 59, but can be 60 to account for leap seconds.
Conditional Expressions¶
-
$cond¶ Use the
$condoperator with the following syntax:Takes an array with three expressions, where the first expression evaluates to a Boolean value. If the first expression evaluates to true,
$condreturns the value of the second expression. If the first expression evaluates to false,$condevaluates and returns the third expression.