$ bucketAuto（聚合）_MonogDB 中文网

MongoDB 中文手册

参考 > 参考 > 经营者 > 聚合管道阶段 > $ bucketAuto（聚合）

在本页面

定义
行为
例

定义¶

$bucketAuto¶

3.4版的新功能。

根据指定的表达式将传入文档分类为特定数量的组，称为存储桶。自动确定存储区边界，以尝试将文档平均分配到指定数量的存储区中。

每个存储段在输出中均表示为文档。每个存储桶的文档包含一个_id字段，该字段的值指定存储桶的包含下限和专有上限，以及一个count包含存储桶中文档数的字段。未指定count时，默认情况下会包含此字段 output。

该$bucketAuto阶段具有以下形式：

复制

{
  $bucketAuto: {
      groupBy: <expression>,
      buckets: <number>,
      output: {
         <output1>: { <$accumulator expression> },
         ...
      }
      granularity: <string>
  }
}

领域类型描述

groupBy 表达用于分组文档的表达式。要指定字段路径，请在字段名称前加一个美元符号$，并将其括在引号中。

buckets 整数一个正的32位整数，指定输入文档被分组到的存储桶数。

output

文献

可选的。一个文档，它指定除字段外还要包含在输出文档中的_id字段。要指定要包括的字段，必须使用累加器表达式：

复制

<outputfield1>: { <accumulator>: <expression1> },
...

指定时，默认count字段不包括在输出文档output中。明确指定count 表达式作为output包含它的文档的一部分：

复制

output: {
  <outputfield1>: { <accumulator>: <expression1> },
  ...
  count: { $sum: 1 }
}

granularity

串

可选的。一个字符串，它指定用于确保计算的边界边以首选的整数或其幂为10结束的首选数字序列。

仅当所有groupBy值均为数字且都不为时才可用NaN。

支持的值为granularity：

`"R5"` `"R10"` `"R20"` `"R40"` `"R80"` `"1-2-5"`	`"E6"` `"E12"` `"E24"` `"E48"` `"E96"` `"E192"` `"POWERSOF2"`

行为¶

在以下情况下，可能少于指定数量的存储桶：

输入文件的数量少于指定的存储桶数量。
groupBy表达式的唯一值的数量小于的指定数量buckets。
的granularity间隔少于的数量 buckets。
该granularity不精不足以文件均匀地分配到指定数量buckets。

如果groupBy表达式引用数组或文档，则使用与$sort 确定存储段边界之前相同的顺序来排列值。

文档在存储桶中的平均分配取决于groupBy字段的基数或唯一值的数量。如果基数不够高，则$ bucketAuto阶段可能无法在存储桶之间平均分配结果。

粒度¶

在$bucketAuto接受一个可选granularity参数，其确保所有桶的边界附着到指定的优选数系列。使用首选的数字系列可以更好地控制在groupBy 表达式中值范围内设置存储桶边界的位置。当groupBy表达式的范围呈指数比例缩放时，它们也可以用于对数帮助均匀地设置存储桶边界。

雷纳德系列¶

雷纳德数系列是一组通过采取任一5衍生的号码的^第 10 ^日，20 ^日，40 ^日或80 ^日的10根，则包括根的各种权力，等同于10.0 1.0之间的值（10.3在的情况R80）。

设置granularity到R5，R10，R20，R40，或R80限制斗边界序列值。当序列的groupBy值超出1.0到10.0的R80范围（对于，则为10.3 ）时，将其乘以10的幂。

例

该R5系列基于10的第五根（即1.58），并包括该根的各种幂（四舍五入），直到达到10。该R5系列的推导如下：

10 ^0/5 = 1
10 ^1/5 = 1.584〜1.6
10 ^2/5 = 2.511〜2.5
10 ^3/5 = 3.981〜4.0
10 ^4/5 = 6.309〜6.3
10 ^5/5 = 10

将相同的方法应用于其他Renard系列，以提供更精细的粒度，即，在1.0和10.0之间有更大的间隔（对于则为10.3 R80）。

E系列¶

对E数系列类似于雷纳德系列的，因为它们通过细分为1.0的间隔至10.0的6 ^个，12 ^个，24 ^个，48 ^个，96 ^个，或192 ^次与特定相对误差根十个。

设置granularity到E6，E12，E24，E48，E96，或 E192限制斗边界序列值。当序列的groupBy值在1.0到10.0范围之外时，乘以10的幂。要了解有关E系列及其各自相对误差的更多信息，请参阅首选数字系列。

1-2-5系列¶

如果存在该序列，则该1-2-5序列的行为类似于三值 Renard序列。

设置granularity为1-2-5将存储段边界限制为10的三进制根的各种幂，四舍五入到一位有效数字。

例

以下是该1-2-5系列的一部分：0.1、0.2、0.5、1、2、5、10、20、50、100、200、500、1000，依此类推…

两个系列的幂¶

设置granularity为POWERSOF2将存储段边界限制为2的幂。

例

以下数字遵循两个系列的功能：

2 ⁰ = 1
2 ¹ = 2
2 ² = 4
2 ³ = 8
2 ⁴ = 16
2 ⁵ = 32
等等…

一个常见的实现是各种计算机组件（例如内存）如何经常遵循一POWERSOF2组首选数字：

1，2，4，8，16，32，64，128，256，512，1024，2048等...

比较不同粒度¶

以下操作演示了如何为指定不同的值granularity如何影响$bucketAuto确定存储桶边界的方式。集合things的_id编号从1到100：

复制

{ _id: 1 }
{ _id: 2 }
...
{ _id: 100 }

的不同值granularity代入以下操作：

复制

db.things.aggregate( [
  {
    $bucketAuto: {
      groupBy: "$_id",
      buckets: 5,
      granularity: <granularity>
    }
  }
] )

下表中的结果说明了不同的值如何 granularity产生不同的存储区边界：

粒度	结果	笔记
没有粒度	{“ _id”：{“ min”：0，“ max”：20}，“ count”：20} {“ _id”：{“最小”：20，“最大”：40}，“计数”：20} {“ _id”：{“最小”：40，“最大”：60}，“计数”：20} {“ _id”：{“最小”：60，“最大”：80}，“计数”：20} {“ _id”：{“最小”：80，“最大”：99}，“计数”：20}
R20	{“ _id”：{“ min”：0，“ max”：20}，“ count”：20} {“ _id”：{“最小”：20，“最大”：40}，“计数”：20} {“ _id”：{“最小”：40，“最大”：63}，“计数”：23} {“ _id”：{“最小”：63，“最大”：90}，“计数”：27} {“ _id”：{“最小”：90，“最大”：100}，“计数”：10}
E24	{“ _id”：{“ min”：0，“ max”：20}，“ count”：20} {“ _id”：{“最小”：20，“最大”：43}，“计数”：23} {“ _id”：{“最小”：43，“最大”：68}，“计数”：25} {“ _id”：{“ min”：68，“ max”：91}，“ count”：23} {“ _id”：{“最小”：91，“最大”：100}，“计数”：9}
1-2-5	{“ _id”：{“ min”：0，“ max”：20}，“ count”：20} {“ _id”：{“最小”：20，“最大”：50}，“计数”：30} {“ _id”：{“最小”：50，“最大”：100}，“计数”：50}	指定的存储桶数超出了序列中的间隔数。
POWERSOF2	{“ _id”：{“ min”：0，“ max”：32}，“ count”：32} {“ _id”：{“最小”：32，“最大”：64}，“计数”：32} {“ _id”：{“最小”：64，“最大”：128}，“计数”：36}	指定的存储桶数超出了序列中的间隔数。

例子¶

考虑artwork包含以下文档的集合：

复制

{ "_id" : 1, "title" : "The Pillars of Society", "artist" : "Grosz", "year" : 1926,
    "price" : NumberDecimal("199.99"),
    "dimensions" : { "height" : 39, "width" : 21, "units" : "in" } }
{ "_id" : 2, "title" : "Melancholy III", "artist" : "Munch", "year" : 1902,
    "price" : NumberDecimal("280.00"),
    "dimensions" : { "height" : 49, "width" : 32, "units" : "in" } }
{ "_id" : 3, "title" : "Dancer", "artist" : "Miro", "year" : 1925,
    "price" : NumberDecimal("76.04"),
    "dimensions" : { "height" : 25, "width" : 20, "units" : "in" } }
{ "_id" : 4, "title" : "The Great Wave off Kanagawa", "artist" : "Hokusai",
    "price" : NumberDecimal("167.30"),
    "dimensions" : { "height" : 24, "width" : 36, "units" : "in" } }
{ "_id" : 5, "title" : "The Persistence of Memory", "artist" : "Dali", "year" : 1931,
    "price" : NumberDecimal("483.00"),
    "dimensions" : { "height" : 20, "width" : 24, "units" : "in" } }
{ "_id" : 6, "title" : "Composition VII", "artist" : "Kandinsky", "year" : 1913,
    "price" : NumberDecimal("385.00"),
    "dimensions" : { "height" : 30, "width" : 46, "units" : "in" } }
{ "_id" : 7, "title" : "The Scream", "artist" : "Munch",
    "price" : NumberDecimal("159.00"),
    "dimensions" : { "height" : 24, "width" : 18, "units" : "in" } }
{ "_id" : 8, "title" : "Blue Flower", "artist" : "O'Keefe", "year" : 1918,
    "price" : NumberDecimal("118.42"),
    "dimensions" : { "height" : 24, "width" : 20, "units" : "in" } }

多重聚合¶

该$bucketAuto阶段可在该 $facet阶段内用于处理来自的同一组输入文档上的多个聚合管道artwork。

下面聚集管道组从文件 artwork收集到水桶基础上price，year并且计算的area：

复制

db.artwork.aggregate( [
  {
    $facet: {
      "price": [
        {
          $bucketAuto: {
            groupBy: "$price",
            buckets: 4
          }
        }
      ],
      "year": [
        {
          $bucketAuto: {
            groupBy: "$year",
            buckets: 3,
            output: {
              "count": { $sum: 1 },
              "years": { $push: "$year" }
            }
          }
        }
      ],
      "area": [
        {
          $bucketAuto: {
            groupBy: {
              $multiply: [ "$dimensions.height", "$dimensions.width" ]
            },
            buckets: 4,
            output: {
              "count": { $sum: 1 },
              "titles": { $push: "$title" }
            }
          }
        }
      ]
    }
  }
] )

该操作返回以下文档：

复制

{
  "area" : [
    {
      "_id" : { "min" : 432, "max" : 500 },
      "count" : 3,
      "titles" : [
        "The Scream",
        "The Persistence of Memory",
        "Blue Flower"
      ]
    },
    {
      "_id" : { "min" : 500, "max" : 864 },
      "count" : 2,
      "titles" : [
        "Dancer",
        "The Pillars of Society"
      ]
    },
    {
      "_id" : { "min" : 864, "max" : 1568 },
      "count" : 2,
      "titles" : [
        "The Great Wave off Kanagawa",
        "Composition VII"
      ]
    },
    {
      "_id" : { "min" : 1568, "max" : 1568 },
      "count" : 1,
      "titles" : [
        "Melancholy III"
      ]
    }
  ],
  "price" : [
    {
      "_id" : { "min" : NumberDecimal("76.04"), "max" : NumberDecimal("159.00") },
      "count" : 2
    },
    {
      "_id" : { "min" : NumberDecimal("159.00"), "max" : NumberDecimal("199.99") },
      "count" : 2
    },
    {
      "_id" : { "min" : NumberDecimal("199.99"), "max" : NumberDecimal("385.00") },
      "count" : 2 },
    {
      "_id" : { "min" : NumberDecimal("385.00"), "max" : NumberDecimal("483.00") },
      "count" : 2
    }
  ],
  "year" : [
    { "_id" : { "min" : null, "max" : 1913 }, "count" : 3, "years" : [ 1902 ] },
    { "_id" : { "min" : 1913, "max" : 1926 }, "count" : 3, "years" : [ 1913, 1918, 1925 ] },
    { "_id" : { "min" : 1926, "max" : 1931 }, "count" : 2, "years" : [ 1926, 1931 ] }
  ]
}

MongoDB 中文手册

$ bucketAuto（聚合）

定义¶

行为¶

粒度¶

雷纳德系列¶

E系列¶

1-2-5系列¶

两个系列的幂¶

比较不同粒度¶

例子¶

单面聚合¶

多重聚合¶