$ substrBytes（聚合）_MonogDB 中文网

MongoDB 中文手册

参考 > 参考 > 经营者 > 聚合管道运营商 > $ substrBytes（聚合）

在本页面

定义
行为
例

定义¶

$substrBytes¶

3.4版的新功能。

返回字符串的子字符串。子字符串以字符串中指定的UTF-8字节索引（从零开始）处的字符开始，并持续指定的字节数。

$substrBytes具有以下运算符表达式语法：

复制

{ $substrBytes: [ <string expression>, <byte index>, <byte count> ] }

领域	类型	描述
`string expression`	串	从中提取子字符串的字符串。可以是任何有效的表达式，只要它可以解析为字符串即可。有关表达式的更多信息，请参见表达式。`string expression` 如果参数解析为`null`或引用缺少的字段，则`$substrBytes`返回空字符串。如果参数不解析为字符串或`null`也不引用缺少的字段，则`$substrBytes`返回错误。
`byte index`	数	指示子字符串的起点。只要可以解析为非负整数或可以表示为整数（例如2.0）的数字，就可以是任何有效表达式。`byte index` `byte index` 无法引用位于多字节UTF-8字符中间的起始索引。
`byte count`	数	只要可以解析为非负整数或可以表示为整数（例如2.0）的数字，就可以是任何有效表达式。 `byte count` 不能导致结尾索引位于UTF-8字符的中间。

行为¶

的$substrBytes操作者使用的UTF-8编码的字节，其中每个码点，或字符，可以在一个和四个字节之间使用编码索引。

例如，US-ASCII字符使用一个字节编码。带变音符号的字符和其他拉丁字母字符（即英语字母之外的拉丁字符）使用两个字节进行编码。中文，日文和韩文字符通常需要三个字节，而其他Unicode平面（表情符号，数学符号等）则需要四个字节。

请务必牢记其中的内容，因为在UTF-8字符的中间提供或会导致错误。string expressionbyte indexbyte count

$substrBytes区别$substrCP在于 $substrBytes计算每个字符的字节数，而 $substrCP计数代码点或字符，而不管字符使用多少字节。

例	结果
{ $substrBytes: [ "abcde", 1, 2 ] }	"bc"
{ $substrBytes: [ "Hello World!", 6, 5 ] }	"World"
{ $substrBytes: [ "cafétéria", 0, 5 ] }	"café"
{ $substrBytes: [ "cafétéria", 5, 4 ] }	"tér"
{ $substrBytes: [ "cafétéria", 7, 3 ] }	消息错误： `"Error: Invalid range, starting index is a UTF-8 continuation byte."`
{ $substrBytes: [ "cafétéria", 3, 1 ] }	消息错误： `"Error: Invalid range, ending index is in the middle of a UTF-8 character."`

例子¶

单字节字符集¶

考虑包含inventory以下文档的集合：

复制

{ "_id" : 1, "item" : "ABC1", quarter: "13Q1", "description" : "product 1" }
{ "_id" : 2, "item" : "ABC2", quarter: "13Q4", "description" : "product 2" }
{ "_id" : 3, "item" : "XYZ1", quarter: "14Q2", "description" : null }

以下操作使用$substrBytes运算符将quarter值（仅包含单字节US-ASCII字符）分隔为a yearSubstring和a quarterSubstring。该 quarterSubstring字段代表指定后的字符串的其余部分。通过使用减去字符串的长度来计算。byte indexyearSubstringbyte index$strLenBytes

复制

db.inventory.aggregate(
  [
    {
      $project: {
        item: 1,
        yearSubstring: { $substrBytes: [ "$quarter", 0, 2 ] },
        quarterSubtring: {
          $substrBytes: [
            "$quarter", 2, { $subtract: [ { $strLenBytes: "$quarter" }, 2 ] }
          ]
        }
      }
    }
  ]
)

该操作返回以下结果：

复制

{ "_id" : 1, "item" : "ABC1", "yearSubstring" : "13", "quarterSubtring" : "Q1" }
{ "_id" : 2, "item" : "ABC2", "yearSubstring" : "13", "quarterSubtring" : "Q4" }
{ "_id" : 3, "item" : "XYZ1", "yearSubstring" : "14", "quarterSubtring" : "Q2" }

单字节和多字节字符集¶

名为的集合food包含以下文档：

复制

{ "_id" : 1, "name" : "apple" }
{ "_id" : 2, "name" : "banana" }
{ "_id" : 3, "name" : "éclair" }
{ "_id" : 4, "name" : "hamburger" }
{ "_id" : 5, "name" : "jalapeño" }
{ "_id" : 6, "name" : "pizza" }
{ "_id" : 7, "name" : "tacos" }
{ "_id" : 8, "name" : "寿司sushi" }

以下操作使用$substrBytes运算符menuCode从该name值创建一个三个字节：

复制

db.food.aggregate(
  [
    {
      $project: {
        "name": 1,
        "menuCode": { $substrBytes: [ "$name", 0, 3 ] }
      }
    }
  ]
)

该操作返回以下结果：

复制

{ "_id" : 1, "name" : "apple", "menuCode" : "app" }
{ "_id" : 2, "name" : "banana", "menuCode" : "ban" }
{ "_id" : 3, "name" : "éclair", "menuCode" : "éc" }
{ "_id" : 4, "name" : "hamburger", "menuCode" : "ham" }
{ "_id" : 5, "name" : "jalapeño", "menuCode" : "jal" }
{ "_id" : 6, "name" : "pizza", "menuCode" : "piz" }
{ "_id" : 7, "name" : "tacos", "menuCode" : "tac" }
{ "_id" : 8, "name" : "寿司sushi", "menuCode" : "寿" }

也可以看看

$substrCP