13.MongoDB系列之分片简介

2022-11-21,,,,

1. 分片概念

分片是指跨机器拆分数据的过程,有时也会用术语分区。MongoDB既可以手工分片,也支持自动分片

2. 理解集群组件

分片的目标之一是由多个分片组成的集群对应用程序来说就像是一台服务器。为了此实现,需要在分片前面运行一个或多个称为mongos的路由进程。mongos维护着一个“目录”,指明了哪个分片包含哪些数据。

应用程序可以正常连接到此路由器并发出请求。路由服务器哪些数据在哪些分片上,可以将请求转发到适当的分片。如果有对请求的响应,路由服务器会收集他们,并在必要时进行合并,然后在发送给应用程序

3. 在单机集群上进行分片
# mongo --nodb --norc
MongoDB shell version v4.2.6
>st = ShardingTest({
name: "one-min-shards",
chunkSize:1,
shards: 2,
rs:{
nodes:3,
oplogSize:10
},
other: {
enableBalancer:true
}
})

当ShardingTest完成集群设置后,将启动并运行10个进程:两个副本集(各有3个节点)、一个配置服务器副本集(有3个节点)、一个mongos

在新终端窗口执行ps -ef|grep mongo可以看到如下信息:

00:00:09 /usr/bin/mongod --oplogSize 10 --port 20000 --replSet one-min-shards-rs0 --dbpath /data/db/one-min-shards-rs0-0 --shardsvr --setParameter migrationLockAcquisitionMaxWaitMS=30000 --setParameter writePeriodicNoops=false --setParameter numInitialSyncConnectAttempts=60 --bind_ip 0.0.0.0 --setParameter enableTestCommands=1 --setParameter disableLogicalSessionCacheRefresh=true --setParameter orphanCleanupDelaySecs=1
root 105 66 1 21:57 pts/0 00:00:10 /usr/bin/mongod --oplogSize 10 --port 20001 --replSet one-min-shards-rs0 --dbpath /data/db/one-min-shards-rs0-1 --shardsvr --setParameter migrationLockAcquisitionMaxWaitMS=30000 --setParameter writePeriodicNoops=false --setParameter numInitialSyncConnectAttempts=60 --bind_ip 0.0.0.0 --setParameter enableTestCommands=1 --setParameter disableLogicalSessionCacheRefresh=true --setParameter orphanCleanupDelaySecs=1
root 140 66 1 21:57 pts/0 00:00:10 /usr/bin/mongod --oplogSize 10 --port 20002 --replSet one-min-shards-rs0 --dbpath /data/db/one-min-shards-rs0-2 --shardsvr --setParameter migrationLockAcquisitionMaxWaitMS=30000 --setParameter writePeriodicNoops=false --setParameter numInitialSyncConnectAttempts=60 --bind_ip 0.0.0.0 --setParameter enableTestCommands=1 --setParameter disableLogicalSessionCacheRefresh=true --setParameter orphanCleanupDelaySecs=1
root 293 66 1 21:58 pts/0 00:00:09 /usr/bin/mongod --oplogSize 10 --port 20003 --replSet one-min-shards-rs1 --dbpath /data/db/one-min-shards-rs1-0 --shardsvr --setParameter migrationLockAcquisitionMaxWaitMS=30000 --setParameter writePeriodicNoops=false --setParameter numInitialSyncConnectAttempts=60 --bind_ip 0.0.0.0 --setParameter enableTestCommands=1 --setParameter disableLogicalSessionCacheRefresh=true --setParameter orphanCleanupDelaySecs=1
root 328 66 1 21:58 pts/0 00:00:09 /usr/bin/mongod --oplogSize 10 --port 20004 --replSet one-min-shards-rs1 --dbpath /data/db/one-min-shards-rs1-1 --shardsvr --setParameter migrationLockAcquisitionMaxWaitMS=30000 --setParameter writePeriodicNoops=false --setParameter numInitialSyncConnectAttempts=60 --bind_ip 0.0.0.0 --setParameter enableTestCommands=1 --setParameter disableLogicalSessionCacheRefresh=true --setParameter orphanCleanupDelaySecs=1
root 363 66 1 21:58 pts/0 00:00:09 /usr/bin/mongod --oplogSize 10 --port 20005 --replSet one-min-shards-rs1 --dbpath /data/db/one-min-shards-rs1-2 --shardsvr --setParameter migrationLockAcquisitionMaxWaitMS=30000 --setParameter writePeriodicNoops=false --setParameter numInitialSyncConnectAttempts=60 --bind_ip 0.0.0.0 --setParameter enableTestCommands=1 --setParameter disableLogicalSessionCacheRefresh=true --setParameter orphanCleanupDelaySecs=1
root 525 66 2 21:58 pts/0 00:00:11 /usr/bin/mongod --oplogSize 40 --port 20006 --replSet one-min-shards-configRS --dbpath /data/db/one-min-shards-configRS-0 --journal --configsvr --storageEngine wiredTiger --setParameter writePeriodicNoops=false --setParameter numInitialSyncConnectAttempts=60 --bind_ip 0.0.0.0 --setParameter enableTestCommands=1 --setParameter disableLogicalSessionCacheRefresh=true --setParameter orphanCleanupDelaySecs=1
root 567 66 1 21:58 pts/0 00:00:11 /usr/bin/mongod --oplogSize 40 --port 20007 --replSet one-min-shards-configRS --dbpath /data/db/one-min-shards-configRS-1 --journal --configsvr --storageEngine wiredTiger --setParameter writePeriodicNoops=false --setParameter numInitialSyncConnectAttempts=60 --bind_ip 0.0.0.0 --setParameter enableTestCommands=1 --setParameter disableLogicalSessionCacheRefresh=true --setParameter orphanCleanupDelaySecs=1
root 609 66 1 21:58 pts/0 00:00:10 /usr/bin/mongod --oplogSize 40 --port 20008 --replSet one-min-shards-configRS --dbpath /data/db/one-min-shards-configRS-2 --journal --configsvr --storageEngine wiredTiger --setParameter writePeriodicNoops=false --setParameter numInitialSyncConnectAttempts=60 --bind_ip 0.0.0.0 --setParameter enableTestCommands=1 --setParameter disableLogicalSessionCacheRefresh=true --setParameter orphanCleanupDelaySecs=1
root 792 66 0 21:58 pts/0 00:00:00 /usr/bin/mongos -v --port 20009 --configdb one-min-shards-configRS/2bffe09ec303:20006,2bffe09ec303:20007,2bffe09ec303:20008 --bind_ip 0.0.0.0 --setParameter enableTestCommands=1 --setParameter disableLogicalSessionCacheRefresh=true

整个集群会将日志存储到当前shell中,因此打开第二个终端窗口,并启动另一个mongo shell:

# mongo --port 20009
mongos> use accounts
switched to db accounts
mongos> for(var i=0; i<100000;i++){db.users.insert({'username': 'user'+i, 'created_at': new Date()})}
mongos> db.users.count()
100000
mongos> sh.status()
--- Sharding Status ---
sharding version: {
"_id" : 1,
"minCompatibleVersion" : 5,
"currentVersion" : 6,
"clusterId" : ObjectId("62d2c414601aeb7a88c169f7")
}
shards:
{ "_id" : "one-min-shards-rs0", "host" : "one-min-shards-rs0/2bffe09ec303:20000,2bffe09ec303:20001,2bffe09ec303:20002", "state" : 1 }
{ "_id" : "one-min-shards-rs1", "host" : "one-min-shards-rs1/2bffe09ec303:20003,2bffe09ec303:20004,2bffe09ec303:20005", "state" : 1 }
active mongoses:
"4.2.6" : 1
autosplit:
Currently enabled: yes
balancer:
Currently enabled: yes
Currently running: no
Failed balancer rounds in last 5 attempts: 0
Migration Results for the last 24 hours:
No recent migrations
databases:
{ "_id" : "accounts", "primary" : "one-min-shards-rs1", "partitioned" : false, "version" : { "uuid" : UUID("ba658ebe-bbd1-44cd-b2c9-cb48d3810551"), "lastMod" : 1 } }
{ "_id" : "config", "primary" : "config", "partitioned" : true }
config.system.sessions
shard key: { "_id" : 1 }
unique: false
balancing: true
chunks:
one-min-shards-rs0 1
{ "_id" : { "$minKey" : 1 } } -->> { "_id" : { "$maxKey" : 1 } } on : one-min-shards-rs0 Timestamp(1, 0)

要对一个特定的集合进行分片,首先需要在集合的数据库上启用分片,如下:

mongos> sh.enableSharding("accounts")

在对集合进行分片时,需要选择一个片键。片键是MongoDB用来拆分数据的一个或几个字段。随着集合的增大,片键会成为集合中最重要的索引。只有创建了索引的字段才能够作为片键。

mongos> db.users.createIndex({'username':1})

现在可以通过username片键来对集合进行分片了

mongos> sh.shardCollection("accounts.users", {"username":1})
mongos> sh.status()
--- Sharding Status ---
sharding version: {
"_id" : 1,
"minCompatibleVersion" : 5,
"currentVersion" : 6,
"clusterId" : ObjectId("62d2c414601aeb7a88c169f7")
}
shards:
{ "_id" : "one-min-shards-rs0", "host" : "one-min-shards-rs0/2bffe09ec303:20000,2bffe09ec303:20001,2bffe09ec303:20002", "state" : 1 }
{ "_id" : "one-min-shards-rs1", "host" : "one-min-shards-rs1/2bffe09ec303:20003,2bffe09ec303:20004,2bffe09ec303:20005", "state" : 1 }
active mongoses:
"4.2.6" : 1
autosplit:
Currently enabled: yes
balancer:
Currently enabled: yes
Currently running: no
Failed balancer rounds in last 5 attempts: 0
Migration Results for the last 24 hours:
6 : Success
databases:
{ "_id" : "accounts", "primary" : "one-min-shards-rs1", "partitioned" : true, "version" : { "uuid" : UUID("ba658ebe-bbd1-44cd-b2c9-cb48d3810551"), "lastMod" : 1 } }
accounts.users
shard key: { "username" : 1 }
unique: false
balancing: true
chunks:
one-min-shards-rs0 6
one-min-shards-rs1 7
{ "username" : { "$minKey" : 1 } } -->> { "username" : "user17256" } on : one-min-shards-rs0 Timestamp(2, 0)
{ "username" : "user17256" } -->> { "username" : "user24515" } on : one-min-shards-rs0 Timestamp(3, 0)
{ "username" : "user24515" } -->> { "username" : "user31775" } on : one-min-shards-rs0 Timestamp(4, 0)
{ "username" : "user31775" } -->> { "username" : "user39034" } on : one-min-shards-rs0 Timestamp(5, 0)
{ "username" : "user39034" } -->> { "username" : "user46294" } on : one-min-shards-rs0 Timestamp(6, 0)
{ "username" : "user46294" } -->> { "username" : "user53553" } on : one-min-shards-rs0 Timestamp(7, 0)
{ "username" : "user53553" } -->> { "username" : "user60812" } on : one-min-shards-rs1 Timestamp(7, 1)
{ "username" : "user60812" } -->> { "username" : "user68072" } on : one-min-shards-rs1 Timestamp(1, 7)
{ "username" : "user68072" } -->> { "username" : "user75331" } on : one-min-shards-rs1 Timestamp(1, 8)
{ "username" : "user75331" } -->> { "username" : "user82591" } on : one-min-shards-rs1 Timestamp(1, 9)
{ "username" : "user82591" } -->> { "username" : "user89851" } on : one-min-shards-rs1 Timestamp(1, 10)
{ "username" : "user89851" } -->> { "username" : "user9711" } on : one-min-shards-rs1 Timestamp(1, 11)
{ "username" : "user9711" } -->> { "username" : { "$maxKey" : 1 } } on : one-min-shards-rs1 Timestamp(1, 12)
{ "_id" : "config", "primary" : "config", "partitioned" : true }
config.system.sessions
shard key: { "_id" : 1 }
unique: false
balancing: true
chunks:
one-min-shards-rs0 1
{ "_id" : { "$minKey" : 1 } } -->> { "_id" : { "$maxKey" : 1 } } on : one-min-shards-rs0 Timestamp(1, 0)

可以看到,这个集合被分为了13个块,其中6个块在分片one-min-shards-rs0上,7个块在分片one-min-shards-rs1上。

现在数据已经分布在分片上了,现在我们查询特定用户名:

mongos> db.users.find({username: "user12345"})
{ "_id" : ObjectId("62d2ca068dbf17063ba4d62f"), "username" : "user12345", "created_at" : ISODate("2022-07-16T14:24:06.891Z") }

通过explain可以观察到数据分布在分片one-min-shards-rs0上

mongos> db.users.find({username: "user12345"}).explain()
{
"queryPlanner" : {
"mongosPlannerVersion" : 1,
"winningPlan" : {
"stage" : "SINGLE_SHARD",
"shards" : [
{
"shardName" : "one-min-shards-rs0",
"connectionString" : "one-min-shards-rs0/2bffe09ec303:20000,2bffe09ec303:20001,2bffe09ec303:20002",
"serverInfo" : {
"host" : "2bffe09ec303",
"port" : 20001,
"version" : "4.2.6",
"gitVersion" : "20364840b8f1af16917e4c23c1b5f5efd8b352f8"
},
"plannerVersion" : 1,
"namespace" : "accounts.users",
"indexFilterSet" : false,
"parsedQuery" : {
"username" : {
"$eq" : "user12345"
}
},
"queryHash" : "379E82C5",
"planCacheKey" : "965E0A67",
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "SHARDING_FILTER",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"username" : 1
},
"indexName" : "username_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"username" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"username" : [
"[\"user12345\", \"user12345\"]"
]
}
}
}
},
"rejectedPlans" : [ ]
}
]
}
}
}

当查询全部数据时,可以发现查询必须方位两个分片才能找到所有数据。通常来说,如果查询中没有使用片键,mongos就不得不将查询发送到每个分片上。

mongos> db.users.find({}).explain()
{
"queryPlanner" : {
"mongosPlannerVersion" : 1,
"winningPlan" : {
"stage" : "SHARD_MERGE",
"shards" : [
{
"shardName" : "one-min-shards-rs0",
"connectionString" : "one-min-shards-rs0/2bffe09ec303:20000,2bffe09ec303:20001,2bffe09ec303:20002",
"serverInfo" : {
"host" : "2bffe09ec303",
"port" : 20001,
"version" : "4.2.6",
"gitVersion" : "20364840b8f1af16917e4c23c1b5f5efd8b352f8"
},
"plannerVersion" : 1,
"namespace" : "accounts.users",
"indexFilterSet" : false,
"parsedQuery" : { },
"queryHash" : "8B3D4AB8",
"planCacheKey" : "8B3D4AB8",
"winningPlan" : {
"stage" : "SHARDING_FILTER",
"inputStage" : {
"stage" : "COLLSCAN",
"direction" : "forward"
}
},
"rejectedPlans" : [ ]
},
{
"shardName" : "one-min-shards-rs1",
"connectionString" : "one-min-shards-rs1/2bffe09ec303:20003,2bffe09ec303:20004,2bffe09ec303:20005",
"serverInfo" : {
"host" : "2bffe09ec303",
"port" : 20005,
"version" : "4.2.6",
"gitVersion" : "20364840b8f1af16917e4c23c1b5f5efd8b352f8"
},
"plannerVersion" : 1,
"namespace" : "accounts.users",
"indexFilterSet" : false,
"parsedQuery" : { },
"queryHash" : "8B3D4AB8",
"planCacheKey" : "8B3D4AB8",
"winningPlan" : {
"stage" : "SHARDING_FILTER",
"inputStage" : {
"stage" : "COLLSCAN",
"direction" : "forward"
}
},
"rejectedPlans" : [ ]
}
]
}
},
"serverInfo" : {
"host" : "2bffe09ec303",
"port" : 20009,
"version" : "4.2.6",
"gitVersion" : "20364840b8f1af16917e4c23c1b5f5efd8b352f8"
},
"ok" : 1,
"operationTime" : Timestamp(1658027749, 1),
"$clusterTime" : {
"clusterTime" : Timestamp(1658030075, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
}

定向查询:包含片键并可以发送到单个分片或分片子集的查询

分散-收集查询:必须发送到所有分片的查询

现在可以在原来的shell,按几次Enter键返回命令行,运行sh.stop()干净的关闭所有服务器

欢迎关注公众号算法小生或沈健的技术博客

13.MongoDB系列之分片简介的相关教程结束。

《13.MongoDB系列之分片简介.doc》

下载本文的Word格式文档,以方便收藏与打印。