MDC - Data Migration | slideum.com

MDC - Data Migration

Transcript MDC - Data Migration

Mongo DB

MongoDB (from "humongous“) is a scalable, highperformance, open source, document-oriented
database. Written in C++.

Home: http://www.mongodb.org/
Support by http://www.10gen.com/
Production Deploy
http://www.mongodb.org/display/DOCS/Production+
Deployments


1
Agenda
 Getting Up to Speed with MongoDB ( key
features)
 Developing with MongoDB (start & shutdown &
connect & query & DML)
 Advanced Usage ( index & Aggregation, GridFS)
 Administration ( admin,replication,sharding)
 MISC (BJSON;internal)
2
Getting Up to Speed with MongoDB

Key Features of MongoDB
1.
2.
document-oriented
schema-free

Design Philosophy

Different from Relation database


easy & simple admin/dev : no-transaction, no-relation, noduration, no-SQL
Different from key-value database

tons of functions: indexing,Aggregation (MapReduce
etc),Fixed-size collections,File storage,replication
3
Getting Up to Speed with MongoDB

Key Features of MongoDB
Document-oriented (multiple key/value pairs)
1.
{ "_id" : ObjectId("4d9a2fa7640cde2b218c6f65"), "version" : 1300363306, "evenOrOdd" : 0,
"siteId" : 0 }
{ "_id" : ObjectId("4d9cd50d2a98297726eeda5b"), "prefix" : "craft fl", "res" : { "sug" : [ "craft
flowers" ], "categories" : [ [ 14339, "Crafts" ] ] } }
2.
Instance --1:X-- database -- 1:X– collection –1:X – document
3.
schema-free
•
Use test
•
Db.test.insert({ "version" : 1300363306, "evenOrOdd" : 0, "siteId" : 0 })
•
Db.test.insert({"prefix" : "craft fl", "res" : { "sug" : [ "craft flowers" ], "categories" : [ [ 14339,
"Crafts" ] ] } })
•
Db.test.insert({“name”:”binzhang’});
4
•
Db.test.ensureIndex({“name”,”1”})
Getting Up to Speed with MongoDB

Design Philosophy
1.
Databases are specializing - the "one size fits all" approach no longer applies.
MongoDB is bettween in-memory key-value and relational persistent database.
2.
By reducing transactional semantics the db provides, one can still solve an
interesting set of problems where performance is very important, and horizontal scaling
then becomes easier. The simpler, the faster.
3.
The document data model (JSON/BSON) is easy to code to, easy to manage
(schemaless), and yields excellent performance by grouping relevant data together
internally. But waste a bit space.
4.
A non-relational approach is the best path to database solutions which scale
horizontally to many machines. Easy to scale out for in-complex application.
5.
While there is an opportunity to relax certain capabilities for better performance,
there is also a need for deeper functionality than that provided by pure key/value stores.
5
Getting Up to Speed with MongoDB

Different from Relation database
1. easy & simple admin/dev
• Kill –INT to shutdown instance
2. no-transaction
3. no-relation
4. no-duration
KILL -9 would corrupt database and need to repair when start up
next time.
5. no-SQL
MongoDB comes with a JavaScript shell that allows
interaction with MongoDB instance from the command line.
6
Getting Up to Speed with MongoDB

1.
Different from key-value database
Datatypes: null,boolean,32-bit integer,64 bit integer,64 bit floating
point number, string, object_id,date,regular expression,code,binary
data,undefined,array,embedded document etc
2.
3.
4.
Indexing: unique index,combine index, geospatial indexing etc
Aggregation (MapReduce etc): group distinct etc
Fixed-size collections: Capped collections are fixed in size and
are useful for certain types of data, such as logs.
5.
File storage: a protocol for storing large files, uses subcollections to
store file metadata separately from content chunks
6.
7.
Replication: include master-slave mode and replicate-set mode
Security : simple authorization.
7
Agenda
 Getting Up to Speed with MongoDB ( Summary :
document oriented & schema-free )
 Developing with MongoDB (start & shutdown &
connect & query & DML)
 Advanced Usage ( index & Aggregation, GridFS)
 Administration ( admin,replication,sharding)
 MISC
8
Developing with MongoDB

Continue.





Start MongoDB
connect
Query
DML ( create, insert, update, delete, drop )
Stop cleanly
9
Developing with MongoDB

Start MongoDB
Mkdir /MONGO/data01
/opt/mongo/bin/mongod --logpath /MONGO/log01/server_log.txt --logappend --fork --cpu -dbpath /MONGO/data01 --replSet autocomplete
Fri Apr 1 14:37:08 [initandlisten] MongoDB starting : pid=10799 port=27017
dbpath=/MONGO/data01 64-bit
Fri Apr 1 14:37:08 [initandlisten] db version v1.8.0, pdfile version 4.5
Fri Apr 1 14:37:08 [initandlisten] git version:
9c28b1d608df0ed6ebe791f63682370082da41c0
Fri Apr 1 14:37:08 [initandlisten] build sys info: Linux bs-linux64.10gen.cc
2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64
BOOST_LIB_VERSION=1_41
Fri Apr 1 14:37:08 [initandlisten] waiting for connections on port 27017
Fri Apr 1 14:37:08 [websvr] web admin interface listening on port 28017
10
Developing with MongoDB

Connect to MongoD
$opt/mongo/bin/mongo
MongoDB shell version: 1.8.0
connecting to: test
autocomplete:PRIMARY> exit
Bye
usage: /opt/mongo/bin/mongo [options]
[db address] [file names (ending in .js)]
11
Developing with MongoDB

MongoDB Shell


MongoDB comes with a JavaScript shell that allows interaction with a
MongoDB instancefrom the command line.
Query – find()









db.c.find() returns everything in the collection c.
db.users.find({"age" : 27}) where the value for "age" is 27
db.users.find({}, {"username" : 1, "email" : 1}) if you are interested only
in the "username" and "email" keys
db.users.find({}, {"fatal_weakness" : 0}) never want to return the
"fatal_weakness" key
db.users.find({}, {"username" : 1, "_id" : 0})
db.users.find({"age" : {"$gte" : 18, "$lte" : 30}})
db.raffle.find({"ticket_no" : {"$in" : [725, 542, 390]}})
db.c.find({"z" : {"$in" : [null], "$exists" : true}})
db.users.find({"name" : /joe/i})
12
Developing with MongoDB
Behind Find() : Cursor


The database returns results from find using a cursor.
The client-side implementations of cursors generally allow you
to control a great deal about the eventual output of a query.
> var cursor = db.people.find();
> for(i=0; i<100; i++) {
> cursor.forEach(function(x) {
... db.c.insert({x : i});
... print(x.name);
... });
... }
adam
> var cursor = db.collection.find();
matt
 > while (cursor.hasNext()) {
zak
 ... obj = cursor.next();


... // do stuff
... }
13
Developing with MongoDB

Behind Find() : Cursor continue

Getting Consistent Results?


var cursor = db.myCollection.find({country:'uk'}).snapshot();
A fairly common way of processing data is to pull it out of
MongoDB, change it in some way, and then save it again:
cursor = db.foo.find();
while (cursor.hasNext()) {
var doc = cursor.next();
doc = process(doc);
db.foo.save(doc);
}
14
Developing with MongoDB


Create collection
db.foo.insert({"bar" : "baz"})

Insert
db.foo.insert({"bar" : "baz"})

Update


db.users.update({"_id" : ObjectId("4b253b067525f35f94b60a31")},
... {"$set" : {"favorite book" : "war and peace"}})

Delete

db.users.remove()
db.mailing.list.remove({"opt-out" : true})


Drop collection
db.foo.drop();
15
Developing with MongoDB

DML continue : Safe Operation
1.
MongoDB does not wait for a response by default when writing to
the database. Use the getLastError command to ensure that
operations have succeeded.
2.
The getLastError command can be invoked automatically with many
of the drivers when saving and updating in "safe" mode (some
drivers call this "set write concern").



db.$cmd.findOne({getlasterror:1})
db.runCommand("getlasterror")
db.getLastError()
16
Developing with MongoDB


Stop MongoDB
kill -2 10014 (SIGINT) or kill 10014 (SIGTERM).
1.
2.
3.
4.

wait for any currently running operations or file preallocations to finish (this
could take a moment)
close all open connections
flush all data to disk
halt.
use the shutdown command
> use admin
switched to db admin
> db.shutdownServer();
server should be down...
17
Agenda
 Getting Up to Speed with MongoDB ( Summary :
document oriented & schema-free )
 Developing with MongoDB (Summary: find())
 Advanced Usage ( index & Aggregation, GridFS)
 Administration ( admin,replication,sharding)
 MISC
18
MongoDB Advanced Usage

Advanced Usage
 Index
 Aggregation
 MapReduce
 Database commands


Capped Collections
GridFS: Storing Files
19
MongoDB Advanced Usage

Index




MongoDB’s indexes work almost identically to typical relational
database indexes,
Index optimization for MySQL/Oracle/SQLite will apply equally
well to MongoDB.
if an index has N keys, it will make queries on any prefix of
those keys fast
Example




db.people.find({"username" : "mark"})
db.people.ensureIndex({"username" : 1})
db.people.find({"date" : date1}).sort({"date" : 1, "username" : 1})
20
db.ensureIndex({"date" : 1, "username" : 1})
MongoDB Advanced Usage

Index continue



Indexes can be created on keys in embedded documents in the same
way that they are created on normal keys.
Indexing for Sorts : Indexing the sort allows MongoDB to pull the
sorted data in order, allowing you to sort any amount of data without
running out of memory.
Index Nameing rule:
keyname1_dir1_keyname2_dir2_..._keynameN_dirN, where
keynameX is the index’s key and dirX is the index’s direction (1 or -1).




db.blog.ensureIndex({"comments.date" : 1})
db.people.ensureIndex({"username" : 1}, {"unique" : true})
db.people.ensureIndex({"username" : 1}, {"unique" : true, "dropDups" :
true})
autocomplete:PRIMARY> db.system.indexes.find()
{ "name" : "_id_", "ns" : "test.fs.files", "key" : { "_id" : 1 }, "v" : 0 }
21
MongoDB Advanced Usage

Index continue : explain()

explain will return information about the indexes used for the query (if any)
and stats about timing and the number of documents scanned.
autocomplete:PRIMARY> db.users.find({"name":"user0"}).explain()
{
"cursor" : "BtreeCursor name_1",
"nscanned" : 1,
"nscannedObjects" : 1,
"n" : 1,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"name" : [
[
"user0",
"user0"
]
22
MongoDB Advanced Usage

Index continue : hint()

If you find that Mongo is using different indexes than you want it
to for a query, you can force it to use a certain index by using
hint.
db.c.find({"age" : 14, "username" : /.*/}).hint({"username" : 1, "age" : 1})

Index continue : change index

db.runCommand({"dropIndexes" : "foo", "index" : "alphabet"})

db.people.ensureIndex({"username" : 1}, {"background" : true})

Using the {"background" : true} option builds the index in the
background, while handling incoming requests. If you do not include
23
MongoDB Advanced Usage

Advanced Usage

Index

Aggregation
db.foo.count()
db.foo.count({"x" : 1})
db.runCommand({"distinct" : "people", "key" : "age"})

Group
{"day" : "2010/10/03", "time" : "10/3/2010 03:57:01 GMT-400", "price" : 4.23}
{"day" : "2010/10/04", "time" : "10/4/2010 11:28:39 GMT-400", "price" : 4.27}
{"day" : "2010/10/03", "time" : "10/3/2010 05:00:23 GMT-400", "price" : 4.10}
{"day" : "2010/10/06", "time" : "10/6/2010 05:27:58 GMT-400", "price" : 4.30}
{"day" : "2010/10/04", "time" : "10/4/2010 08:34:50 GMT-400", "price" : 4.01}
db.runCommand({"group" : {
... "ns" : "stocks",
... "key" : "day",
... "initial" : {"time" : 0},
... "$reduce" : function(doc, prev) {
... if (doc.time > prev.time) {
... prev.price = doc.price;
... prev.time = doc.time;
24
MongoDB Advanced Usage

Mapreduce


It is a method of aggregation that can be easily parallelized
across multiple servers. It splits up a problem, sends chunks of
it to different machines, and lets each machine solve its part of
the problem. When all of the machines are finished, they merge
all of the pieces of the solution back into
a full solution.\
> reduce = function(key, emits) {
Example: Finding All Keys in a Collection
... total = 0;
>map = function() {
... for (var key in this) {
... emit(key, {count : 1});
... }};
... for (var i in emits) {
... total += emits[i].count; }
... return {"count" : total};
... }
> mr = db.runCommand({"mapreduce" : "foo", "map" : map, "reduce"
:
25
reduce})
MongoDB Advanced Usage
Database commands


Commands implement all of the functionality that doesn’t fit
neatly into “create, read, update, delete.”
Example:
> db.runCommand({"drop" : "test"});
{ "errmsg" : "ns not found", "ok" : false }
It equals querying $cmd internal collections.
>db.$cmd.findOne({"drop" : "test"});

Show all commands
>db.listCommands()
26
MongoDB Advanced Usage

Capped Collections
1.
capped collections automatically age-out the
oldest documents as new documents are
inserted.
Documents cannot be removed or deleted
(aside from the automatic age-out described
earlier), and updates that would cause
documents to move (in general updates that
cause documents to grow in size) are
disallowed.
inserts into a capped collection are
extremely fast.
By default, any find performed on a capped
collection will always return results in
insertion order.
ideal for use cases like logging.
Replication use capped collection as OpLog.
2.
3.
4.
5.
6.
27
MongoDB Advanced Usage

GridFS: Storing Files


GridFS is a mechanism for storing large binary files in MongoDB.
Why using GridFS
• Using GridFS can simplify your stack. If you’re already using MongoDB,
GridFS
obviates the need for a separate file storage architecture.
• GridFS will leverage any existing replication or autosharding that you’ve set
up for
MongoDB, so getting failover and scale-out for file storage is easy.
• GridFS can alleviate some of the issues that certain filesystems can exhibit
when
being used to store user uploads. For example, GridFS does not have issues
with
storing large numbers of files in the same directory.
• You can get great disk locality with GridFS, because MongoDB allocates
data files
28
in 2GB chunks.
MongoDB Advanced Usage

GridFS: example
$ echo "Hello, world" > foo.txt
$ ./mongofiles put foo.txt
connected to: 127.0.0.1
added file: { _id: ObjectId('4c0d2a6c3052c25545139b88'),
filename: "foo.txt", length: 13, chunkSize: 262144,
uploadDate: new Date(1275931244818),
md5: "a7966bf58e23583c9a5a4059383ff850" }
done!
$ ./mongofiles list
connected to: 127.0.0.1
foo.txt 13
$ rm foo.txt
$ ./mongofiles get foo.txt
connected to: 127.0.0.1
29
MongoDB Advanced Usage

GridFS: internal
The basic idea behind GridFS is that we can store large files by splitting them up into chunks and storing each
chunk as a separate document.
autocomplete:PRIMARY> show collections
fs.chunks
fs.files
system.indexes
autocomplete:PRIMARY> db.fs.chunks.find()
{ "_id" : ObjectId("4db258ae05a23484714d58ad"), "files_id" :
ObjectId("4db258ae39ae206d1114d6e4"), "n" : 0, "data" :
BinData(0,"SGVsbG8sbW9uZ28K") }
{ "_id" : ObjectId("4db258d305a23484714d58ae"), "files_id" :
ObjectId("4db258d37858d8bb53489eea"), "n" : 0, "data" :
BinData(0,"SGVsbG8sbW9uZ28K") }
{ "_id" : ObjectId("4db2596d05a23484714d58af"), "files_id" :
ObjectId("4db2596d4fefdd07525ef166"), "n" : 0, "data" :
BinData(0,"SGVsbG8sbW9uZ28K") }
autocomplete:PRIMARY> db.fs.files.find()
{ "_id" : ObjectId("4db258ae39ae206d1114d6e4"), "filename" : "file1.txt", "chunkSize" :
262144, "uploadDate" : ISODate("2011-04-23T04:42:22.546Z"), "md5" :
30
"c002dec1a1086442b2aa49c2b6e48884", "length" : 12 }
MongoDB Advanced Usage

Advanced Usage review
 Index
: almost same as oracle
 Aggregation
 MapReduce : built-in
 Database commands
: db.listCommands()
 Capped Collections
: suitable for log
 GridFS: Storing Files
: built-in document oriented

Others



Geospatial Indexing
Database References
Server-Side Scripting
31
DBA on MongonDB
Administration ( admin,replication,sharding)
Monitoring
Security and Authentication
Backup and Repair
Master-Slave Replication
Replication-set
Sharding
32
DBA on MongonDB
 Easy Monitoring




Using the Admin Interface
db.runCommand({"serverStatus" : 1})
mongostat
Third-Party Plug-Ins
33
Using the Admin Interface
34


























db.runCommand({"serverStatus" : 1})
{
"version" : "1.5.3",
"uptime" : 166,
"localTime" : "Thu Jun 10 2010
15:47:40 GMT-0400 (EDT)",
"globalLock" : {
"totalTime" : 165984675,
"lockTime" : 91471425,
"ratio" : 0.551083556358441
},
"mem" : {
"bits" : 64,
"resident" : 101,
"virtual" : 2824,
"supported" : true,
"mapped" : 336
},
"connections" : {
"current" : 141,
"available" : 19859
},
"extra_info" : {
"note" : "fields vary by platform"
},
"indexCounters" : {
"btree" : {
db.runCommand({"serverStatus" : 1})





















"backgroundFlushing" : {
"flushes" : 2,
"total_ms" : 44,
"average_ms" : 22,
"last_ms" : 36,
"last_finished" : "Thu Jun 10 2010
15:46:54 GMT-0400 (EDT)"
},
"opcounters" : {
"insert" : 38195,
"query" : 8874,
"update" : 4058,
"delete" : 389,
"getmore" : 888,
"command" : 17731
},
"asserts" : {
"regular" : 0,
"warning" : 0,
"msg" : 0,
"user" : 5054,
"rollovers" : 0
35
mongostat



















Fields
inserts
- # of inserts per second
query
- # of queries per second
update
- # of updates per second
delete
- # of deletes per second
getmore
- # of get mores (cursor batch) per
second
command
- # of commands per second
flushes
- # of fsync flushes per second
mapped - amount of data mmaped (total data size)
megabytes
visze
- virtual size of process in megabytes
res
- resident size of process in megabytes
faults
- # of pages faults per sec (linux only)
locked
- percent of time in global write lock
idx miss - percent of btree page misses (sampled)
qr|qw
- queue lengths for clients waiting
(read|write)
ar|aw
- active clients (read|write)
netIn
- network traffic in - bits
netOut
- network traffic out - bits
conn
- number of open connections
36
DBA on MongonDB
Security and Authentication
1. Each database in a MongoDB instance can
2.
3.
4.
have any number of users.
only authenticated users of a database are able
to perform read or write operations on it.
A user in the admin database can be thought of
as a superuser
Need to start MongoDB with “--auth” option to
enable authentication.
37
Backup on MongonDB
1.
Data File Cold Backup

2.
3.
4.
kill –INT mongod; copy --dbpath
mongodump (exp) and mongorestore (imp)
fsync and Lock
Slave Backup
> use admin
switched to db admin
> db.runCommand({"fsync" : 1, "lock" : 1});
{
"info" : "now locked against writes, use db.$cmd.sys.unlock.findOne() to unlock",
"ok" : 1
}
Do mongodump
> db.$cmd.sys.unlock.findOne();
{ "ok" : 1, "info" : "unlock requested" }
38
Repair MongonDB
1.
Need to repair databases after an
unclean shutdown ( kill -9 )
**************
old lock file: /data/db/mongod.lock. probably means unclean
shutdown
recommend removing file and running --repair
see: http://dochub.mongodb.org/core/repair for more
information
*************
All of the documents in the database are
exported and then immediately imported,
ignoring any that are invalid. Then rebuild
indexes.
3. Take a long time while data-set is humongous
2.
39
DBA on MongonDB
Replication
Master-Slave Replication
Replication-set
40
Master-Slave Replication
$ mkdir -p ~/dbs/master
$ ./mongod --dbpath ~/dbs/master --port 10000 --master
$ mkdir -p ~/dbs/slave
$ ./mongod --dbpath ~/dbs/slave --port 10001 --slave -source localhost:10000
1.
2.
3.
4.
Scale read
Backup on Slave
Process data on Slave
DR
41
Master-Slave Replication
How it works? The Oplog
oplog.$main a capped collection in local database.
•
•
•
•
ts Timestamp for the operation. The timestamp type is an internal type
used to track when operations are performed. It is composed of a 4-byte
timestamp and a 4-byte incrementing counter.
op Type of operation performed as a 1-byte code (e.g., “i” for an insert).
ns Namespace (collection name) where the operation was performed.
o Document further specifying the operation to perform. For an insert,
this would be the document to insert.
1.
Slave first starts up, it will do a full sync of the data on the master node.
2.
After the initial sync is complete, the slave will begin querying the
master’s oplog and applying operations in order to stay up-to-date.
“async”
42
Replication on MongonDB
Replication-set
1. A replica set is basically a master-slave cluster
with automatic failover.
2. One master, some secondary (slave)
3. One secondary is elected by the cluster and may
change to another node if the current master
goes down.
43
Replication on MongonDB
Setup Replication-set
1. Option --replSet is name for this replica set.
$ ./mongod --dbpath ~/dbs/node1 --port 10001 --replSet autocomplete/slcdbx1005:10002
We start up the other server in the same way:
$ ./mongod --dbpath ~/dbs/node2 --port 10002 --replSet autocomplete/slcdbx1006:10001
If we wanted to add a third server, we could do so with either of these commands:
$ ./mongod --dbpath ~/dbs/node3 --port 10003 --replSet autocomplete/slcdbx1005:10001
$ ./mongod --dbpath ~/dbs/node3 --port 10003 –replSet autocomplete/slcdbx1005:10001,
slcdbx1006:10002
44
Replication-set failover
•standard a full copy of data & voting & ready to be primary
•passive a full copy of data & voting
•arbiter voting & no data replicated
45
MongonDB Auo-sharding
Sharding : splitting data up and storing different portions of the
data on different machines.
1. Manualy sharding: The application code manages storing different
data on different servers and querying against the appropriate server t
get data back.
2. Auto Sharding : The cluster handles splitting up data and rebalancing
automatically.
46
Auto sharding
When to shard?
1. You’ve run out of disk space on
your current machine.
2. You want to write data faster than a
single mongod can handle.
3. You want to keep a larger proportion
of data in memory to improve
performance.
4. DR
5. Failover automatically
47
Auto sharding
Component of MongoDB sharding?
•
Config server
$ mkdir -p ~/dbs/config
$ ./mongod --dbpath ~/dbs/config --port 20000
•
Mongos (router)
$ ./mongos --port 30000 --configdb localhost:20000
•
Sharding ( usually replication-set)
$ mkdir -p ~/dbs/shard1
$ ./mongod --dbpath ~/dbs/shard1 --port 10000
Mongos> db.runCommand({addshard : "localhost:10000",
allowLocal : true})
{
"added" : "localhost:10000",
"ok" : true
}
48
Pre sharding a table
Determine a shard key
1.
2.
3.
4.
define how we distribute data.
MongoDB's sharding is order-preserving; adjacent data by shard key tends to be on the same
server.
The config database stores all the metadata indicating the location of data by range:
It should be granular enough to ensure an even distribution of data.
Chunks
1.
2.
3.
a contiguous range of data from a particular collection.
Once a chunk has reached about 200M size, the chunk splits into two new chunks. When a
particular shard has excess data, chunks will then migrate to other shards in the system.
The addition of a new shard will also influence the migration of chunks.
49
Sharding a table
Enable sharding on a database
db.runCommand({"enablesharding" : "foo"})
Enable sharding on collection.
db.runCommand({"shardcollection" : "foo.bar", "key" : {"_id" : 1}})
Show autosharding status
> db.printShardingStatus()
--- Sharding Status --sharding version: { "_id" : 1, "version" : 3 }
shards:
{ "_id" : "shard0", "host" : "localhost:10000" }
{ "_id" : "shard1", "host" : "localhost:10001" }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "foo", "partitioned" : false, "primary" : "shard1" }
{ "_id" : "x", "partitioned" : false, "primary" : "shard0" }
{ "_id" : "test", "partitioned" : true, "primary" : "shard0",
"sharded" : { "test.foo" : { "key" : { "x" : 1 }, "unique" : false } } }
test.foo chunks:
{ "x" : { $minKey : 1 } } -->> { "x" : { $maxKey : 1 } } on : shard0
{ "t" : 1276636243000, "i" : 1 }
50
Query on sharding

assume a shard key of { x : 1 }.
51
Sharding machine layout
Avoid single failure
52
Review
 Getting Up to Speed with MongoDB ( document
oriented and schema-free )
 Developing with MongoDB (find())
 Advanced Usage ( Tons of features)
 Administration ( Easy to
admin,replication,sharding)
 MISC (BJSON;internal)
53
Misc
1.
2.
3.
BSON
Datafiles layout
Memory-Mapped Storage Engine
54
Misc1

BSON (Binary JSON)

a lightweight binary format capable of
representing any MongoDB document as a
string of bytes.
BSON is the format in which
documents are saved to disk.
When a driver is given a document to insert,
use as a query, and so on, it will encode that
document to BSON before sending it to the
server.
Goals:Efficiency Traversability Performance




55
Datafiles layout
& Memory-Mapped Storage Engine
2.
The numeric data files for a database will double in size for each
new file, up to a maximum file size of 2GB.
Preallocates data files to ensure consistent performance
3.
Memory-Mapped storage Engine
1.
4.
5.
6.
7.
When the server starts up, it memory maps all its data files.
OS is to manage flushing data to disk and paging data in and out.
MongoDB cannot control the order that data is written to disk, which
makes it impossible to use a writeahead log to provide single-server
durability.
32-bit MongoDB servers are limited to a total of about 2GB of data
per mongod. This is because all of the data must be addressable 56
using only 32 bits.
Q&A
 Getting Up to Speed with MongoDB ( key
features)
 Developing with MongoDB (start & shutdown &
connect & query & DML)
 Advanced Usage ( index & Aggregation, GridFS)
 Administration ( easy admin,replication,sharding)
 MISC (BSON; Memory-Mapped)
57

MDC - Data Migration

Transcript MDC - Data Migration

Directory