Transcript NoSQL
.NET Database Technologies:
Using NoSQL databases
NoSQL – “Not only SQL”
• Alternatives to the ubiquitous relational database which
may be superior in specific application scenarios
• Object-oriented databases (ODBMS)
They came, they saw, they....
...didn’t conquer, but they are still around
• NoSQL databases
The new kids on the block
General term applied to a range of different non-relational
database systems
Largely emerging to meet the needs of large-scale Web 2.0
applications
Object-oriented databases
• ODBMSs use the same data model as object-oriented
programming languages
no object-relational impedance mismatch due to a uniform
model
• An object database combines the features of an object-
oriented language and a DBMS (language binding)
treat data as objects
•
object identity
•
attributes and methods
•
relationships between objects
extensible type hierarchy
•
inheritance, overloading and overriding as well as
customised types
ODBMS history
• Object Database Manifesto
Paper published in 1989 (Atkinson et. al)
• Some ODBMS products
Early 1990s: Gemstone, Objectivity
Late 1990s: Versant, ObjectStore, Poet , Matisse
2000s: db4o, Cache
• ODMG (Object Data Management Group)
1993: ODMG 1.0 standard
1997: ODMG 2.0
1999: ODMG 3.0, then ODMG disbanded
2005: ODMG reformed, working towards new standard
ODMG
• Object Database ManagementGroup (ODMG) founded in
1991
standardisation body including all majorODBMS vendors
• Define a standard to increase the portability across
different ODBMS products
• Mirroring the SQL standard for RDBMS
Object Model
Object Definition Language (ODL)
Object Query Language (OQL)
language bindings
•
C++, Smalltalk and Java bindings
Characteristics of ODBMS
• Support complex data models with no mapping issues
• Tight integration with an object-oriented programming
language (persistent programming language)
• High performance in suitable
application scenarios
• Different products scale from
small-footprint embedded db
(db4o) to large-scale highlyconcurrent systems (e.g.
Versant V/OD)
Persistence patterns and ODBMS
• Some of Fowler’s patterns are specific to the use of a
relational database, e.g.
Data Mapper
Foreign Key Mapping
Metadata Mapping
Single-table Inheritance, etc.
• Some are not specific to the data storage model and are
relevant when using an ODBMS, e.g.
Identity Map
Unit of Work
Repository
Lazy-Loading
db4o
• Open-source object-database engine
Now owned by Versant
Complements their own V/OD product
• Can be used in embedded or client-server modes
Embed in application simply by including DLLs
• Native object database
Stores .NET (or Java) objects directly with no special
requirements on classes
Other ODBMSs (e.g. V/OD) require classes to be marked as
persistent through bytecode manipulation and also store class
definitions
Tight integration with application, but trade-off in limited adhoc querying and reporting
Can replicate data to relational database if required
IObjectContainer
• IObjectContainer interface is implemented by objects
which provide access to database
IObjectContainer is roughly equivalent to EF ObjectContext
Unit of Work pattern if transparent persistence is enabled (see
later)
• Can access DB in embedded mode (direct file access) or
client-server mode (local or remote)
IObjectServer instance required in client-server mode
• IObjectContainer instances created by factory classes, e.g.
Db40Embedded
• Queries on IObjectContainer return IObjectSet (except
LINQ queries)
Viewing data and ad-hoc querying
• ObjectManager Enterprise
Visual Studio plug-in
Browsing and drag-and-drop queries
• LINQPad
Need to include db4o DLLs and namespaces for stored classes
Executes LINQ queries and visualises results
db4o query APIs
• Query-by-example (QBE)
Very limited - no comparisons, ranges, etc.
• Simple Object Data Access (SODA)
Build query by navigating graph and adding constraints to
nodes
• Native Queries
Expressed completely in programming language
Type-safe
Optimised to SODA query at runtime if possible
• LINQ
.NET version, not in Java (obviously)
Activation
• Objects are stored in DB as an object graph
• If db4o configured to cascade-on-activate (eager loading)
then retrieving one object could potentially load a large
number of related objects
• Fixed activation depth limits depth of traversal of graph
when retrieving objects
Default value is 5
• Can then explicitly activate related objects when needed
• Lazy loading can be configured with transparent
activation
• Classes need to be “instrumented” at load time by running
Db4oTool.exe
Code injected into assembly so that classes implement
IActivatable interface
Update depth
• Similar considerations apply to updates
• Storing an updated object could cause unnecessary
updates to related objects
• Fixed update depth limits depth of traversal of graph
when retrieving objects
Default value is 1
• Can configure transparent persistence which allows
changes to be tracked
Only changed objects are updated in database
Behaves like change tracking in, for example, Entity
Framework
Unit of Work
PI?
• Stores POCOs without any need for mapping, so yes
• Transparent Activation requires that classes implement a
specific interface
• But this is done at build time so domain classes don’t need
any specific code
• Has parallels with dynamic proxies in ORMs:
Classes are instances of domain classes, which have been
modified ‘under the hood’ at build-time
Compare with dynamic proxy class which derive from domain
classes and are created ‘under the hood’ at run-time
Further reading
• www.odbms.org
Resource portal
• Db4o Tutorial
included in product download
• The Definitive Guide to db4o (Apress)
NoSQL databases
• New breed of databases that are appearing largely in
response to the limitations of existing relational databases
• Typically:
Support massive data storage (petabyte+)
Distribute storage and processing across multiple servers
• Contrast in architecture and priorities compared to
relational databases
• Hence term NoSQL
• “Not only SQL” – absence of SQL is not a requirement
NoSQL features
• Wide variety of implementations, but some features are
common to many of them:
• Schema-less
• Shared-nothing architecture
• Elasticity
• Sharding and asynchronous replication
• BASE, not ACID
Basically Available
Soft state
Eventually consistent
MapReduce
• Algorithm for dividing a work load into units suitable for
parallel processing
• Useful for queries against large sets of data: the query can
be distributed to 100’s or 1000’s of nodes, each of which
works on a subset of the target data
• The results are then merged together, ultimately yielding
a single “answer” to the original query
• Example: get total word count of a large number of
documents
Map: calculate word count of each document
•
Each node works on a subset of the overall data set
•
Results emitted to intermediate storage
Reduce: calculate total of intermediate results
Brewer’s CAP theorem
• Can optimize for only two of three priorities in a
distributed database:
• Consistency
All clients have same view of the data
Requires atomicity, transaction isolation
• Availability
Every request received by a non-failing node must result in a
response
• Partition Tolerance
Partitions happen if certain nodes can’t communicate
No set of failures less than total network failure is allowed to
cause the system to respond incorrectly
Implications of CAP theorem
• Any two properties can be achieved
• CP
If messages between nodes are lost then system waits
Possible that no response returned at all
No inconsistent data returned to client
• CA
No partitions, system will always respond and data is
consistent
• AP
Response always returned even if some messages between
nodes
Different nodes may have different views of the data
Implications of CAP theorem
• Choose a database whose priorities match the application
http://blog.nahurst.com/visual-guide-to-nosql-systems
Using a NoSQL database in a .NET application
• Application typically makes connection to remote cluster
• Some (but not many) NoSQL databases are supported by
native .NET clients
Handle “mapping” from .NET objects to data model
• Many NoSQL databases are accessed through a REST
interface
Application must construct request and handle response
format, e.g. JSON
Application can be written in any suitable language
• Azure Table Storage is Microsoft’s NoSQL storage for
cloud-based applications
• However the data is accessed, you need to understand the
data model, which will be significantly different from a
typical relational database or object model
NoSQL database types and examples
• Key/value Databases
These manage a simple value or row, indexed by a key
e.g. Voldemort, Vertica
• Big table Databases
“a sparse, distributed, persistent multidimensional sorted map”
e.g. Google BigTable, Azure Table Storage, Amazon SimpleDB
• Document Databases
Multi-field documents (or objects) with JSON access
e.g. MongoDB, RavenDB (.NET specific), CouchDB
• Graph Databases
Manage nodes, edges, and properties
e.g. Neo4j, sones
MongoDB
• Scalable, high-performance, open source, document-
oriented database
• Stores JSON-style (actually BSON) documents with
dynamic schema
• Replication, high-availability and auto-sharding
• Supports document-based queries and map/reduce
• Command line tools :
mongod – starts server as a service or daemon
mongo – client shell
•
Store documents defined as JSON
•
Retrieved documents form query displayed as JSON
MongoDB and HTTP
• Admin console at http://<server name>:28017
• REST interface on http://<server name>:28018
Enabled by starting server with mongod --rest
Server responds to RESTful HTTP requests, e.g.
•
http://127.0.0.1:28017/company/Employee/?filter_Name=
Fernando
Response is in JSON format
Could be consumed by client-side code in Ajax application
MongoDB .NET driver
• Can access documents as instances of Document class
• Represents document as key-value pairs
• Or, can serialize POCOs to database format (JSON)
• Deserialize database documents to POCOs
• Supports LINQ queries
• MapReduce queries can be expressed as LINQ queries
MongoDB schema design
• Collections are essentially named groupings of documents
Roughly equivalent to relational database tables
• Less "normalization" than a relational schema because there
are no server-side joins
• Generally, you will want one database collection for each of
your top level objects
Don’t want a collection for every "class" - instead, embed objects
relational
document
Document example
• Save:
• Query:
http://www.10gen.com/video/mongosv2010/schemadesign
MongoDB in C# applications - PI?
• Up to a point
• Collection class needs Id property of a specific type
(MongoDB.Oid)
• Object model needs to be designed with document schema
in mind
Further reading
• http://nosql-database.org/
• http://www.nosqlpedia.com/
• http://www.mongodb.org/
• http://www.codeproject.com/KB/database/MongoDBCS.aspx
Nice code example for C# and MongoDB