NoSQL_AND_Big_Data - GlobalsDB

Download Report

Transcript NoSQL_AND_Big_Data - GlobalsDB

Big Data, NoSQL . . . So What?

Iran Hutchinson

Me

• I work for InterSystems who: – Drives http://globalsdb.org

– – – NoSQL project.

Has 20+ years of NoSQL production deployments Has 20+ years of Big Data production deployments Built a ~250 million Euro business on the above • Email: [email protected]

• Twitter: #iranic

#iranic

Big Data

is … • Important data in varying formats and volumes that is being generated across all areas affecting your business that is generally not centrally correlated or managed.

• Examples include: – Word Files, PowerPoint, PDFs – – Emails, Instant Messaging, Texts Blogs and Social Media – – Automated data from machine activities Stream data from financial stock markets

#iranic

Some Big Data Numbers

• Source: McKinsey Global Institute • 5 Billion mobile phones used in 2010 • 30 Billion pieces of info shared on Facebook each month • 40% projected growth in global data generated • 235 Terabytes collected by US Library of Congress 04/11 – 15 out of 17 sectors in US have more data stored per company than this.

#iranic

Some Big Data Numbers …

• Source: McKinsey Global Institute • $300 Billion in potential value in US Healthcare system • €250 Billion in Europe’s public sector administration • $600 Billion in annual consumer surplus using location data • 60% Potential increase in retail operating margins • 140,000 – 190,000 analytical talent positions in US • 1.5 Million data-savvy managers needed in US

#iranic

Case Study: Credit Suisse

• Key Challenges: – Revamp order routing architecture – – – Revamp order management architecture Serve current demand and scale to new levels Address downtime challenges

#iranic

Case Study: Credit Suisse …

• Big Data in the form of volumes of transactions • Leveraged Caché’s: – In-memory architecture for performance – – On-disk resiliency for availability Distributed architecture for data coherency • Can easily process 1,000,000,000 transactions – During business hours

#iranic

Case Study: European Space Agency (ESA) • Key Challenges – Make the largest, most precise 3-D map of our Galaxy – – Monitor 1,000,000,000 stars over 5 years, precisely charting position, movement, and brightness Along the way discover hundreds of thousands of new celestial objects

#iranic

Case Study: ESA Continued …

• Challenge Calculation: • Capture data for 1 Billion Celestial Objects • http://www.intersystems.com/cache/whitepapers/pdf/Charting_th e_Galaxy.pdf

X X 1,000,000,000 objects 100 observations per object 600 bytes per observation 60,000,000,000,000 (60TB) Solution: Caché/XEP, delivering 100,000+ sustained inserts per second per server, stored as real objects with SQL access

#iranic

Enabling Technology

• Focus on Caché • A quick look at the architecture

#iranic

Enabling Technology …

• Java + C database kernel run in same process

#iranic

Enabling Technology …

• ECP, Distributed Computing

#iranic

Enabling Technology …

• Multiple, simultaneous data to disk writers

#iranic

Who is this Guy?

• Edgar Frank “Ted” Codd • Known for 12 Rules (0 ~ 12) for Relational Data Systems

#iranic

NoSQL … Breaking the Rules

• Rule 1: The information Rule – All information is represented in 1 and only 1 way, namely by values in column positions within rows of tables • Rule 12: The no subversion Rule – If the system provides a low-level (record-at-a-time) interface, then that interface cannot be used to subvert the system i.e. relational security or integrity constraints.

#iranic

Why NoSQL?

• No to ACID transactions • No to the impedance mismatch with SQL • Dealing with Big Data and Web Scale • High prices from RDBMS vendors • Use commodity hardware • Flexible data models • It’s a cool movement ….

#iranic

Is NoSQL a new Concept?

• No • Remember MUMPS?

– SET ^Car("Door","Color")="BLUE” • Remember Multi-value/PICK – MATWRITE array.variable ON file.variable,id. ….

• Ever heard of the NoSQL RDB?

– Carlo Strozzi – http://www.strozzi.it/cgi bin/CSA/tw7/I/en_US/nosql/Home%20Page

#iranic

CAP Theorem

• Consistent – A service that is consistent operates fully or not.

• Availability – The service is available to operate fully or not.

• Partition Tolerance – Managing data on multiple nodes. 1 node is 1 partition so it works or does not when it comes to processing data.

• Significant as you can get 2 of these only …

#iranic

CAP Theorem …

• Arguments and links – http://www.julianbrowne.com/article/viewer/brewers cap-theorem – – http://ksat.me/a-plain-english-introduction-to-cap theorem/ http://voltdb.com/company/blog/clarifications-cap theorem-and-data-related-errors

#iranic

CAP Theorem …: Consistency

DB1 DB7 DB2 DB6 DB3 DB5 DB4

#iranic

CAP Theorem …: Consistency

Spoke DB1 Spoke DB4 Hub Spoke DB2 Spoke DB3

#iranic

CAP Theorem …: Consistency

DB1

#iranic

DB3 DB2

Distributed computing

• Fallacies (Peter Deutsch) – The network is reliable – – Latency is zero Bandwidth is infinite – – – – – The network is secure Topology doesn’t change There is one administrator Transport cost is zero The network is homogeneous •

#iranic

Remember JINI? (See Apache River project)

NoSQL: Which Model to Use?

Key-Value Graph Data Document Column

#iranic

NoSQL: Which project?

• http://nosql-database.org/ lists 122 today.

• Depends on your model selection.

• Most likely choose well-known project.

• Don’t forget about shared risk!

#iranic

NoSQL: Querying

• Some solutions have no querying • When available query languages differ • Lack of general AD-Hoc querying – “no” SQL • Have you heard of UnQL?

– http://www.unqlspec.org/display/UnQL/Home • NOTE: Toad for Cloud

#iranic

NoSQL: How to Succeed?

• Know your application • Don’t forget the past lessons • Consider a hybrid approach • Fight the desire to Roll-Your-Own-DB • Start small but significant

#iranic

NoSQL: Hybrid Approach 1

• Two Systems • NoSQL System • SQL/RDBMS NoSQL Data Mapper / Translator SQL/RDBMS

#iranic

NoSQL: Hybrid Approach 2

• One system does both NoSQL and SQL Relational ?

Data Graph Key-Value Document Column

#iranic

GlobalsDB.org Project

• Name comes from the underlying data structure – Multi-dimensional array – Basis for commercial Caché data system • Free for development and production deployment • NoSQL DB with Java and Node.js APIs • Code base is same as commercial product • APIs are open sourced or being open sourced • Database kernel is not open source

#iranic

A “Global” Definition

• A Global is persistent sparse multi-dimensional array, which consists of one or more storage elements or "nodes". Each node is identified by a node reference (which is, essentially, its logical address) – – simple =="some data” complex["subscript-1", "subscript-2"] =="some data” •

#iranic

Example – product[item,type,os,proccessor] == quantity – product[“computer”,”laptop”,”Mac”,”i7”] == 3

GlobalsDB Architecture

• Current Architecture

#iranic

GlobalsDB, NoSQL, Big Data

• http://nosql.mypopescu.com/ • http://highscalability.com/ • http://nosqltapes.com/ • http://globalsdb.wordpress.com

#iranic