Crossmedia-Publishing mit NoSQL

Download Report

Transcript Crossmedia-Publishing mit NoSQL

Crossmedia-Publishing
mit NoSQL-Techniken:
Möglichkeiten,
Einsatzszenarien,
Bewertung
Christian Kohl, De Gruyter
23.06.2015
Cross-Media-Forum, München
https://www.flickr.com/photos/jdhancock/5307754233; License: CC BY 2.0
1) Kurze Erläuterung NoSQL + XML
2) NoSQL + XML im Crossmedia-Publishing
bei De Gruyter
Source: http://www.flickr.com/photos/ravescuritiba/773032554/
A very very short history of DB technology
1960s Hierarchical Era
Applikations- und Hardware
spezifische Datenspeicherung
IBM Mainframes bspw.
1970s+ Relational Era
Granularer Zugriff auf
hochstrukturierte Daten
Tabellen: Spalten/Zeilen
IBM, MS, Oracle, …
+ SQL
2000s+ Any Structure Era
Schema agnostic, Massive
scale, Query and search,
Heterogeneous data,
Unstructered, Faster time-toresults
Amazon, Google, Facebook,
LinkedIn, MarkLogic, …
+ XQUERY, SPARQL, Gremlin,
…
Image Source: https://www.flickr.com/photos/infocux/8450190120; License: CC BY-NC 2.0
Datenlandschaft heute
Datenmenge
RDBMS
Verlinkung
Suchmaschine
Verteilte,
horizontal
Information Continuum
skalierbare Architekturen
Volumen von
Information
XML
Relational
Structured
Metadata
Geospatial
Graph
Sparse Emails Documents
Semi-/UnHierarchical Semi-structured Content
Structured
Data
Semi- or Un-Structured
Time-varying
Free text
Source: Frank Föge, MarkLogic Corporation, 2014.
RDBMS Performance
Relationale DB
Anforderung der Applikation
Lohnliste
Performance
Großzahl d.
Webanwendungen
Soziales Netzwerk
Semantic Trading?
Datenkomplexität / -heterogenität
Source: Sam Bisbee, http://www.ibmbigdatahub.com/blog/exploring-nosql-family-tree.
(Zu) Einfache NoSQL
Taxonomie
Key/Value
• Riak, Dynamo, Voldemort, …
Column Oriented
• Cassandra, Hbase, BigTable, …
Document Store
• MarkLogic, CouchDB, MongoDB, …
Graph
• Neo4j, InfiniteGraph, …
Image Source: http://h5inc.files.wordpress.com/2011/04/warning-brain-explosion-zone.png
NoSQL ermöglicht …
• Schnellere App Entwicklung
• Heterogene Datentypen
• Rapid Deployment
• Starke horizontale Skalierbarkeit
hinsichtlich
• Größe
• Komplexität
Image Source: https://steenschledermann.files.wordpress.com/2014/05/no-thanks-were-too-busy1.jpg?w=611
Source: http://media.gamemanx.com/flv/sf4-ehonda-sagat.jpg
Developer Journey
== Agile Process
Iterate
Load Data
Sources
“as-is”
(XML, JSON, Binary)
SLIDE: 10
Search
Transform
Combine
Data
Define Indexes
for Analytics
Data Access
Web Application
User Interface
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
DOXMLDBs
Image Source: http://www.flickr.com/photos/rs-foto/1242024959/
A book table looks like this…??
Book
Info
Section
Title = “I Love Penguins”
Author = “S. Lion”
• Chapter
DB Schema mapping
Page
Page
Shredding
•
Chapter
Page
Foreign Keys
& Joins
Section
author
I Love Penguins
S. Lion
section
Performance Overhead
…
Paragraph =
“I love penguins because…”
Paragraph =
“On the subject of food…”
• Chapter
• Chapter
• Chapter
• Paragraph
• Paragraph
SLIDE: 12
title
Issues with Sections? How
many columns?
Maintenance
Overhead
Option: Modeling hierarchies with relations (foreign key)
is not efficient.
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
<meta>
<URI> http://thewobbitaparody.blogspot.de</URI>
<title> The Superfriends Of The Ring</title>
<author>Paul Erickson</author>
</meta>
<body> (…)
<section nr=„11“ title=„Promo‘s Afterparty“>
<paragraph>Promo came in soon afterwards. He glanced about the condo and
then quietly asked "Is Uncle Bulbo gone yet?“ "Yes, at last," said Pantsoff. "I
thought he'd never leave. Oh, he left something for you." He handed Promo the
inter-office envelope. "Don't bother unwinding the string. Inside is his will, his trust
documents, and his tax records. I think he left you his ring, too.“ "Oh, great," said
Promo. "How long do I have to keep that stuff? Five years? Seven years? Forever? I
hate filing." He stopped complaining for a moment. "You said his magic ring is in
there too? Cool! I'll never have to pay a cover charge to enter a nightclub again!“
"Promo, you've inherited Bulbo's fortune, so stop thinking small for a change.
Actually, don't think about the ring at all. Just put it away. Keep it secret, and keep
it safe!"</paragraph>
(…)
</body>
Dokument als Informations-Container
Metadaten, Daten, Beziehungen und Inhalte
<SAR>
<title>Suspicious vehicle
vehicle…near airport </title>
<date> 2012-11-12Z </date>
<type> observation/surveillance</type>
<threat>
<type> suspicious activity </type>
<category> suspicious vehicle </category>
</threat>
<location>
<lat> 37.497075 </lat>
<long> -122.363319 </long>
</location>
A blue van
<description>A
van…with license plate ABC 123 was observed parked behind the airport sign…
<triple><subject> IRIID </subject> <predicate> isa</predicate><object> license-plate</object> </triple>
<triple><subject> IRIID </subject> <predicate> value</predicate> <object> ABC 123 </object> </triple>
</description>
</SAR>
SLIDE: 14
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Dokument als Informations-Container
<SAR>
<description>
<title>
Suspicious vehicle…
<type>
<date>
2012-11-12Z
<location>
<triple>
A blue van…
<triple>
<threat>
<lat>
37.497075
<long>
-122.363319
SLIDE: 15
<predicate> ABC 123
<subject>
IRIID
<predicate>
<type>
observation/surveillance
<category>
suspicious activity
suspicious vehicle
<object>
isa
<subject> value
IRIID
<object>
license-plate
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
XML ist für Verlage
NoSQL bei DG
• De Gruyter Online
• De Gruyter CMS
• Maybe Asset
Management?
• Maybe DataWarehouse?
Source: http://www.flickr.com/photos/scotthudson/3448785931/
De Gruyter Online
Metadaten
Assets
Dokumente
Entitlements
Starkes Wachstum
Unterschiedlichste Daten
De Gruyter CMS
Metadaten
Assets
Dokumente
Triples
Häufiges Re-Arrangement der Daten:
Änderungen bei Struktur und Verlinkung
Semantik