Strategies for All Your Data

Download Report

Transcript Strategies for All Your Data

Session id: 40236
Strategies for All Your Data
Sandeepan Banerjee
Vishu Krishnamurthy
Oracle Corporation
Where are you spending
your money ?




Data Management
Labor
Software Integration
Hardware and System
Integration
Too much information in too
many places
Relational
Documents
Specialty Servers For
Different Kinds Of Data
Data Isolation
High Systems Admin
And Management Costs
Scalability Problems
High Training Costs
Complex Support
Problems
Multimedia
Specialized …
Location
Messages
XML
One Management System for All
Your Data
Relational
Characters, Numbers and Dates






Complete
Integrated
Robust
Scalable
Secure
Available on all
platforms
XML DB
Integrated Native XML Database
Oracle Text & Ultra Search
Text management and search
Oracle Locator & Spatial
Location and Proximity Searching
Oracle interMedia
Multimedia management
Oracle Collaboration Suite
Unified Messaging and Files
Extensibility Framework
Chemical, Genetic, Engineering,…
What is Oracle XML DB?
 Database support for the XML data model
–
XMLType, XMLSchema, DOM Fidelity, Xpath, …
 Hierarchical organization of the data
–
WebDAV compliant with indexing for fast access
 Transparent storage optimizations
 Query Language: SQLX and XQuery
Classes of XML DB Applications
 Exchanging Structured Documents
–
Well-formed templated business-documents e.g.
Purchase Orders, Phone Bills, …
 Managing Unstructured Documents
–
Documents, Messages, Instructions
 Integrating and normalizing data from diverse
sources
Structured Document Exchange
 Relational storage remains the “right” way to store
highly structured data
 As an XML programmer, you do not want to think
about “tables”
–
A hierarchical data model is what you want to
manipulate
 XML DB’s XMLType is about preserving the XML
paradigm while getting the benefits of relational
performance and scalability
Structured Document Exchange
with Oracle XML DB
 XML data model and API’s familiar to XML
programmers
–
–
XML Schema, Schema Validation, Dom Fidelity
JNDI, DOM, XPATH, SQLX, XQuery
 Enterprise Class Performance & Scalability
–
–
–
–
Piecewise updates
Schema caching
Lazy materialization
Server-based XSL transformations
Structured Data: Temenos
 GLOBUS Banking platform: #1 selling platform,
major banks worldwide
 Contract-based system, deeply nested data model,
user-customizable
 80+ major subsystems, 6000 Tables, 100s of GB
“Using Oracle XML DB, we successfully benchmarked 22 million
banking transactions per day, which translated to 2500 databasetransactions-per-second, for Temenos' GLOBUS banking platform.
Oracle XML DB’s performance assured us that powerful XML
innovations can be operationalized and deployed without sacrificing
enterprise-class scalability.”
- TEMENOS
Managing Unstructured Data
 More and more content is being produced as XML
(Microsoft Word, Corel XMetal, Arbortext Epic, …)
–
Markup improves search, processing, organization, …
 XML DB’s Repository enables XML document
content to be stored as ‘files’ in ‘folders’ without
losing strong-management, queryability,
unbreakable security etc.
 XML is doing for unstructured data what
Relational did for structured: create a standard
way to store, query and manage unstructured data
Managing Unstructured Data
with Oracle XML DB
 XML data model and API’s familiar to Content
Developers
 Integrated Repository
–
–
–
WebDAV compliant
Xpath index for fast traversal of foldering hierarchies
SQL Queryable
 Integrated Text Processing
–
Optimizations such as “tag aware” search
Reed Elsevier
 Large technical publishing conglomerate
 More than 1700 scientific, technical & medical
peer-reviewed journals
 Over 59 million abstracts
 Over two million full-text scientific journal
articles , another one million full-text articles
via CrossRef (http://www.crossref.org/) to
other publishers' platforms
 XML DB chosen as Repository Database
g
10 : What’s new in XML DB
 Broad Performance Improvements
–
–
–
–
–
SQLX query rewrites
XSLT optimizations
Repository Access and Query optimizations
Direct loader support, loading large XML documents
Storage optimizations
 I18N: support for differing character sets on client and
server
 Schema Evolution
–
Transparently achieves data load/reload
 Unified XML API between XDK and XML DB
–
Unified C interfaces
XML-based Integration: XQuery
 Why XQuery ?
–
Declarative way to query XML
documents
 Why Java?
–
–
Run in mid-tier or database
Future server implementation in C
 Why XML Database ?
–
–
–
–
Native XML storage
XML data management
Performance optimizations
SQL/XML or XQuery depending
on data
 Status
–
OTN downloads (pending W3C
standard finalization in ’04)
XQuery Engine
XQuery Engine
iAS
J2EETM Platform
Server JVM
XML DB
XQuery Example
Assume a document – emp.xml
<empset>
<emp empno=“21” ename=“SCOTT” salary=“120000”/>
<emp empno=“22” ename=“JONES” salary=“344000”/>
</empset>
To get the names of employees with salary > 200000
for $i in document(‘emp.xml’)/empset
let $j = 200000
where $i/@salary > $j
return $i/@ename
Result (attribute node)
JONES
Differences from SQL
 Navigation-oriented (using XPath expressions)
 Different type system (XMLSchema based simple
types)
 Identity-based (XML Node identities and document
order)
 Namespace aware name-resolution (functions,
variables, element creation)
 Row based versus Item based
 Results are heterogeneous sequences
 Does not have all SQL extensions (e.g, OLAP, FullText..)
Oracle XQuery API
 JXQI – Java API (ongoing standards discussions)
import oracle.xquery;
XQueryContext ctx = new XQuerycontext();
Reader strm = new FileReader(“exmpl1.xml”)
XQueryPreparedStatement
xq = ctx.prepareStatement(strm);
XQueryResultSet rset = xq.executeQuery();
while (rset.next())
rset.getNode().print(System.out);
 XQLPlus tool! (like SQLPlus)
Datasources
 Enables arbitrary input sources
–
files, cache, JCA datasources
 xmldatasrc – Oracle language addition
 Datasource API
–
–
–
–
initialize
describe
execute
Fetch
 Bind (an existing DOM)
Rewrite to SQL
 XQuery over Oracle databases – Rewrite!
for $i in view(“scott.emp”)/ROW
where $i/SALARY > 200000
return $i/ENAME
-- is translated to --select “$i”.ename
from scott.emp “$i”
where “$i”.salary > 200000;
More SQL rewrite
for $i in view(‘purchaseOrder’)/ROW/PurchaseOrder
where $i/ShipAddr/City = ‘San Francisco’
return <PO ponum=$i/@Poid> <$i/ShipAddr> </PO>
select xmlelement(“PO”,
XMLAttributes(extractvalue(“$i”,‘/PurchaseOrder/@Poid’) as “ponum”)),
extract(“$i”, ‘/PurchaseOrder/ShipAddr’))
from scott.purchaseorder “$i”
where extractvalue(“$i”, ‘/PurchaseOrder/ShipAddr/City’) =
‘San Francisco’
D E M O N S T R A T I O N
XQuery
Oracle Text
 Rich Full-Text Capabilities built into the Oracle
database
 Integrated Search support for Applications
–
OCS, Portal, Ebusiness Suite
 Catalog Search
 Document Archives and Warehouses
 Infrastructure for Intranet and Extranet Search
(via Ultra Search.)
Oracle Text: Rich Full-Text
g
10 : What’s new in Oracle Text?
 Supervised Classification – Rule-based and SVM
 Unsupervised Classification (Clustering) – KMeans and
Hierarchical
 Query-Log Analysis
 Query-Templating for Progressive-Relaxation, Query-rewriting,
Alternative scoring etc.
 Index creation improvements -- Real-time synchronization
 Better Partitioning: Create local-partitioned indexes in parallel
 Filtering enhancements
–
Filter and index RFC-822 email messages
 Language Enhancements
–
Japanese stemming, Customization of Japanese & Chinese Lexicons
 Information Visualization – Stretch viewer
Oracle Ultra Search
 Out-of-the-box heterogeneous search-and-locate
capabilities
–
DB, Web Servers, Files, E-Mail, Apps
 High performance threaded Java crawlers
 Web-style interface
 Extensible, customizable (Java API)
–
–
–
Customizable metadata search
Custom crawling
Custom rendering
 Integrated administration
 Fully multilingual and globalized
 Integrated with Oracle Portal (repository, portlet) and
Oracle Collaboration Suite
10g: What’s new in Ultra Search?
 Enhanced Security
–
–
–
Secure Crawling (https support)
Better Authentication
 http Digest and Forms
ACL-secured search hitlist
 Role-based ACLs per datasource
 Or custom ACLs stamped by crawler
 Federated Search
–
JCA-compliant Searchlet API
 Unified Search
–
Secure Crawler API
 OID Integration
D E M O N S T R A T I O N
Information
Visualization
The Media-enabled Oracle
Platform
 Oracle Database 10g
–
Storage, management, & retrieval of image, audio, video data
–
Native format understanding, metadata extraction, methods
for image processing
–
Support for leading streaming media servers
 Oracle Application Server 10g
–
JSP, servlet and PL/SQL application development support
–
Media Adaptation Services for Wireless
–
JDeveloper (BC4J/UIX) and Portal integration
 Oracle Collaboration Suite
–
Metadata extraction for OCS Files
g
New Oracle10 Multimedia
Features
 Standards Support – SQL/MM Still Image
 New version of Java Advanced Imaging (JAI 1.1.1_01)
and additional image processing operators
 Support for additional media formats
–
•
•
•
Microsoft ASF, MPEG2 & MPEG4
Microsoft Windows Media Server Plugin
Real Server Plugin for Helix Server
XML DB integration
How Oracle’s Multimedia
capabilites are better
Only Oracle10g:
 Supports media content natively
–
–
No manual initiation of separate processes to enable database
tablespace to accept media data.
No need for DBAs to initiate these processes for each table where
they wish to store media data
 Stores all media and its metadata in the same table as the
associated relational data
–
–
No triggers on each and every media object created to update the
separate “administration” tables that contain media objects and
metadata.
No added processing and I/O overhead for access and retrieval
 Provides Java class libraries and JSP Tag libraries for application
development and media access.
Oracle is the Leading Spatial Database
“In repeated surveys, IDC has found that Oracle is used
in an 80%-90% share of Spatial Information
Management oriented database installations.”
IDC, December 2002
 Oracle 10g Locator feature: Beginning with Oracle9i
LOCATION capabilities have been part of EVERY
database at NO ADDITIONAL COST
–
Enables business, web and LBS applications
 Oracle Spatial 10g: Enterprise Edition Option
–
Supports advanced Land Management, GIS,
Transportation,Energy / Utilities, Remote Sensing, Defense
and Intelligence applications
Oracle10g Location Features
Locator
Spatial (Enterprise Option)
 Points, lines, polygons
 2D, 3D, 4D data
 Spatial Operators
 All Locator features
 Spatial functions
–
–






Distance
Relationships
Coordinate Systems
Long Transactions
Table Partitioning*
Object Replication**
Parallel Query* – NEW!
Deferred Spatial Indexes –
NEW!
* Requires Enterprise Edition with Partitioning Option
** Some replication features on Enterprise Ed. only
–
–








area/length calculation
buffer, centroid, intersection,
union, etc.
Linear Referencing
Spatial Aggregates
Coordinate Transforms
GeoRaster – NEW!
Topology Data Model – NEW!
Network Data Model – NEW!
GeoCoder – NEW!
Spatial Data Analysis &
Mining – NEW!
Location features in the Oracle “Stack”
Any device
CRM & ERP Applications
TCA schema
Web Services
e-Business Suite
Application Server
iAS MapViewer / JDeveloper
B2B, B2E,
B2C
iAS LBS Components
Oracle Application Server 10g
SOAP, WSDL
Data Server
Spatial
Locator
Oracle Database 10g
Oracle Location Technology
Online
Service
Oracle core technologies
Oracle’s Extensibility Framework
 Open API to plug in new data types
and access methods
 Specialty Data Types

 Chemical
 Genetic
 Engineering
 Biometric
 Multimedia
Driven by specialized-domain ISVs -MDL, NetGene, Informax, Protegrity, …
Extensibility: In Silico
Chemistry
 Chemistry searching requires special
techniques
“Viagra®”
–
–
Chemical name is not unique
Chemists think graphically
“sildenafil citrate”
H
H
O
O
N
 The solution:
H
H
N
N
N
N
–
–
A graphical search engine
H
H
N
S
H
O
H
Specialized operators such as substructure
search (“sss”) = a chemical “contains”
O
H
Oracle Collaboration Suite
 Consolidate management of unstructured data (email, shared
documents and other collaborative content)
 Before grid computing, resources such as storage and CPUs had
to be managed separately for each component of the suite (e.g.
email vs files vs web conferencing).
 OCS 10g takes advantage of grid infrastructure for greater
efficiency, reduced cost and easier management
Extended Data Management
Oracle Collaboration Suite, Oracle Portal, eBusiness Suite
provide solutions
Ultra Search crawls and (where desirable) federates non-Oracle or
legacy sources, and bring these in the ambit of uniform access
• Search, Interchange, Visualization
• Analytics and Mining
Oracle provides the most robust open and extensible platform
and the important services for all your data
• Storage and Management
• Search, Interchange, Visualization
• Analytics and Mining
• Structured data will stay Relational
• Documents & Messages will move to XML
• Multimedia will be in BLOBs, with metadata annotated in XML
QUESTIONS
ANSWERS