Michael Rys Principal Lead Program Manager Microsoft Corp. Session Code: DAT321 Beyond Relational Data Pain Points Dealing with relational and non-relational data platforms Growth in application complexity and duplicated.

Download Report

Transcript Michael Rys Principal Lead Program Manager Microsoft Corp. Session Code: DAT321 Beyond Relational Data Pain Points Dealing with relational and non-relational data platforms Growth in application complexity and duplicated.

Michael Rys
Principal Lead Program Manager
Microsoft Corp.
Session Code: DAT321
Beyond Relational Data
Pain
Points
Dealing with relational and non-relational
data platforms
Growth in application complexity and
duplicated functionality
Compensating for unavailable services
Goals
Reduce the cost of managing all types of data
Simplify the development of applications which use
relational and non-relational data
Extend services currently available for relational data
to non-relational data
Provide non-relational services to relational data
Customer Quote
“The non-relational features available in SQL Server 2008
allowed us to manage our large datasets composed of
millions of images and associated metadata as single
entities and eliminated our need to maintain CLR functions
for geospatial calculations.
Switching to SQL Server 2008 reduced our operational
costs, increased our team productivity, and allowed us to
focus on our core business of collecting and processing
geospatial data.”
– Telmo Sampaio, CTO – Blue Dasher Technologies
Session Objectives and Takeaways
SQL Server 2008 has added improved
support for a variety of non-relational data
management scenarios
You will understand what SQL Server 2008
will offer in addition to SQL Server 2005’s
functionality to
Store and manage unstructured data
Store and manage semi-structured data
You will better understand when to use which of
the presented technologies
Beyond Relational Feature Overview
SQL Server 2005
Full Text Indexing
Documents &
Multimedia
SQL Server 2008
Remote BLOB Store API
Filestream
Integrated FTS
Fully supported
Geometry and
Geography data types
and Functions
Spatial
XML Data Type and
Functions
XML Upgrades
User Defined Types
Large UDTs
Sparse Columns
Wide Tables/Column Set
Filtered Indices
HierarchyID
XML
Relational
BR Support
Street-level, location-based imagery
Telmo Sampaio
Chief Technology Officer
Blue Dasher Technologies
Blue Dasher Application Overview
Capture and provide street-level imagery for all public
roads in the 50 top metropolitan areas in the U.S.
Manage data with SQL Server 2008:
Metadata: 70GB of data
Image data: 50TB of data
Metadata: Uses spatial GEOGRAPHY Type to pinpoint
the location of the images, Spatial Index defined
Image data: 3 sizes
small (7 Kb) in file system
medium (60 Kb) and large (1.3 Mb) stored using filestream
Medium used for navigation
Blue Dasher-SQL Server benefits
Geography data type and functions
No need to manage .NET assemblies - use of STIntersects(),
STDistance(), STBuffer()
Ability to save different shapes in single GEOGRAPHY
column
Manageability
Single backup/restore operations for metadata and images
Replication support
Performance
Streaming files to client application faster using Win32 API
Documents and Multimedia
Documents &
Multimedia
Use File Servers
Dedicated BLOB
Store
Store BLOBs in
Database
Application
Application
Application
BLOBs
BLOBs
DB
DB
BLOBs
DB
Advantages
Low cost per GB
Streaming Performance
Lower cost/GB at scale
Scalability & Expandability
Integrated management
Data-level consistency
Challenges
Complex application
development and
deployment
Integration with
structured data
Windows File Servers
NetApp NetFiler
Complex application
development and
deployment
Separate management
Enterprise-scales only
EMC Centera
Fujitsu Nearline
Poor data streaming
support
File size limitations
Highest cost per GB
Example
SQL Server
VARBINARY(MAX)
SQL Server 2008 BLOBs
Documents &
Multimedia
Use File Servers
Application
BLOB
Store BLOBs in
Database
DB
Application
Dedicated BLOB
Store
Store BLOBs in
DB + File System
Application
BLOB
BLOB
Application
BLOB
DB
DB
DB
Remote BLOB Storage
SQL BLOB
FILESTREAM Storage
Filestream
Documents &
Multimedia
Storage Attribute on VARBINARY(MAX)
Store BLOBs in
DB + File System
Application
BLOB
Works with integrated FTS
Unstructured data stored directly in
the file system (requires NTFS)
Dual Programming Model
TSQL (Same as SQL BLOB)
Win32 Streaming APIs with
T-SQL transactional semantics
Data Consistency
Integrated Manageability
DB
Back Up/Restore
Administration
Size limit is the file system volume size
SQL Server Security Stack
FILESTREAM API
// New TSQL Function:
// Get_filestream_transaction_context()
//
SELECT Get_filestream_transaction_context()
// New TSQL Function :
// PathName()
//
SELECT ClaimImage.PathName() FROM
Insurancedb..Claims
Managed SqlFileStream: READ
// New SqlFileStream Class in VS08 SP1
//
SqlFileStream sfs = new SqlFileStream(path, txnId,
System.IO.FileAccess.Read);
// output file to read into
//
System.IO.FileStream fs = new System.IO.FileStream (
"c:\\output2.jpg", System.IO.FileMode.Create);
{
byte[] buffer = new byte[512 * 1024];
int cbBytesRead = buffer.Length;
while (cbBytesRead == buffer.Length)
{
cbBytesRead = sfs.Read(buffer, 0, buffer.Length);
fs.Write(buffer, 0, cbBytesRead);
}
}
Managed SqlFileStream: WRITE
sfs = new SqlFileStream(path, txnId, System.IO.FileAccess.Write,
0);
using (System.IO.Stream res =
Pictures.GetResourceStream(HealthCare.MRI.JoeSmith)) //input
{
byte[] buffer = new byte[512 * 1024]; // 512KB
int cbBytesRead = buffer.Length;
while (cbBytesRead == buffer.Length)
{
cbBytesRead = res.Read(buffer, 0, buffer.Length);
sfs.Write(buffer, 0, cbBytesRead);
}
}
// commit SQL transaction and close SQL connection.
//
FileStream
Write Performance (Remote)
Insert
600
Filestream
Win32
(Filesystem)
Access
Filestream TSQL
500
Throughput (Mbps)
400
300
Varbinary
200
100
0
240 KB
-100
-200
480 KB
1 MB
2 MB
4 MB
8 MB
Filesystem
Win32
Access Gain
(%)
Read Performance (Remote)
900
800
Filestream Win32
(Filesystem)
Access
Throughput (Mbps)
700
600
Filestream T-SQL
500
Varbinary
400
300
Filesystem Win32
Access Gain (%)
200
100
0
240 KB
480 KB
1 MB
2 MB
4 MB
8 MB
Updates Using FILESTREAM File
I/O Access
“In place partial updates” not supported
Update through Win32 handle (Write access)
Done by creating a new file (0-bytes)
New content replaces old (committed) content on
Transaction Commit (example: high resolution pictures)
For the new file to contain the committed BLOB
content first use
FSCTL_SQL_FILESTREAM_FETCH_OLD_CONTENT
Useful for LOB append
Some Tips…
Reading bigger buffers gives better performance
Volumes hosting FILESTREAM data should have 8.3 name
generation and LastAccessTime disabled
FILESTREAM containers to reside on dedicated volumes
Have one volume per FILESTREAM container
enables space management at volume level
“Magic” SMB buffer size = ~60KB
Another “good” value is 480KB
ROWGUID unique index for aligned partitioning
AntiVirus programs should be configured not to delete
infected files but to quarantine them
If using compressed volumes, use cluster size 4 KB
Migration SQL BLOBs to FILESTREAM
Conceptual walkthrough
Create one (or more) FILESTREAM (filegroup)
Alter table to add FILESTREAM column
For each existing row, update data in new row with
empty FILESTREAM value
Write LOB data (File I/O or T-SQL access)
Optional: drop Varbinary(MAX) column
BLOB migration utilities support (intended for
SQL Server 2008 R2)
Remote Blob Store Architecture
Documents &
Multimedia
SQL DB
Customer Application
SQL RBS API
NetApp lib
IBM lib
NetApp
IBM
Centera lib
Centera
Provider API
RBS
Services
• Create
• Fetch
• GC
• Delete
Customer applications can transparently support different BLOB stores
Each Remote Blob Store vendor responsible for delivering their own providers
RBS Workflow
Machine Boundary
1 Write BLOB(Photo)
Application
2
Return Blob ID
3
Write Blob ID to PhotoRef
field
2
RBS Client Library
BLOB Store
Provider Library
1
3
BLOB Store
SQL Server
ClaimID
ClaimDate
PhotoRef
4390
6/5/2007
<Binary(20)>
RBS Fundamentals
Most useful in environments where
interoperability is required
No restrictions on back-end store
Back-end can change with no app change
Looser (link level) consistency guarantees
SQL Server handles link consistency and
garbage collection
Unstructured Storage In SQL08
File Stores /
External Blob
Stores (CAS)
Streaming
Performance
Depends on
external store
Link Level
Consistency
Data Level
Consistency
Integrated
Management
Non-local
Windows File
Servers
n/a
External Blob
Stores
n/a
SQL BLOBs
Remote Blob
API
Depends on
external store
FILESTREAM
Full Text Search Challenges
Indexes stored outside SQL Server lead to
manageability challenges
Mixed query performance suffers
from having to pull over complete
full-text result set
Scaling issues on big boxes
See also DAT303 – Answering the Queries your users really want to
ask, with Full-Text Search in SQL Server 2008
Documents &
Multimedia
Full Text Search Improvements
Full-Text Engine and Indexes fully integrated
Catalog, index and stopword lists now inside the database
Better performance in many common scenarios
Make mixed queries perform and scale
Optimizer has knowledge about FT index
SELECT * FROM candidates
WHERE CONTAINS(resume,’”SQL Server”’)
AND ZipCode = ‘98052’
X= selectivity (%) of rows that match the relational condition
IFTS FTQuery time : (SQL 2005 FTQuery time) x (X/100)
Exposing the FT Index content and any
given WB behaviours
New word breakers (WB)
Better supportability tools
From 23 to over 40 word breakers/locales
Documents &
Multimedia
IFTS Query Architecture
Documents &
Multimedia
SQL SERVER process
WB client
Bind
Doc_table
id
FTS Algeb.
SQL/FTS integrated
doc_type
query tree document
QE
Shared Memory
Parse
SQL Algeb.
QUERY
Results
FTS Algeb.
Algebrizer
T-SQL
Parser
FDHOST process
Execution
Plan
QO
Language
Module
STOPLIST
Ranking Func.
Integration
THESAURUS
FTLogicalOperator
FTLogicalOperator
FTExecutionOperator
FTExecutionOperator
Stemmer
iFilters
CREATE FULLTEXT INDEX ft_idx
ON Doc_table(document
Cardinality
TYPE COLUMN doc_type)
FTLogicalOperator
KEY INDEX doc_id_idx;
FTExecutionOperator
WB
Full-Text Index
Indexing Performance
Documents &
Multimedia
2 min
1 min
Populating an index of 20 million rows of 1K data on
identical hardware (time in minutes)
Spatial Data Overview
Spatial
Storage and retrieval of spatial data using standard SQL syntax
New Spatial Data Types (geometry, geography)
New Spatial Methods (intersects, buffer, etc.)
New Spatial Indexes
Offers full set of Open Geospatial Consortium components
(OGC/SQL MM, ISO 19125)
Spatial Builder Interface
SSMS Visualization
Integration with Virtual Earth
See also:
DAT324 WiE: Building Location-Aware Services with Microsoft SQL Server
DAT03-HOL Integrating Microsoft SQL Server 2008 Spatial Support with Microsoft
Virtual Earth
XML Improvements
Improved XML Schema Validation
Support for storing and validating
Office 12 Document formats
Support for lax validation
Full xs:dateTime support
Support for values without timezone
timezone preservation
Improved support for lists and union types
Added support for let-clause in XQuery
Added fn:upper-case()/fn:lower-case()
Added support for
insert sql:variable(“@xml”) into /a/b
XML
XML Improvements
Semi-Structured Data
Relational
BR Support
Scenarios
Property bag scenarios
Distinct customized property sets associated with data
Large number of unique properties, user annotations
Examples
Document management systems (SharePoint)
Media stores
Databases with heterogeneous record types in a table
Type specific properties, inherited properties
in a type hierarchy
Examples
Product catalogs (Commerce Server), Location/business specific
properties(VE), etc…
Semi-Structured Data
Characteristics
Large number of sparsely
populated properties
Distinct property sets
Heterogeneous structures
Sets, nested structures
Relational
BR Support
Semi-Structured Data
Relational
BR Support
Requirements
Efficient storage for sparse properties
Efficient relational access to sparse properties:
Query, DML, Indexing
Ability to get/set sparse property
sets generically
Index subsets of relevant properties for
a property set
Retrieve and analyze hierarchical data
Semi-Structured Data
Relational
BR Support
SQL Server 2008 features
Sparse Columns: Optimized storage for sparse columns
Column Sets/Wide Tables: Support thousands of
sparse columns
Filtered Indexes: Ability to index a subset of
rows in a table
Hierarchy ID: System CLR type for hierarchical
organization of data
XML: For fast dynamic evolution (open schema),
lists and tree objects
Hierarchical Data
Scenarios
Forum and mailing list threads
Business organization charts
Content management categories
Product categories
Files/folders management
SQL Server 2005
Parent/Child column
XML datatype
New in SQL Server 2008
HierarchyID type
Relational
BR Support
HierarchyID
Relational
BR Support
Key properties
A system data type with
variable length
CLR UDT
Microsoft.SqlServer.Types
Varbinary encoding ( < 900 bytes)
To represent position in a hierarchy
Logically encodes the path from the
root of the tree to a node
Rich built-in methods for
manipulating hierarchies
Simplifies storage and querying of
hierarchical data
Comparison a<b is in depth-first order
Support for arbitrary insertions
and deletions
/
/1/
/1/1/
/2/
/1/2/
/3/
/3/1/
/3/2/
/1/1/1/
/1/1/2/
See also:
DAT04-INT Using the HIERARCHYID
Datatype in Microsoft SQL Server 2008
to Maintain and Query Hierarchies
Sparse Columns
“Sparse” as a storage attribute on a column
Storage Optimization: 0 bytes stored
for a NULL value
Co-location of data: Performance benefits
NULL Compression in the TDS layer
No change in Query/DML behavior
Relational
BR Support
Sparse Columns
Relational
BR Support
Wide Tables/Column Sets
Large number of sparse columns allowed in a table
(30,000 Columns, 1000 indexes, 30,000 statistics)
Requires defining a “Sparse Column Set”
An un-typed XML column, with a published format
Logical grouping for all sparse columns in a table
Select * returns all non-sparse-columns,
sparse column set (XML)
Allows generic retrieval/update of all sparse columns
as a set
Sparse Columns and Filtered Indexes
Sparse Column Storage
Relational
BR Support
0 Bytes stored for NULL Values
~20% CPU overhead for non-null value access
Additional 2- 4 bytes for non-null values
Sparse columns are beneficial when space savings >40%
Sparse storage
Non-sparse storage
PK
c1
sc
1
1
A
1
2
B
3
C
4
D
5
E
6
F
7
G
8
H
9
I
sc
2
sc
3
sc
4
sc
5
sc
6
sc
7
sc
8
sc
9
9
2
4
6
1
7
5
4
8
3
9
5
7
2
8
3
6
P
K
Csc
11
sc
2
1
A
(sc1,sc9)(1,9)
2
B
(sc2,sc4)(2,4)
3
C
(sc6,sc7)(6.7)
4
D
(sc1,sc5)(1,5)
5
E
(sc4,sc8)(4,8)
6
F
(sc3,sc9)(3,9)
7
G
(sc5,sc7)(5,7)
8
H
(sc2,sc8)(2,8)
9
I
(sc3,sc6)(3,6)
sc
3
sc
4
sc
5
sc
6
sc
7
sc
8
sc
9
Filtered Indexes
Filtered Indexes and Statistics
Indexing a portion of the data in a table
Filtered/co-related statistics creation and usage
Query/DML Optimization to use filtered indexes and statistics
Restrictions
Simple limited grammar for the predicate
Only on non-clustered indexes
Benefits
Lower storage and maintenance costs
for large number of indexes
Query/DML performance benefits: IO only for qualifying rows
Relational
BR Support
Summary
SQL Server 2008 will make it easier to create
information-centric applications that require
Unstructured documents
XML
Semi-structured information
Combine the above with relational data
by:
Reducing the cost of managing all types of data
Simplifying the development of applications which
use relational and non-relational data
Extending services currently available for relational data to
non-relational data
Future of Beyond Relational
Rich unstructured data
Enable existing BR apps to store data in SQL Server
Win 32 File I/O API compatibility
Better integration of FileStream and
RBS programming models
Better scalability of FileStream
Property Search and promotion
iFTS improvements in functionality and
scale/performance
Deep Spatial
More functionality
Across BI components
Resources
WebLog
http://sqlblog.com/blogs/michael_rys/
Whitepapers & Videos
FileStream/RBS Whitepapers:
http://msdn.microsoft.com/en-us/library/cc949109.aspx
http://www.microsoft.com/sqlserver/2008/en/us/wp-sql-2008-manageunstructured.aspx
What’s new for XML in SQL Server 2008:
http://www.microsoft.com/sqlserver/2008/en/us/wp-sql-2008-whats-newxml.aspx
iFTS: http://msdn.microsoft.com/en-us/library/cc721269.aspx
Semistructured data: To be published soon (see blog for announcement)
PDC Presentation on Beyond Relational:
http://channel9.msdn.com/pdc2008/BB16/
And of course Books Online!
SQL Server 2008 Business Value Calculator:
http://www.moresqlserver.com
SQL Server Community Resources
The Professional Association for SQL Server (PASS) is an independent,
not-for-profit association, dedicated to supporting, educating, and
promoting the Microsoft SQL Server community.
• Connect: Local Chapters, Special Interest Groups, Online Community
• Share: PASSPort Social Networking, Community Connection Event
• Learn: PASS Summit Annual Conference, Technical Articles, Webcasts
• More about the PASS organization www.sqlpass.org/
Become a FREE PASS Member: www.sqlpass.org/RegisterforSQLPASS.aspx
Learn more about the PASS organization www.sqlpass.org/
Additional Community Resources
SQL Server Community Center www.microsoft.com/sqlserver/2008/en/us/community-center.aspx
TechNet Community for IT Professionals
http://technet.microsoft.com/en-us/sqlserver/bb671048.aspx
Developer Center
http://msdn.microsoft.com/en-us/sqlserver/bb671064.aspx
SQL Server 2008 Learning Portal
http://www.microsoft.com/learning/sql/2008/default.mspx
SQL Server Word of the Day
Monday, May 11
POLICY-BASED
MANAGEMENT
*Game cards may be picked up at the SQL Server booths in the TLC
[email protected]
Related Content
DAT324 WiE: Building Location-Aware Services with Microsoft SQL Server,
Ed Katibah, Olivier Meyer
Fri 5/15 | 1:00 PM-2:15 PM | Room 402
DAT320 Optimizing Microsoft SQL Server 2008 Applications Using Table Valued
Parameters, XML, and MERGE, Tobias Ternstrom
Fri 5/15 | 9:00 AM-10:15 AM | Room 153
DAT303 Answering the Queries Your Users Really Want to Ask, with iFTS in
Microsoft SQL Server 2008, Greg Low
Tue 5/12 | 4:30 PM-5:45 PM | Room 404
DAT403 What's New in Microsoft SQL Server 2008, Stephen Forte
Mon 5/11 | 1:00 PM-2:15 PM | Room 151
Related Content
DAT04-INT Using the HIERARCHYID Datatype in Microsoft SQL Server 2008 to Maintain
and Query Hierarchies, Itzik Ben-Gan
Mon 5/11 | 1:00 PM-2:15 PM | Blue Thr 1
PAN56 SQL Server Server Programmability Q & A
Tue 5/12 | 10:00AM | 501C
DAT03-HOL Integrating Microsoft SQL Server 2008 Spatial Support
with Microsoft Virtual Earth
Resources
www.microsoft.com/teched
www.microsoft.com/learning
Sessions On-Demand & Community
Microsoft Certification & Training Resources
http://microsoft.com/technet
http://microsoft.com/msdn
Resources for IT Professionals
Resources for Developers
Complete an
evaluation on
CommNet and
enter to win!
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should
not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS,
IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.