Transcript Document

BLAZING-FAST PERFORMANCE:
EXPLORE FILETABLES, FULL-TEXT SEARCH AND SEMANTIC
SEARCH FOR FASTER, MORE MEANINGFUL RESULTS
ACROSS UNSTRUCTURED DATA
Michael Rys,
Program Manager SQL Server
@SQLServerMike
This document has been prepared for limited distribution within Microsoft. This document
contains materials and information that Microsoft considers confidential, proprietary, and
significant for the protection of its business. The distribution of this document is limited to
those solely involved with the program described within.
Confidential and Proprietary © 2011 Microsoft
Last Updated: Thursday, July 16, 2015
RICH UNSTRUCTURED DATA IN SQL SERVER 2012
Address important customer requests for Capabilities and
Rich Services for Rich Unstructured Data (RUDS)
• Scale Up storage and search to 100M to 500M documents
• Easy use/access to unstructured data from all applications
• Rich insight into unstructured data to make better
decisions
INTEGRATED MANAGEMENT OF
DOCUMENTS IN SQL SERVER 2012
DEMO
FILETABLE DETAILS
• DB Level option for
Non-transactional Access to File data
•
Full concurrency control
•
• Off by Default
T-Sql access, Win32 access and between the
two
• Applies only to Win32 access
• T-SQL access is fully transactional
• CREATE TABLE DocumentStore AS FileTable WITH
(FileTable_Directory= N'Document‘);
• a Table of Files/Directories
• Table with a fixed schema
• Contains FILESTREAM and File Attributes
• Each row represents a File or a Directory
• System defined constraints maintain the
tree integrity
• Table level permissions for files/directories
in a FileTable (SQL security)
• Supported by AlwaysOn
•
Exposes File/Directory hierarchy through a
Windows Share
•
•
•
•
Supports Win32 APIs for File/Directory
Management
DB Storage is Transparent to Win32
applications
SMB level of application compatibility
Virtual network name (VNN) path support for
transparent Win32 application failover
FULL TEXT SEARCH IMPROVEMENTS IN SQL SERVER 2012
• Improved Performance and Scale:
•
•
•
•
Scale-up to 350M documents for storage and search
iFTS query perf 7-10 times faster than in SQL Server 2008
Worst-case iFTS query response times < 3 sec for corpus
Be similar or better than main database search competitors
• New Functionality:
• Property Search
• customizable NEAR
• Innovation in Search:
• Semantic similarity Search
SEMANTIC SIMILARITY
• Input: Text such as varchar, Office, PDF, HTML, email…
Output: Rowset functions with standard SQL queries
Illustrating example:
Source Table
Title
Document
D1
Annual Budget
…
D2
Corporate
Earnings
…
D3
Marketing Reports
…
…
…
…
1
Key
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Keyphrases
KeyphraseDocuments
ID Keyword
ID
DocID
T1 revenue
T1 (revenue)
D1 (Annual Budget)
T2 growth
T2 (growth)
D2 (Corporate Earnings)
T3 Windows
T3 (Windows)
D3 (Marketing Reports)
T4 Azure
…
…
…
T1 (revenue)
D7 (Finance Report)
…
…
T3 (Windows)
D11 (Azure Strategy)
T4 (Azure)
D11 (Azure Strategy)
…
Full-Text and Semantic Processing
quarter,
record,
revenue…
3
2a
DocumentSimilarity
Keyword Index (Full-Text)
DocID
MatchedDocID
ID
Keyword Colid …
compDocid CompOc
CompPid
D1 (Annual Budget)
D2 (Corporate Earnings)
K1
revenue
1
…
10,23,123
(1,4),(5,8),(1,34)
2,5,6,8,4,3
D1 (Annual Budget)
D7 (Finance Report)
K2
growth
1
…
10,23,123
(1,5),(5,9),(1,34)
2,5,6,8,5,4
D3 (Marketing Reports)
D11 (Azure Strategy)
…
…
…
…
…
…
…
…
SEMANTIC EXTRACTION AND RELATIONSHIPS
FULLTEXT SEARCH IN SQL SERVER 2012
DEMO
KEY TAKEAWAYS
• SQL Server’s unstructured data support is targeting non-traditional
database workloads that are growing rapidly in the enterprise.
• Example: Content and Collaboration apps
• SQL Server’s unstructured data support is targeting key ISV asks in fast
growing markets:
• Examples: eDiscovery, Healthcare, Document management etc.
• Beyond Relational is key strategy to enable you to build complex data
applications that go beyond relational data!
RELATED CONTENT
FURTHER. FORWARD. FASTER.
•Learn more
Visit the Microsoft Virtual Academy
Free technical SQL Server training courses
•Get more
Download SQL Server 2012 Trial Software
180-day evaluation of SQL Server 2012
•Do more
Participate in virtual launch activities and collect
points for prizes
The more points you earn, the bigger your prize could be
© 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market
conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.
MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
APPENDIX
MANAGING FILETABLE – TROUBLE SHOOTING
• DMV to show all open non-transactional file handles
sys.dm_filestream_non_transact_handles
• Stored Procedure to terminate open file handles
sp_kill_filestream_non_transacted_handles
• X-events/Perf counters for trouble shooting
FULL-TEXT PROPERTY SEARCH
--needed once per database instance to load the office filters
exec sp_fulltext_service 'load_os_resources',1
go
exec sp_fulltext_service 'restart_all_fdhosts'
go
-- use property search
CREATE SEARCH PROPERTY LIST p1;
ALTER SEARCH PROPERTY LIST [p1] ADD N'System.Author' WITH
(PROPERTY_SET_GUID = 'f29f85e0-4ff9-1068-ab9108002b27b3d9',
PROPERTY_INT_ID = 4,
PROPERTY_DESCRIPTION = N'System.Author');
ALTER FULLTEXT INDEX ON fttable SET SEARCH PROPERTY LIST = p1
Go
-- Querying DOCUMENT HAVING “John Doe” AS AUTHOR ONLY
select * from fttable
where contains(property(ftcol, 'System.Author'), '"John Doe"')
FULL-TEXT CUSTOMIZABLE NEAR
--OLD NEAR SYNTAX
select * from fttable where contains(*, 'test near
Space')
--NEW NEAR USAGES:
--SPECIFY DISTANCE
select * from fttable
where contains(*, 'near((test, Space), 5,false)')
--REDUCE DISTANCE
select * from fttable
where contains(*, 'near((test, Space), 2,false)')
--ORDER OF WORDS IS SPECIFIED AS IMPORTANT
select * from fttable
where contains(*, 'near((test, Space), 5,true)')