Transcript Document
BLAZING-FAST PERFORMANCE: EXPLORE FILETABLES, FULL-TEXT SEARCH AND SEMANTIC SEARCH FOR FASTER, MORE MEANINGFUL RESULTS ACROSS UNSTRUCTURED DATA Michael Rys, Program Manager SQL Server @SQLServerMike This document has been prepared for limited distribution within Microsoft. This document contains materials and information that Microsoft considers confidential, proprietary, and significant for the protection of its business. The distribution of this document is limited to those solely involved with the program described within. Confidential and Proprietary © 2011 Microsoft Last Updated: Thursday, July 16, 2015 RICH UNSTRUCTURED DATA IN SQL SERVER 2012 Address important customer requests for Capabilities and Rich Services for Rich Unstructured Data (RUDS) • Scale Up storage and search to 100M to 500M documents • Easy use/access to unstructured data from all applications • Rich insight into unstructured data to make better decisions INTEGRATED MANAGEMENT OF DOCUMENTS IN SQL SERVER 2012 DEMO FILETABLE DETAILS • DB Level option for Non-transactional Access to File data • Full concurrency control • • Off by Default T-Sql access, Win32 access and between the two • Applies only to Win32 access • T-SQL access is fully transactional • CREATE TABLE DocumentStore AS FileTable WITH (FileTable_Directory= N'Document‘); • a Table of Files/Directories • Table with a fixed schema • Contains FILESTREAM and File Attributes • Each row represents a File or a Directory • System defined constraints maintain the tree integrity • Table level permissions for files/directories in a FileTable (SQL security) • Supported by AlwaysOn • Exposes File/Directory hierarchy through a Windows Share • • • • Supports Win32 APIs for File/Directory Management DB Storage is Transparent to Win32 applications SMB level of application compatibility Virtual network name (VNN) path support for transparent Win32 application failover FULL TEXT SEARCH IMPROVEMENTS IN SQL SERVER 2012 • Improved Performance and Scale: • • • • Scale-up to 350M documents for storage and search iFTS query perf 7-10 times faster than in SQL Server 2008 Worst-case iFTS query response times < 3 sec for corpus Be similar or better than main database search competitors • New Functionality: • Property Search • customizable NEAR • Innovation in Search: • Semantic similarity Search SEMANTIC SIMILARITY • Input: Text such as varchar, Office, PDF, HTML, email… Output: Rowset functions with standard SQL queries Illustrating example: Source Table Title Document D1 Annual Budget … D2 Corporate Earnings … D3 Marketing Reports … … … … 1 Key ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Keyphrases KeyphraseDocuments ID Keyword ID DocID T1 revenue T1 (revenue) D1 (Annual Budget) T2 growth T2 (growth) D2 (Corporate Earnings) T3 Windows T3 (Windows) D3 (Marketing Reports) T4 Azure … … … T1 (revenue) D7 (Finance Report) … … T3 (Windows) D11 (Azure Strategy) T4 (Azure) D11 (Azure Strategy) … Full-Text and Semantic Processing quarter, record, revenue… 3 2a DocumentSimilarity Keyword Index (Full-Text) DocID MatchedDocID ID Keyword Colid … compDocid CompOc CompPid D1 (Annual Budget) D2 (Corporate Earnings) K1 revenue 1 … 10,23,123 (1,4),(5,8),(1,34) 2,5,6,8,4,3 D1 (Annual Budget) D7 (Finance Report) K2 growth 1 … 10,23,123 (1,5),(5,9),(1,34) 2,5,6,8,5,4 D3 (Marketing Reports) D11 (Azure Strategy) … … … … … … … … SEMANTIC EXTRACTION AND RELATIONSHIPS FULLTEXT SEARCH IN SQL SERVER 2012 DEMO KEY TAKEAWAYS • SQL Server’s unstructured data support is targeting non-traditional database workloads that are growing rapidly in the enterprise. • Example: Content and Collaboration apps • SQL Server’s unstructured data support is targeting key ISV asks in fast growing markets: • Examples: eDiscovery, Healthcare, Document management etc. • Beyond Relational is key strategy to enable you to build complex data applications that go beyond relational data! RELATED CONTENT FURTHER. FORWARD. FASTER. •Learn more Visit the Microsoft Virtual Academy Free technical SQL Server training courses •Get more Download SQL Server 2012 Trial Software 180-day evaluation of SQL Server 2012 •Do more Participate in virtual launch activities and collect points for prizes The more points you earn, the bigger your prize could be © 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION. APPENDIX MANAGING FILETABLE – TROUBLE SHOOTING • DMV to show all open non-transactional file handles sys.dm_filestream_non_transact_handles • Stored Procedure to terminate open file handles sp_kill_filestream_non_transacted_handles • X-events/Perf counters for trouble shooting FULL-TEXT PROPERTY SEARCH --needed once per database instance to load the office filters exec sp_fulltext_service 'load_os_resources',1 go exec sp_fulltext_service 'restart_all_fdhosts' go -- use property search CREATE SEARCH PROPERTY LIST p1; ALTER SEARCH PROPERTY LIST [p1] ADD N'System.Author' WITH (PROPERTY_SET_GUID = 'f29f85e0-4ff9-1068-ab9108002b27b3d9', PROPERTY_INT_ID = 4, PROPERTY_DESCRIPTION = N'System.Author'); ALTER FULLTEXT INDEX ON fttable SET SEARCH PROPERTY LIST = p1 Go -- Querying DOCUMENT HAVING “John Doe” AS AUTHOR ONLY select * from fttable where contains(property(ftcol, 'System.Author'), '"John Doe"') FULL-TEXT CUSTOMIZABLE NEAR --OLD NEAR SYNTAX select * from fttable where contains(*, 'test near Space') --NEW NEAR USAGES: --SPECIFY DISTANCE select * from fttable where contains(*, 'near((test, Space), 5,false)') --REDUCE DISTANCE select * from fttable where contains(*, 'near((test, Space), 2,false)') --ORDER OF WORDS IS SPECIFIED AS IMPORTANT select * from fttable where contains(*, 'near((test, Space), 5,true)')