Leveraging SharePoint 2010 Search Technologies

Download Report

Transcript Leveraging SharePoint 2010 Search Technologies

“Leveraging SharePoint 2010 Search Technologies”
With: Ivan Neganov
Sponsors
Agenda
 Open Discussion
Topic of the day
QA
Leveraging
SharePoint 2010
Search Technologies
Mississauga SharePoint User Group,
October 19, 2010
About the Speaker
Ivan Neganov
Founder of SoftForte, Inc. 11 years of experience in
developing WCM solutions based on ASP.NET and
SharePoint platforms. Focusing on SharePoint since
2007. Blog: neganov.blogspot.com
the Science of Quality
Agenda
•
•
•
•
Enterprise Search defined
Common search concepts and terms
Search architecture
SharePoint search technologies
What is Enterprise Search
• Why not use Google Appliance aka “Google Box”?
• Why not use open source engine like Lucene?
• Why SharePoint search isn’t enough?
• Do I need taxonomy & faceted search?
• Can users just go ahead and tag everything?
Enterprise is not just a
large Intranet
• Large volumes of data
• Usually there exists a “right” or highly relevant
document
• Security is critical
• Taxonomies and vocabularies are important
• Dates are important
• Corporate data does have structure
• Search is convenient for surfacing content
• Search is promising for future BI applications
Search Scenarios
• Two types of scenarios in an enterprise:
o Productivity search
• Intranet/team collaboration search
• People search/Social computing
• Site search
o Search applications
• Parts search (fuzzy search requirement)
• Intelligence & Investigation (heavy use of entity extraction)
• IP protection
• Compliance/Records management
• E-commerce
• Knowledge management & Support
• BI applications
Microsoft Search
Technologies
•
•
•
•
Desktop search, successor of Index Server
SQL Server Search – Full Text Search (FTS)
Exchange Search – uses same iFilters as SharePoint
Bing (formerly live search)
o Bing + Yahoo = 9.5%
• SharePoint & FAST Search
SharePoint 2010 Search
Technologies
•
•
•
•
•
•
Microsoft SharePoint Foundation (Free)
o
o
o
Single site collection, 10 million items
No external search
Automatic configuration
o
o
Enterprise-level search, 10 million items but single search server only
No people search
o
o
Enterprise-level, redundancy support, 100 million items
No people search
o
100 million items, added people search, tagging
o
o
o
Over 200 million items
Improved and flexible relevancy
Entity extraction
o
o
Advanced entity extraction
Standalone product
Microsoft Search Server 2010 Express (Free)
Microsoft Search Server 2010
Microsoft SharePoint Server 2010
Microsoft FAST Search Server for SharePoint
Microsoft FAST ESP Server
Relevancy
• Google: PageRank algorithm
• Same approach is used in FAST and SharePoint 2010
• FAST provides ability to dynamically boost rank
Index
Linguistics
• Word stemming
• Word lemmatization
• Word morphology
o Collapsing indices
Other Common Search
Concepts
•
•
•
•
•
•
•
Crawling
Querying
Crawled & Managed Properties
Best Bets
Refiners aka Facets
Linguistics: Stemma & Lemma
Entity Extraction
High Level Search
Architecture
Demo: Search Experience
FAST Search Server 2010
for SharePoint
• Advanced scalability & performance
• Advanced content processing
• Extensibility
FAST Content Processing Pipeline:
FAST ESP
• Essentially re-packaged FAST ESP 5.3
• Planned two SKUs (according to SPC 2009)
o FAST Search Server for Internet Sites
o Fast Search Server for Internal Applications
• Updates?
Planning Enterprise Search
• Search is redundant and scalable
Planning FAST Search
Which Search Technology
Is Appropriate?
• FAST Search Server requires enterprise CALs
Estimating Costs
SharePoint Enterprise Search
FAST Search Server for SharePoint
4 – 6 query and index servers
4 – 6 query and index servers, 0 – 2
content distributor & web analyzer
servers
1 – 2 database clusters (share)
1 – 2 database clusters (share)
40 million documents, medium dedicated search farm
Search UI
• Search Web Parts
• Search Center
• Thick clients
Extending Search
•
•
•
•
•
Federation - OpenSearch
Query Object Model
BCS Connectors
RANK & XRANK
Tapping in Document Processing Pipeline
Federation
Madrid
Los Angeles
Hong Kong
South Africa
Demo: Search Federation
Connector Framework
• Leverage tooling (SPD, VS2010)
Entity Extraction in FAST
• Automatically create crawled properties for a given
vocabulary
• Useful for advanced scenarios: for example
1. Extract property at crawl time,
2. Enrich a property
3. Index enriched property
Search in the Enterprise:
Future
• Amount of content will continue to grow
• Search will integrate with Business Intelligence
applications
• Entity, Sentiment and Fact extraction
• Search as navigation
• Search visualization
• Search as a service
• Many more custom applications leveraging search
Resources
• Microsoft Technet, MSDN
• Professional Microsoft Search 2010
Questions