Paul Learning Sr. Consultant Microsoft Heartland District Joel Oleson Architect, Evangelist Quest Software @joeloleson http://SharePointJoel.com Audience Poll New to SharePoint? 1-3 Years Experience? (SharePoint) 4-8 Years Experience? (SharePoint) Large-scale Implementation (~5TB) experience? Scalability.

Download Report

Transcript Paul Learning Sr. Consultant Microsoft Heartland District Joel Oleson Architect, Evangelist Quest Software @joeloleson http://SharePointJoel.com Audience Poll New to SharePoint? 1-3 Years Experience? (SharePoint) 4-8 Years Experience? (SharePoint) Large-scale Implementation (~5TB) experience? Scalability.

Paul Learning
Sr. Consultant
Microsoft
Heartland District
Joel Oleson
Architect, Evangelist
Quest Software
@joeloleson
http://SharePointJoel.com
Audience Poll
New to SharePoint?
1-3 Years Experience? (SharePoint)
4-8 Years Experience? (SharePoint)
Large-scale Implementation (~5TB) experience?
Scalability or performance issues in
SharePoint deployments?
How many SQL Admins are freaking out because
of the number of SharePoint databases?
Session Overview
How were these “considerations” derived?
SQL Server 2008 with SharePoint
SharePoint Database Overview (Demo)
Architectural Design Considerations
Real-world scenarios
Business Requirements
Logical and Physical Architecture
Architectural Design Statistical Results
Large-scale Case Study and Whitepaper
Storage Architecture Whitepaper
Appendix: DB Sizes, Content Distribution…
Considerations
Information based on real-world, large-scale
SharePoint Implementations.
Large software company (Microsoft)
Intranet Portal for 120K users
Global Enterprise Collaboration Solution (~20TB)
Scalable Hosting Solution (SharePoint Online)
Large automotive manufacturer
Loan Origination Application / Document Repository
~50 Million content items (~6 TB)
Large pharmaceutical company
Document Repository
~75 Million content items (~40 TB)
SharePoint Containment Hierarchy
Farm
Servers
Web Front End, APP, SQL
Web Applications
Central Admin, SSP Admin, Content
Databases
Content, Config, SSP, Search
Site Collections
Internet, Intranet Portal, Wikis, Blogs, Team, Doc, Mtg
Sites
Wikis, Blogs, Team, Doc, Mtg
Lists
Doc Lib, Pages, Events, Discussions, Surveys, etc…
Items
Files, calendar items, contacts, customers, images, custom
SharePoint Databases
Overview
Understanding SharePoint Databases
Farm
• Config
• Servers
• Web Apps
• Solutions
• Global
Config
Web App
• Content 1..2
• Site
Collections
• Sites
• Lists
• Pages
• Documents
• DWPs
SSP
• Search
• Properties
• SSP
• My site
host config
• Profiles
• BDC config
• Excel Calc
Understanding Configuration DB
Config
Database
Sites
Servers
VServers
Understanding Content DB
Content
Database
Sites
Webs
Doc
Stream
Understanding SSP DB - Search
Search
Database
Search
Properties
Understanding SSP DB – SSP
SSP
Database
MySite
Host Config
Profiles
BDC Config
Excel Calc
SharePoint Databases Overview
SharePoint and SQL Server
MOSS DB Overview
MOSS DB Overview
Configuration Database
MOSS DB Overview
Content Databases
MOSS DB Overview
Content Databases
MOSS DB Overview
SSP Databases
MOSS DB Overview
SSP Databases
SharePoint Database Performance
Database Disk I/O Demand
Most Demand
Medium Demand
Low Demand
Search
Config
*Content..
Temp
+SSP
Model
Tlogs
Master
* Except during backup and Indexing + Except during Profile Import
Top Performance Killers
1.
2.
3.
4.
Indexing/Crawling
Backup (SQL & Tape)
Profile Import
Misc Timer Jobs – User Sync for large #s of
Users
5. STSADM Backup/Restore
6. Large List Operations
7. Heavy User Operation List Import/Write
Architectural Design Considerations
Architectural Design Considerations
Database Volumes
Separate database volumes into unique LUN’s
consisting of unique physical disk spindles.
Prioritize data among faster disks with ranking:
SQL TempDB data files
Database transaction log files
Search database
Content databases
In a heavily read-oriented portal site, prioritize data
over logs.
Separate out Search database transaction log from
content database transaction logs.
Architectural Design Considerations
SQL TempDB Data Files
Recommended practice is that the number of data
files allocated for TempDB should be equal to
number of core CPU’s in SQL Server.
TempDB data file sizes should be consistent across
all data files.
TempDB data files should be spread across unique
LUN’s and separated from Content DB, Search DB,
etc…
TempDB Log file separated to unique LUN.
Architectural Design Considerations
SQL TempDB Data Files - Continued
Optimal TempDB data file sizes can be calculated
using the following formula:
[MAX DB SIZE (KB)] X [.25] / [# CORES] = DATA FILE SIZE (KB)
Calculation result (starting size) should be roughly
equal to 25% of the largest content or search DB.
Use RAID 10; separate LUN from other database
objects (content, search, etc…).
“Autogrow” feature set to a fixed amount; if auto
grow occurs, permanently increase TempDB size.
Architectural Design Considerations
Content Databases
100 content databases per Web application
100GB per content database
CAUTION: Major DB locking issues reported in
collaborative DM scenarios above 100GB
Need to ensure that you understand the issues based
on number of users, usage profiles, etc…
Service Level Agreement (SLA) requirements for backup
and restore will also have an impact on this decision.
Lab testing completed demonstrated SharePoint
performance was NOT impacted by utilizing larger DB
sizes; tests included content DB sizes that were 100GB,
150GB, 200GB, 250GB, 300GB and 350GB.
See Appendix for test results!
Architectural Design Considerations
Content Databases - Continued
Pre-construct and pre-size
Script generation of empty database objects
“Autogrow” feature on
Use RAID 5 or RAID 10 logical units
RAID 10 is the best choice when cost is not a concern.
RAID 5 will be sufficient and will save on costs, since content
databases tend to be more read intensive than write intensive.
Multi-core computer running SQL Server
Primary file group could consist of a data file for each
CPU core present in SQL Server.
Move each data file to separate logical units consisting of
unique physical disk spindles.
Architectural Design Considerations
Search Database
Pre-construct and pre-size
Script generation of empty database objects
“Autogrow” feature on
Use RAID 10 logical units
Should be a requirement for large-scale systems
Search database is extremely read/write intensive
Multi-core computer running SQL Server
Primary file group could consist of a data file for each
CPU core present in SQL Server.
Move each data file to separate logical units consisting of
unique physical disk spindles.
Architectural Design Considerations
Search Database
Search database is VERY read/write intensive!
Do not place any other database data files on any
logical unit where search database files reside.
If possible, try to ensure that the RAID 10 logical
units for the search database data files do not
share their physical spindles with other databases.
Place the search database log files on an
independent logical unit.
Architectural Design Considerations
Storage Architecture
Database storage for content items will be between
1.2 and 1.5 time the raw file size.
Use the following formula to calculate how much
disk space is required for the search database.
[GB DISK SPACE RQD] = [TOTAL CONTENT SIZE(GB)] X [FILE SIZE MODIFIER] X 4
Where [FILE SIZE MODIFIER] is a number in the following
range, based on the average size of the files in your
corpus:
1.0:
0.12:
0.05:
Content consists of small files (avg=1KB)
Content consists of moderate files (avg=10KB)
Content consists of large files (avg>=100K)
Architectural Design Considerations
Database Maintenance
SQL Server 2005 SP2 is needed if using the DB
maintenance wizard (KB930887).
Physical Volume File Fragmentation:
Defragment your physical volumes on a regular schedule for
increased performance!
LUN’s need to be 20-50% larger than the data stored on them allow
for effective defragmentation of the data files.
Performance Monitor Counters to watch:
Average Disk Queue Length
Single Digit values are optimal.
Occasional double-digit values aren’t a large concern.
Sustained triple-digit values require attention.
Architectural Design Considerations
Topology
A single list should not have more than 2,000 items
per list view.
A view or container represents the root of the list, as well
as any folders within the list; a folder is a container
because other list items are stored within it.
Whitepaper: Working with large lists in Office SharePoint
Server 2007 (Steve Peschka)
http://go.microsoft.com/fwlink/?LinkId=95450
Disk Drive Speed
15K RPM recommended.
IIS Application Pools
Ensure “Max Used Memory” setting utilizes all the
available RAM in your WFE’s.
Architectural Design Considerations
Managed Paths
The easiest way to segregate content into multiple
site collections and unique databases is to use the
“Define Managed Paths” and “CreateSiteInNewDB”
features in MOSS.
Architectural Design Considerations
STSADM Command-line Tool and
CreateSiteInNewDB Operation.
The Create Site Collection option in Central
Administration does not allow creating and
assigning a specific database to a site collection.
However, the STSADM command-line tool can be
used in conjunction with the CreateSiteInNewDB
operation to provision site collections with their
own content repositories.
Enables scripting of the entire site collection
process and running it as a batch file!
Architectural Design Considerations
STSADM Command-line Tool and
CreateSiteInNewDB Operation - Continued.
The Office SharePoint Server 2007 STSADM
command-line tool is located in the following
directory on SharePoint farm servers:
%COMMONPROGRAMFILES%\Microsoft shared\Web
server extensions\12\Bin
Script Sample
STSADM –o CreateSiteInNewDB
@SET STSADM="%COMMONPROGRAMFILES%\Microsoft shared\Web
server extensions\12\Bin\stsadm“
%STSADM% -o createsiteinnewdb
-url http://farm.mpsc.int/divisions/div01
-owneremail [email protected]
-ownerlogin mpsc\admin
-lcid 1033
-title "Loan Originations - Division 01"
-description "Division 01 Loan Originations Center."
-databaseuser mpscadmin
-databasepassword <password>
-databaseserver sap_win
-databasename WSS_Content_MPSC1_Div01
Creating Secondary Files Using SQL Filegroups
Creating Secondary Files
Creating Secondary Files
Creating Secondary Files
Creating Secondary Files
Creating Secondary Files
Creating Secondary Files
Creating Secondary Files
Creating Secondary Files
Issues and concerns using Filegroups:
Back-up and Restore.
OOB Search restore unaware of Filegroups
Will place Filegroup onto same drive as when backup run.
Ensure all machines that you restore to have a drive with
same drive letter.
Future upgrades, Services Packs, Hot Fixes
Potential to modify index moved into Filegroup, or add a
new index to one of the tables in Filegroup.
SQL2005 and Greater
Script to move indexes uses new feature in SQL2005;
cannot use on SQL2000 or earlier.
SQL Filegroups and Search:
http://blogs.msdn.com/enterprisesearch/archive/2
008/09/16/sql-file-groups-and-search.aspx
Real-World
Huge Scenarios
Real-world Scenarios
Automotive Mfgr. Business Requirements (Phase I)
Loan Origination Application built on Office SharePoint
Server 2007
1. Ability to house 10.5 million images.
2. System performance with a “normal” input load defined as
receipt of 27,000 images per day.
One day = 10 hours.
This would include indexing of current document metadata for
search availability within 1 hour of receipt.
3.
Simulate user load to represent 200 users simultaneously
accessing the system to perform the following tasks.
Use search to find elements of document metadata.
View a document (represents a scanned TIFF image).
Update elements of document metadata.
4.
5.
Combine items 2 and 3 above to function simultaneously.
Double the volume for each of the scenarios.
Real-world Scenarios
Data Load Process (Phase I)
Partner (KnowledgeLake) developed a custom loader utility
application that implements the KnowledgeLake Document
Release Engine. This engine:
Is capable of releasing 9.17 documents/second per server into any
combination of the SharePoint Document Library repositories.
Is capable of releasing documents of any type to SharePoint.
Is capable of setting metadata (Site Column) value during the upload
process.
Employs a high-volume, storage-based folder architecture within
SharePoint to ensure UI responsiveness.
The loader utility was deployed and executed on 4 servers.
Using this application, we were able to achieve:
An average document load throughput of 36.6 documents per
second!
An average daily input of 3.17 million documents!
During the intense load process of 10.5 million documents,
the average SharePoint Web Front-End (WFE) utilization was
only 28%!
Real-world Scenarios
Base Scale / Performance requirements (Phase II)
Collaboration Portal built on Office SharePoint Server
2007 (SP1 applied)
1. Ability to manage 50+ million content items.
Mix of images from Phase I effort, with additional
scanned images and Office documents consisting of
Word, Excel, PowerPoint, as well as Adobe PDF
documents.
2. Simulate user load to represent maximum number of
users simultaneously accessing the system to perform
the following tasks.
Use search to find elements of content (full-text) and content
metadata.
View a document.
Update elements of document metadata.
Browse site structure.
Load additional content.
Real-world Scenarios
Data Load Process (Phase II)
TIFF Load:
10 million documents (Phase I) + 25 million documents (Phase II) = 35 million
documents total!
Four Web Front-Ends were used for the load process.
Peak Load Rate:
121.4 docs per second/10.49 million documents per day.
Average Load Rate:
~5 million documents per day.
Load Time:
5 days.
Office Load:
15 million documents consisting of Word (.docx), Excel (.xlsx), PowerPoint (.pptx)
and Adobe PDF.
Five Web Front-Ends were used for the load process.
Peak Load Rate:
24.3 docs per second/2.1 million documents per day.
Average Load Rate:
~1.9 million documents per day.
Load Time:
8 days.
NOTE: Load rates were heavily oppressed in the Office load due to the
automation process that created the PDF files.
What does the logical architecture look like?!
What does the
physical
architecture look
like?!
Let's talk
scalability…
Scale OUT…
Scale UP…
What does the site
topology look like?!
Phase I
17 Divisional Site
Collections / DB’s
Phase II
10 Departmental
Site Collections /
DB’s
What does the storage architecture look like?
What does the storage architecture look like?
What does the storage architecture look like?
What does the storage architecture look like?
What does the storage architecture look like?
What does the storage architecture look like?
What does the storage architecture look like?
What does the storage architecture look like?
What does the storage architecture look like?
What does the storage architecture look like?
What does the storage architecture look like?
What does the storage architecture look like?
What does the storage architecture look like?
What does the storage architecture look like?
What does the storage architecture look like?
Architectural Design Statistical Results
Phase I
Designed Once / Built Once
No architecture OR configuration changes were
required after the initial build was completed.
10.5+ million documents loaded into the system in
approximately 60 hours!
Full Crawl indexed documents and associated metadata
in roughly 32 hours!
10,623,012 items.
Average content database size for divisional breakouts
was 59.84GB!
Average time for a scanned item to be included in the
index (Incremental crawl) was under 5 minutes!
Total index size was 6.5GB!
Architectural Design Statistical Results
Phase II
40+ million documents loaded into the system
in approximately 13 days!
Additional 25 million images.
15 million Office documents (Word, Excel,
PowerPoint and Adobe PDF).
Dynamic metadata injection upon load.
Full crawl indexed content and associated
metadata in approximately 35 days.
Indexer was a 4 processor machine with 8GB of
RAM.
50,396,079 items indexed!
Lesson Learned: Scale up the Index server
hardware for crawling a large corpus!
Architectural Design Statistical Results
Phase II
Search database size was 539GB.
Lesson Learned: Large search database caused disk
I/O contention; break this out into multiple data file
allocations matching the number of core
processors on SQL Server, and spread them over
unique LUN’s.
Total Index size was 162GB!
Average Content database size for Divisional
breakouts was 200.65GB!
Average Content database size for
Departmental breakouts was 137.60GB!
Scalability of Architecture
Additional phase was added in order to reduce the
overall crawl time in the previous phase of the effort.
Index server scaled up to improve crawl performance.
8-processor machine hyper-threaded (16 cores) with 32GB
RAM!
Search DB was partitioned to use 15 secondary data
files, for a total of 16 files, to match the number of cores
processors (SQL).
Search DB and secondary data files were spread across
16 unique LUN’s to reduce (or eliminate) disk I/O
contention problems experienced in the previous phase.
Crawler Impact Rules defined and implemented to
increase crawl threads.
Request 64 documents at a time.
Real-world Scenarios
Pharmaceutical Business Requirements
Collaboration Portal built on Office SharePoint
Server 2007
Initial implementation required ~40TB of content storage.
Content items consist of Word, Excel, PowerPoint,
HTML, Text, and Adobe PDF documents.
Identify performance characteristics and provide
guidance around content database sizing to fulfill SLA
requirements.
FAST search integration.
How quickly can we serve search results?
Microsoft System Center Data Protection Manager (DPM)
2007 integration.
How quickly can we backup and restore content
databases based on size?
Real-world Scenarios
Data Load Process
71,524,357 documents loaded across two
SharePoint Farms 10.92 days!
Content was spread across the farms into 165
unique content databases.
6,240 Site Collections, each containing 10 sub-sites for a
total of 62,400 sites.
Database sizes were pre-configured to vary in size from
100GB to 350GB to determine performance and/or SLA
impacts.
What does the logical architecture look like?!
What does the physical architecture look like?!
What does the site
topology look like?!
165 Content DB’s
6,240 Site
Collections
10 Sub-Sites in
each collection:
62,400 Sites!
What does the storage architecture look like?
Architectural Design Statistical Results
Conclusion
User Loads
Based on IIS log analysis, it was determined that customer has an
average concurrent user count of 119 users.
EEC lab testing for customer included significantly more load, using a
ZERO THINK TIME in all stress tests completed.
Stress tests included 2 - 3,000 concurrent users.
Based on the 10% rule, testing completed equated to an environment
representing 300,000 users!
TWENTY-FIVE TIMES the average concurrent user load at customer!
RAW number of RPS during peak times is 1,469 at customer.
During EEC lab testing with 25 times the concurrent user load and NO “think
time”, we were able to obtain 773 RPS, which equates to 346.59 ACTUAL RPS!
Database Sizes
System stress testing showed no performance degradation resulting
from increased DB sizes.
Backup and restore functionality was accomplished in under 25 minutes
using either SQL 2005 or DPM 2007.
FAST Search Integration
Successfully integrated FAST search capabilities, indexed content corpus
and served search results as expected.
Joel's SQL Storage Resources
Whitepapers
Blogs
Articles
http://www.sharepointjoel.com/Lists/Posts/Post.aspx?ID=114
SharePoint Deployment Planning Services
SDPS
Microsoft funded!
Customers qualify based
on their software
assurance level
1, 3, 5, 10 and 15-Day
Offerings
Visit SDPS Partner Center
http://www.PartnerSDPS.com
Large-Scale Case Study Available
SharePoint Scalability and Performance
Whitepaper
Contains majority of content you will see here,
along with test results you won’t see here.
TechNet topic:
http://go.microsoft.com/fwlink/?LinkId=120901
Word 2007 format:
http://go.microsoft.com/fwlink/?LinkId=120881
Word 2000-2003 format:
http://go.microsoft.com/fwlink/?LinkId=120890
PDF format:
http://go.microsoft.com/fwlink/?LinkId=120891
SharePoint Scalability
Storage Architecture (KnowledeLake)
Prescriptive Guidance
Storage subsystem for highly-scalable
implementations.
Techniques for monitoring scalability metrics.
Techniques for modifying existing storage
architecture to eliminate poor I/O performance.
PDF format:
http://go.microsoft.com/fwlink/?LinkId=119399
Today's Entertainment
Joel and Paul
@joeloleson
http://www.sharepointjoel.com
Resources
www.microsoft.com/teched
www.microsoft.com/learning
Sessions On-Demand & Community
Microsoft Certification & Training Resources
http://microsoft.com/technet
http://microsoft.com/msdn
Resources for IT Professionals
Resources for Developers
www.microsoft.com/learning
Microsoft Certification and Training Resources
SQL Server Community Resources
The Professional Association for SQL Server (PASS) is an independent,
not-for-profit association, dedicated to supporting, educating, and
promoting the Microsoft SQL Server community.
• Connect: Local Chapters, Special Interest Groups, Online Community
• Share: PASSPort Social Networking, Community Connection Event
• Learn: PASS Summit Annual Conference, Technical Articles, Webcasts
• More about the PASS organization www.sqlpass.org/
Become a FREE PASS Member: www.sqlpass.org/RegisterforSQLPASS.aspx
Learn more about the PASS organization www.sqlpass.org/
Additional Community Resources
SQL Server Community Center www.microsoft.com/sqlserver/2008/en/us/communitycenter.aspx
TechNet Community for IT Professionals
http://technet.microsoft.com/en-us/sqlserver/bb671048.aspx
Developer Center
http://msdn.microsoft.com/en-us/sqlserver/bb671064.aspx
SQL Server 2008 Learning Portal
http://www.microsoft.com/learning/sql/2008/default.mspx
SQL Server Word of the Day
Tuesday, May 12
RESOURCE
GOVERNOR
*Game cards may be picked up at the SQL Server booths in the TLC
Additional Resources
•
•
•
Speaker Blog: http://www.SharePointJoel.com
Twitter: @joeloleson
Other:
External Resources
http://blogs.msdn.com/joelo/archive/2007/07/09/c
apacity-planning-key-links-and-info.aspx
http://www.sharepointjoel.com/Lists/Posts/Post.as
px?ID=114
SQL Server 2008 Business Value Calculator:
www.moresqlserver.com
Complete an
evaluation on
CommNet and
enter to win!
SQL Server 2008 with SharePoint
Hardware and software requirements
http://msdn.microsoft.com/enus/library/ms143506.aspx
To support SQL 2008, Windows SharePoint
Services 3.0 Service Pack 1 must be installed.
http://www.microsoft.com/downloads/details.aspx
?FamilyID=875da47e-89d5-4621-a319a1f5bfedf497&DisplayLang=en
Matrix of features available within each edition
of SQL Server 2008.
http://msdn.microsoft.com/enus/library/cc645993.aspx
SQL Server 2008 with SharePoint
Manageability Enhancements
Policy-based administration
Back-up compression
http://technet.microsoft.com/enus/library/cc645579.aspx
High Availability Enhancements
Data mirroring enhancements
http://technet.microsoft.com/enus/library/cc645581.aspx
SQL Server 2008 with SharePoint
Data Security Enhancements
New encryption functions to enhance security
Transparent Data Encryption (TDE) does not require
SharePoint configuration changes.
http://technet.microsoft.com/enus/library/cc645578.aspx
Scalability and Performance Enhancements
Filtered Indexes and Statistics
Query Performance and Processing
http://technet.microsoft.com/enus/library/cc645580.aspx
SQL Server 2008 with Windows Server 2008
Transactional Replication with SQL Server 2008
Dramatically outperformed SQL 2005 on Win 2003.
Most substantial gains in with Publisher/Subscriber
model both running on SQL 2008 and Win 2008
Performance improvements
Increased resiliency
System failover benefits
See Geo-Replication Performance Gains at
http://msdn.microsoft.com/enus/library/dd263442.aspx
Architectural Design Considerations
Search Availability
Content loading and indexing required concurrently
Separate search database log files and transaction log
files to independent logical units.
Prevent disk array with the transaction logs from
becoming a bottleneck
Help ensure content loading and indexing can occur at
the same time without contention.
The recommended practice for separating the
database volume types for the transaction log files
to unique LUN’s follows.
Content Database Log Files.
Search Database Log Files.
Architectural Design Considerations
Index Server RAM
At a minimum, amount of RAM on Index server
should be greater than or equal to about one-third
the size of the index.
Crawling, Indexing, and Computing Ranking
“Computing Ranking” in Office SharePoint 2007
refers to the process of computing the relevance of
the items that have been crawled.
In a nutshell, this state indicates that the search
system is processing anchors that it discovered
during the crawl and ensuring that the relevance
ranking for the item is current.
Architectural Design Considerations
Crawler Performance
Crawling is an extremely disk intensive operation
on both the computer running SQL Server
(read/write) and the disk on which the index
catalog is stored.
Things to consider for increasing disk speed and
crawler performance:
Check the Current Disk Queue Length on the disks to see
if the queue depth is too high.
The Physical Disk/Current Disk Queue Length counter should be
less than the sum of the number of spindles in the disk array, plus
two.
It’s also possible that disk defragmentation is occurring as
the database grows (DEFRAG!).
Architectural Design Considerations
Crawler Performance - Continued
A simple way to know if SQL Server performance is
impacting the indexer is to look at two counters on
the index server.
\Office Server Search Archival
Plugin(Portal_Content)\Total docs in first queue
\Office Server Search Archival
Plugin(Portal_Content)\Total docs in second queue
If BOTH of these counters are at 500 for more than 15 seconds,
SQL Server is the bottleneck.
Increase the “performance level” setting of the
crawler. This increases the priority that the service
is running at the Windows operating system level.
Architectural Design Considerations
Crawler Performance - Continued
Can increase the number of threads used to crawl
content in Central Administration by defining
Crawler Impact Rules.
For content-intensive indexing with quick search
availability, recommend LARGEST possible computer
(processor, RAM) for the index server role.
With enough power on the index machine (and a capable
SQL back-end), you could create a Crawler Impact Rule to
“Request 64 documents at a time”.
CAUTION:
Closely monitor the performance impacts of these settings
for your specific environment prior to implementation!!!
Architectural Design Considerations
Index Server Performance
To reduce traffic on your Web Front End (WFE)
servers as well as network hops required, make
your Index Server your Target Server.
Dedicated Index with Target WFE:
Index
>
WFE
Dedicated Index AS Target WFE:
Index/WFE
>
SQL
>
SQL
At a minimum, use a dedicated Target Server that is
NOT part of the load-balanced front-end
Reduce traffic on your WFE’s during indexing process.
Database Sizes
Phase I
80.00
Div01
Div02
70.00
Div03
Div04
60.00
Div05
Div06
50.00
Div07
Div08
Div09
40.00
Div10
Div11
30.00
Div12
Div13
20.00
Div14
Div15
10.00
Div16
Div17
0.00
1
Database Sizes
Phase II – Divisional Content
250.00
WSS_Content_MPSC1_Div01.mdf
WSS_Content_MPSC1_Div02.mdf
WSS_Content_MPSC1_Div03.mdf
200.00
WSS_Content_MPSC1_Div04.mdf
WSS_Content_MPSC1_Div05.mdf
WSS_Content_MPSC1_Div06.mdf
WSS_Content_MPSC1_Div07.mdf
150.00
WSS_Content_MPSC1_Div08.mdf
WSS_Content_MPSC1_Div09.mdf
WSS_Content_MPSC1_Div10.mdf
WSS_Content_MPSC1_Div11.mdf
100.00
WSS_Content_MPSC1_Div12.mdf
WSS_Content_MPSC1_Div13.mdf
WSS_Content_MPSC1_Div14.mdf
WSS_Content_MPSC1_Div15.mdf
50.00
WSS_Content_MPSC1_Div16.mdf
WSS_Content_MPSC1_Div17.mdf
0.00
Divisional DB Sizes
Database Sizes
Phase II – Departmental Content
138.00
137.80
WSS_Content_MPSC2_Dpt01.mdf
WSS_Content_MPSC2_Dpt02.mdf
137.60
WSS_Content_MPSC2_Dpt03.mdf
WSS_Content_MPSC2_Dpt04.mdf
137.40
WSS_Content_MPSC2_Dpt05.mdf
137.20
WSS_Content_MPSC2_Dpt06.mdf
WSS_Content_MPSC2_Dpt07.mdf
137.00
WSS_Content_MPSC2_Dpt08.mdf
WSS_Content_MPSC2_Dpt09.mdf
136.80
WSS_Content_MPSC2_Dpt10.mdf
136.60
136.40
Departmental DB Sizes
Content Spread
Phase II – Departmental Content
900,000
800,000
700,000
600,000
docx
500,000
pptx
xlsx
400,000
pdf
300,000
200,000
100,000
0
1
2
3
4
5
6
7
8
9
10
Database Sizes
MPSC Phase I
Type
Search
Divisional Content
Divisional Content
Divisional Content
Divisional Content
Divisional Content
Divisional Content
Divisional Content
Divisional Content
Divisional Content
Divisional Content
Divisional Content
Divisional Content
Divisional Content
Divisional Content
Divisional Content
Divisional Content
Divisional Content
DB Name
MPSC_SharedServices_Search_DB.mdf
WSS_Content_MPSC1_Div01.mdf
WSS_Content_MPSC1_Div02.mdf
WSS_Content_MPSC1_Div03.mdf
WSS_Content_MPSC1_Div04.mdf
WSS_Content_MPSC1_Div05.mdf
WSS_Content_MPSC1_Div06.mdf
WSS_Content_MPSC1_Div07.mdf
WSS_Content_MPSC1_Div08.mdf
WSS_Content_MPSC1_Div09.mdf
WSS_Content_MPSC1_Div10.mdf
WSS_Content_MPSC1_Div11.mdf
WSS_Content_MPSC1_Div12.mdf
WSS_Content_MPSC1_Div13.mdf
WSS_Content_MPSC1_Div14.mdf
WSS_Content_MPSC1_Div15.mdf
WSS_Content_MPSC1_Div16.mdf
WSS_Content_MPSC1_Div17.mdf
TOTAL STORAGE SIZE:
TOTAL DIVISIONAL CONTENT DB SIZE:
AVERAGE DIVISIONAL CONTENT DB SIZE:
Volume
SearchDb_Vol(G:)
Content1_Vol(H:)
Content2_Vol(I:)
Content3_Vol(J:)
Content4_Vol(K:)
Content5_Vol(L:)
Content6_Vol(M:)
Content7_Vol(N:)
Content8_Vol(O:)
Content1_Vol(H:)
Content2_Vol(I:)
Content3_Vol(J:)
Content4_Vol(K:)
Content5_Vol(L:)
Content6_Vol(M:)
Content7_Vol(N:)
Content8_Vol(O:)
Content1_Vol(H:)
Size (GB)
63.40
57.00
60.70
72.00
60.10
23.90
60.60
72.70
69.80
35.50
65.60
61.50
65.60
77.70
64.90
25.90
65.50
78.30
1,080.70
1,017.30
59.84
Database Sizes
MPSC Phase II
DB Type
DB Name
MPSC_SharedServices_Search_DB.mdf
Volume
SearchDb_Vol(G:)
Size (GB)
539.00
Divisional Content
Divisional Content
Divisional Content
Divisional Content
Divisional Content
Divisional Content
Divisional Content
Divisional Content
Divisional Content
Divisional Content
Divisional Content
Divisional Content
Divisional Content
Divisional Content
Divisional Content
Divisional Content
Divisional Content
WSS_Content_MPSC1_Div01.mdf
WSS_Content_MPSC1_Div02.mdf
WSS_Content_MPSC1_Div03.mdf
WSS_Content_MPSC1_Div04.mdf
WSS_Content_MPSC1_Div05.mdf
WSS_Content_MPSC1_Div06.mdf
WSS_Content_MPSC1_Div07.mdf
WSS_Content_MPSC1_Div08.mdf
WSS_Content_MPSC1_Div09.mdf
WSS_Content_MPSC1_Div10.mdf
WSS_Content_MPSC1_Div11.mdf
WSS_Content_MPSC1_Div12.mdf
WSS_Content_MPSC1_Div13.mdf
WSS_Content_MPSC1_Div14.mdf
WSS_Content_MPSC1_Div15.mdf
WSS_Content_MPSC1_Div16.mdf
WSS_Content_MPSC1_Div17.mdf
TOTAL DIVISIONAL CONTENT:
AVERAGE DIVISIONAL CONTENT:
Content1_Vol(H:)
Content2_Vol(I:)
Content3_Vol(J:)
Content4_Vol(K:)
Content5_Vol(L:)
Content6_Vol(M:)
Content7_Vol(N:)
Content8_Vol(O:)
Content1_Vol(H:)
Content2_Vol(I:)
Content3_Vol(J:)
Content4_Vol(K:)
Content5_Vol(L:)
Content6_Vol(M:)
Content7_Vol(N:)
Content8_Vol(O:)
Content1_Vol(H:)
191.00
198.00
201.00
197.00
201.00
199.00
199.00
201.00
188.00
199.00
200.00
203.00
200.00
202.00
227.00
203.00
202.00
3,411.00
200.65
Departmental Content
Departmental Content
Departmental Content
Departmental Content
Departmental Content
Departmental Content
Departmental Content
Departmental Content
Departmental Content
Departmental Content
WSS_Content_MPSC2_Dpt01.mdf
WSS_Content_MPSC2_Dpt02.mdf
WSS_Content_MPSC2_Dpt03.mdf
WSS_Content_MPSC2_Dpt04.mdf
WSS_Content_MPSC2_Dpt05.mdf
WSS_Content_MPSC2_Dpt06.mdf
WSS_Content_MPSC2_Dpt07.mdf
WSS_Content_MPSC2_Dpt08.mdf
WSS_Content_MPSC2_Dpt09.mdf
WSS_Content_MPSC2_Dpt10.mdf
TOTAL DEPARTMENTAL CONTENT:
AVERAGE DEPARTMENTAL CONTENT:
Content1_Vol(H:)
Content2_Vol(I:)
Content3_Vol(J:)
Content4_Vol(K:)
Content5_Vol(L:)
Content6_Vol(M:)
Content7_Vol(N:)
Content8_Vol(O:)
Content5_Vol(L:)
Content7_Vol(N:)
137.00
138.00
138.00
137.00
138.00
138.00
138.00
137.00
137.00
138.00
1,376.00
137.60
Search
GRAND TOTAL CONTENT:
GRAND TOTAL AVERAGE CONTENT:
#
#
ITEMS
FOLDERS
TIFF
TIFF
2,256,639
13,250
2,206,762
13,244
1,877,409
12,295
2,205,250
13,243
1,847,333
13,086
2,205,712
13,241
1,851,289
12,097
1,938,231
12,642
1,787,785
13,216
2,179,837
11,828
2,293,663
12,015
2,216,798
12,016
1,858,731
11,254
2,224,727
12,016
1,974,025
12,491
2,214,368
12,016
1,867,572
11,446
35,006,131
211,396
2,059,184
12,435
Office/PDF Office/PDF
docx
pptx
xlsx
pdf
1,504,884
11,743 868,033 57,255 521,996 57,600
1,507,553
11,600 869,546 57,616 522,878 57,513
1,506,906
11,601 868,779 57,804 522,467 57,856
1,505,731
11,601 868,455 57,536 522,215 57,525
1,505,740
11,601 869,784 57,468 520,818 57,670
1,507,524
11,601 869,765 57,765 522,241 57,753
1,507,108
11,601 870,284 57,362 521,590 57,872
1,503,853
11,601 867,015 57,318 521,748 57,772
1,505,234
11,601 868,209 57,518 521,934 57,573
1,508,146
11,599 870,315 57,571 522,393 57,867
15,062,679
116,149 8,690,185 575,213 5,220,280 577,001
1,506,268
11,615 869,019 57,521 522,028 57,700
5,326.00 50,068,810
197.26 1,854,400
327,545
12,131
1,504,884
1,507,553
1,506,906
1,505,731
1,505,740
1,507,524
1,507,108
1,503,853
1,505,234
1,508,146
Performance of Components Over Time
Load Averages
Search Load
Modify Load
Run # Cncnt. Users
Insert Load
Test %
View Image Load Test
Items/Min
Count
Minutes
Crawl Cycle
WFE CPU
Minutes
%
Index CPU Target CPU SQL CPU TempDb Q TranLog Q SearchDb Q
%
%
%
Length
Length
ContentDb Q
Length
Page Load Time
Length
Seconds
Image Load
Time
Seconds
1*
0
0
0
0
20
15
6.325
0.700
0.330 3.900
0.000
0.001
0.000
0.000
N/A
N/A
2
200
0
0
0
20
15
19.350
0.700
0.320 6.500
0.000
0.001
0.000
0.000
0.042
N/A
3
400
0
0
0
20
15
36.700
0.520
0.320 8.700
0.000
0.001
0.000
0.000
0.059
N/A
4*
0
0
0
0
20
5
13.950
0.690
0.310 8.100
0.000
0.002
0.000
0.000
N/A
N/A
5
200
0
0
0
20
5
27.025
0.700
0.300 11.100
0.000
0.002
0.000
0.000
0.045
N/A
6
200
35
0
0
20
5
32.825
0.640
0.470 13.700
0.000
0.005
0.001
0.001
0.080
N/A
7
200
34
38.17
0
20
5
32.325
1.400
1.100 14.000
0.010
0.009
0.020
0.006
0.074
N/A
8
200
32
38.19 13456
20
5
28.750
1.500
1.100 13.300
0.000
0.008
0.020
0.005
0.064
0.030
9
400
0
0
20
5
42.900
0.570
0.310 13.600
0.000
0.020
0.000
0.000
0.063
N/A
10
400
34
0
20
5
54.6
0.62
19.6
0.001
0.008
0
0
0.12
N/A
11
400
36
78.29
20
5
51.125
2.100
1.400 18.700
0.020
0.015
0.070
0.010
0.110
N/A
12
400
35
79.33 26224
20
5
44.750
2.200
1.600 17.700
0.001
0.014
0.070
0.010
0.090
0.030
13
0
0
37.95
0
20
5
13.225
1.400
1.000 8.200
0.000
0.005
0.060
0.010
N/A
N/A
14
0
0
78.1
0
20
5
13.375
2.000
1.300 8.500
0.000
0.009
0.150
0.020
N/A
N/A
* Denotes Baseline Load Test
0
0
0
0.49
How do we pull all this together?!
Database #
Pharmaceutical
Content
Database
Distribution
Substitute “F1”
with SQL Server
number to
generate
unique DB’s
Farm 1: 2 SQL
Farm 2: 1 SQL
165 Content
Databases!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
Content Database Name
CDB_F1_350_1
CDB_F1_350_2
CDB_F1_350_3
CDB_F1_350_4
CDB_F1_350_5
CDB_F1_350_6
CDB_F1_350_7
CDB_F1_300_1
CDB_F1_300_2
CDB_F1_300_3
CDB_F1_300_4
CDB_F1_300_5
CDB_F1_300_6
CDB_F1_300_7
CDB_F1_300_8
CDB_F1_250_1
CDB_F1_250_2
CDB_F1_250_3
CDB_F1_250_4
CDB_F1_250_5
CDB_F1_250_6
CDB_F1_250_7
CDB_F1_250_8
CDB_F1_250_9
CDB_F1_250_10
CDB_F1_250_11
CDB_F1_250_12
CDB_F1_250_13
CDB_F1_250_14
CDB_F1_250_15
CDB_F1_250_16
CDB_F1_250_17
CDB_F1_250_18
CDB_F1_200_1
CDB_F1_200_2
CDB_F1_200_3
CDB_F1_200_4
CDB_F1_200_5
CDB_F1_200_6
CDB_F1_200_7
CDB_F1_200_8
CDB_F1_200_9
CDB_F1_200_10
CDB_F1_200_11
CDB_F1_200_12
CDB_F1_200_13
CDB_F1_200_14
CDB_F1_200_15
CDB_F1_200_16
CDB_F1_200_17
CDB_F1_150_1
CDB_F1_150_2
CDB_F1_150_3
CDB_F1_100_1
CDB_F1_100_2
Database
Size(TB)
0.35
0.35
0.35
0.35
0.35
0.35
0.35
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.25
0.25
0.25
0.25
0.25
0.25
0.25
0.25
0.25
0.25
0.25
0.25
0.25
0.25
0.25
0.25
0.25
0.25
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.15
0.15
0.15
0.1
0.1
# of Site
Collections
Site Collection Names
30
60
60
60
60
60
60
30
60
60
60
60
60
60
60
30
60
60
60
60
60
60
60
60
60
60
60
60
60
60
60
60
60
30
60
60
60
60
60
60
60
60
60
60
60
60
60
60
60
60
30
60
60
30
60
Site_F1_350_1_1 thru 30
Site_F1_350_2_1 thru 60
Site_F1_350_3_1 thru 60
Site_F1_350_4_1 thru 60
Site_F1_350_5_1 thru 60
Site_F1_350_6_1 thru 60
Site_F1_350_7_1 thru 60
Site_F1_300_1_1 thru 30
Site_F1_300_2_1 thru 60
Site_F1_300_3_1 thru 60
Site_F1_300_4_1 thru 60
Site_F1_300_5_1 thru 60
Site_F1_300_6_1 thru 60
Site_F1_300_7_1 thru 60
Site_F1_300_8_1 thru 60
Site_F1_250_1_1 thru 30
Site_F1_250_2_1 thru 60
Site_F1_250_3_1 thru 60
Site_F1_250_4_1 thru 60
Site_F1_250_5_1 thru 60
Site_F1_250_6_1 thru 60
Site_F1_250_7_1 thru 60
Site_F1_250_8_1 thru 60
Site_F1_250_9_1 thru 60
Site_F1_250_10_1 thru 60
Site_F1_250_11_1 thru 60
Site_F1_250_12_1 thru 60
Site_F1_250_13_1 thru 60
Site_F1_250_14_1 thru 60
Site_F1_250_15_1 thru 60
Site_F1_250_16_1 thru 60
Site_F1_250_17_1 thru 60
Site_F1_250_18_1 thru 60
Site_F1_200_1_1 thru 30
Site_F1_200_2_1 thru 60
Site_F1_200_3_1 thru 60
Site_F1_200_4_1 thru 60
Site_F1_200_5_1 thru 60
Site_F1_200_6_1 thru 60
Site_F1_200_7_1 thru 60
Site_F1_200_8_1 thru 60
Site_F1_200_9_1 thru 60
Site_F1_200_10_1 thru 60
Site_F1_200_11_1 thru 60
Site_F1_200_12_1 thru 60
Site_F1_200_13_1 thru 60
Site_F1_200_14_1 thru 60
Site_F1_200_15_1 thru 60
Site_F1_200_16_1 thru 60
Site_F1_200_17_1 thru 60
Site_F1_150_1_1
Site_F1_150_1_1 thru 60
Site_F1_150_2_1 thru 60
Site_F1_100_1_1
Site_F1_100_2_1 thru 60
Subsites under each site
collection
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
How do we pull all this together?!
Pharmaceutical Data Load Statistics
Farm 1
Farm 2
SQL 1-1 Total Docs
SQL 1-1 Total TB
23,196,327.00
13.16 TB
SQL 1-1 Total Docs
SQL 1-1 Total TB
SQL 1-2 Total Docs
SQL 1-2 Total TB
Total TB
23,110,478.00
13.12 TB
26.29 TB
Total TB
Type
DOCX
XLSX
PPTX
HTML
TXT
PDF
Total Docs
Total Load Time
Count
%
23,393,946.00 50.52%
9,364,053.00 20.22%
7,007,626.00 15.13%
2,334,473.00 5.04%
2,391,297.00 5.16%
1,815,385.00 3.92%
46,306,780.00
100%
10.92 Days
Type
DOCX
XLSX
PPTX
HTML
TXT
PDF
Total Docs
Total Load Time
25,217,577.00
13.10 TB
13.10 TB
Count
12,668,489.00
5,071,017.00
3,801,674.00
1,273,835.00
1,295,295.00
1,107,267.00
25,217,577.00
%
50.24%
20.11%
15.08%
5.05%
5.14%
4.39%
100%
7.96 Days
Architectural Design Statistical Results
Testing Results – 300GB Content Databases
RUN:
USERS:
Avg. Response Time (Sec)
Avg. First Byte Time (Sec)
Avg. Page Time (Sec)
Avg. Failed Requests
Avg. Reqs/Sec
Actual RPS *
Avg. WFE CPU Util
Proc's
Memory
Think Time
ALL.300.501 ALL.300.501 ALL.300.501 ALL.300.501 ALL.300.501
200
500
1000
2000
5000
0.27
0.27
0.84
9
665
298
53.28%
16(8x2)
32
0
0.68
0.65
2.0
16
787
352
57.3%
16(8x2)
32
0
1.37
1.26
3.60
12
779
349
58.18%
16(8x2)
32
0
2.8
2.5
7.5
17
493
221
32.67%
16(8x2)
32
0
7.4
5.7
16.7
22
573
256
40.6%
16(8x2)
32
0
Architectural Design Statistical Results
Testing Results – 350GB Content Databases
350.300.5
01
350.300.5
01
350.300.5
01
350.300.5
01
350.300.5
01
350.300.5
01
200
500
1000
2000
3000
5000
0.34
0.51
1.17
2.4
2.5
5.04
0.33
.50
1.15
2.3
2.9
4.06
Avg. Page Time (Sec)
1.02
1.50
3.59
7.4
8.6
12.2
Avg. Failed Requests
123
282
258
261
371
520
Avg. Reqs/Sec
537
913
795
617
773
842
Actual RPS *
241
224
356
277
347
378
Avg. WFE CPU Util
42.73%
71.4%
65.88%
46.55%
59.05%
72.77%
Proc's
16(8x2)
16(8x2)
16(8x2)
16(8x2)
16(8x2)
16(8x2)
32
32
32
32
32
32
0
0
0
0
0
0
RUN:
USERS=:
Avg. Response Time
(Sec)
Avg. First Byte Time
(Sec)
Memory
Think Time
Architectural Design Statistical Results
Testing Results – 250GB Content Databases
RUN:
USERS:
250.300.50 250.300.50 250.300.50 250.300.50 250.300.50 250.300.50
1
1
1
1
1
1
200
500
1000
2000
3000
5000
0.33
0.60
1.3
2.58
3.9
6.4
0.33
0.59
1.2
2.17
3.2
5.0
Avg. Page Time (Sec)
1.0
1.81
3.7
6.6
9.8
14.6
Avg. Failed Requests
0
3
1
2
2
3
Avg. Reqs/Sec
554
764
769
714
652
648
Actual RPS *
248
343
345
320
292
291
Avg. WFE CPU Util
44.15%
59.35%
58.22%
51%
48.25%
46.6%
Proc's
16(8x2)
16(8x2)
16(8x2)
16(8x2)
16(8x2)
16(8x2)
32
32
32
32
32
32
0
0
0
0
0
0
Avg. Response Time
(Sec)
Avg. First Byte Time
(Sec)
Memory
Think Time
Architectural Design Statistical Results
Testing Results – 150GB Content Databases
RUN:
USERS=:
Avg. Response Time
(Sec)
Avg. First Byte Time
(Sec)
Avg. Page Time (Sec)
Avg. Failed Requests
Avg. Reqs/Sec
Acutal RPS *
Avg. WFE CPU Util
Proc's
Memory
Think Time
150.300.50 150.300.50 150.300.50 150.300.50 150.300.50 150.300.50
1
1
1
1
1
1
200
500
1000
2000
3000
5000
0.27
0.60
1.3
1.94
2.5
4.3
0.27
0.81
507
663
297
53.55%
16(8x2)
32
0
0.59
1.78
1079
927
416
77.45%
16(8x2)
32
0
1.2
3.8
875
737
330
77.95%
16(8x2)
32
0
1.84
5.46
939
789
354
67.45%
16(8x2)
32
0
2.4
7.3
1238
706
317
57.8%
16(8x2)
32
0
3.7
11.0
1616
767
344
70.0%
16(8x2)
32
0
Real-world Scenarios
Farm 1
Data Load Statistics
DOCX
XLSX
Farm 1:
PPTX
HTML
TXT
PDF
Total
DOCX
XLSX
PPTX
HTML
TXT
PDF
46,306,780 23,393,946 9,364,053 7,007,626 2,334,473 2,391,297 1,815,385
DOCX_PCT XLSX_PCT PPTX_PCT HTML_PCT TXT_PCT PDF_PCT
50.26%
20.11%
15.07%
4.95%
5.08%
3.46%
Farm 2
DOCX
XLSX
Farm 2:
PPTX
HTML
TXT
PDF
Total
DOCX
XLSX
PPTX
HTML
TXT
PDF
25,217,577 12,668,489 5,071,017 3,801,674 1,273,835 1,295,295 1,107,267
DOCX_PCT XLSX_PCT PPTX_PCT HTML_PCT TXT_PCT PDF_PCT
50.24%
20.11%
15.08%
5.05%
5.13%
4.39%
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should
not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS,
IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.