Paul Learning Sr. Consultant Microsoft Heartland District Joel Oleson Architect, Evangelist Quest Software @joeloleson http://SharePointJoel.com Audience Poll New to SharePoint? 1-3 Years Experience? (SharePoint) 4-8 Years Experience? (SharePoint) Large-scale Implementation (~5TB) experience? Scalability.
Download ReportTranscript Paul Learning Sr. Consultant Microsoft Heartland District Joel Oleson Architect, Evangelist Quest Software @joeloleson http://SharePointJoel.com Audience Poll New to SharePoint? 1-3 Years Experience? (SharePoint) 4-8 Years Experience? (SharePoint) Large-scale Implementation (~5TB) experience? Scalability.
Paul Learning Sr. Consultant Microsoft Heartland District Joel Oleson Architect, Evangelist Quest Software @joeloleson http://SharePointJoel.com Audience Poll New to SharePoint? 1-3 Years Experience? (SharePoint) 4-8 Years Experience? (SharePoint) Large-scale Implementation (~5TB) experience? Scalability or performance issues in SharePoint deployments? How many SQL Admins are freaking out because of the number of SharePoint databases? Session Overview How were these “considerations” derived? SQL Server 2008 with SharePoint SharePoint Database Overview (Demo) Architectural Design Considerations Real-world scenarios Business Requirements Logical and Physical Architecture Architectural Design Statistical Results Large-scale Case Study and Whitepaper Storage Architecture Whitepaper Appendix: DB Sizes, Content Distribution… Considerations Information based on real-world, large-scale SharePoint Implementations. Large software company (Microsoft) Intranet Portal for 120K users Global Enterprise Collaboration Solution (~20TB) Scalable Hosting Solution (SharePoint Online) Large automotive manufacturer Loan Origination Application / Document Repository ~50 Million content items (~6 TB) Large pharmaceutical company Document Repository ~75 Million content items (~40 TB) SharePoint Containment Hierarchy Farm Servers Web Front End, APP, SQL Web Applications Central Admin, SSP Admin, Content Databases Content, Config, SSP, Search Site Collections Internet, Intranet Portal, Wikis, Blogs, Team, Doc, Mtg Sites Wikis, Blogs, Team, Doc, Mtg Lists Doc Lib, Pages, Events, Discussions, Surveys, etc… Items Files, calendar items, contacts, customers, images, custom SharePoint Databases Overview Understanding SharePoint Databases Farm • Config • Servers • Web Apps • Solutions • Global Config Web App • Content 1..2 • Site Collections • Sites • Lists • Pages • Documents • DWPs SSP • Search • Properties • SSP • My site host config • Profiles • BDC config • Excel Calc Understanding Configuration DB Config Database Sites Servers VServers Understanding Content DB Content Database Sites Webs Doc Stream Understanding SSP DB - Search Search Database Search Properties Understanding SSP DB – SSP SSP Database MySite Host Config Profiles BDC Config Excel Calc SharePoint Databases Overview SharePoint and SQL Server MOSS DB Overview MOSS DB Overview Configuration Database MOSS DB Overview Content Databases MOSS DB Overview Content Databases MOSS DB Overview SSP Databases MOSS DB Overview SSP Databases SharePoint Database Performance Database Disk I/O Demand Most Demand Medium Demand Low Demand Search Config *Content.. Temp +SSP Model Tlogs Master * Except during backup and Indexing + Except during Profile Import Top Performance Killers 1. 2. 3. 4. Indexing/Crawling Backup (SQL & Tape) Profile Import Misc Timer Jobs – User Sync for large #s of Users 5. STSADM Backup/Restore 6. Large List Operations 7. Heavy User Operation List Import/Write Architectural Design Considerations Architectural Design Considerations Database Volumes Separate database volumes into unique LUN’s consisting of unique physical disk spindles. Prioritize data among faster disks with ranking: SQL TempDB data files Database transaction log files Search database Content databases In a heavily read-oriented portal site, prioritize data over logs. Separate out Search database transaction log from content database transaction logs. Architectural Design Considerations SQL TempDB Data Files Recommended practice is that the number of data files allocated for TempDB should be equal to number of core CPU’s in SQL Server. TempDB data file sizes should be consistent across all data files. TempDB data files should be spread across unique LUN’s and separated from Content DB, Search DB, etc… TempDB Log file separated to unique LUN. Architectural Design Considerations SQL TempDB Data Files - Continued Optimal TempDB data file sizes can be calculated using the following formula: [MAX DB SIZE (KB)] X [.25] / [# CORES] = DATA FILE SIZE (KB) Calculation result (starting size) should be roughly equal to 25% of the largest content or search DB. Use RAID 10; separate LUN from other database objects (content, search, etc…). “Autogrow” feature set to a fixed amount; if auto grow occurs, permanently increase TempDB size. Architectural Design Considerations Content Databases 100 content databases per Web application 100GB per content database CAUTION: Major DB locking issues reported in collaborative DM scenarios above 100GB Need to ensure that you understand the issues based on number of users, usage profiles, etc… Service Level Agreement (SLA) requirements for backup and restore will also have an impact on this decision. Lab testing completed demonstrated SharePoint performance was NOT impacted by utilizing larger DB sizes; tests included content DB sizes that were 100GB, 150GB, 200GB, 250GB, 300GB and 350GB. See Appendix for test results! Architectural Design Considerations Content Databases - Continued Pre-construct and pre-size Script generation of empty database objects “Autogrow” feature on Use RAID 5 or RAID 10 logical units RAID 10 is the best choice when cost is not a concern. RAID 5 will be sufficient and will save on costs, since content databases tend to be more read intensive than write intensive. Multi-core computer running SQL Server Primary file group could consist of a data file for each CPU core present in SQL Server. Move each data file to separate logical units consisting of unique physical disk spindles. Architectural Design Considerations Search Database Pre-construct and pre-size Script generation of empty database objects “Autogrow” feature on Use RAID 10 logical units Should be a requirement for large-scale systems Search database is extremely read/write intensive Multi-core computer running SQL Server Primary file group could consist of a data file for each CPU core present in SQL Server. Move each data file to separate logical units consisting of unique physical disk spindles. Architectural Design Considerations Search Database Search database is VERY read/write intensive! Do not place any other database data files on any logical unit where search database files reside. If possible, try to ensure that the RAID 10 logical units for the search database data files do not share their physical spindles with other databases. Place the search database log files on an independent logical unit. Architectural Design Considerations Storage Architecture Database storage for content items will be between 1.2 and 1.5 time the raw file size. Use the following formula to calculate how much disk space is required for the search database. [GB DISK SPACE RQD] = [TOTAL CONTENT SIZE(GB)] X [FILE SIZE MODIFIER] X 4 Where [FILE SIZE MODIFIER] is a number in the following range, based on the average size of the files in your corpus: 1.0: 0.12: 0.05: Content consists of small files (avg=1KB) Content consists of moderate files (avg=10KB) Content consists of large files (avg>=100K) Architectural Design Considerations Database Maintenance SQL Server 2005 SP2 is needed if using the DB maintenance wizard (KB930887). Physical Volume File Fragmentation: Defragment your physical volumes on a regular schedule for increased performance! LUN’s need to be 20-50% larger than the data stored on them allow for effective defragmentation of the data files. Performance Monitor Counters to watch: Average Disk Queue Length Single Digit values are optimal. Occasional double-digit values aren’t a large concern. Sustained triple-digit values require attention. Architectural Design Considerations Topology A single list should not have more than 2,000 items per list view. A view or container represents the root of the list, as well as any folders within the list; a folder is a container because other list items are stored within it. Whitepaper: Working with large lists in Office SharePoint Server 2007 (Steve Peschka) http://go.microsoft.com/fwlink/?LinkId=95450 Disk Drive Speed 15K RPM recommended. IIS Application Pools Ensure “Max Used Memory” setting utilizes all the available RAM in your WFE’s. Architectural Design Considerations Managed Paths The easiest way to segregate content into multiple site collections and unique databases is to use the “Define Managed Paths” and “CreateSiteInNewDB” features in MOSS. Architectural Design Considerations STSADM Command-line Tool and CreateSiteInNewDB Operation. The Create Site Collection option in Central Administration does not allow creating and assigning a specific database to a site collection. However, the STSADM command-line tool can be used in conjunction with the CreateSiteInNewDB operation to provision site collections with their own content repositories. Enables scripting of the entire site collection process and running it as a batch file! Architectural Design Considerations STSADM Command-line Tool and CreateSiteInNewDB Operation - Continued. The Office SharePoint Server 2007 STSADM command-line tool is located in the following directory on SharePoint farm servers: %COMMONPROGRAMFILES%\Microsoft shared\Web server extensions\12\Bin Script Sample STSADM –o CreateSiteInNewDB @SET STSADM="%COMMONPROGRAMFILES%\Microsoft shared\Web server extensions\12\Bin\stsadm“ %STSADM% -o createsiteinnewdb -url http://farm.mpsc.int/divisions/div01 -owneremail [email protected] -ownerlogin mpsc\admin -lcid 1033 -title "Loan Originations - Division 01" -description "Division 01 Loan Originations Center." -databaseuser mpscadmin -databasepassword <password> -databaseserver sap_win -databasename WSS_Content_MPSC1_Div01 Creating Secondary Files Using SQL Filegroups Creating Secondary Files Creating Secondary Files Creating Secondary Files Creating Secondary Files Creating Secondary Files Creating Secondary Files Creating Secondary Files Creating Secondary Files Issues and concerns using Filegroups: Back-up and Restore. OOB Search restore unaware of Filegroups Will place Filegroup onto same drive as when backup run. Ensure all machines that you restore to have a drive with same drive letter. Future upgrades, Services Packs, Hot Fixes Potential to modify index moved into Filegroup, or add a new index to one of the tables in Filegroup. SQL2005 and Greater Script to move indexes uses new feature in SQL2005; cannot use on SQL2000 or earlier. SQL Filegroups and Search: http://blogs.msdn.com/enterprisesearch/archive/2 008/09/16/sql-file-groups-and-search.aspx Real-World Huge Scenarios Real-world Scenarios Automotive Mfgr. Business Requirements (Phase I) Loan Origination Application built on Office SharePoint Server 2007 1. Ability to house 10.5 million images. 2. System performance with a “normal” input load defined as receipt of 27,000 images per day. One day = 10 hours. This would include indexing of current document metadata for search availability within 1 hour of receipt. 3. Simulate user load to represent 200 users simultaneously accessing the system to perform the following tasks. Use search to find elements of document metadata. View a document (represents a scanned TIFF image). Update elements of document metadata. 4. 5. Combine items 2 and 3 above to function simultaneously. Double the volume for each of the scenarios. Real-world Scenarios Data Load Process (Phase I) Partner (KnowledgeLake) developed a custom loader utility application that implements the KnowledgeLake Document Release Engine. This engine: Is capable of releasing 9.17 documents/second per server into any combination of the SharePoint Document Library repositories. Is capable of releasing documents of any type to SharePoint. Is capable of setting metadata (Site Column) value during the upload process. Employs a high-volume, storage-based folder architecture within SharePoint to ensure UI responsiveness. The loader utility was deployed and executed on 4 servers. Using this application, we were able to achieve: An average document load throughput of 36.6 documents per second! An average daily input of 3.17 million documents! During the intense load process of 10.5 million documents, the average SharePoint Web Front-End (WFE) utilization was only 28%! Real-world Scenarios Base Scale / Performance requirements (Phase II) Collaboration Portal built on Office SharePoint Server 2007 (SP1 applied) 1. Ability to manage 50+ million content items. Mix of images from Phase I effort, with additional scanned images and Office documents consisting of Word, Excel, PowerPoint, as well as Adobe PDF documents. 2. Simulate user load to represent maximum number of users simultaneously accessing the system to perform the following tasks. Use search to find elements of content (full-text) and content metadata. View a document. Update elements of document metadata. Browse site structure. Load additional content. Real-world Scenarios Data Load Process (Phase II) TIFF Load: 10 million documents (Phase I) + 25 million documents (Phase II) = 35 million documents total! Four Web Front-Ends were used for the load process. Peak Load Rate: 121.4 docs per second/10.49 million documents per day. Average Load Rate: ~5 million documents per day. Load Time: 5 days. Office Load: 15 million documents consisting of Word (.docx), Excel (.xlsx), PowerPoint (.pptx) and Adobe PDF. Five Web Front-Ends were used for the load process. Peak Load Rate: 24.3 docs per second/2.1 million documents per day. Average Load Rate: ~1.9 million documents per day. Load Time: 8 days. NOTE: Load rates were heavily oppressed in the Office load due to the automation process that created the PDF files. What does the logical architecture look like?! What does the physical architecture look like?! Let's talk scalability… Scale OUT… Scale UP… What does the site topology look like?! Phase I 17 Divisional Site Collections / DB’s Phase II 10 Departmental Site Collections / DB’s What does the storage architecture look like? What does the storage architecture look like? What does the storage architecture look like? What does the storage architecture look like? What does the storage architecture look like? What does the storage architecture look like? What does the storage architecture look like? What does the storage architecture look like? What does the storage architecture look like? What does the storage architecture look like? What does the storage architecture look like? What does the storage architecture look like? What does the storage architecture look like? What does the storage architecture look like? What does the storage architecture look like? Architectural Design Statistical Results Phase I Designed Once / Built Once No architecture OR configuration changes were required after the initial build was completed. 10.5+ million documents loaded into the system in approximately 60 hours! Full Crawl indexed documents and associated metadata in roughly 32 hours! 10,623,012 items. Average content database size for divisional breakouts was 59.84GB! Average time for a scanned item to be included in the index (Incremental crawl) was under 5 minutes! Total index size was 6.5GB! Architectural Design Statistical Results Phase II 40+ million documents loaded into the system in approximately 13 days! Additional 25 million images. 15 million Office documents (Word, Excel, PowerPoint and Adobe PDF). Dynamic metadata injection upon load. Full crawl indexed content and associated metadata in approximately 35 days. Indexer was a 4 processor machine with 8GB of RAM. 50,396,079 items indexed! Lesson Learned: Scale up the Index server hardware for crawling a large corpus! Architectural Design Statistical Results Phase II Search database size was 539GB. Lesson Learned: Large search database caused disk I/O contention; break this out into multiple data file allocations matching the number of core processors on SQL Server, and spread them over unique LUN’s. Total Index size was 162GB! Average Content database size for Divisional breakouts was 200.65GB! Average Content database size for Departmental breakouts was 137.60GB! Scalability of Architecture Additional phase was added in order to reduce the overall crawl time in the previous phase of the effort. Index server scaled up to improve crawl performance. 8-processor machine hyper-threaded (16 cores) with 32GB RAM! Search DB was partitioned to use 15 secondary data files, for a total of 16 files, to match the number of cores processors (SQL). Search DB and secondary data files were spread across 16 unique LUN’s to reduce (or eliminate) disk I/O contention problems experienced in the previous phase. Crawler Impact Rules defined and implemented to increase crawl threads. Request 64 documents at a time. Real-world Scenarios Pharmaceutical Business Requirements Collaboration Portal built on Office SharePoint Server 2007 Initial implementation required ~40TB of content storage. Content items consist of Word, Excel, PowerPoint, HTML, Text, and Adobe PDF documents. Identify performance characteristics and provide guidance around content database sizing to fulfill SLA requirements. FAST search integration. How quickly can we serve search results? Microsoft System Center Data Protection Manager (DPM) 2007 integration. How quickly can we backup and restore content databases based on size? Real-world Scenarios Data Load Process 71,524,357 documents loaded across two SharePoint Farms 10.92 days! Content was spread across the farms into 165 unique content databases. 6,240 Site Collections, each containing 10 sub-sites for a total of 62,400 sites. Database sizes were pre-configured to vary in size from 100GB to 350GB to determine performance and/or SLA impacts. What does the logical architecture look like?! What does the physical architecture look like?! What does the site topology look like?! 165 Content DB’s 6,240 Site Collections 10 Sub-Sites in each collection: 62,400 Sites! What does the storage architecture look like? Architectural Design Statistical Results Conclusion User Loads Based on IIS log analysis, it was determined that customer has an average concurrent user count of 119 users. EEC lab testing for customer included significantly more load, using a ZERO THINK TIME in all stress tests completed. Stress tests included 2 - 3,000 concurrent users. Based on the 10% rule, testing completed equated to an environment representing 300,000 users! TWENTY-FIVE TIMES the average concurrent user load at customer! RAW number of RPS during peak times is 1,469 at customer. During EEC lab testing with 25 times the concurrent user load and NO “think time”, we were able to obtain 773 RPS, which equates to 346.59 ACTUAL RPS! Database Sizes System stress testing showed no performance degradation resulting from increased DB sizes. Backup and restore functionality was accomplished in under 25 minutes using either SQL 2005 or DPM 2007. FAST Search Integration Successfully integrated FAST search capabilities, indexed content corpus and served search results as expected. Joel's SQL Storage Resources Whitepapers Blogs Articles http://www.sharepointjoel.com/Lists/Posts/Post.aspx?ID=114 SharePoint Deployment Planning Services SDPS Microsoft funded! Customers qualify based on their software assurance level 1, 3, 5, 10 and 15-Day Offerings Visit SDPS Partner Center http://www.PartnerSDPS.com Large-Scale Case Study Available SharePoint Scalability and Performance Whitepaper Contains majority of content you will see here, along with test results you won’t see here. TechNet topic: http://go.microsoft.com/fwlink/?LinkId=120901 Word 2007 format: http://go.microsoft.com/fwlink/?LinkId=120881 Word 2000-2003 format: http://go.microsoft.com/fwlink/?LinkId=120890 PDF format: http://go.microsoft.com/fwlink/?LinkId=120891 SharePoint Scalability Storage Architecture (KnowledeLake) Prescriptive Guidance Storage subsystem for highly-scalable implementations. Techniques for monitoring scalability metrics. Techniques for modifying existing storage architecture to eliminate poor I/O performance. PDF format: http://go.microsoft.com/fwlink/?LinkId=119399 Today's Entertainment Joel and Paul @joeloleson http://www.sharepointjoel.com Resources www.microsoft.com/teched www.microsoft.com/learning Sessions On-Demand & Community Microsoft Certification & Training Resources http://microsoft.com/technet http://microsoft.com/msdn Resources for IT Professionals Resources for Developers www.microsoft.com/learning Microsoft Certification and Training Resources SQL Server Community Resources The Professional Association for SQL Server (PASS) is an independent, not-for-profit association, dedicated to supporting, educating, and promoting the Microsoft SQL Server community. • Connect: Local Chapters, Special Interest Groups, Online Community • Share: PASSPort Social Networking, Community Connection Event • Learn: PASS Summit Annual Conference, Technical Articles, Webcasts • More about the PASS organization www.sqlpass.org/ Become a FREE PASS Member: www.sqlpass.org/RegisterforSQLPASS.aspx Learn more about the PASS organization www.sqlpass.org/ Additional Community Resources SQL Server Community Center www.microsoft.com/sqlserver/2008/en/us/communitycenter.aspx TechNet Community for IT Professionals http://technet.microsoft.com/en-us/sqlserver/bb671048.aspx Developer Center http://msdn.microsoft.com/en-us/sqlserver/bb671064.aspx SQL Server 2008 Learning Portal http://www.microsoft.com/learning/sql/2008/default.mspx SQL Server Word of the Day Tuesday, May 12 RESOURCE GOVERNOR *Game cards may be picked up at the SQL Server booths in the TLC Additional Resources • • • Speaker Blog: http://www.SharePointJoel.com Twitter: @joeloleson Other: External Resources http://blogs.msdn.com/joelo/archive/2007/07/09/c apacity-planning-key-links-and-info.aspx http://www.sharepointjoel.com/Lists/Posts/Post.as px?ID=114 SQL Server 2008 Business Value Calculator: www.moresqlserver.com Complete an evaluation on CommNet and enter to win! SQL Server 2008 with SharePoint Hardware and software requirements http://msdn.microsoft.com/enus/library/ms143506.aspx To support SQL 2008, Windows SharePoint Services 3.0 Service Pack 1 must be installed. http://www.microsoft.com/downloads/details.aspx ?FamilyID=875da47e-89d5-4621-a319a1f5bfedf497&DisplayLang=en Matrix of features available within each edition of SQL Server 2008. http://msdn.microsoft.com/enus/library/cc645993.aspx SQL Server 2008 with SharePoint Manageability Enhancements Policy-based administration Back-up compression http://technet.microsoft.com/enus/library/cc645579.aspx High Availability Enhancements Data mirroring enhancements http://technet.microsoft.com/enus/library/cc645581.aspx SQL Server 2008 with SharePoint Data Security Enhancements New encryption functions to enhance security Transparent Data Encryption (TDE) does not require SharePoint configuration changes. http://technet.microsoft.com/enus/library/cc645578.aspx Scalability and Performance Enhancements Filtered Indexes and Statistics Query Performance and Processing http://technet.microsoft.com/enus/library/cc645580.aspx SQL Server 2008 with Windows Server 2008 Transactional Replication with SQL Server 2008 Dramatically outperformed SQL 2005 on Win 2003. Most substantial gains in with Publisher/Subscriber model both running on SQL 2008 and Win 2008 Performance improvements Increased resiliency System failover benefits See Geo-Replication Performance Gains at http://msdn.microsoft.com/enus/library/dd263442.aspx Architectural Design Considerations Search Availability Content loading and indexing required concurrently Separate search database log files and transaction log files to independent logical units. Prevent disk array with the transaction logs from becoming a bottleneck Help ensure content loading and indexing can occur at the same time without contention. The recommended practice for separating the database volume types for the transaction log files to unique LUN’s follows. Content Database Log Files. Search Database Log Files. Architectural Design Considerations Index Server RAM At a minimum, amount of RAM on Index server should be greater than or equal to about one-third the size of the index. Crawling, Indexing, and Computing Ranking “Computing Ranking” in Office SharePoint 2007 refers to the process of computing the relevance of the items that have been crawled. In a nutshell, this state indicates that the search system is processing anchors that it discovered during the crawl and ensuring that the relevance ranking for the item is current. Architectural Design Considerations Crawler Performance Crawling is an extremely disk intensive operation on both the computer running SQL Server (read/write) and the disk on which the index catalog is stored. Things to consider for increasing disk speed and crawler performance: Check the Current Disk Queue Length on the disks to see if the queue depth is too high. The Physical Disk/Current Disk Queue Length counter should be less than the sum of the number of spindles in the disk array, plus two. It’s also possible that disk defragmentation is occurring as the database grows (DEFRAG!). Architectural Design Considerations Crawler Performance - Continued A simple way to know if SQL Server performance is impacting the indexer is to look at two counters on the index server. \Office Server Search Archival Plugin(Portal_Content)\Total docs in first queue \Office Server Search Archival Plugin(Portal_Content)\Total docs in second queue If BOTH of these counters are at 500 for more than 15 seconds, SQL Server is the bottleneck. Increase the “performance level” setting of the crawler. This increases the priority that the service is running at the Windows operating system level. Architectural Design Considerations Crawler Performance - Continued Can increase the number of threads used to crawl content in Central Administration by defining Crawler Impact Rules. For content-intensive indexing with quick search availability, recommend LARGEST possible computer (processor, RAM) for the index server role. With enough power on the index machine (and a capable SQL back-end), you could create a Crawler Impact Rule to “Request 64 documents at a time”. CAUTION: Closely monitor the performance impacts of these settings for your specific environment prior to implementation!!! Architectural Design Considerations Index Server Performance To reduce traffic on your Web Front End (WFE) servers as well as network hops required, make your Index Server your Target Server. Dedicated Index with Target WFE: Index > WFE Dedicated Index AS Target WFE: Index/WFE > SQL > SQL At a minimum, use a dedicated Target Server that is NOT part of the load-balanced front-end Reduce traffic on your WFE’s during indexing process. Database Sizes Phase I 80.00 Div01 Div02 70.00 Div03 Div04 60.00 Div05 Div06 50.00 Div07 Div08 Div09 40.00 Div10 Div11 30.00 Div12 Div13 20.00 Div14 Div15 10.00 Div16 Div17 0.00 1 Database Sizes Phase II – Divisional Content 250.00 WSS_Content_MPSC1_Div01.mdf WSS_Content_MPSC1_Div02.mdf WSS_Content_MPSC1_Div03.mdf 200.00 WSS_Content_MPSC1_Div04.mdf WSS_Content_MPSC1_Div05.mdf WSS_Content_MPSC1_Div06.mdf WSS_Content_MPSC1_Div07.mdf 150.00 WSS_Content_MPSC1_Div08.mdf WSS_Content_MPSC1_Div09.mdf WSS_Content_MPSC1_Div10.mdf WSS_Content_MPSC1_Div11.mdf 100.00 WSS_Content_MPSC1_Div12.mdf WSS_Content_MPSC1_Div13.mdf WSS_Content_MPSC1_Div14.mdf WSS_Content_MPSC1_Div15.mdf 50.00 WSS_Content_MPSC1_Div16.mdf WSS_Content_MPSC1_Div17.mdf 0.00 Divisional DB Sizes Database Sizes Phase II – Departmental Content 138.00 137.80 WSS_Content_MPSC2_Dpt01.mdf WSS_Content_MPSC2_Dpt02.mdf 137.60 WSS_Content_MPSC2_Dpt03.mdf WSS_Content_MPSC2_Dpt04.mdf 137.40 WSS_Content_MPSC2_Dpt05.mdf 137.20 WSS_Content_MPSC2_Dpt06.mdf WSS_Content_MPSC2_Dpt07.mdf 137.00 WSS_Content_MPSC2_Dpt08.mdf WSS_Content_MPSC2_Dpt09.mdf 136.80 WSS_Content_MPSC2_Dpt10.mdf 136.60 136.40 Departmental DB Sizes Content Spread Phase II – Departmental Content 900,000 800,000 700,000 600,000 docx 500,000 pptx xlsx 400,000 pdf 300,000 200,000 100,000 0 1 2 3 4 5 6 7 8 9 10 Database Sizes MPSC Phase I Type Search Divisional Content Divisional Content Divisional Content Divisional Content Divisional Content Divisional Content Divisional Content Divisional Content Divisional Content Divisional Content Divisional Content Divisional Content Divisional Content Divisional Content Divisional Content Divisional Content Divisional Content DB Name MPSC_SharedServices_Search_DB.mdf WSS_Content_MPSC1_Div01.mdf WSS_Content_MPSC1_Div02.mdf WSS_Content_MPSC1_Div03.mdf WSS_Content_MPSC1_Div04.mdf WSS_Content_MPSC1_Div05.mdf WSS_Content_MPSC1_Div06.mdf WSS_Content_MPSC1_Div07.mdf WSS_Content_MPSC1_Div08.mdf WSS_Content_MPSC1_Div09.mdf WSS_Content_MPSC1_Div10.mdf WSS_Content_MPSC1_Div11.mdf WSS_Content_MPSC1_Div12.mdf WSS_Content_MPSC1_Div13.mdf WSS_Content_MPSC1_Div14.mdf WSS_Content_MPSC1_Div15.mdf WSS_Content_MPSC1_Div16.mdf WSS_Content_MPSC1_Div17.mdf TOTAL STORAGE SIZE: TOTAL DIVISIONAL CONTENT DB SIZE: AVERAGE DIVISIONAL CONTENT DB SIZE: Volume SearchDb_Vol(G:) Content1_Vol(H:) Content2_Vol(I:) Content3_Vol(J:) Content4_Vol(K:) Content5_Vol(L:) Content6_Vol(M:) Content7_Vol(N:) Content8_Vol(O:) Content1_Vol(H:) Content2_Vol(I:) Content3_Vol(J:) Content4_Vol(K:) Content5_Vol(L:) Content6_Vol(M:) Content7_Vol(N:) Content8_Vol(O:) Content1_Vol(H:) Size (GB) 63.40 57.00 60.70 72.00 60.10 23.90 60.60 72.70 69.80 35.50 65.60 61.50 65.60 77.70 64.90 25.90 65.50 78.30 1,080.70 1,017.30 59.84 Database Sizes MPSC Phase II DB Type DB Name MPSC_SharedServices_Search_DB.mdf Volume SearchDb_Vol(G:) Size (GB) 539.00 Divisional Content Divisional Content Divisional Content Divisional Content Divisional Content Divisional Content Divisional Content Divisional Content Divisional Content Divisional Content Divisional Content Divisional Content Divisional Content Divisional Content Divisional Content Divisional Content Divisional Content WSS_Content_MPSC1_Div01.mdf WSS_Content_MPSC1_Div02.mdf WSS_Content_MPSC1_Div03.mdf WSS_Content_MPSC1_Div04.mdf WSS_Content_MPSC1_Div05.mdf WSS_Content_MPSC1_Div06.mdf WSS_Content_MPSC1_Div07.mdf WSS_Content_MPSC1_Div08.mdf WSS_Content_MPSC1_Div09.mdf WSS_Content_MPSC1_Div10.mdf WSS_Content_MPSC1_Div11.mdf WSS_Content_MPSC1_Div12.mdf WSS_Content_MPSC1_Div13.mdf WSS_Content_MPSC1_Div14.mdf WSS_Content_MPSC1_Div15.mdf WSS_Content_MPSC1_Div16.mdf WSS_Content_MPSC1_Div17.mdf TOTAL DIVISIONAL CONTENT: AVERAGE DIVISIONAL CONTENT: Content1_Vol(H:) Content2_Vol(I:) Content3_Vol(J:) Content4_Vol(K:) Content5_Vol(L:) Content6_Vol(M:) Content7_Vol(N:) Content8_Vol(O:) Content1_Vol(H:) Content2_Vol(I:) Content3_Vol(J:) Content4_Vol(K:) Content5_Vol(L:) Content6_Vol(M:) Content7_Vol(N:) Content8_Vol(O:) Content1_Vol(H:) 191.00 198.00 201.00 197.00 201.00 199.00 199.00 201.00 188.00 199.00 200.00 203.00 200.00 202.00 227.00 203.00 202.00 3,411.00 200.65 Departmental Content Departmental Content Departmental Content Departmental Content Departmental Content Departmental Content Departmental Content Departmental Content Departmental Content Departmental Content WSS_Content_MPSC2_Dpt01.mdf WSS_Content_MPSC2_Dpt02.mdf WSS_Content_MPSC2_Dpt03.mdf WSS_Content_MPSC2_Dpt04.mdf WSS_Content_MPSC2_Dpt05.mdf WSS_Content_MPSC2_Dpt06.mdf WSS_Content_MPSC2_Dpt07.mdf WSS_Content_MPSC2_Dpt08.mdf WSS_Content_MPSC2_Dpt09.mdf WSS_Content_MPSC2_Dpt10.mdf TOTAL DEPARTMENTAL CONTENT: AVERAGE DEPARTMENTAL CONTENT: Content1_Vol(H:) Content2_Vol(I:) Content3_Vol(J:) Content4_Vol(K:) Content5_Vol(L:) Content6_Vol(M:) Content7_Vol(N:) Content8_Vol(O:) Content5_Vol(L:) Content7_Vol(N:) 137.00 138.00 138.00 137.00 138.00 138.00 138.00 137.00 137.00 138.00 1,376.00 137.60 Search GRAND TOTAL CONTENT: GRAND TOTAL AVERAGE CONTENT: # # ITEMS FOLDERS TIFF TIFF 2,256,639 13,250 2,206,762 13,244 1,877,409 12,295 2,205,250 13,243 1,847,333 13,086 2,205,712 13,241 1,851,289 12,097 1,938,231 12,642 1,787,785 13,216 2,179,837 11,828 2,293,663 12,015 2,216,798 12,016 1,858,731 11,254 2,224,727 12,016 1,974,025 12,491 2,214,368 12,016 1,867,572 11,446 35,006,131 211,396 2,059,184 12,435 Office/PDF Office/PDF docx pptx xlsx pdf 1,504,884 11,743 868,033 57,255 521,996 57,600 1,507,553 11,600 869,546 57,616 522,878 57,513 1,506,906 11,601 868,779 57,804 522,467 57,856 1,505,731 11,601 868,455 57,536 522,215 57,525 1,505,740 11,601 869,784 57,468 520,818 57,670 1,507,524 11,601 869,765 57,765 522,241 57,753 1,507,108 11,601 870,284 57,362 521,590 57,872 1,503,853 11,601 867,015 57,318 521,748 57,772 1,505,234 11,601 868,209 57,518 521,934 57,573 1,508,146 11,599 870,315 57,571 522,393 57,867 15,062,679 116,149 8,690,185 575,213 5,220,280 577,001 1,506,268 11,615 869,019 57,521 522,028 57,700 5,326.00 50,068,810 197.26 1,854,400 327,545 12,131 1,504,884 1,507,553 1,506,906 1,505,731 1,505,740 1,507,524 1,507,108 1,503,853 1,505,234 1,508,146 Performance of Components Over Time Load Averages Search Load Modify Load Run # Cncnt. Users Insert Load Test % View Image Load Test Items/Min Count Minutes Crawl Cycle WFE CPU Minutes % Index CPU Target CPU SQL CPU TempDb Q TranLog Q SearchDb Q % % % Length Length ContentDb Q Length Page Load Time Length Seconds Image Load Time Seconds 1* 0 0 0 0 20 15 6.325 0.700 0.330 3.900 0.000 0.001 0.000 0.000 N/A N/A 2 200 0 0 0 20 15 19.350 0.700 0.320 6.500 0.000 0.001 0.000 0.000 0.042 N/A 3 400 0 0 0 20 15 36.700 0.520 0.320 8.700 0.000 0.001 0.000 0.000 0.059 N/A 4* 0 0 0 0 20 5 13.950 0.690 0.310 8.100 0.000 0.002 0.000 0.000 N/A N/A 5 200 0 0 0 20 5 27.025 0.700 0.300 11.100 0.000 0.002 0.000 0.000 0.045 N/A 6 200 35 0 0 20 5 32.825 0.640 0.470 13.700 0.000 0.005 0.001 0.001 0.080 N/A 7 200 34 38.17 0 20 5 32.325 1.400 1.100 14.000 0.010 0.009 0.020 0.006 0.074 N/A 8 200 32 38.19 13456 20 5 28.750 1.500 1.100 13.300 0.000 0.008 0.020 0.005 0.064 0.030 9 400 0 0 20 5 42.900 0.570 0.310 13.600 0.000 0.020 0.000 0.000 0.063 N/A 10 400 34 0 20 5 54.6 0.62 19.6 0.001 0.008 0 0 0.12 N/A 11 400 36 78.29 20 5 51.125 2.100 1.400 18.700 0.020 0.015 0.070 0.010 0.110 N/A 12 400 35 79.33 26224 20 5 44.750 2.200 1.600 17.700 0.001 0.014 0.070 0.010 0.090 0.030 13 0 0 37.95 0 20 5 13.225 1.400 1.000 8.200 0.000 0.005 0.060 0.010 N/A N/A 14 0 0 78.1 0 20 5 13.375 2.000 1.300 8.500 0.000 0.009 0.150 0.020 N/A N/A * Denotes Baseline Load Test 0 0 0 0.49 How do we pull all this together?! Database # Pharmaceutical Content Database Distribution Substitute “F1” with SQL Server number to generate unique DB’s Farm 1: 2 SQL Farm 2: 1 SQL 165 Content Databases! 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 Content Database Name CDB_F1_350_1 CDB_F1_350_2 CDB_F1_350_3 CDB_F1_350_4 CDB_F1_350_5 CDB_F1_350_6 CDB_F1_350_7 CDB_F1_300_1 CDB_F1_300_2 CDB_F1_300_3 CDB_F1_300_4 CDB_F1_300_5 CDB_F1_300_6 CDB_F1_300_7 CDB_F1_300_8 CDB_F1_250_1 CDB_F1_250_2 CDB_F1_250_3 CDB_F1_250_4 CDB_F1_250_5 CDB_F1_250_6 CDB_F1_250_7 CDB_F1_250_8 CDB_F1_250_9 CDB_F1_250_10 CDB_F1_250_11 CDB_F1_250_12 CDB_F1_250_13 CDB_F1_250_14 CDB_F1_250_15 CDB_F1_250_16 CDB_F1_250_17 CDB_F1_250_18 CDB_F1_200_1 CDB_F1_200_2 CDB_F1_200_3 CDB_F1_200_4 CDB_F1_200_5 CDB_F1_200_6 CDB_F1_200_7 CDB_F1_200_8 CDB_F1_200_9 CDB_F1_200_10 CDB_F1_200_11 CDB_F1_200_12 CDB_F1_200_13 CDB_F1_200_14 CDB_F1_200_15 CDB_F1_200_16 CDB_F1_200_17 CDB_F1_150_1 CDB_F1_150_2 CDB_F1_150_3 CDB_F1_100_1 CDB_F1_100_2 Database Size(TB) 0.35 0.35 0.35 0.35 0.35 0.35 0.35 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.15 0.15 0.15 0.1 0.1 # of Site Collections Site Collection Names 30 60 60 60 60 60 60 30 60 60 60 60 60 60 60 30 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 30 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 30 60 60 30 60 Site_F1_350_1_1 thru 30 Site_F1_350_2_1 thru 60 Site_F1_350_3_1 thru 60 Site_F1_350_4_1 thru 60 Site_F1_350_5_1 thru 60 Site_F1_350_6_1 thru 60 Site_F1_350_7_1 thru 60 Site_F1_300_1_1 thru 30 Site_F1_300_2_1 thru 60 Site_F1_300_3_1 thru 60 Site_F1_300_4_1 thru 60 Site_F1_300_5_1 thru 60 Site_F1_300_6_1 thru 60 Site_F1_300_7_1 thru 60 Site_F1_300_8_1 thru 60 Site_F1_250_1_1 thru 30 Site_F1_250_2_1 thru 60 Site_F1_250_3_1 thru 60 Site_F1_250_4_1 thru 60 Site_F1_250_5_1 thru 60 Site_F1_250_6_1 thru 60 Site_F1_250_7_1 thru 60 Site_F1_250_8_1 thru 60 Site_F1_250_9_1 thru 60 Site_F1_250_10_1 thru 60 Site_F1_250_11_1 thru 60 Site_F1_250_12_1 thru 60 Site_F1_250_13_1 thru 60 Site_F1_250_14_1 thru 60 Site_F1_250_15_1 thru 60 Site_F1_250_16_1 thru 60 Site_F1_250_17_1 thru 60 Site_F1_250_18_1 thru 60 Site_F1_200_1_1 thru 30 Site_F1_200_2_1 thru 60 Site_F1_200_3_1 thru 60 Site_F1_200_4_1 thru 60 Site_F1_200_5_1 thru 60 Site_F1_200_6_1 thru 60 Site_F1_200_7_1 thru 60 Site_F1_200_8_1 thru 60 Site_F1_200_9_1 thru 60 Site_F1_200_10_1 thru 60 Site_F1_200_11_1 thru 60 Site_F1_200_12_1 thru 60 Site_F1_200_13_1 thru 60 Site_F1_200_14_1 thru 60 Site_F1_200_15_1 thru 60 Site_F1_200_16_1 thru 60 Site_F1_200_17_1 thru 60 Site_F1_150_1_1 Site_F1_150_1_1 thru 60 Site_F1_150_2_1 thru 60 Site_F1_100_1_1 Site_F1_100_2_1 thru 60 Subsites under each site collection 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 How do we pull all this together?! Pharmaceutical Data Load Statistics Farm 1 Farm 2 SQL 1-1 Total Docs SQL 1-1 Total TB 23,196,327.00 13.16 TB SQL 1-1 Total Docs SQL 1-1 Total TB SQL 1-2 Total Docs SQL 1-2 Total TB Total TB 23,110,478.00 13.12 TB 26.29 TB Total TB Type DOCX XLSX PPTX HTML TXT PDF Total Docs Total Load Time Count % 23,393,946.00 50.52% 9,364,053.00 20.22% 7,007,626.00 15.13% 2,334,473.00 5.04% 2,391,297.00 5.16% 1,815,385.00 3.92% 46,306,780.00 100% 10.92 Days Type DOCX XLSX PPTX HTML TXT PDF Total Docs Total Load Time 25,217,577.00 13.10 TB 13.10 TB Count 12,668,489.00 5,071,017.00 3,801,674.00 1,273,835.00 1,295,295.00 1,107,267.00 25,217,577.00 % 50.24% 20.11% 15.08% 5.05% 5.14% 4.39% 100% 7.96 Days Architectural Design Statistical Results Testing Results – 300GB Content Databases RUN: USERS: Avg. Response Time (Sec) Avg. First Byte Time (Sec) Avg. Page Time (Sec) Avg. Failed Requests Avg. Reqs/Sec Actual RPS * Avg. WFE CPU Util Proc's Memory Think Time ALL.300.501 ALL.300.501 ALL.300.501 ALL.300.501 ALL.300.501 200 500 1000 2000 5000 0.27 0.27 0.84 9 665 298 53.28% 16(8x2) 32 0 0.68 0.65 2.0 16 787 352 57.3% 16(8x2) 32 0 1.37 1.26 3.60 12 779 349 58.18% 16(8x2) 32 0 2.8 2.5 7.5 17 493 221 32.67% 16(8x2) 32 0 7.4 5.7 16.7 22 573 256 40.6% 16(8x2) 32 0 Architectural Design Statistical Results Testing Results – 350GB Content Databases 350.300.5 01 350.300.5 01 350.300.5 01 350.300.5 01 350.300.5 01 350.300.5 01 200 500 1000 2000 3000 5000 0.34 0.51 1.17 2.4 2.5 5.04 0.33 .50 1.15 2.3 2.9 4.06 Avg. Page Time (Sec) 1.02 1.50 3.59 7.4 8.6 12.2 Avg. Failed Requests 123 282 258 261 371 520 Avg. Reqs/Sec 537 913 795 617 773 842 Actual RPS * 241 224 356 277 347 378 Avg. WFE CPU Util 42.73% 71.4% 65.88% 46.55% 59.05% 72.77% Proc's 16(8x2) 16(8x2) 16(8x2) 16(8x2) 16(8x2) 16(8x2) 32 32 32 32 32 32 0 0 0 0 0 0 RUN: USERS=: Avg. Response Time (Sec) Avg. First Byte Time (Sec) Memory Think Time Architectural Design Statistical Results Testing Results – 250GB Content Databases RUN: USERS: 250.300.50 250.300.50 250.300.50 250.300.50 250.300.50 250.300.50 1 1 1 1 1 1 200 500 1000 2000 3000 5000 0.33 0.60 1.3 2.58 3.9 6.4 0.33 0.59 1.2 2.17 3.2 5.0 Avg. Page Time (Sec) 1.0 1.81 3.7 6.6 9.8 14.6 Avg. Failed Requests 0 3 1 2 2 3 Avg. Reqs/Sec 554 764 769 714 652 648 Actual RPS * 248 343 345 320 292 291 Avg. WFE CPU Util 44.15% 59.35% 58.22% 51% 48.25% 46.6% Proc's 16(8x2) 16(8x2) 16(8x2) 16(8x2) 16(8x2) 16(8x2) 32 32 32 32 32 32 0 0 0 0 0 0 Avg. Response Time (Sec) Avg. First Byte Time (Sec) Memory Think Time Architectural Design Statistical Results Testing Results – 150GB Content Databases RUN: USERS=: Avg. Response Time (Sec) Avg. First Byte Time (Sec) Avg. Page Time (Sec) Avg. Failed Requests Avg. Reqs/Sec Acutal RPS * Avg. WFE CPU Util Proc's Memory Think Time 150.300.50 150.300.50 150.300.50 150.300.50 150.300.50 150.300.50 1 1 1 1 1 1 200 500 1000 2000 3000 5000 0.27 0.60 1.3 1.94 2.5 4.3 0.27 0.81 507 663 297 53.55% 16(8x2) 32 0 0.59 1.78 1079 927 416 77.45% 16(8x2) 32 0 1.2 3.8 875 737 330 77.95% 16(8x2) 32 0 1.84 5.46 939 789 354 67.45% 16(8x2) 32 0 2.4 7.3 1238 706 317 57.8% 16(8x2) 32 0 3.7 11.0 1616 767 344 70.0% 16(8x2) 32 0 Real-world Scenarios Farm 1 Data Load Statistics DOCX XLSX Farm 1: PPTX HTML TXT PDF Total DOCX XLSX PPTX HTML TXT PDF 46,306,780 23,393,946 9,364,053 7,007,626 2,334,473 2,391,297 1,815,385 DOCX_PCT XLSX_PCT PPTX_PCT HTML_PCT TXT_PCT PDF_PCT 50.26% 20.11% 15.07% 4.95% 5.08% 3.46% Farm 2 DOCX XLSX Farm 2: PPTX HTML TXT PDF Total DOCX XLSX PPTX HTML TXT PDF 25,217,577 12,668,489 5,071,017 3,801,674 1,273,835 1,295,295 1,107,267 DOCX_PCT XLSX_PCT PPTX_PCT HTML_PCT TXT_PCT PDF_PCT 50.24% 20.11% 15.08% 5.05% 5.13% 4.39% © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.