CLICK TO ADD MAIN TITLE

Download Report

Transcript CLICK TO ADD MAIN TITLE

Managing your Blackboard®
System for Growth and
Performance
Presented By Steve Feldman
April 13, 2005
Welcome
• Session Objectives:
–
–
–
–
Introduction to Capacity Planning
Introduction to Performance Management
Handling Performance and Capacity Issues
Introduction to Load Testing
• Innovation
– Methodology for Resolving Issues
• Results/Outcomes
– Awareness of what you are doing well or not doing
at all.
Introduction: About Your Presenter
• What do I do at Blackboard?
– Director, Software Performance Engineering and
Architecture
– Part of Product Development, but interface with every
department in Blackboard.
– Manage the Software Performance Engineering (SPE)
Process as part of the development lifecycle.
• A few key points…
–
–
–
–
Been at Blackboard since the Fall of 2003.
Worked on AP2, AP3 and R7.0
Manage a team of several developer/engineers.
Practicing Member of CMG
Performance Maturity Model:
Where do you fit in?
Level 1:
Reactive
Fire Fighting
Level 2:
Monitoring
And
Instrumenting
Level 3:
Performance
Optimizing
Michael Maddox, MCI; A Performance Process Maturity Model
Level 4:
Business
Optimizing
Level 5:
Process
Optimizing
Invest Your Time Understanding
Performance and Capacity
• Set Performance Objectives from the
Start
• Optimize Your Environment from the
Start.
Set Performance and Capacity
Objectives from the Start
• It’s Never too late to define a performance or capacity
objective.
–
–
–
–
–
Come as the result of a problem or issue
Solving a maintenance window or schedule
Planning for an upgrade
Planning for a rollout to new users
New Blackboard Building Blocks, Features or Integration
• Define Clear and Concise Objectives
– Measurable/Quantifiable and Achievable
– Differentiate between Performance and Capacity
• Processing Time versus Workload
• Growth versus Adoption
• Resource Utilization and Maintenance
Optimize Your Environment from
the Start
• Blackboard environments moving from supported to
mission critical (Application Management Maturity
Model)
• Dedicate equipment and even network bandwidth.
• Understand the working parts
– Acquire knowledge about the integrated subsystems.
– Don’t need to be a web, app or db guru, but know
enough to:
• Manage and Maintain Independently
• Research Knowledge Gaps
• Solve Common Issues without Help
Optimize Your Environment from
the Start
• Optimize Environment from the Start based on
Knowledge of Sub-Systems
• Monitor and Instrument Regularly
• Talk to Your Users about their Experience.
• Investigate Yourself
• Finding the Right Configuration takes time:
– Make 1 Change at Time
– Make the Change Based on Empirical Information (Not
Hunches…)
– Maintain a Consistent Configuration for 1 period of time
(month, semester or a grading period)
Introduction to Capacity Planning
Capacity Planning: Building an Ideal
Blackboard Environment
• What is Capacity Planning?
• Capacity Planning Factors
–
–
–
–
–
–
–
–
–
–
Determine an Initial Deployment Architecture.
Handling Adoption and Growth
Archiving Data
Backups and Restoration
Maintenance Windows and Tasks
Integrating with External Systems
Redundancy and Failover
Business Processes
Upgrades
Rolling out New Features
• Capacity Planning Tools
Capacity Planning Factors: Determine
an Initial Deployment Architecture
• It’s Never Too Late to Consider or Reconsider
Your Deployment Architecture.
• Try to Understand Key Components
– Eventual Audience Rollout
– User Behavior
• Session Patterns
• Frequency
• Concurrency
– Data Management Strategy
– Resource Needs
• Processing
• Storage
Capacity Planning Factors: Handling
Adoption and Growth
• Work with Functional Leaders to Understand
Deployment Strategy
– Adoption Patterns of Users and Features
• Study Growth
– Not just users and courses, but data and content.
– Instrument daily, weekly, monthly, yearly, etc.
• Study the Activity Patterns of your Users
(Behavior Modeling)
– Session Times
– Where they go and what they do…
Capacity Planning Factors: Archiving
Data
• A Lot of Data Can be Viewed as Disposable
to Many and Priceless by Few
• Define a Strategy Early On About Archiving
Data.
– Enable Tracking and Study Last Modified
– Use BB Tools to Archive and Export
– Remove from the System
• Maintain Activity Accumulator Data
– Export
– Purge Regularly
Capacity Planning Factors: Backups
and Restoration
• Database Backups
– Differential versus Full
•
•
•
•
•
Depends on Size, Confidence in Process and Usage
Plan for the Unexpected
Restore on Development Environments Routinely
Store in a Safe Place
Practice During Maintenance Windows
• File System Backups
– Perform Regularly
– Just as Valuable as database back-ups
– Not just data, but configuration
Capacity Planning Factors:
Maintenance Windows and Tasks
• Keep Your Users Informed
– Downtime/Outages
– Periods where Performance Can be Affected
• Schedule Regularly
– Log Rotations
– Server Restarts
– Database Statistics, Index Rebuilt and Extent
Management
– Data Fragmentation
– Archiving and Purging Data
– Service Packs and Upgrades (discussed later)
Capacity Planning Factors: Integrating
with External Systems
• Understand the integration
– What data is affected
• Inbound versus Outbound
– Frequency of Integration
• Real-time versus Batched/Scheduled
• Hopefully not manually intervened
• Performance of both systems should not be
affected based on integration
Capacity Planning Factors: Failover
and Redundancy
• Have a Plan
• Make a Budget
– If no budget, communicate plan and downtime
• Practice for the Unexpected
• Be Realistic
• Built-In Capabilities for Redundancy and Failover
– Blackboard Load-Balancing
– SQL-Server Clustering and Oracle RAC
• Quality of Service Models
– Tomcat Clusters
Capacity Planning Factors: Business
Processes
• Define Schedule with Functional and Technical
Leaders
– Schedule for an extended period of time
– Map out window based on need and usage
– Model and Prototype
• Make Sure the Window is Large Enough
• Business processes should make sense and be
realistic
• Schedule During Periods of Low Usage and NonPeak Times
• Make it Repeatable, Automated and Easy to Debug
Capacity Planning Factors: Planning
for Upgrades
• Updating Versions of Blackboard
– Take Advantage of New Features
– Functional Patches
– Performance Same or Optimized
• Performance Requirement for Every Development Release
• Updating Platform Technology
– Platform Patches
– Operating System Upgrades
• Plan for Downtime (Data Restoration)
• Updating Hardware Architecture
– Plan for Downtime (Data Restoration)
– Take Advantage of Faster, Cheaper Equipment
Capacity Planning Factors: Rolling Out
New Features
• Understand How New Features Change the
Following:
–
–
–
–
–
–
Customer/User Behavior
Adoption
Growth
Resource Utilization
Integration Patterns
Business Process Changes
Capacity Planning Tools
• Behavior Modeling
–
–
–
–
What is it?
What tools can you use?
Valid Instrumentation Periods.
What to look for and to learn from the data.
• Homegrown Tools (What to Mine)
–
–
–
–
–
Last Modified
Growth Changes
Adoption Patterns
Concurrency Patterns
Business Processes (Run Times)
Behavior Modeling
Capacity Planning Resources
• Modeling
– SPEED
– IBM Rational
– Simul8
– Opnet
– NetIq (WebTrends)
– Many Freeware Products on SourceForge
• Resources
– Performance by Design : Computer Capacity
Planning By Example; Menasce, Daniel
Introduction to Performance
Management
Measuring Performance
• What to Focus On
–
–
–
–
–
–
–
Response Time
Processing Time
Storage/Growth (volumetric patterns)
Workload (Processing and Memory)
Network Utilization/Bandwidth
Adoption/Behavior
New Features and Deployments
• Plot, Measure and Model
– Distinct Sessions
– Physical Resource Utilization (Workload)
– Logical Resource Utilization
Measuring Performance
x
Users
Slope of
Recovery
Peak of
Saturation
Point of Max
Workload
Workload
s
∑ / Time =
Peak of
Concurrency
i=0
Slope of
Abandonment
0
Sessions
Per Hour
Time
60
Quality of Service Paradigm
• A web application’s quality of service is measured by
response time, throughput and availability.
• Poor quality of service leads to abandonment, decline
in adoption and potentially permanently lost users.
• QoS is key to assessing how well Web-based
applications meet user expectations on two primary
measures: availability and response time.
Quality of Service: All for One and
One for All Architecture…
• What exactly does this mean?
– In today’s architecture no system, sub-system, use case,
transaction, data element, etc. has a greater utility value
then its neighbor component in the system.
• Is this an accurate representation of the product?
– In Blackboard, all things are not created equally or weighted
equally in value as deemed by our users.
– However, our architecture is such that all things are created
and weighted equally.
• Why is this bad?
– The QoS of the application becomes unpredictable.
– No guarantees can be made for capacity planning and
utilization.
– Clients rarely have the comfort level that their application
environment is ever stable other then periods of light usage.
Quality of Service: All Things are Not Equal,
So Let’s Not Treat them Equal…
• From a psychological perspective, it’s easy to predict
which systems have greater QoS needs then others.
– Taking an assessment has a greater utility then reading an
announcement.
– Entering gradebook scores has a greater utility then adding
a course document or folder.
• From a workload perspective, it’s easy to
conceptualize which systems demand greater QoS
needs then others.
– A lab of 20 students taking an assessment has a greater
workload on the system then a lab of 20 students reading a
course document.
– A virtual workshop of 20 users collaborating has a greater
workload then 20 students navigating through a course.
Quality of Service: Where Can We
Go With This…
• Resource management policies and procedures can
be implemented to support the workload needs of the
system.
– Sub-system or potentially task workload monitoring.
– Administrator defined thresholds for application
management.
• Seasonal deployment changes based on
patterns/trends of usage or even predefined
scheduling by course administrators.
• Better utilization of capital expenditures.
– Potentially more expensive with greater adoption.
– Quantifiably reliable.
Quality of Service: Example
Adaptive Content
Workload
General Workload
Content Collection
Workload
Assessment Workload
Distributed Workload
Application Cluster
Application Server
Apache / IIS
File Server
Collaboration
Server
Web / Application
Server
Anti-Pattern
Bottleneck
Sub-System
1 Tomcat
Sub-System
1 Tomcat
Web / Application
Server
Modperl / PerlEx
Sub-System
2 Tomcat
Sub-System
2 Tomcat
General
Tomcat
General
Tomcat
Sub-System
3 Tomcat
Sub-System
3 Tomcat
General
Tomcat
General
Tomcat
Interface
Bottleneck
Database
JDBC
DBI
Dealing with Performance and
Capacity Issues
Dealing with Performance Issues
Solving a performance issue is no different then solving
a functional issue. The same level of care and effort in
solving the issue should be given. We recommend the
following three steps as the appropriate path for
problem determination and resolution:
• Decompose the Problem
• Resolve the Issue
• Follow Up and Prevent
Dealing with Performance and
Capacity Issues
•
•
Most clients fail to report performance issues. The bulk users of the
system (students) rarely report issues.
Most Issues are reported when
–
–
–
–
•
Administrators experience performance issues first hand for their own tasks.
Instructors are performing course administration activities.
Instructors are working on the product in a classroom environment.
Administrators pick up student chatter in BLOGS and Discussion Boards.
What does that mean?
– Identifying the actual performance bottleneck is hard and requires a well
formulated approach.
• Primarily performance issues are the result of:
–
–
–
–
–
Poor System Management in Dealing with Growth
Changes in Adoption Patterns (Concurrency Thresholds)
Functional Issues in the Application
Undersized Hardware and Resources
User Error (Unrealistic Operations)
Characteristics of a Good
Problem Resolution Methodology
•
•
•
•
•
•
•
•
Measurable
Reliable
Deterministic
Practical
Finite
Predictive
Efficient
Impact Aware
Performance Resolution Methods
• Trial and Error Method
• Response Time Method
• Do Nothing and Ignore Method
– Blame the Users Sub-Method
– Blame the Hardware Sub-Method
– Blame the Vendor Sub-Method
Trial and Error Method
• Identify that a particular operation X has an
unacceptable response time.
• Make changes with the intent of improving X.
• Remove any changes that make X worst.
• If improvement is not perceived, go back and
make additional changes.
• If the improvement is minor, then go back and
make more changes as it is possible to
produce more improvements with additional
changes.
Response Time Method
• Select the critical operations for which the
business needs improved performance.
• Collect proper diagnostic data during periods
of poor performance with a focus on:
– Response Time Consumption
• Execute the optimization activity that will have
the greatest net payoff to the business.
• If the best payoff activity fails to yield desired
results, then suspend optimization activities
until something changes:
Example #1
Scenario: Butch (Student) logs into Blackboard to access music
files he stores in Content Collection. He selects the appropriate tab
and waits for the left navigation frame to completely load. He ends
up waiting for 2 minutes until the tree fully loads. Angered by
repeated incidents of this he sends a furious email to the system
administrators complaining about his “lost time” waiting for the tree
to load.
Question: How do we address this problem appropriately?
Example #2
Scenario: The accounting department has decided to utilize the
Blackboard assessment engine for high-stakes testing during
semester mid-terms. The department has issued a 1000 question
random block assessment, in which students will be responsible for
answering 25 questions in an all-at-once deployment fashion. The
department wants all 500 students to complete testing during a 2
hour window over the course of a week.
The last time the department used Blackboard for high-stakes
assessment, students complained about page load times and a few
incidents in which students were kicked out of the application
resulting in a locked assessment.
Question: The department has approached your help. How do
you avoid a repeated issue?
Example #3
Scenario: An integration between the campus SCT system and Blackboard
must take place to ensure students and faculty exist in the system and with
the appropriate course enrollment based on recent course registration. The
integration must take place prior to the beginning of the semester. The
same integration took place last semester, but was deemed a failure by the
faculty as it took over a week for all courses, faculty and students to be
entered and associated on the system.
You were/are the administrator in charge of the integration. Part of the
problem was that your data feeds from SCT were unorganized. Another
problem is that you ran into a large number of system-level issues that
caused your integrations to fail.
Question: How do you reduce the risk and ensure successful
integration?
Example #4:
Scenario: You have procured budgetary funding to replace the
older Blackboard servers and storage device for newer hardware.
This new hardware is expected to solve all of your performance
problems. The new servers will arrive in late May, which will give
you 45 days to configure and convert your Blackboard environment
before the bulk of your students get back on the system. You have
been told by your boss that the system can only be down for 48
hours, as the summer school still uses Blackboard.
Question: How do you ensure a smooth conversion with minimal
downtime? What can you do in advance? How would you spend
your 48 hours of downtime?
Example #5:
Scenario: Suzie (Blackboard Administrator) has been contacted by her boss about a
change in the school’s Blackboard licensing. The school had been using a
Blackboard Learning System™ - Basic license for the past two years. They have
upgraded to the Blackboard Learning System and purchased the Blackboard
Community System™ and Blackboard Content System™ in order to support a new
distance learning initiative. Her boss tells Suzie that she is responsible for the
following:
• Purchasing of hardware and storage to support new products.
• Software Upgrade from Blackboard Learning System – Basic Edition to Blackboard
Learning System
• Installation and Configuration of the new implementation.
The new software components are expected to change the way Blackboard has
traditionally been used at the school. There will be lot’s more data, and will cater to a
community 10X the size of the present implementation.
Question: What can Suzie do in order to prepare for the change in features,
adoption and growth?
Performance Resources
• Measurement
–
–
–
–
–
Windows Tool Kit, Top, Sar, VMStat, Prstat
JProbe, OptimizeIt, HPJmeter, JMPI/Thread Dumps
Hotsos, Statspack, TKProf, Enterprise Manager, Query Analyzer
Performasure, Spotlight, Patrol, Unicenter
Apache Server-Status, JVMStat, VerboseGC
• Resources
– http://support.microsoft.com/kb/224587
– http://www.sql-serverperformance.com/jc_sql_server_quantative_analysis1.asp
– http://www.javaperformancetuning.com
– http://www.oraperf.com
– http://www.ixora.com.au
– http://www.hotsos.com
– http://perl.apache.org/docs/1.0/guide/performance.html
Introduction to Load Testing
Introduction to Load Testing
Load Testing is the process of…
• Simulating synthetic workload on a software
application.
• Identifying where bottlenecks exist:
– Software Layer
– Hardware and/or Interface Layer
• Determining software and system capacity
capabilities under a given workload.
• Attempting to meet or exceed a predefined
performance objectives.
• Representing conditional patterns of application
usage.
Introduction to Load Testing
• Software load testing requires a significant
investment from an organization both financially and
operationally.
• Most commercially available load testing tools cost
tens of thousands of dollars to purchase and
maintain.
• Organizing and managing a staff focused on using
these specialized tools bears similar expense.
• Organizations must be prepared to deal with the
results of the load tests.
– Optimizing Software (Refactoring)
– Identifying Accurate Sizing and Capacity Configurations
Components of Load Testing
Reusable autonomous actions in the application (Create, Read, Delete,
Update and Execute)
 Isolated verification points
 Incorporation of abandonment (patience rating)

Library of
Test Assets
Volumetrics and
Usage Analysis
Scenario Definition






Simulation of realistic scenarios based on actual usage (artifacts)
Focus on sessions per hour rather then solely on concurrency
Session Outcomes: Abandon, Abort, Continue or Idle.
Define user patience rating (Will users abandon if the transaction
or site are slow)
 Incorporate as a means of preserving realistic/expected usage patterns.

Abandonment
Capture statistical overview of current implementations (data models)
Study usage patterns and trends for simulation
Develop performance data models based on findings.
Load Testing as a Part of the
Blackboard SDLC
- High-Watermark Load Testing
- Common Scenario Load Testing
- Conditional Scenario Load Testing
Performance
Testing
- Assess Performance Risk
- Mitigate Performance Risk
- Identify Critical Use Cases for Analysis
Requirement
Development
Design
- Baseline as functionality can be tested
- Profile for inefficient calls/executions
- Identify scalability issues in time to refactor
Develop
- Sizing and Capacity Guidance
Integrated
Testing
Certification
Regression
Testing
- Platform Configurations
- Advanced Configurations
- Review Technical Design Document
- Reference acceptable design patterns
- Warn about unacceptable anti-patterns
- Model/Prototype
Functional
Testing
End to End Performance Integration in the Blackboard Software Development Lifecycle
General
Availability
Load Testing as a Part of the
Blackboard SDLC
•
•
•
Five step process deep rooted in designing for performance before a
feature is developed.
Part of the requirements process by assessing risk, defining
performance requirements and isolating high-impact use cases.
Study artifacts of performance within current implementations:
– Usage Analysis
– Data Collection (Volumetrics within the Data Model)
•
•
•
Isolate software contention by identifying software anti-patterns.
Refactor and optimize the software application layer (business logic
and database structure).
Performance test the software under conditional and common load on
standard/recommended configurations.
– Simulate Abandonment for Calibration Purposes
– Generate enough samples of a given function
– Stay within 2 Sigma (95% response time)
Load Testing Tools and
Resources
• Simulation
– Mercury LoadRunner
– Segue Silk Performer
– Grinder and Apache JMeter
– Open STA
– Rational Test Studio
– Microsoft WCAT and WAST
• Resources
– http://www.keynote.com/downloads/articles/tradesecrets.pdf
(Abandonment)
– http://www128.ibm.com/developerworks/rational/library/4169.html
(Great Starter Article)
– Performance Analysis for Java Websites; Joines, Stacy
Closing Slide
• Innovating Together in ‘05:
– Managing Performance and Capacity is something everyone can
do.
– The more quantifiable something is…the more attainable it can be.
• Resources Available:
– Provided throughout the presentation.
• Follow up Contact(s):
– Steve Feldman, [email protected]
• IF YOU ONLY REMEMBER 1 THING:
– It is never too late to think about performance and capacity.