Automated Testing: Better, Cheaper, Faster, For Everything
Download
Report
Transcript Automated Testing: Better, Cheaper, Faster, For Everything
GDC Tutorial, 2005.
Building Multi-Player Games
Case Study: The Sims Online
Lessons Learned,
Larry Mellon
TSO: Overview
Initial team: little to no MMP experience
Engineering estimate: switching from 4-8 player
peer to peer to MMP client/server would take no
additional development time!
No code / architecture / tool support for
Long-term, continually changing nature of game
Non-deterministic execution, dual platform (win32 /
Linux)
Overall process designed for single-player
complexity, small development team
Limited nightly builds, minimal daily testing
Limited design reviews, limited scalability testing,
no “maintainable/extensible” impl. requirement
TSO: Case Study Outline
(Lessons Learned)
Poorly designed SP MP MMP transitions
Scaling
Team & code size, data set size
Build & distribution
Architecture: logical & code
Visibility: development & operations
Testability: development, release, load
Multi-Player, Non-determinism
Persistent user data vs code/content updates
Patching / new content / custom content
Scalability
(Team Size & Code Size)
What were the problems
Side effect breaks & ability to work in parallel
Independent module design & impact on overall system
(initially, no system architect)
#include structure
Limited encapsulation + poor testability + non-determinism =
TROUBLE
win32 / Linux, compile times, pre-compiled headers, ...
What worked
Move to new architecture via Refactoring & Scaffolding
HSB, incSync, nullView Simulator, nullView client, …
Rolling integrations: never dark
Sandboxing & pumpkins
Scalability
(Build & Distribution)
To developers, customers & fielded servers
What didn’t work (well enough)
Pulling builds from developer’s workstations
Shell scripts & manual publication
What worked well
Heavy automation with web tracking
Repeatability, Speed, Visibility
Hierarchies of promotion & test
Scalability
(Architecture)
Logical versus physical versus code structure
Logical: Replicated computing vs client / server
Only physical was not a major, MAJOR issue
Security & stability implications
Code: Client / server isolation & code sharing
Multiple, concurrent logic threads were sharing code&data,
each impacting the others
Nullview client & simulator
Regulators vs Protocols: bug counts & state machines
Go to final architecture ASAP
Multiplayer:
Client/Server:
Client
Sim
Here be
Sync
Hell
Client
Sim
Sim
Evolve
Client
Sim
Client
Sim
Nice
Undemocratic
Request/
Command
Client
Client
Client
Final Architecture ASAP:
Make Everything Smaller&Separate
Evolve
Final Architecture ASAP:
Reduce Complexity of Branches
Shared Code
Packet
Arrival
More Packets!!
If (client)
If (server)
#ifdef (nullview)
Shared
State
Client & server teams
would constantly break
each other via changes
to shared state&code
Client Event
Server Event
Final Architecture ASAP:
“Refactoring”
Decomposed into Multiple dll’s
Found the Simulator
Interfaces
Reference Counting
Client/Server subclassing
How it helped:
–Reduced coupling. Even reduced compile times!
–Developers in different modules broke each other less often.
–We went everywhere and learned the code base.
Final Architecture ASAP:
It Had to Always Run
Initially clients wouldn’t behave predictably
We could not even play test
Game design was demoralized
We needed a bridge, now!
?
?
Final Architecture ASAP:
Incremental Sync
A quick temporary solution…
Couldn’t wait for final system to be finished
High overhead, couldn’t ship it
We took partial state snapshots on the server
and restored to them on the client
How it helped:
–Could finally see the game as it would be.
–Allowed parallel game design and coding
–Bought time to lay in the “right” stuff.
Architecture:
Conclusions
Keep it simple, stupid!
Keep it clean
DLL/module integration points
#ifdef’s must die!
Keep it alive
Client/server
Plan for a constant system architect role: review all
modules for impact on team, other modules & extensibility
Expose & control all inter-process communication
See Regulators: state machines that control transactions
TSO: Case Study Outline
(Lessons Learned)
Poorly designed SP MP MMP transitions
Scaling
Team & code size, data set size
Build & distribution
Architecture: logical & code
Visibility: development & operations
Testability: development, release, load
Multi-Player, Non-determinism
Persistent user data vs code/content updates
Patching / new content / custom content
Visibility
Problems
Debugging a client/server issue was very slow & painful
Knowing what to work on next was largely guesswork
Reproducing system failures from live environment
Knowing how one build or server cluster differed from
another was again largely guesswork
What we did that worked
Log / crash aggregators & filters
Live “critical event” monitor
Esper: live player & engine metrics
Repeatable load testing
Web-based Dashboard: health, status, where is everything
Fully automated build & publish procedures
Visibility via “Bread Crumbs”:
Aggregated Instrumentation Flags
Trouble Spots
Server Crash
Quickly Find Trouble Spots
DB byte count
oscillates out of
control, server
crashes
Drill Down For Details
A single DB
Request is
clearly at fault
TSO: Case Study Outline
(Lessons Learned)
Poorly designed SP MP MMP transitions
Scaling
Team & code size, data set size
Build & distribution
Architecture: logical & code
Visibility: development & operations
Testability: development, release, load
Multi-Player, Non-determinism
Persistent user data vs code/content updates
Patching / new content / custom content
Testability
Development, release, load: all show stopper
problems
QA coordination / speed / cost
Repeatablity, non-determinism
Need for many, many tests per day, each with
multiple inputs (two to two thousand players
per test)
Testability: What Worked
Automated testing for repeatablity & scale
Scriptable test clients: mirrored actual user play sessions
Changed the game’s architecture to increase testability
External test harnesses to control 50+ test clients per CPU,
4,000+ per session
Push-button UI to configure, run & analyze tests (developer
& QA)
Constantly updated Baselines, with “Monkey Test” stats
Pre-checkin regression
QA: web-driven state machine to control testers &
collect/publish results
What didn’t work
Event Recorders, unit testing
Manual-only testing
MMP Automated Testing: Approach
Push-button ability to run large-scale, repeatable tests
Cost
Hardware / Software
Human resources
Process changes
Benefit
Accurate, repeatable measurable tests during development
and operations
Stable software, faster, measurable progress
Base key decisions on fact, not opinion
Why Spend The Time & Money?
System complexity, non-determinism, scale
Tests provide hard data in a confusing sea of
possibilities
End users: high Quality of Service bar
Dev team: greater comfort & confidence
Tools augment your team’s ability to do their jobs
Find problems faster
Measure / change / measure: repeat as necessary
Production & executives: come to depend on this
data to a high degree
Scripted Test Clients
Scripts are emulated play sessions: just
like somebody plays the game
Command
steps: what the player does to the
game
Validation steps: what the game should do in
response
Scripts Tailored
To Each Test Application
Unit testing: 1 feature = 1 script
Load testing: Representative play session
The average Joe, times thousands
Shipping quality: corner cases, feature
completeness
Integration: test code changes for catastrophic
failures
Scripted Players: Implementation
Test Client
Game Client
Script Engine
Game GUI
State
State
Commands
Presentation Layer
Client-Side Game Logic
Process Shift:
Earlier Tools Investment Equals More Gain
Amount of
work done
Not Good
Enough
Project
Start
MMP Developer Efficiency
Strong test support
Weak test support
Time
Target
Launch
Process Shifts: Automated Testing
Changes The Shape Of The Development
Progress Curve
Stability (Code Base & Servers)
Keep Developers moving forward, not bailing water
Scale & Feature Completeness
Focus Developers on key, measurable roadblocks
Process Shift: Measurable Targets,
Projected Trend Lines
Target
Complete
Core Functionality
Tests, Any Feature
(e.g. # clients)
Time
First Passing
Test
Now
Any Time
(e.g. Alpha)
Actionable progress metrics, early enough to react
Process Shift: Load Testing
(Before Paying Customers Show Up)
Expose issues that only occur at scale
Establish hardware requirements
Establish play is acceptable @ scale
Client-Server Comparison
TSO: Case Study Outline
(Lessons Learned)
Poorly designed SP MP MMP transitions
Scaling
Team & code size, data set size
Build & distribution
Architecture: logical & code
Visibility: development & operations
Testability: development, release, load
Multi-Player, Non-determinism
Persistent user data vs code/content updates
Patching / new content / custom content
User Data
Oops!
Users stored much more data (with much more variance) that
we had planned for
What helped
Caused many DB failures, city failures
BIG problem: their persistent data has to work, always, across all builds
& DB instances
Regression testing, each build, against live set of user data
What would have helped more
Sanity checks against the DB
Range checks against user data
Better code & architecture support for validation of user data
Patching / New Content / Custom
Content
Oops!
Initial Patch budget of 1Meg blown in 1st week of
operations
New Content required stronger, more predictable
process
Custom Content required infrastructure able to easily
add new content, on the fly
Key Issue: all effort had gone into going Live, not
creating a sustainable process once Live
Conclusion: designing these in would have been
much easier than retrofitting…
Lessons Learned
autoTest: Scripted test clients and instrumented code rock!
Collection, aggregation and display of test data is vital in making
decisions on a day to day basis
Lessen the panic
autoBuild: make it pushbutton with instant web visibility
Scale&Break is a very clarifying experience
Stable code&servers greatly ease the pain of building a MMP game
Hard data (not opinion) is both illuminating and calming
Use early, use often to get bugs out before going live
Budget for a strong architect role & a strong design review
process for the entire game lifecycle
Scalability, testability, patching & new content & long-term persistence
are requirements: MUCH cheaper to design in than frantic retrofitting
KISS principle is mandatory, as is expecting changes
Lessons Learned
Visibility: tremendous volumes of data require automated
collection&summarization
Get some people on board who’ve been burned before: a lot of
TSO’s pain could have been easily avoided, but little
distributed system experience & MMP design issues existed in
early phases of project
Fred Brooks, the 31st programmer
Provide drill-down access to details from summary view web pages
Strong tools & process pays off for large teams & long-term operations
Measure & improve your workspace, constantly
Non-determinism is painful & unavoidable
Minimize impact via explicit design support & use strong, constant
calibration to understand it
Biggest Wins
Code Isolation
Scaffolding
Tools: Build / Test / Measure,
Information Management
Pre-Checkin Regression / Load Testing
Biggest Losses
Architecture: Massively peer to peer
Early lack of tools
#ifdef across platform / function
“Critical Path” dependencies
More Details: www.maggotranch.com/MMP (3 TSO Lessons Learned talks)