Automated Testing: Better, Cheaper, Faster, For Everything

Download Report

Transcript Automated Testing: Better, Cheaper, Faster, For Everything

GDC Tutorial, 2005.
Building Multi-Player Games
Case Study: The Sims Online
Lessons Learned,
Larry Mellon
TSO: Overview

Initial team: little to no MMP experience
Engineering estimate: switching from 4-8 player
peer to peer to MMP client/server would take no
additional development time!
 No code / architecture / tool support for

Long-term, continually changing nature of game
 Non-deterministic execution, dual platform (win32 /
Linux)


Overall process designed for single-player
complexity, small development team
Limited nightly builds, minimal daily testing
 Limited design reviews, limited scalability testing,
no “maintainable/extensible” impl. requirement

TSO: Case Study Outline
(Lessons Learned)
Poorly designed SP  MP MMP transitions
Scaling
Team & code size, data set size
Build & distribution
Architecture: logical & code
Visibility: development & operations
Testability: development, release, load
Multi-Player, Non-determinism
Persistent user data vs code/content updates
Patching / new content / custom content
Scalability
(Team Size & Code Size)

What were the problems

Side effect breaks & ability to work in parallel



Independent module design & impact on overall system
(initially, no system architect)
#include structure


Limited encapsulation + poor testability + non-determinism =
TROUBLE
win32 / Linux, compile times, pre-compiled headers, ...
What worked

Move to new architecture via Refactoring & Scaffolding



HSB, incSync, nullView Simulator, nullView client, …
Rolling integrations: never dark
Sandboxing & pumpkins
Scalability
(Build & Distribution)


To developers, customers & fielded servers
What didn’t work (well enough)
Pulling builds from developer’s workstations
 Shell scripts & manual publication


What worked well

Heavy automation with web tracking


Repeatability, Speed, Visibility
Hierarchies of promotion & test
Scalability
(Architecture)

Logical versus physical versus code structure


Logical: Replicated computing vs client / server


Only physical was not a major, MAJOR issue
Security & stability implications
Code: Client / server isolation & code sharing



Multiple, concurrent logic threads were sharing code&data,
each impacting the others
Nullview client & simulator
Regulators vs Protocols: bug counts & state machines
Go to final architecture ASAP
Multiplayer:
Client/Server:
Client
Sim
Here be
Sync
Hell
Client
Sim
Sim
Evolve
Client
Sim
Client
Sim
Nice
Undemocratic
Request/
Command
Client
Client
Client
Final Architecture ASAP:
Make Everything Smaller&Separate
Evolve
Final Architecture ASAP:
Reduce Complexity of Branches
Shared Code
Packet
Arrival
More Packets!!
If (client)
If (server)
#ifdef (nullview)
Shared
State
Client & server teams
would constantly break
each other via changes
to shared state&code
Client Event
Server Event
Final Architecture ASAP:
“Refactoring”

Decomposed into Multiple dll’s




Found the Simulator
Interfaces
Reference Counting
Client/Server subclassing
How it helped:
–Reduced coupling. Even reduced compile times!
–Developers in different modules broke each other less often.
–We went everywhere and learned the code base.
Final Architecture ASAP:
It Had to Always Run

Initially clients wouldn’t behave predictably
We could not even play test
Game design was demoralized

We needed a bridge, now!


?
?
Final Architecture ASAP:
Incremental Sync

A quick temporary solution…
Couldn’t wait for final system to be finished
 High overhead, couldn’t ship it


We took partial state snapshots on the server
and restored to them on the client
How it helped:
–Could finally see the game as it would be.
–Allowed parallel game design and coding
–Bought time to lay in the “right” stuff.
Architecture:
Conclusions

Keep it simple, stupid!


Keep it clean



DLL/module integration points
#ifdef’s must die!
Keep it alive


Client/server
Plan for a constant system architect role: review all
modules for impact on team, other modules & extensibility
Expose & control all inter-process communication

See Regulators: state machines that control transactions
TSO: Case Study Outline
(Lessons Learned)
Poorly designed SP  MP MMP transitions
Scaling
Team & code size, data set size
Build & distribution
Architecture: logical & code
Visibility: development & operations
Testability: development, release, load
Multi-Player, Non-determinism
Persistent user data vs code/content updates
Patching / new content / custom content
Visibility

Problems





Debugging a client/server issue was very slow & painful
Knowing what to work on next was largely guesswork
Reproducing system failures from live environment
Knowing how one build or server cluster differed from
another was again largely guesswork
What we did that worked






Log / crash aggregators & filters
Live “critical event” monitor
Esper: live player & engine metrics
Repeatable load testing
Web-based Dashboard: health, status, where is everything
Fully automated build & publish procedures
Visibility via “Bread Crumbs”:
Aggregated Instrumentation Flags
Trouble Spots
Server Crash
Quickly Find Trouble Spots
DB byte count
oscillates out of
control, server
crashes
Drill Down For Details
A single DB
Request is
clearly at fault
TSO: Case Study Outline
(Lessons Learned)
Poorly designed SP  MP MMP transitions
Scaling
Team & code size, data set size
Build & distribution
Architecture: logical & code
Visibility: development & operations
Testability: development, release, load
Multi-Player, Non-determinism
Persistent user data vs code/content updates
Patching / new content / custom content
Testability




Development, release, load: all show stopper
problems
QA coordination / speed / cost
Repeatablity, non-determinism
Need for many, many tests per day, each with
multiple inputs (two to two thousand players
per test)
Testability: What Worked

Automated testing for repeatablity & scale








Scriptable test clients: mirrored actual user play sessions
Changed the game’s architecture to increase testability
External test harnesses to control 50+ test clients per CPU,
4,000+ per session
Push-button UI to configure, run & analyze tests (developer
& QA)
Constantly updated Baselines, with “Monkey Test” stats
Pre-checkin regression
QA: web-driven state machine to control testers &
collect/publish results
What didn’t work


Event Recorders, unit testing
Manual-only testing
MMP Automated Testing: Approach


Push-button ability to run large-scale, repeatable tests
Cost




Hardware / Software
Human resources
Process changes
Benefit



Accurate, repeatable measurable tests during development
and operations
Stable software, faster, measurable progress
Base key decisions on fact, not opinion
Why Spend The Time & Money?



System complexity, non-determinism, scale
 Tests provide hard data in a confusing sea of
possibilities
End users: high Quality of Service bar
Dev team: greater comfort & confidence

Tools augment your team’s ability to do their jobs
Find problems faster
 Measure / change / measure: repeat as necessary


Production & executives: come to depend on this
data to a high degree
Scripted Test Clients

Scripts are emulated play sessions: just
like somebody plays the game
 Command
steps: what the player does to the
game
 Validation steps: what the game should do in
response
Scripts Tailored
To Each Test Application


Unit testing: 1 feature = 1 script
Load testing: Representative play session



The average Joe, times thousands
Shipping quality: corner cases, feature
completeness
Integration: test code changes for catastrophic
failures
Scripted Players: Implementation
Test Client
Game Client
Script Engine
Game GUI
State
State
Commands
Presentation Layer
Client-Side Game Logic
Process Shift:
Earlier Tools Investment Equals More Gain
Amount of
work done
Not Good
Enough
Project
Start
MMP Developer Efficiency
Strong test support
Weak test support
Time
Target
Launch
Process Shifts: Automated Testing
Changes The Shape Of The Development
Progress Curve
Stability (Code Base & Servers)
Keep Developers moving forward, not bailing water
Scale & Feature Completeness
Focus Developers on key, measurable roadblocks
Process Shift: Measurable Targets,
Projected Trend Lines
Target
Complete
Core Functionality
Tests, Any Feature
(e.g. # clients)
Time
First Passing
Test
Now
Any Time
(e.g. Alpha)
Actionable progress metrics, early enough to react
Process Shift: Load Testing
(Before Paying Customers Show Up)
Expose issues that only occur at scale
Establish hardware requirements
Establish play is acceptable @ scale
Client-Server Comparison
TSO: Case Study Outline
(Lessons Learned)
Poorly designed SP  MP MMP transitions
Scaling
Team & code size, data set size
Build & distribution
Architecture: logical & code
Visibility: development & operations
Testability: development, release, load
Multi-Player, Non-determinism
Persistent user data vs code/content updates
Patching / new content / custom content
User Data


Oops!
Users stored much more data (with much more variance) that
we had planned for



What helped


Caused many DB failures, city failures
BIG problem: their persistent data has to work, always, across all builds
& DB instances
Regression testing, each build, against live set of user data
What would have helped more



Sanity checks against the DB
Range checks against user data
Better code & architecture support for validation of user data
Patching / New Content / Custom
Content






Oops!
Initial Patch budget of 1Meg blown in 1st week of
operations
New Content required stronger, more predictable
process
Custom Content required infrastructure able to easily
add new content, on the fly
Key Issue: all effort had gone into going Live, not
creating a sustainable process once Live
Conclusion: designing these in would have been
much easier than retrofitting…
Lessons Learned

autoTest: Scripted test clients and instrumented code rock!


Collection, aggregation and display of test data is vital in making
decisions on a day to day basis
Lessen the panic




autoBuild: make it pushbutton with instant web visibility


Scale&Break is a very clarifying experience
Stable code&servers greatly ease the pain of building a MMP game
Hard data (not opinion) is both illuminating and calming
Use early, use often to get bugs out before going live
Budget for a strong architect role & a strong design review
process for the entire game lifecycle


Scalability, testability, patching & new content & long-term persistence
are requirements: MUCH cheaper to design in than frantic retrofitting
KISS principle is mandatory, as is expecting changes
Lessons Learned

Visibility: tremendous volumes of data require automated
collection&summarization



Get some people on board who’ve been burned before: a lot of
TSO’s pain could have been easily avoided, but little
distributed system experience & MMP design issues existed in
early phases of project
Fred Brooks, the 31st programmer



Provide drill-down access to details from summary view web pages
Strong tools & process pays off for large teams & long-term operations
Measure & improve your workspace, constantly
Non-determinism is painful & unavoidable

Minimize impact via explicit design support & use strong, constant
calibration to understand it
Biggest Wins
Code Isolation
Scaffolding
Tools: Build / Test / Measure,
Information Management
Pre-Checkin Regression / Load Testing
Biggest Losses
Architecture: Massively peer to peer
Early lack of tools
#ifdef across platform / function
“Critical Path” dependencies
More Details: www.maggotranch.com/MMP (3 TSO Lessons Learned talks)