A step towards data-orientation

Download Report

Transcript A step towards data-orientation

JOHAN TORP
STHLM GAME DEVELOPER FORUM 5/5 2011
<[email protected]>
› M.Sc. Computer Science. OO (Java) and functional programming (Haskell)
› Worked ~5 years outside game industry. C++, generic programming & boost, DbC
› AI coder at DICE ~2½ years
Optimal game dev credentials? 
•OOP
•GP
•FP
•DbC
•TMP
PC & normal sized apps = cache schmache
Games on consoles
5000 L2 misses = ~1ms
- data-oriented design ftw!
› A lot of OO code and knowledge out there
› Incrementally moving from OO to cache-friendlier code
›
›
›
›
Facts needed before looking at code
Cache-friendly pathfinding
Async vs sync code
Questions
›
›
›
›
›
Visual domain specific scripting language
Gameplay / AI code in C++
NavPower pathfinding middleware
EASTL containers
We love to blow up parts of our game worlds – and call this destruction
› PS3 has 1 core:
›
› 360 has 3 cores:
›
32KB data and 32KB instruction L1 cache
512KB L2 I+D cache
32KB data and 32KB instruction L1 cache for each core,
1Mb L2 I+D shared by all cores
› 1 L1 cache miss ~= 40 cycles
“You miss L1 so much that you cry yourself to sleep every night
with a picture of it under your pillow” @okonomiyonda
› 1 L2 cache miss ~= 600 cycles
› 1 L2 cache miss ~= 20 matrix multiplications
› Other than heavy calculations: CPU performance ~= cache misses
Keep copy of common data nearby … in a compact representation
Pointer chasing thrashes both I-cache and D-cache
getBot()->getPlayer()->getControllable()->getWorldPosition()
Often better to copy frequently accessed data once each frame, access copy instead
/// Temporary struct containing information about a single sensor.
/// Never stored between updates.
struct VisionInfo
{
VisionInfo(const AiSettings& settings, EntryComponent& owner, ...);
Vec3 eyePos;
Vec2 eyeForwardXz;
uint playerId;
// Extracted from settings
float centralAngle;
float peripheralAngle;
float seeingDistance;
bool seeThroughTerrain;
};
› Temporary data structures common in Data-Oriented Design
› Stack variables or alloca()
› Not suited for large amounts or large edge cases
› Put aside 8x128kb blocks for ”scratch pad calculations”
› Linear allocator – doesn’t free within block
› Return whole block when done – zero fragmentation
Find a good slot in fragmented memory space Expensive!
Container of new:ed objects scattered in memory Poor cache locality!
Mix short/long lived allocations -> fragmention Lose memory over time!
You should prefer pre-allocated flat vectors and try to minimize new/malloc
AI DECISION MAKING
OO
PATHFINDING
ANIMATION
NAVPOWER
OO
OO
›
›
›
›
›
›
Find path
Load / unload nav mesh section
Add / remove obstacles
Path invalidation detection
Can go-tests
Line- / can go straight-tests, circle tests, triangle tests
›
›
›
›
›
›
Find path
Load / unload nav mesh section
Add / remove obstacles
Path invalidation detection
Can go-tests
Line- / can go straight-tests, circle tests, triangle tests
Collect and batch process for good cache locality
› Pathfinder - find path, path invalidation, circle/line tests
› Random position generator - can go-tests
› Manager - load nav mesh, obstacles, destruction, updates
Let some line tests in AI decision making remain synchronous
class Pathfinder
{
virtual PathHandle* findPath(const PathfindingPosition& start,
const PathfindingPosition& end,
float corridorRadius,
PathHandle::StateListener* listener) = 0;
virtual void releasePath(PathHandle* path) = 0;
virtual bool canGoStraight(Vec3Ref start, Vec3Ref end,
Vec3* collision = nullptr) const = 0;
};
typedef eastl::fixed_vector<Vec3, 8> WaypointVector;
typedef eastl::fixed_vector<float, 8> WaypointRadiusVector;
struct PathHandle {
enum State {ComputingPath, ValidPath, NoPathAvailable, RepathingRequired};
class StateListener {
virtual void onStateChanged(PathHandle* handle) = 0;
};
PathHandle():waypoints(pathfindingArena()), radii(pathfindingArena()) {}
WaypointVector waypoints;
WaypointRadiusVector radii;
State state;
};
typedef eastl::fixed_vector<Vec3, 8> WaypointVector;
typedef eastl::fixed_vector<float, 8> WaypointRadiusVector;
struct PathHandle {
enum State {ComputingPath, ValidPath, NoPathAvailable, RepathingRequired};
class StateListener {
virtual void onStateChanged(PathHandle* handle) = 0;
};
PathHandle():waypoints(pathfindingArena()), radii(pathfindingArena()) {}
WaypointVector waypoints;
WaypointRadiusVector radii;
State state;
};
›
›
›
›
›
›
›
›
›
›
class NavPowerPathfinder : public Pathfinder {
public:
virtual PathHandle* findPath(...) override;
virtual PathHandle* findPathFromDestination(...) override;
virtual void releasePath(...) override;
virtual bool canGoStraight(...) const override;
void updatePaths();
void notifyPathListeners();
private:
bfx::PolylinePathRCPtr m_paths[MaxPaths];
PathHandle m_pathHandles[MaxPaths];
PathHandle::StateListener* m_pathHandleListeners[MaxPaths];
u64 m_usedPaths, m_updatedPaths, m_updatedValidPaths;
};
1.
2.
3.
4.
Copy all new NavPower paths -> temporary representation
Drop unnecessary points for all paths
Corridor adjust all paths
Copy temporaries -> PathHandles
1.
2.
3.
4.
Copy all new NavPower paths -> temporary representation
Drop unnecessary points for all paths
Corridor adjust all paths
Copy temporaries -> PathHandles
typedef eastl::vector<CorridorNode> Corridor;
ScratchPadArena scratch;
Corridor corridor(scratch);
corridor.resize(navPowerPath.size()); // Will allocate memory using scratch pad
for (...) { // Loop through all paths in their corridor representation
dropUnnecessaryPoints(it->corridor, scratchPad);
for (...)
shrinkEndPoints(it->corridor);
for (...)
calculateCornerDisplacements(it->corridor);
for (...)
displaceCorners(it->corridor);
for (...)
shrinkSections(it->corridor);
for (...)
copyCorridorToHandle(it->corridor, it->pathHandle);
}
void NavPowerManager::update(float frameTime)
{
m_streamingManager.update();
m_destructionManager.update();
m_obstacleManager.update();
for (PositionGeneratorVector::const_iterator it= ...)
(**it).update();
bfx::SystemSimulate(frameTime);
for (PathfinderVector::const_iterator it=m_pathfinders.begin(), ...)
(**it).updatePaths();
for (PathfinderVector::const_iterator it=m_pathfinders.begin(), ...)
(**it).notifyPathListeners();
}
AI Decision Making Code
Pathfinding Runtime Code
NavPower Code
Animation Code
EXECUTION
AI Decision Making Code
Pathfinding Runtime Code
NavPower Code
Animation Code
EXECUTION
›
›
›
›
Keep pathfinding code/data cache hot
Avoid call sites cache running cold
Easier to jobify / SPUify
Easy to schedule and avoid spikes
AI DECISION MAKING
OO
PATHFINDING
ANIMATION
NAVPOWER
OO
OO
SCRIPTING
SERVER
CLIENT
AI DECISION MAKING
PATH FOLLOWING
PATHFINDING
NAVPOWER
DRIVING
VEHICLE INPUT
LOCOMOTION
LOCOMOTION
Waypoint Positions
Waypoint Data
Corridor Radii
ANIMATION
Each server update
1. Each AI decision making
2. Pathfinding manager update
All pathfinding requests
All corridor adjustments
All PathHandle notifications -> path following -> server locomotion
3. Network pulse. Server locomotion -> client locomotion
4. ...rest of update
No extra latency added
›
›
›
›
Callbacks. Delay? Fire in batch?
Handle+poll instead of callbacks. Poll in batch?
Record messages/events, act on them later.. in batch?
Assume success, recover from failure next update
+ Cache friendly & parallelizable
+ Easy to profile & schedule
+ Avoid bugs with long synchronous callback chains
+ Modular
- More glue code
managers, handles, polling update calls, multiple representations of the same data
- More bugs
index fiddling, life time handling, latency, representations drifting out of sync
- Callstack won’t tell you everything
break point in sync code gives easy-to-debug vertical slice...
...but can we afford vertical deep dives?
› Do not have to abandon OO nor rewrite the world
› Start small, batch a bit, cut worst pointer chasing, avoid deep dives, grow from there
› Much easer to rewrite a system in a DO fashion afterwards
Existing code is crystallized knowledge, refactor incrementally to learn!
›Background: Console caches, heap allocations expensive, temporary memory
›AI decision making – pathfinding – animation
›Code: Async abstractions, handles, scratch pad, fixed_vector, batch processing
›Latency analysis, pros&cons sync vs async
Think about depth/width of calls, try stay within your system, keep hot data nearby
Avoid rewritis You can retract your synchronous tentacles slowly 
email [email protected]
twitter semanticspeed
slides www.johantorp.com