Robots at MySpace Scaling a .NET Website with Microsoft Robotic Studio Erik Nelson Group Architect / [email protected] Akash Patel Senior Architect / [email protected] Tony Chow.

Download Report

Transcript Robots at MySpace Scaling a .NET Website with Microsoft Robotic Studio Erik Nelson Group Architect / [email protected] Akash Patel Senior Architect / [email protected] Tony Chow.

Robots at MySpace
Scaling a .NET Website with Microsoft Robotic
Studio
Erik Nelson Group Architect / [email protected]
Akash Patel Senior Architect / [email protected]
Tony Chow Development Manager / [email protected]
Core Platform
MySpace.com
•
•
•
•
•
•
•
MySpace is the largest Social Network
… based in Los Angeles
The largest .NET website in the world
Can’t be big without Caching
>6 million requests/second to the middle tier at peak
Many TB of user generated cached data
Data must not be stale
– Users hate that
• New and interesting features require more than just a “cache”
• Our middle tier is called Data Relay, because it does much
more than just cache.
• Data Relay has been in production since 2006!
CCR
•
•
•
•
What is CCR?
Coordination and Concurrency Runtime
Part of the Robotics Toolkit
Provides
– Thread pools (Dispatcher)
– Job queues (DispatcherQueue)
– Flexible ways of connecting actions to those
queues and pools (Ports and Arbiter)
Graphs are Cool
R
e
q
u
e
s
t
s
/
S
e
c
The Stream
• The stream is everywhere
• The stream is extremely volatile data
– Both the “who” and the “what”
• Updates from our Users don’t just go to us
– Twitter, Google, etc
• ~60,000 Stream Queries per Second
– Over 5 billion a day
• 35 million updates a day
• ~5 TB of Data in our Stream
Why not a DB?
• We decided to be “publisher” based and not
“subscriber” based
• For us, that would involve a massively
distributed query
– Hundreds of databases
• Decoupling writing from reading
OK So How Then?
Robots!
Robots?
• Lots of inputs and outputs!
• Need for minimum latency and decoupling
between jobs!
• Just like a robot!
Abusing a Metaphor
• Our robots must
– Incorporate incoming messages
– Tell their neighbors about any messages they
receive
– Be able to answer lots of questions
– Talk to other robots when they need more info
– Deal with other robots being slow or missing
How Does CCR Help?
• Division of labor
– Incorporate incoming messages
– Tell their neighbors about any messages they
receive
– Be able to answer lots of questions
– Talk to other robots when they need more info
– Deal with other robots being slow or missing
How Does CCR Help?
• Queue Control
– We can has Buckets
• Queue Division
– Different destinations have their own queues
• Strict Pool Control
Akash Patel
Senior Architect
Activity Stream
• Activity Stream (News Feed)
– Aggregation of your friends activities
• Activity Stream Generation
– Explicitly: Status Update
– Implicitly: Post New Photo Album
– Auto: 3rd Party App
Friends & Activities
Where’s
Imagine
Your
You
Index
Publisher
post
upload
Friends
grows
this
the
aBased
new
awith
Activity
is
…new
You
status
Cache
new
photo
…Stream?
update
activities
album
.. an
.. Index
indexUpdated
is created
- Activity Associated to Publishing User
Friends & Activities
• Activity Stream Generated by Querying
– Filter & Merge Friend’s Activities
Very Volatile
Stream Architecture
• Utilizes Data Relay Framework
– Message Based System
• Fire & Forget Msgs [Save, Delete, Update]
• RoundTrip Msgs [Get, Query, Execute]
– Replication & Clustering Built-in
• Index Cache
– Not a Key/Value Store
– Storage & Querying System
– 2 Tiered System (separates index from data)
Data Relay Architecture
Data is Partitioned across clusters
N1
N2
C1
Cluster 1
N3
N4
N5
C1
Cluster 2
N6
N7
N8
N9
Cluster 3
Group A
Group B
Data is Replicated within Clusters
Group
Cluster
Node
Stream Architecture
N1
N2
C1
Cluster 1
N3
N4
N5
C1
Cluster 2
Activities Index
N6
N7
N8
Cluster 3
N9
Activity Stream Update
New Activity Msg
N1
N2
Cluster 1
N3
N4
N5
Cluster 2
N6
N7
N8
Cluster 3
N9
CCR Perspective
N1
New Activity Msg
Node 2 Proxy (Destination Node)
Fire & Forget Msg
Port
1
Round Trip Msgs
Port
1
Arbiters
Arbiters
Dispatcher
Queue
Port
2
Thread
Pool
Dispatcher
Queue
Thread
Pool
Activity Stream Request
Client
Activity Stream Request
SubQuery
FriendList1
N1
N2
Cluster 1
Distributed Query - FriendList
SubQuery – FriendList2
N3
N4
N5
Cluster 2
N6
SubQuery
FriendList3
N7
N8
Cluster 3
N9
CCR Perspective
Client
Activity Stream Query
Node 1 Proxy (Destination Node)
Fire & Forget Msg
Port
1
Round Trip Msgs
Port
1
Arbiters
Arbiters
Dispatcher
Queue
Port
2
Thread
Pool
Dispatcher
Queue
Thread
Pool
Activity Stream Request
Query Result
Sub-Query
Result1
N1
N2
Cluster 1
Sub-Query
Result3
Sub-Query
Result2
N3
N4
N5
Cluster 2
N6
N7
N8
Cluster 3
N9
Activity Stream Request
Activity Stream Response
Sub-Query
Result1
N1
N2
Cluster 1
Query Result
Sub-Query
Result3
Sub-Query
Result2
N3
N4
N5
Cluster 2
N6
N7
N8
Cluster 3
N9
Activity Stream Request
Activity Stream Response
Query Result
Activity Index Cache
C1
C1
Activities Data Cache
More Graphs
R
e
q
u
e
s
t
s
/
S
e
c
Stream Requests
Index Gets
Index Cache
• De Facto Distributed Querying Platform
– Sort, Merge, Filter
• Ubiquitous when Key/Value Store is not
enough
– Activity Stream
– Videos
– Music
– MySpace Developer Platform
Robots Processing Your
Every Move!
• CCR constructs in every NodeProxy
–
–
–
–
Ports
Arbiters
Dispatcher Queues
Dispatchers (Shared)
• Messages Batching
– Arbiter.Choice
• Arbiter.MultipleItemReceive
• Arbiter.Receive from the TimeoutPort
• Threadpool Flexibilty
– Number of pools
– Flexibility to set & change pool size dynamically*
Activity Stream
• Activities are everywhere
Twitter
MySpace
Google
Tony Chow
Development Manager
Google Searchable Stream
Real-Time Stream
• Pushes user activities out to subscribers using
the PubSubHubbub standard
• Anyone can subscribe to the Real-Time
Stream, free of charge
• Launched in December 2009
• Major subscribers: Google, Groovy, OneRiot
• ~100 million messages delivered per day
What doesn’t work
Users
Front End
Google
Groovy
OneRiot
Slow.com
The Challenges
•
•
•
•
Protect the user experience
Constant stream to healthy subscribers
Give all subscribers a fair chance at trying
Prevent unhealthy subscribers from doing
damage
Streaming Architecture
Users
Subscribers
Front End
delivery
activities
<atom>
Filter
Transaction
Manager
Policing the Stream
Google
Groovy
Slow.com
iLike
•
•
•
•
Queue
Partition
Throttle
Async I/O
Policing the Stream
• So far so good—for occasionally slow
subscribers
• But chronically underperforming subscribers
call for more drastic measures
Policing the Stream
• Discard
• Unsubscribe
Google
unsubscribe
Filter
Groovy
Slow.com
iLike
Transaction Manager is
Everywhere @ MySpace!
•
•
•
•
•
•
Generic platform for reliable persistence
Supports SQL, SOAP, REST, and SMTP calls
MySpace Mail
Friend Requests
Status/Mood Update
And much more!
The Role of CCR
• CCR is integral to DataRelay
• CCR Iterator Pattern for Async I/O
Asynchronous I/O
• Synchronous I/O
– Needs lots of threads to do lots of I/O
– Massive context switching
– Doesn’t scale
• Asynchronous I/O
– Efficient use of threads
– Massively scales
– Hard to program, harder to read
– Gnarly and unmaintainable code
The CCR Iterator Pattern
• A better way to do write async code
– C# Iterators—makes enumerators easier
– CCR Iterators—makes async I/O easier
• Makes async code look like sync code
The Diffference
IEnumerable<ITask>
void
Before()
After()
{
cmd1.BeginExecuteNonQuery(
cmd1.BeginExecuteNonQuery(result=>port.Post(1));
yield
result1=>
return Arbiter.Receive(...);
cmd1.EndExecuteNonQuery();
{
cmd1.EndExecuteNonQuery();
cmd2.BeginExecuteNonQuery(result=>port.Post(1));
cmd2.BeginExecuteNonQuery(
yield return Arbiter.Receive(...);
result2=>
cmd2.EndExecuteNonQuery();
{
}
cmd2.EndExecuteNonQuery();
});
});
}
The CCR Iterator Pattern
• Improves readability and maintainability
• Far less bug-prone
• Indispensible for asynchronous programming
What Now?
• We didn’t show any code samples…
• Because we are going to share more than
samples …
WE ARE
OPEN SOURCING!!
Open Source
• http://DataRelay.CodePlex.com
• Lesser GPL License for…
– Data Relay Base
– Our C#/Managed C++ Berkeley DB Wrapper and
Storage Component
– Index Cache System
– Network transport
– Serialization System
What Now?
• Places in our code with CCR
– Bucketed batch
• \Infrastructure\DataRelay\RelayComponent.Forwarding
\Node.cs - ActivateBurstReceive(int count)
– Distributed bulk message handling
• \Infrastructure\DataRelay\RelayComponent.Forwarding
\Forwarder.cs - HandleMessages
– General Message Handling
• \Infrastructure\DataRelay\DataRelay.RelayNode\RelayN
ode.cs
• \Infrastructure\SocketTransport\Server\SocketServer.cs
Evaluate Us!
Please fill out an evaluation for our
presentation!
More evaluations = more better for everyone.
Thank You! Questions?
• Erik Nelson
– [email protected]
• Akash Patel
– [email protected]
• Tony Chow
– [email protected]
• http://DataRelay.CodePlex.com