Robots at MySpace Scaling a .NET Website with Microsoft Robotic Studio Erik Nelson Group Architect / [email protected] Akash Patel Senior Architect / [email protected] Tony Chow.
Download ReportTranscript Robots at MySpace Scaling a .NET Website with Microsoft Robotic Studio Erik Nelson Group Architect / [email protected] Akash Patel Senior Architect / [email protected] Tony Chow.
Robots at MySpace Scaling a .NET Website with Microsoft Robotic Studio Erik Nelson Group Architect / [email protected] Akash Patel Senior Architect / [email protected] Tony Chow Development Manager / [email protected] Core Platform MySpace.com • • • • • • • MySpace is the largest Social Network … based in Los Angeles The largest .NET website in the world Can’t be big without Caching >6 million requests/second to the middle tier at peak Many TB of user generated cached data Data must not be stale – Users hate that • New and interesting features require more than just a “cache” • Our middle tier is called Data Relay, because it does much more than just cache. • Data Relay has been in production since 2006! CCR • • • • What is CCR? Coordination and Concurrency Runtime Part of the Robotics Toolkit Provides – Thread pools (Dispatcher) – Job queues (DispatcherQueue) – Flexible ways of connecting actions to those queues and pools (Ports and Arbiter) Graphs are Cool R e q u e s t s / S e c The Stream • The stream is everywhere • The stream is extremely volatile data – Both the “who” and the “what” • Updates from our Users don’t just go to us – Twitter, Google, etc • ~60,000 Stream Queries per Second – Over 5 billion a day • 35 million updates a day • ~5 TB of Data in our Stream Why not a DB? • We decided to be “publisher” based and not “subscriber” based • For us, that would involve a massively distributed query – Hundreds of databases • Decoupling writing from reading OK So How Then? Robots! Robots? • Lots of inputs and outputs! • Need for minimum latency and decoupling between jobs! • Just like a robot! Abusing a Metaphor • Our robots must – Incorporate incoming messages – Tell their neighbors about any messages they receive – Be able to answer lots of questions – Talk to other robots when they need more info – Deal with other robots being slow or missing How Does CCR Help? • Division of labor – Incorporate incoming messages – Tell their neighbors about any messages they receive – Be able to answer lots of questions – Talk to other robots when they need more info – Deal with other robots being slow or missing How Does CCR Help? • Queue Control – We can has Buckets • Queue Division – Different destinations have their own queues • Strict Pool Control Akash Patel Senior Architect Activity Stream • Activity Stream (News Feed) – Aggregation of your friends activities • Activity Stream Generation – Explicitly: Status Update – Implicitly: Post New Photo Album – Auto: 3rd Party App Friends & Activities Where’s Imagine Your You Index Publisher post upload Friends grows this the aBased new awith Activity is …new You status Cache new photo …Stream? update activities album .. an .. Index indexUpdated is created - Activity Associated to Publishing User Friends & Activities • Activity Stream Generated by Querying – Filter & Merge Friend’s Activities Very Volatile Stream Architecture • Utilizes Data Relay Framework – Message Based System • Fire & Forget Msgs [Save, Delete, Update] • RoundTrip Msgs [Get, Query, Execute] – Replication & Clustering Built-in • Index Cache – Not a Key/Value Store – Storage & Querying System – 2 Tiered System (separates index from data) Data Relay Architecture Data is Partitioned across clusters N1 N2 C1 Cluster 1 N3 N4 N5 C1 Cluster 2 N6 N7 N8 N9 Cluster 3 Group A Group B Data is Replicated within Clusters Group Cluster Node Stream Architecture N1 N2 C1 Cluster 1 N3 N4 N5 C1 Cluster 2 Activities Index N6 N7 N8 Cluster 3 N9 Activity Stream Update New Activity Msg N1 N2 Cluster 1 N3 N4 N5 Cluster 2 N6 N7 N8 Cluster 3 N9 CCR Perspective N1 New Activity Msg Node 2 Proxy (Destination Node) Fire & Forget Msg Port 1 Round Trip Msgs Port 1 Arbiters Arbiters Dispatcher Queue Port 2 Thread Pool Dispatcher Queue Thread Pool Activity Stream Request Client Activity Stream Request SubQuery FriendList1 N1 N2 Cluster 1 Distributed Query - FriendList SubQuery – FriendList2 N3 N4 N5 Cluster 2 N6 SubQuery FriendList3 N7 N8 Cluster 3 N9 CCR Perspective Client Activity Stream Query Node 1 Proxy (Destination Node) Fire & Forget Msg Port 1 Round Trip Msgs Port 1 Arbiters Arbiters Dispatcher Queue Port 2 Thread Pool Dispatcher Queue Thread Pool Activity Stream Request Query Result Sub-Query Result1 N1 N2 Cluster 1 Sub-Query Result3 Sub-Query Result2 N3 N4 N5 Cluster 2 N6 N7 N8 Cluster 3 N9 Activity Stream Request Activity Stream Response Sub-Query Result1 N1 N2 Cluster 1 Query Result Sub-Query Result3 Sub-Query Result2 N3 N4 N5 Cluster 2 N6 N7 N8 Cluster 3 N9 Activity Stream Request Activity Stream Response Query Result Activity Index Cache C1 C1 Activities Data Cache More Graphs R e q u e s t s / S e c Stream Requests Index Gets Index Cache • De Facto Distributed Querying Platform – Sort, Merge, Filter • Ubiquitous when Key/Value Store is not enough – Activity Stream – Videos – Music – MySpace Developer Platform Robots Processing Your Every Move! • CCR constructs in every NodeProxy – – – – Ports Arbiters Dispatcher Queues Dispatchers (Shared) • Messages Batching – Arbiter.Choice • Arbiter.MultipleItemReceive • Arbiter.Receive from the TimeoutPort • Threadpool Flexibilty – Number of pools – Flexibility to set & change pool size dynamically* Activity Stream • Activities are everywhere Twitter MySpace Google Tony Chow Development Manager Google Searchable Stream Real-Time Stream • Pushes user activities out to subscribers using the PubSubHubbub standard • Anyone can subscribe to the Real-Time Stream, free of charge • Launched in December 2009 • Major subscribers: Google, Groovy, OneRiot • ~100 million messages delivered per day What doesn’t work Users Front End Google Groovy OneRiot Slow.com The Challenges • • • • Protect the user experience Constant stream to healthy subscribers Give all subscribers a fair chance at trying Prevent unhealthy subscribers from doing damage Streaming Architecture Users Subscribers Front End delivery activities <atom> Filter Transaction Manager Policing the Stream Google Groovy Slow.com iLike • • • • Queue Partition Throttle Async I/O Policing the Stream • So far so good—for occasionally slow subscribers • But chronically underperforming subscribers call for more drastic measures Policing the Stream • Discard • Unsubscribe Google unsubscribe Filter Groovy Slow.com iLike Transaction Manager is Everywhere @ MySpace! • • • • • • Generic platform for reliable persistence Supports SQL, SOAP, REST, and SMTP calls MySpace Mail Friend Requests Status/Mood Update And much more! The Role of CCR • CCR is integral to DataRelay • CCR Iterator Pattern for Async I/O Asynchronous I/O • Synchronous I/O – Needs lots of threads to do lots of I/O – Massive context switching – Doesn’t scale • Asynchronous I/O – Efficient use of threads – Massively scales – Hard to program, harder to read – Gnarly and unmaintainable code The CCR Iterator Pattern • A better way to do write async code – C# Iterators—makes enumerators easier – CCR Iterators—makes async I/O easier • Makes async code look like sync code The Diffference IEnumerable<ITask> void Before() After() { cmd1.BeginExecuteNonQuery( cmd1.BeginExecuteNonQuery(result=>port.Post(1)); yield result1=> return Arbiter.Receive(...); cmd1.EndExecuteNonQuery(); { cmd1.EndExecuteNonQuery(); cmd2.BeginExecuteNonQuery(result=>port.Post(1)); cmd2.BeginExecuteNonQuery( yield return Arbiter.Receive(...); result2=> cmd2.EndExecuteNonQuery(); { } cmd2.EndExecuteNonQuery(); }); }); } The CCR Iterator Pattern • Improves readability and maintainability • Far less bug-prone • Indispensible for asynchronous programming What Now? • We didn’t show any code samples… • Because we are going to share more than samples … WE ARE OPEN SOURCING!! Open Source • http://DataRelay.CodePlex.com • Lesser GPL License for… – Data Relay Base – Our C#/Managed C++ Berkeley DB Wrapper and Storage Component – Index Cache System – Network transport – Serialization System What Now? • Places in our code with CCR – Bucketed batch • \Infrastructure\DataRelay\RelayComponent.Forwarding \Node.cs - ActivateBurstReceive(int count) – Distributed bulk message handling • \Infrastructure\DataRelay\RelayComponent.Forwarding \Forwarder.cs - HandleMessages – General Message Handling • \Infrastructure\DataRelay\DataRelay.RelayNode\RelayN ode.cs • \Infrastructure\SocketTransport\Server\SocketServer.cs Evaluate Us! Please fill out an evaluation for our presentation! More evaluations = more better for everyone. Thank You! Questions? • Erik Nelson – [email protected] • Akash Patel – [email protected] • Tony Chow – [email protected] • http://DataRelay.CodePlex.com