Adaptive Parallelism for Web Search Myeongjae Jeon Rice University In collaboration with Yuxiong He (MSR), Sameh Elnikety (MSR), Alan L.
Download ReportTranscript Adaptive Parallelism for Web Search Myeongjae Jeon Rice University In collaboration with Yuxiong He (MSR), Sameh Elnikety (MSR), Alan L.
Adaptive Parallelism for Web Search Myeongjae Jeon Rice University In collaboration with Yuxiong He (MSR), Sameh Elnikety (MSR), Alan L. Cox (Rice), and Scott Rixner (Rice) 1 Performance of Web Search 1) Query response time – Answer quickly to users – Reduce both mean and high-percentile latency 2) Response quality (relevance) – Provide highly relevant web pages – Improve with resources and time consumed 2 How Microsoft Bing Works Aggregator Query - All web pages are partitioned across index servers - Distributed query processing (embarrassingly parallel) - Aggregate top K relevant pages Top K pages Top K pages Top K pages Top K pages Top K pages Top K pages Index Server Index Server Index Server Index Server Index Server Index Server Partition Partition Partition Partition Partition All web pages Partition 3 Our Work Query 1 Query 2 Query 3 • Multicore index server – Multiple queries executed concurrently – Each query executed sequentially • Optimization opportunity – Low CPU utilization – Many idle cores • Contributions Index Server – Parallelize a single query – Reduce response times 4 Outline 1. Query parallelism – Run a query with multiple threads 2. Adaptive parallelism – Select degree of parallelism per query 3. Evaluation 5 Query Processing and Early Termination Highest Docs sorted by static rank Doc 1 Doc 2 Doc 3 ……. ……. Lowest Doc N-2 Doc N-1 Doc N Web documents ……. Inverted index “EuroSys” Processing Not evaluated • Processing “Not evaluated” part is useless – Unlikely to contribute to top K relevant results 6 Assigning Data to Threads • Purpose: data processing similar to sequential execution Sequential execution Sorted web documents Highest rank T1 Lowest rank T2 T1 T2 T2 T2 T1 T2 T1 T2 1 2 3 4 5 6 7 8 • Key approach: dynamic alternating on small chunks 7 Share Thread Info Using Global Heap Global heap (global top K) Sync Sync Thread 1 (local top k) Thread 2 (local top k) • Share information to reduce wasted execution • Use a global heap and update asynchronously • Small sync overhead by batched updates 8 Outline 1. Query parallelism – Run a query with multiple threads 2. Adaptive parallelism – Select degree of parallelism per query 3. Evaluation 9 Key Ideas • System load – Parallelize a query at light load – Execute a query sequentially at heavy load • Speedup – Parallelize a query more aggressively if parallelism shows a better speedup 10 No Load Heavy Load Query 1 Query 1 Query 4 Query 2 Query 5 Query 3 Query 6 11 No Speedup 6 5 4 3 2 1 1 2 3 4 5 6 # Threads Query 60T1 50 40 30 20 10T6 0 1 2 3 4 5 6 # Threads Speedup Exec. time Speedup Exec. time 60 T1 50 40 30 20 10 0 Linear Speedup 6 5 4 3 2 1 1 2 3 4 5 6 # Threads 1 2 3 4 5 6 # Threads Query 12 Speedup in Reality 90 80 70 60 50 40 30 20 10 0 Speedup Exec. Time (ms) • Mostly neither no speedup nor linear speedup 1 2 3 4 5 # Threads 6 6 5 4 3 2 1 Linear speedup Bing No speedup 1 2 3 4 5 # Threads 6 13 Adaptive Algorithm • Decide parallelism degree at runtime – Pick a degree (p) that minimizes response time min(Tp + K Tp p / N) My execution time Latency impact on waiting queries K: system load (queue length) Tp: Execution time with parallelism degree p 14 Experimental Setup • Machine setup – Two 6-core Xeon processors (2.27GHz) – 32GB memory • 22GB dedicated to caching – 90GB web index in SSD • Experimental system – Index server – Client • Replay obtained queries • Poisson distribution • Varying arrival rate (query per second) • Workload – 100K Bing user queries 15 Mean Response Time Mean Response Time (ms) - Fixed Parallelism 100 90 80 70 60 50 40 30 Degree = 1 Degree = 2 Degree = 3 Degree = 6 10 20 30 40 50 60 70 80 90 100 System Load (QPS) • No fixed degree of parallelism performs well for all loads 16 Mean Response Time Mean Response Time (ms) - Adaptive Parallelism 100 90 80 70 60 50 40 30 Degree = 1 Degree = 2 Degree = 3 Degree = 6 Adaptive 10 20 30 40 50 60 70 80 90 100 System Load (QPS) • Lower than all other fixed degrees 17 Mean Response Time 100 90 80 70 60 50 40 30 Degree = 1 Degree = 2 Degree = 3 Degree = 6 Adaptive 10 20 30 40 50 40.88 40 30 20 0 70 80 90 100 51.04 50 10 60 System Load (QPS) 60 Fraction (%) Mean Response Time (ms) - Adaptive Parallelism - 0.53 1 7.09 0.33 0.14 2 3 4 5 6 Parallelism Degree - Select any degree among all possible options - Parallelism degrees are utilized unevenly to produce the best performance 18 Mean Response Time Mean Response Time (ms) - Adaptive Parallelism 100 90 80 70 60 50 40 30 47% Degree = 1 Adaptive 10 20 30 40 50 60 70 80 90 100 System Load (QPS) Interesting range • Much lower than sequential execution 19 95% Response Time (ms) 95th-Percentile Response Time 400 350 300 52% 250 Degree = 1 Degree = 2 Degree = 3 Degree = 6 Adaptive 200 150 100 10 20 30 40 50 60 70 80 90 100 System Load (QPS) • Similar improvements in 99% response time 20 More Experimental Results • Quality of responses – Equivalent to sequential execution • Importance of speedup – Compare to system load only – Using both system load and speedup provides up to 17% better for the average 21 Conclusion 1. Effectively parallelize search server – Parallelize a single query with multiple threads – Adaptively select parallelism degree per query 2. We improve response time by: – 47% on average, 52% for the 95th-percentile – Same quality of responses 3. In Bing next week! 22