Adaptive Parallelism for Web Search Myeongjae Jeon Rice University In collaboration with Yuxiong He (MSR), Sameh Elnikety (MSR), Alan L.

Download Report

Transcript Adaptive Parallelism for Web Search Myeongjae Jeon Rice University In collaboration with Yuxiong He (MSR), Sameh Elnikety (MSR), Alan L.

Adaptive Parallelism for Web Search
Myeongjae Jeon
Rice University
In collaboration with Yuxiong He (MSR), Sameh Elnikety (MSR),
Alan L. Cox (Rice), and Scott Rixner (Rice)
1
Performance of Web Search
1) Query response time
– Answer quickly to users
– Reduce both mean and high-percentile latency
2) Response quality (relevance)
– Provide highly relevant web pages
– Improve with resources and time consumed
2
How Microsoft Bing Works
Aggregator
Query
- All web pages are partitioned
across index servers
- Distributed query processing
(embarrassingly parallel)
- Aggregate top K relevant pages
Top K
pages
Top K
pages
Top K
pages
Top K
pages
Top K
pages
Top K
pages
Index
Server
Index
Server
Index
Server
Index
Server
Index
Server
Index
Server
Partition
Partition
Partition
Partition
Partition
All web pages
Partition
3
Our Work
Query 1
Query 2
Query 3
• Multicore index server
– Multiple queries executed concurrently
– Each query executed sequentially
• Optimization opportunity
– Low CPU utilization
– Many idle cores
• Contributions
Index
Server
– Parallelize a single query
– Reduce response times
4
Outline
1. Query parallelism
– Run a query with multiple threads
2. Adaptive parallelism
– Select degree of parallelism per query
3. Evaluation
5
Query Processing and Early Termination
Highest
Docs sorted by static rank
Doc 1 Doc 2 Doc 3
…….
…….
Lowest
Doc N-2 Doc N-1 Doc N
Web
documents
…….
Inverted index
“EuroSys”
Processing
Not evaluated
• Processing “Not evaluated” part is useless
– Unlikely to contribute to top K relevant results
6
Assigning Data to Threads
• Purpose: data processing similar to sequential execution
Sequential execution
Sorted web documents
Highest rank
T1
Lowest rank
T2
T1
T2
T2
T2
T1
T2
T1
T2
1
2
3
4
5
6
7
8
• Key approach: dynamic alternating on small chunks
7
Share Thread Info Using Global Heap
Global heap (global top K)
Sync
Sync
Thread 1 (local top k)
Thread 2 (local top k)
• Share information to reduce wasted execution
• Use a global heap and update asynchronously
• Small sync overhead by batched updates
8
Outline
1. Query parallelism
– Run a query with multiple threads
2. Adaptive parallelism
– Select degree of parallelism per query
3. Evaluation
9
Key Ideas
• System load
– Parallelize a query at light load
– Execute a query sequentially at heavy load
• Speedup
– Parallelize a query more aggressively if parallelism
shows a better speedup
10
No Load
Heavy Load
Query 1
Query 1 Query 4
Query 2 Query 5
Query 3 Query 6
11
No Speedup
6
5
4
3
2
1
1 2 3 4 5 6
# Threads
Query
60T1
50
40
30
20
10T6
0
1 2 3 4 5 6
# Threads
Speedup
Exec. time
Speedup
Exec. time
60
T1
50
40
30
20
10
0
Linear Speedup
6
5
4
3
2
1
1 2 3 4 5 6
# Threads
1 2 3 4 5 6
# Threads
Query
12
Speedup in Reality
90
80
70
60
50
40
30
20
10
0
Speedup
Exec. Time (ms)
• Mostly neither no speedup nor linear speedup
1
2
3 4 5
# Threads
6
6
5
4
3
2
1
Linear speedup
Bing
No speedup
1
2
3
4
5
# Threads
6
13
Adaptive Algorithm
• Decide parallelism degree at runtime
– Pick a degree (p) that minimizes response time
min(Tp + K  Tp  p / N)
My execution
time
Latency impact on
waiting queries
K: system load (queue length)
Tp: Execution time with parallelism degree p
14
Experimental Setup
• Machine setup
– Two 6-core Xeon
processors (2.27GHz)
– 32GB memory
• 22GB dedicated to
caching
– 90GB web index in SSD
• Experimental system
– Index server
– Client
• Replay obtained queries
• Poisson distribution
• Varying arrival rate (query
per second)
• Workload
– 100K Bing user queries
15
Mean Response Time
Mean Response
Time (ms)
- Fixed Parallelism 100
90
80
70
60
50
40
30
Degree = 1
Degree = 2
Degree = 3
Degree = 6
10
20
30
40
50
60
70
80
90
100
System Load (QPS)
• No fixed degree of parallelism performs well
for all loads
16
Mean Response Time
Mean Response
Time (ms)
- Adaptive Parallelism 100
90
80
70
60
50
40
30
Degree = 1
Degree = 2
Degree = 3
Degree = 6
Adaptive
10
20
30
40
50
60
70
80
90
100
System Load (QPS)
• Lower than all other fixed degrees
17
Mean Response Time
100
90
80
70
60
50
40
30
Degree = 1
Degree = 2
Degree = 3
Degree = 6
Adaptive
10
20
30
40
50
40.88
40
30
20
0
70
80
90
100
51.04
50
10
60
System Load (QPS)
60
Fraction (%)
Mean Response
Time (ms)
- Adaptive Parallelism -
0.53
1
7.09
0.33 0.14
2
3
4
5
6
Parallelism Degree
- Select any degree among all
possible options
- Parallelism degrees are utilized
unevenly to produce the best
performance
18
Mean Response Time
Mean Response
Time (ms)
- Adaptive Parallelism 100
90
80
70
60
50
40
30
47%
Degree = 1
Adaptive
10
20
30
40
50
60
70
80
90
100
System
Load
(QPS)
Interesting
range
• Much lower than sequential execution
19
95% Response Time
(ms)
95th-Percentile Response Time
400
350
300
52%
250
Degree = 1
Degree = 2
Degree = 3
Degree = 6
Adaptive
200
150
100
10
20
30
40
50
60
70
80
90
100
System Load (QPS)
• Similar improvements in 99% response time
20
More Experimental Results
• Quality of responses
– Equivalent to sequential execution
• Importance of speedup
– Compare to system load only
– Using both system load and speedup provides up
to 17% better for the average
21
Conclusion
1. Effectively parallelize search server
– Parallelize a single query with multiple threads
– Adaptively select parallelism degree per query
2. We improve response time by:
– 47% on average, 52% for the 95th-percentile
– Same quality of responses
3. In Bing next week!
22