The Traveling Salesman Problem

Download Report

Transcript The Traveling Salesman Problem

Case Studies:
Bin Packing &
The Traveling Salesman Problem
Part II
David S. Johnson
AT&T Labs – Research
© 2010 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.
The Traveling Salesman Problem
Given:
Set of cities {c1,c2,…,cN }.
For each pair of cities {ci,cj}, a distance d(ci,cj).
Find:
Permutation
π : {1,2,..., N}  {1,2,..., N}
N 1
 d(c
i1
π(i)
that minimizes
, c π(i  1) )  d(c π(N) , c π(1) )
N = 10
N = 10
Other Types of Instances
• X-ray crystallography
– Cities: orientations of a crystal
– Distances: time for motors to rotate the crystal
from one orientation to the other
• High-definition video compression
– Cities: binary vectors of length 64 identifying the
summands for a particular function
– Distances: Hamming distance (the number of terms
that need to be added/subtracted to get the next
sum)
No-Wait Flowshop Scheduling
– Cities: Length-4 vectors <c1,c2,c3,c4> of
integer task lengths for a given job
that consists of tasks that require 4
processors that must be used in order,
where the task on processor i+1 must
start as soon as the task on processor i
is done).
– Distances: d(c,c’) = Increase in the
finish time of the 4th processor if c’ is
run immediately after c.
– Note: Not necessarily symmetric: may
have d(c,c’)  d(c’,c).
How Hard?
• NP-Hard for all the above applications
and many more
– [Karp, 1972]
– [Papadimitriou & Steiglitz, 1976]
– [Garey, Graham, & Johnson, 1976]
– [Papadimitriou & Kanellakis, 1978]
– …
How Hard?
Number of possible tours:
N! = 123(N-1)N = Θ(2NlogN)
10! = 3,628,200
20! ~ 2.431018 (2.43 quadrillion)
Dynamic Programming Solution:
O(N22N) = o(2NlogN)
Dynamic Programming Algorithm
• For each subset C’ of the cities containing c1, and
each city cC’, let f(C’,c) = Length of shortest path
that is a permutation of C’, starting at c1 and ending
at c.
• f({c1}, c1) = 0
• For xC’, f(C’{x},x) = MincC’f(C’,c) + d(c,x).
• Optimal tour length = MincCf(C,c) + d(c, c1).
• Running time: ~(N-1)2N-1 items to be computed, at time
N for each = O(N22N)
How Hard?
Number of possible tours:
N! = 123(N-1)N = Θ(2NlogN)
10! = 3,628,200
20! ~ 2.431018 (2.43 quadrillion)
Dynamic Programming Solution:
O(N22N)
102210 = 102,400
202220 = 419,430,400
N = 10
N = 100
N = 1000
N = 10000
Planar Euclidean Application #1
• Cities:
– Holes to be drilled in printed circuit boards
N = 2392
Planar Euclidean Application #2
• Cities:
– Wires to be cut in a “Laser Logic”
programmable circuit
N = 7397
N = 33,810
N = 85,900
Standard Approach to Coping
with NP-Hardness:
• Approximation Algorithms
– Run quickly (polynomial-time for theory,
low-order polynomial time for practice)
– Obtain solutions that are guaranteed to be
close to optimal
– For the latter and the TSP, we need the
triangle inequality to hold:
d(a,c) ≤ d(a,b) + d(b,c)
No -Inequality Danger
• Theorem [Karp, 1972]: Given a graph G = (V,E), it is
NP-hard to determine whether G contains a Hamiltonian
circuit (collections of edges that make up a tour).
• Given a graph, construct a TSP instance in which
{
d(c,c’) =
1 if {c,c’}  E
N2N if {c,c’}  E
• If Hamiltonian circuit exists, OPT = N, if not, OPT > N2N.
• A polynomial-time approximation algorithm with that
guaranteed a tour of length no more than 2N OPT would
imply P = NP.
Nearest Neighbor (NN):
1.
Start with some city.
2. Repeatedly go next to the nearest unvisited neighbor of
the last city added.
3. When all cities have been added, go from the last back
to the first.
A
d(A,D) ≤ d(A,E) + d(E,D)
B
E
d(B,C) ≤ d(B,E) + d(E,C)
D
C
Note: By -inequality, an optimal tour need not contain any crossed edges.
d(A,D) + d(B,C) ≤ (d(A,E) + d(E,D)) + (d(B,E) + d(E,C))
= (d(A,E) + d(E,C)) + (d(B,E) + d(E,D))
= d(A,C) + d(B,D)
For the Euclidean metric, the inequalities are strict
(unless all relevant cities are co-linear)
• Theorem
[Rosenkrantz, Stearns, & Lewis, 1977]:
– There exists a constant c, such that if instance
I obeys the triangle inequality, then we have
NN(I) ≤ clog(N)Opt(I).
– There exists a constant c’, such that for all N >
3, there are N-city instances I obeying the
triangle inequality for which we have NN(I) >
c’log(N)Opt(I).
– For any algorithm A, let
RN(A) = min{A(I)/OPT(I): I is an N-city instance}
– Then RN(NN) = Θ(log(N)) (worst-case ratio)
Lower Bound Examples
F1:
1
1+
(NN starts at left, ends in middle)
1
Li/2 + 1
Fi+1:
Fi
1+
Li/2 + 1
1+
Fi
Let Li be the number of edges encountered if we travel stepby-step from the leftmost vertex in Fi to the rightmost.
L1 = 2,
Li+1 = 2Li + 2
Set the distance from the rightmost vertex of Fk to the
leftmost to 1+, and set all non-specified distances to
their -inequality minimum.
OPT(Fk)/(1+) ≤ N = Lk + 1 = (2Lk-1 + 1) + 1
= (2(2Lk-2 + 1) + 1) + 1 = (2(2(2Lk-3 + 1) + 1) + 1) + 1
k 1
k 1
i
= 2 L1   2 = 2k + 2k – 1 < 2k+1
Log(N) < k+1
i 0
k 1
NN(Fk) ≥ Lk + 1 +  2
i1
k -1-i
k 1
Li > 2k + 2 k -1-i2 i = 2k + (k-1)2k-1
i1
> k2k-1 > (k/4) OPT(Fk)/(1+) = Ω(log(N) OPT(Fk)
Upper Bound Proof (Sketch)
• Observation 1: For symmetric instances with the inequality, d(c,c’) ≤ OPT(I)/2 for all pairs of cities {c,c’}.
P1
c
Optimal Tour
P2
c’
-inequality  d(c,c’) ≤ length(P1) and d(c,c’) ≤ length(P2)
 2d(c,c’) ≤ OPT(I)
• Observation 2: For cities c, let L(c) be the length of
edge added when c was the last city in the tour.
For all pairs {c,c’}, d(c,c’) ≥ min(L(c),L(c’)).
• Proof: Suppose without loss of generality that c is the
first of c,c’ to occur in the NN tour. When c was the
last city, the next city chosen, say c*, satisfied d(c,c*)
≤ c(c,c’’) for all unvisited cities c’’. Since c’ was as yet
unvisited, this implies c(c,c’) ≥ d(c,c*) = L(c).
• Overall Proof Idea: We will partition C into O(logN)
disjoint sets Ci such that cCi L(c) ≤ OPT(I), implying
that
NN(I) = cC L(c) = O(logN)OPT(I).
• Set X0 = C, i = 0 and repeat while |Xi| > 3.
• Let Ti be the set of edges in an optimal tour for Xi -note that |Ti| = |Xi|. By the -inequality, the total
length of these edges is at most OPT(I).
• Let Ti’ be the set of edges in Ti with length greater
than 2OPT(I)/|Ti| -- note that |Ti’| < |Ti|/2.
• For each edge e = {c,c’} in Ti - Ti’, let f(e) be a city c’’
 {c,c’} such that L(c’’) ≤ d(c,c’).
• Set Ci = {f(e): e  Ti - Ti’} -- note that
– cCi L(c) ≤ eTi length(e) ≤ OPT(I).
– |Ci| ≥ |Ti - Ti’|/2 ≥ |Ti|/4 = |Xi|/4 .
• Set Xi+1 = Xi – Ci -- note that |Xi+1| ≤ (3/4)|Xi|.
• Given that |Xi+1| ≤ (3/4)|Xi|, i ≥ 0, this process can only
continue while (3/4)iN > 3, i.e., by the time log(3/4)●i +
logN > log(3), or roughly while log(N)-log(3) > log(3/4)●i = log(4/3)●i.
• Hence the process halts at iteration i*, where
i* 
log(N)  log(3)
log(4/3)
 O(logN)
• At this point there are 3 or fewer cities in Xi*+1. Put
each of them in a separate set Ci*+j, 1 ≤ j ≤ |Xi*+1|, for
which, by Observation 1, we will have L(c) ≤ OPT(I)/2.
Greedy (Multi-Fragment) (GR):
1. Sort the edges, shortest first, and treat them in that
order. Start with an empty graph.
2. While the current graph is not a TSP tour, attempt to add
the next shortest edge. If it yields a vertex degree
exceeding 2 or a tour of length less than N, delete.
• Theorem [Ong & Moore, 1984]: For all instances I
obeying the -Inequality, GR(I) = O(logN)OPT(I).
• Theorem [Frieze, 1979]: There are N-city
instances IN for arbitrarily large N that obey the
-Inequality and have GR[IN ] = Ω(logN/loglogN).
Nearest Insertion (NI):
1. Start with 2-city tour consisting of the some
city and its nearest neighbor.
2. Repeatedly insert the non-tour city that is
closest to a tour city (in the location that
yields the smallest increase in tour length).
• Theorem [Rosenkrantz, Stearns, & Lewis, 1977]:
If instance I obeys the triangle
inequality, then R(NI) = 2.
• Key Ideas of Upper Bound Proof:
– The minimum spanning tree (MST) for I is no longer
than the optimal tour, since deleting an edge from
the tour yields a spanning tree.
– The length of the NI tour is at most twice the
length of an MST.
A
d(B,C) ≤ d(B,A) + d(A,C) by -inequality, so
C
B
d(A,C) + d(B,C) – d(B,A) ≤ 2d(A,C)
(A,C) would be the next edge added by
Prim’s minimum spanning tree algorithm
Lower Bound Examples
2
2
2
2
NI(I) = 2N-2
OPT(I) = N+1
2
2
2
2
Another Approach
• Observation 1: Any connected graph in which
every vertex has even degree contains an
“Euler Tour” – a cycle that traverses each
edge exactly once, which can be found in
linear time.
[Problem 2: Prove this!]
• Observation 2: If the -inequality holds, then
traversing an Euler tour but skipping past
previously-visited vertices yields a Traveling
Salesman tour of no greater length.
Obtaining the Initial Graph
• Double MST algorithm (DMST):
– Combine two copies of an MST.
– Theorem [Folklore]: DMST(I) ≤ 2Opt(I).
• Christofides algorithm (CH):
– Combine one copy of an MST with a minimumlength matching on its odd-degree vertices (there
must be an even number of them since the total
sum of degrees for any graph is even).
– Theorem [Christofides, 1976]: CH(I) ≤ 1.5Opt(I).
Optimal Tour on Odd-Degree Vertices
(No longer than overall Optimal Tour)
Matching M1 + Matching M2 = Optimal Tour
Hence Optimal Matching ≤ min(M1,M2) ≤ OPT(I)/2
Can we do better?
• No polynomial-time algorithm is known that has a
worst-case ratio less than 3/2 for arbitrary instances
satisfying the -inequality.
• Assuming P  NP, no polynomial-time algorithm can do
better than 220/219 = 1.004566… [Papadimitriou &
Vempala, 2006].
• For Euclidean and related metrics, there exists a
polynomial-time approximation scheme (PTAS): For
each  > 0, a polynomial-time algorithm with worst-case
ratio ≤ 1+. [Arora, 1998][Mitchell, 1999].
PTAS RunningTimes
• [Arora, STOC 1996]:
O(N100/)
• [Arora, JACM 1998]:
O(N(logN)O(1/))
• [Rao & Smith, STOC 1998]:
O(1)
(1/)
O(2
N
+ N log N)
• If  = ½ and O(1) = 10, then
O(1)
(1/)
2
> 21000.
Performance “In Practice”
• Testbed: Random Euclidean Instances
(Results appear to be reasonably well-correlated
with those for our real-world instances.)
Nearest Neighbor
Greedy
Smart Shortcuts
Smart-Shortcut Christofides
Smart-Shortcut Christofides
2-Opt
3-Opt
Original Tour
Original Tour
Result of 2-Opt move
Results of 3-Opt moves
2-Opt [Johnson-McGeoch Implementation]
4.2% off optimal
10,000,000 cities in 101 seconds at 2.6 Ghz
3-Opt [Johnson-McGeoch Implementation]
2.4% off optimal
10,000,000 cities in 4.3 minutes at 2.6 Ghz
Lin-Kernighan [Johnson-McGeoch Implementation]
1.4% off optimal
10,000,000 cities in 46 minutes at 2.6 Ghz
Iterated Lin-Kernighan [J-M Implementation]
0.4% off optimal
100,000 cities in 35 minutes at 2.6 Ghz
Worst-Case Running Times
• Nearest Neighbor: O(N2)
• Greedy: O(N2logN)
• Nearest Insertion: O(N2)
• Double MST: O(N2)
• Christofides: O(N3)
• 2-Opt: O(N2●(number of improving moves))
• 3-Opt: O(N3●(number of improving moves))
K-d trees [Bentley, 1975, 1990]
Data Elements
Array T: Permutation of Point indices
Array H: H[i] = index of tree leaf containing Point i
Tree Vertex k:
Index L in
Array T of
First Point
Index U in
Array T of
Last Point
Widest
Dimension
Median Value
in Widest
Dimension
Leaf?
Empty?
Implicit: Index of Parent = k/2
Index of Lower Child = 2k
Index of Higher Child = 2k+1
Operations
• Construct kd-tree (recursively): O(NlogN)
• Delete or temporarily delete points (without
rebalancing): O(1)
• Find nearest neighbor to one of points:
“typically” O(logN)
• Given point p and radius r, find all points p’ with
d(p,p’) ≤ r (“ball search”):
“typically” O(logN + number of points returned).
Finding Nearest Neighbor to Point p
• Initialize M = ∞. Call rnn(H[p],p,M).
• rnn(k,p,M):
– If Vertex k is a leaf,
• For each point x in vertex k’s list,
– If dist(p,x) < M, set M = dist(p,x) and Champion = x.
– Otherwise,
• Let d be vertex k’s widest dimension, m be its median value for
that dimension, and pd the value of p’s coordinate in dimension d.
• If pd > m, let NearChild = 2k+1 and FarChild = 2k.
• Otherwise, NearChild = 2k and FarChild = 2k+1.
• If NearChild has not been explored, rnn(NearChild,p,M).
• If |pd – m| < M, rnn(FarChild,p,M).
– Mark Vertex k as “explored”
Nearest Neighbor Algorithm in
“O(NlogN)”
• Construct kd-Tree on cities.
• Pick a starting city cπ(1).
• While there remains an unvisited city, let the
current last city be cπ(i).
– Use the kd-Tree to find the nearest unvisited city
to cπ(i).
– Delete cπ(i) from the kd-Tree
Greedy (Lazily) in “O(NlogN)”
Initialization:
• Construct a kd-tree on the cities. Let G = (C,). We will maintain
the property that the kd-tree contains only vertices with degree
0 or 1 in G.
• For each city c, let n(c) be its nearest “legal” neighbor. A
neighbor is legal if adding edge {c,n(c)} to the current graph does
not create
– A vertex of degree 3 or
– A cycle of length less than N.
• Put triples (c,n(c),dist(n,n(c))) in a priority queue, sorted in order
of increasing distance.
• For each city c, define otherend(c) = c.
Greedy (Lazily) in “O(NlogN)”
While not yet a tour:
• Extract the minimum (c,n(c),d(c,n(c)) triple from the priority queue.
• If {c,n(c)} is a legal edge, add it to G, and if either c or n(c) now has
degree 2, delete it from the kd-tree.
• If c has degree 2, continue.
• Otherwise n(c) = otherend(c) and so completes a short cycle.
– Temporarily delete otherend(c) from the kd-Tree.
– Find the new nearest neighbor n’(c) of c.
– Add (c,n’(c),dist(c,n’(c)) to the priority queue.
– Put otherend(c) back in the kd-Tree.
“Nearest Insertion” in “O(NlogN)”
• Straightforward if we instead implement the variant in
which each new city is inserted next to its nearest
neighbor in the tour (“Nearest Addition”) – upper bound
proof still goes through, but performance suffers.
• Seemingly no way to avoid Ω(N2) if we need to find the
best place to insert.
• Essentially as good performance can be obtained using
Jon Bentley’s “Nearest Addition+” variant:
– If c is to be inserted and x is the nearest city in the tour to c,
choose the best insertion that places x next to a city x’ with
d(c,x’) ≤ 2d(c,x).
– Idea: Use ball search to find candidates for x’.
Organizing the Search
C
C
B
A
AB
Try each A in turn around the current tour until an improving move is found –
Try all C not adjacent to A in the current tour.
take first one found. If none found after all cities tried for A, halt.
2-Opt in “O(NlogN)”
Loop on cities
Nearer neighbors (via kd-trees)
Accept first rather than best
Approximation: Bounded Nbr Lists w.
stored distances
• Approximation: Don’t Look Bits
•
•
•
•
3-Opt in “NlogN”
• 1st two swaps improve
• Betweenness
• Tour data structure
Doubly-linked List
2-Level Tree
Splay Tree
Lin-Kernighan (k-Opt on the
cheap)
For more on the TSP algorithm performance, see the website for the
DIMACS TSP Challenge:
http://www2.research.att.com/~dsj/chtsp/index.html/
Comparison: Smart-Shortcut Christofides versus 2-Opt
Tour Length
Normalized Running Time
N = 1000 “Random Clustered Instance”
N = 10,000 “Random Clustered Instance”
Estimating Running-Time
Growth Rate for 2-Opt
Microseconds/N
Microseconds/NlogN
Microseconds/N1.25
Held-Karp (or “Subtour”) Bound
• Linear programming relaxation of the following
formulation of the TSP as an integer program:
• Minimize
city pairs {c,c’}(x{c,c’}d(c,c’))
• Subject to
–
c’C x{c,c’} = 2, for all c  C.
–
cS,c’C-S x{c,c’} ≥ 2, for all S  C.
– x
 {0,1}
0{c,c’}
≤ x{c,c’}
≤ 1 , for all pairs {c,c’}  C.
Linear programming relaxation
Percent by which Optimal Tour exceeds Held-Karp Bound
Computing the Held-Karp Bound
• Difficulty: Too many “subtour” constraints:
cS,c’C-S x{c,c’} ≥ 2, for all S  C
• Fortunately, if any such constraint is violated by our current
solution, we can find such a violated constraint in polynomial time:
• Suppose the constraint for S is violated by solution x. Consider
the graph G, where edge {c,c’} has capacity x{c,c’} . For any pair of
vertices (u,v), u  S and v  C-S, the maximum flow from u to v is
less than 2 (and conversely).
• Consequently, an S yielding a violated inequality can be found
using 2N-1 network flow computations, assuming such an
inequality exists.
Lin-Kernighan [Johnson-McGeoch Implementation]
1.4% off optimal
10,000,000 cities in 46 minutes at 2.6 Ghz
Iterated Lin-Kernighan [J-M Implementation]
0.4% off optimal
100,000 cities in 35 minutes at 2.6 Ghz
Concorde Branch-and-Cut Optimization
[Applegate-Bixby-Chvatal-Cook]
Optimum
1,000 cities in median time 5 minutes at 2.66 Ghz
Concorde
•
“Branch-and-Cut” approach exploiting linear programming to determine
lower bounds on optimal tour length.
•
Based on 30+ years of theoretical developments in the “Mathematical
Programming” community.
•
Exploits “chained” (iterated) Lin-Kernighan for its initial upperbounds.
•
Eventually finds an optimal tour and proves its optimality (unless it
runs out of time/space).
•
Also can compute the Held-Karp lower bound for very large instances.
•
Executables and source code can be downloaded from
http://www.tsp.gatech.edu/
Geometric Interpretation
Hyperplane
perpendicular
to the vector
of edge lengths
Optimal Tour
-- Points in RN(N-1)/2 corresponding to a tour.
Optimal Tour is a point on the convex hull of
all tours.
Facet
Unfortunately, the LP relaxation of the TSP can be a very
To improve
it, add to
more
poor
approximation
theconstraints
convex hull (“cuts”)
of tours.
One Facet Class: Comb Inequalities
H
T1
T2 T3
Ts-1
Ts
Teeth Ti are disjoint, s is odd,
all regions contain at least one city.
H
T1
T2
Ts-1
T3
Ts
• For Y the handle or a tooth, let x(Y) be the total
value of the edge variables for edges with one
endpoint in Y and one outside, when the function x
corresponds to a tour
• By subtour inequalities, we must have x(Y) ≥ 2 for
each such Y. It also must be even, which is exploited
to prove the comb inequality:
x(H) 
s
 x(T )  3s  1
i
i1
Branch & Cut
• Use a heuristic to generate a initial “champion” tour
and provide provide an upper bound U ≥ OPT.
• Let our initial “subproblem” consist of an LP with just
the inequalities of the LP formulation (or some subset
of them).
• Handle subproblems as follows:
Branch & Cut
• Keep adding violated inequalities (of various sorts) that you can
find, until
– (a) LP Solution value ≥ U. In this case we prune this case and
if no other cases are left, our current tour is optimal.
– (b) Little progress is made in the objective function. In this
case, for some edge {c,c’} with a fractional value, split into
two subproblems, one with x{c,c’} fixed at 1 (must be in the
tour, and one with it fixed at 0 (must not be in the tour).
• If we ever encounter an LP solution that is a tour and has length
L’ < L, set L = L’ and let this new tour be the champion. Prune any
subproblems whose LP solution exceeds or equals L. If at any
point all your children are pruned, prune yourself.
U = 97
Initial LP, U = 100, LB = 90
X{a,b} = 0
X{a,b} = 1
LB = 92
X{c,d} = 0
LB = 92
LB = 93
X{c,d} = 1
LB = 100
X{a,c} = 0
LB = 98
X{a,c} = 1
LB = 97
New Opt = 97
X{e,a} = 0
LB = 101
X{e,a} = 1
LB = 100
Current World Record (2006)
Using a parallelized version of the
Concorde code, Helsgaun’s
sophisticated variant on Iterated
Lin-Kernighan, and 2719.5 cpu-days
N = 85,900
The optimal tour is 0.09% shorter than the tour DSJ constructed using Iterated Lin-Kernighan
in 1991. In 1986, when computers were much slower, we could only give the Laser Logic people a
Nearest-Neighbor tour, which was 23% worse, but they were quite happy with it…
Running times (in seconds)
for 10,000 Concorde runs
on random 1000-city planar
Euclidean instances (2.66
Ghz Intel Xeon processor
in dual-processor PC,
purchased late 2002).
Range: 7.1 seconds
to 38.3 hours
Concorde Asymptotics
[Hoos and Stϋtzle, 2009 draft]
• Estimated median running time for
planar Euclidean instances.
• Based on
– 1000 samples each for N = 500,600,…,2000
– 100 samples each for N = 2500, 3000,3500,4000,4500
– 2.4 Ghz AMD Opteron 2216 processors with 1MB L2 cache
and 4 GB main memory, running Cluster Rocks Linux v4.2.1.
0.21 · 1.24194 N
Actual median for N = 2000: ~57 minutes, for N = 4,500: ~96 hours
Theoretical Properties of Random
Euclidean Instances
Expected optimal tour length for an N-city
instance approaches CN for some constant
C as N  . [Beardwood, Halton, and Hammersley, 1959]
Key Open Question:
What is the Value of C?
The Early History
• 1959: BHH estimated C  .75, based on hand solutions for a
202-city and a 400-city instance.
• 1977: Stein estimates C  .765, based on extensive simulations
on 100-city instances.
• Methodological Problems:
• Not enough data
• Probably not true optima for the data there is
• Misjudges asymptopia
Stein: C = .765
BHH: C = .75
Figure from [Johnson, McGeoch, Rothberg, 1996]
What is the dependence on N ?
• Expected distance to nearest neighbor proportional
to 1/N, times n cities yields (N)
• O(N) cities close to the boundary are missing some
neighbors, for an added contribution proportional to
(N)(1/N), or (1)
• A constant number of cities are close to two
boundaries (at the corners of the square), which may
add an additional (1/N )
• This yields target function
OPT/N = C + /N + /N
Asymptotic Upper Bound Estimates
(Heuristic-Based Results Fitted to OPT/N =
C + /N)
• 1989: Ong & Huang estimate C ≤ .74, based on runs of
3-Opt.
• 1994: Fiechter estimates C ≤ .73, based on runs of
“parallel tabu search”
• 1994: Lee & Choi estimate C ≤ .721, based on runs of
“multicanonical annealing”
• Still inaccurate, but converging?
• Needed: A new idea.
New Idea (1995): Suppress the variance
added by the “Boundary Effect” by using
Toroidal Instances
• Join left boundary of the unit square to the
right boundary, top to the bottom.

Toroidal Unit Ball
Toroidal Distance
Computations
Toroidal Instance Advantages
• No boundary effects.
• Same asymptotic constant for E[OPT/N] as
for planar instances [Jaillet, 1992] (although
it is still only asymptotic).
• Lower empirical variance for fixed N.
Toroidal Approaches
1996: Percus & Martin estimate
C  .7120 ± .0002.
1996: Johnson, McGeoch, and Rothberg estimate
C  .7124 ± .0002.
2004: Jacobsen, Read, and Saleur estimate
C  .7119.
Each coped with the difficulty of computing optima in a
different way.
With Concorde and today’s computers, we no longer have to.
Optimal Tour Lengths:
One Million 100-City Instances
-1e+07
-5e+06
0
+5e+06
Optimal Tour Lengths Appear to Be Normally Distributed
Optimal Tour Lengths:
One Million 1000-City Instances
-1e+07
-5e+06
0
+5e+06
With a standard deviation that appears to be independent of N
The New Data
• Solver:
– Latest (2003) version of Concorde with a
few bug fixes and adaptations for new metrics
• Primary Random Number Generator:
– RngStream package of Pierre L’Ecuyer,
described in
• “AN OBJECT-ORIENTED RANDOM-NUMBER PACKAGE
WITH MANY LONG STREAMS AND SUBSTREAMS,”
Pierre L'ecuyer, Richard Simard, E. Jack Chen, W. David
Kelton, Operations Research 50:6 (2002), 1073-1075
Toroidal Instances
Number of Cities
Number of
Instances
OPT
HK
N = 3, 4, …, 49, 50
1,000,000
X
X
N = 60, 70, 80, 90, 100
1,000,000
X
X
N = 200, 300, …, 1,000
1,000,000
X
X
10,000
X
X
100,000
X
X
N = 110, 120, …, 1,900
N = 2,000
N = 2,000, 3,000, …, 10,000
N = 100,000
N = 1,000,000
1,000,000
X
1,000
X
100
X
Euclidean Instances
Number of Cities
Number of
Instances
OPT
HK
N = 3, 4, …, 49, 50
1,000,000
X
X
N = 60, 70, 80, 90, 100
1,000,000
X
X
N = 110, 120, …, 1,000, 2,000
10,000
X
X
N = 1,100, 1,200 …, 10,000
10,000
X
N = 20,000, 30,000, …, 100,000
10,000
X
1,000
X
N = 1,000,000
Standard Deviations
N = 100
N = 1,000
99% Confidence Intervals for OPT/N
for Euclidean and Toroidal Instances
99% Confidence Intervals for (OPT-HK)/N
for Euclidean and Toroidal Instances
Least Squares fit for all data from
[30,2000] -- OPT/N = (C + a/N + b/N2)
C = .712401 ± .000005 versus C = .712333 ± .00006
What is the right function?
Power Series in 1/N – the Percus-Martin Choice
Range of N
Function
C
Confidence
[30,2000]
C + a/N + b/N2
.712401
± .000005
[100,2000]
C + a/N + b/N2
.712403
± .000010
[100,2000]
C + a/N
.712404
± .000006
Justification: Expected distance to the kth nearest
neighbor is provably such a power series.
What is the right function?
OPT/sqrt(N) = Power Series in 1/sqrt(N))
Range of N
Function
C
Confidence
[100,2000]
C + a/N0.5
.712296
± .000015
[100,2000]
C + a/N0.5 + b/N
.712403
± .000030
[100,2000]
C + a/N0.5 + b/N + c/N1.5
.712424
± .000080
Justification: This is what we saw in the planar
Euclidean case (although it was caused by boundaries).
What is the right function?
OPT = (1/sqrt(N) · (Power Series in 1/N)
Range of N
Function
C
Confidence
[100,2000]
C + a/N0.5
.712296
± .000015
[100,2000]
C + a/N0.5 + b/N1.5
.712366
± .000022
[100,2000]
C + a/N0.5 + b/N1.5 + c/N2.5
.712385
± .000040
What is the right function?
Range of N
Function
C
Confidence
[30,2000]
C + a/N + b/N2
.712401
± .000005
[100,2000]
C + a/N + b/N2
.712403
± .000010
[100,2000]
C + a/N
.712404
± .000006
[100,2000]
C + a/N0.5
.712296
± .000015
[100,2000]
C + a/N0.5 + b/N
.712403
± .000030
[100,2000]
C + a/N0.5 + b/N + c/N1.5
.712424
± .000080
[100,2000]
C + a/N0.5 + b/N1.5
.712366
± .000022
[100,2000]
C + a/N0.5 + b/N1.5 + c/N2.5
.712385
± .000040
Effect of Data Range on Estimate
[30,2000], [60,2000], [100,2000], [200,2000], [100,1000]
95% Confidence
Intervals
CC ++ a/n
a/n.5.5.5 + b/n + c/n1.5
C + a/n
+ b/n2 + c/n3
C + a/n.5.5
.1.5 + c/n2.5
+ b/n.1.5
The Winners?
C + a/n
+
b/n2
+
C = .71240 ± .00002
.000005
c/n3
Question
Does the HK-based approach
agree?
CHK = .707980 ± .000003
95% confidence interval derived using C + a/N + b/N2 functional
form
C-CHK = .004419 ± .000002
95% confidence interval derived using C + a/N + b/N2 functional
form
HK-Based Estimate
C-CHK = .004419 ± .000002
+ CHK = .707980 ± .000003
C = .712399 ± .000005
Versus (Conservative) Opt-Based Estimate
C = .712400 ± .000020
Combined Estimate?
C = .71240
± .00001
OPEN PROBLEM:
What function truly describes the data?
Our data suggests OPT/sqrt(N) ≈
.71240 + a/N - b/N2 + O(1/N3),
a = .049 ± .004,
b = .3 ± .2
(from fits for ranges [60,2000] and [100,2000])
But what about the range [3,30]?
(95% confidence intervals on data) – f(N),
3 ≤ N ≤ 30
Fit of a + b/N + c/N2 + d/N3 + e/N4 for
[3,30]
95% Confidence Intervals
To date, no good fit of any sort has been found.
Problem
• Combinatorial factors for small N may
make them unfittable:
– Only one possible tour for N = 3 (expected length
of optimal tour can be given in closed form)
– Only 3, 12, 60, 420, … possible tours for N = 4, 5,
6, 7, …, so statistical mechanics phenomena may
not yet have taken hold.
• So let’s throw out data for N < 12
Fit of a + b/N + c/N2 + d/N3 + e/N4 for
[12,2000]
Still Questionable…
Unexplained Phenomenon: Rise and then Fall
Peaks at N = 17
99.7% confidence intervals on OPT/n,
10 ≤ n ≤ 30.
Stop!
Applications
• Cities (3 dimensions):
– Orientations of a crystal when doing X-ray
crystallography – up to 3000 cities
Gnuplot Least Squares fit for the Percus-Martin
values of N -- OPT/N = (C + a/N + b/N2)/(1+1/(8N))
C = .712234 ± .00017 versus originally claimed C = .7120 ± .0002
Least Squares fit for all data from
[12,100] -- OPT/N = (C + a/N + b/N2)
C = .712333 ± .00006 versus C = .712234 ± .00017