Welcome! This post is not going to be discussing technical implementation details or theortical work for my Google Summer of Code project, but rather serve as a summary and recap for the work that I did this summer.
I am very happy with the work I was able to accomplish and believe that I successfully completed my project.
Overview#
My project was titled NetworkX: Implementing the Asadpour Asymmetric Traveling Salesman Problem Algorithm. The updated abstract given on the Summer of Code project project page is below.
This project seems to implement the asymmetric traveling salesman problem developed by Asadpour et al, originally published in 2010 and revised in 2017. The project is broken into multiple methods, each of which has a set timetable during the project. We start by solving the HeldKarp relaxation using the Ascent method from the original paper by Held and Karp. Assuming the result is fractional, we continue into the Asadpour algorithm (integral solutions are optimal by definition and immediately returned). We approximate the distribution of spanning trees on the undirected support of the Held Karp solution using a maximum entropy rounding method to construct a distribution of trees. Roughly speaking, the probability of sampling any given tree is proportional to the product of all its edge lambda values. We sample 2 log n trees from the distribution using an iterative approach developed by V. G. Kulkarni and choose the tree with the smallest cost after returning direction to the arcs. Finally, the minimum tree is augmented using a minimum network flow algorithm and shortcut down to an O(log n / log log n) approximation of the minimum Hamiltonian cycle.
My proposal PDF for the 2021 Summer of Code can be found here.
All of my changes and additions to NetworkX are part of this pull request and can also be found on this branch in my fork of the GitHub repository, but I will be discussing the changes and commits in more detail later.
Also note that for the commits I listed in each section, this is an incomplete list only hitting on focused commits to that function or its tests.
For the complete list, please reference the pull request or the bothTSP
GitHub branch on my fork of NetworkX.
My contributions to NetworkX this summer consist predominantly of the following functions and classes, each of which I will discuss in their own sections of this blog post. Functions and classes which are frontfacing are also linked to the developer documentation for NetworkX in the list below and for their section headers.
SpanningTreeIterator
ArborescenceIterator
held_karp_ascent
spanning_tree_distribution
sample_spanning_tree
asadpour_atsp
These functions have also been unit tested, and those tests will be integrated into NetworkX once the pull request is merged.
References#
The following papers are where all of these algorithms originate form and they were of course instrumental in the completion of this project.
[1] A. Asadpour, M. X. Goemans, A. Madry, S. O. Gharan, and A. Saberi, An O (log n / log log n)approximation algorithm for the asymmetric traveling salesman problem, SODA ’10, Society for Industrial and Applied Mathematics, 2010, p. 379  389 https://dl.acm.org/doi/abs/10.5555/1873601.1873633.
[2] J. Edmonds, Optimum Branchings, Journal of Research of the National Bureau of Standards, 1967, Vol. 71B, p.233240, https://archive.org/details/jresv71Bn4p233
[3] M. Held, R.M. Karp, The travelingsalesman problem and minimum spanning trees. Operations research, 19701101, Vol.18 (6), p.11381162. https://www.jstor.org/stable/169411
[4] G.K. Janssens, K. Sörensen, An algorithm to generate all spanning trees in order of increasing cost, Pesquisa Operacional, 200508, Vol. 25 (2), p. 219229, https://www.scielo.br/j/pope/a/XHswBwRwJyrfL88dmMwYNWp/?lang=en
[5] V. G. Kulkarni, Generating random combinatorial objects, Journal of algorithms, 11 (1990), p. 185–207.
SpanningTreeIterator
#
The SpanningTreeIterator
was the first contribution I completed as part of my GSoC project.
This class takes a graph and returns every spanning tree in it in order of increasing cost, which makes it a direct implementation of [4].
The interesting thing about this iterator is that it is not used as part of the Asadpour algorithm, but served as an intermediate step so that I could develop the ArborescenceIterator
which is required for the Held Karp relaxation.
It works by partitioning the edges of the graph as either included, excluded or open and then finding the minimum spanning tree which respects the partition data on the graph edges.
In order to get this to work, I created a new minimum spanning tree function called kruskal_mst_edges_partition
which does exactly that.
To prevent redundancy, all kruskal minimum spanning trees now use this function (the original kruskal_mst_edges
function is now just a wrapper for the partitioned version).
Once a spanning tree is returned from the iterator, the partition data for that tree is split so that the union of the newly generated partitions is the set of all spanning trees in the partition except the returned minimum spanning tree.
As I mentioned earlier, the SpanningTreeIterator
is not directly used in my GSoC project, but I still decided to implement it to understand the partition process and be able to directly use the examples from [4] before moving onto the ArborescenceIterator
.
This class I’m sure will be useful to the other users of NetworkX and provided a strong foundation to build the ArborescenceIterator
off of.
Blog Posts about SpanningTreeIterator
5 Jun 2021  Finding All Minimum Arborescences
10 Jun 2021  Implementing The Iterators
Commits about SpanningTreeIterator
Now, at the beginning of this project, my commit messages were not very good… I had some problems about merge conflicts after I accidentally committed to the wrong branch and this was the first time I’d used a precommit hook.
I have not changed the commit messages here, so that you may be assumed by my troughly unhelpful messages, but did annotate them to provide a more accurate description of the commit.
Testing  Rewrote Kruskal’s algorithm to respect partitions and tested that while stubbing the iterators in a separate file
I’m not entirely sure how the commit hook works…  Added test cases and finalized implementation of Spanning Tree Iterator in the incorrect file
Moved iterators into the correct files to maintain proper codebase visibility  Realized that the iterators need to be in mst.py
and branchings.py
respectively to keep private functions hidden
Documentation update for the iterators  No explanation needed
Update mst.py to accept suggestion  Accepted doc string edit from code review
Review suggestions from dshult  Implemented code review suggestions from one of my mentors
Cleaned code, merged functions if possible and opened partition functionality to all
Implement suggestions from boothby
ArborescenceIterator
#
The ArborescenceIterator
is a modified version of the algorithm discussed in [4] so that it iterates over the spanning arborescences.
This iterator was a bit more difficult to implement, but that is due to how the minimum spanning arborescence algorithm is structured rather than the partition scheme not being applicable to directed graphs.
In fact the partition scheme is identical to the undirected SpanningTreeIterator
, but Edmonds’ algorithm is more complex and there are several edge cases about how nodes can be contracted and what it means for respecting the partition data.
In order to fully understand the NetworkX implementation, I had to read the original Edmonds paper, [2].
The most notable change was that when the iterator writes the next partition onto the edges of the graph just before Edmonds’ algorithm is executed, if any incoming edge is marked as included, all of the others are marked as excluded.
This is an implicit part of the SpanningTreeIterator
, but needed to be explicitly done here so that if the vertex in question was merged during Edmonds’ algorithm we could not choose two of the incoming edges to the same vertex once the merging was reversed.
As a final note, the ArborescenceIterator
has one more initial parameter than the SpanningTreeIterator
, which is the ability to give it an initial partition and iterate over all spanning arborescence with cost greater than the initial partition.
This was used as part of the branch and bound method, but is no longer a part of the my Asadpour algorithm implementation.
Blog Posts about ArborescenceIterator
5 Jun 2021  Finding All Minimum Arborescences
10 Jun 2021  Implementing The Iterators
Commits about ArborescenceIterator
My commits listed here are still annotated and much of the work was done at the same time.
Testing  Rewrote Kruskal’s algorithm to respect partitions and tested that while stubbing the iterators in a separate file
Moved iterators into the correct files to maintain proper codebase visibility  Realized that the iterators need to be in mst.py
and branchings.py
respectively to keep private functions hidden
Including Black reformat  Modified Edmonds’ algorithm to respect partitions
Modified the ArborescenceIterator to accept init partition  No explanation needed
Documentation update for the iterators  No explanation needed
Update branchings.py accept doc string edit  No explanation needed
Review suggestions from dshult  Implemented code review suggestions from one of my mentors
Cleaned code, merged functions if possible and opened partition functionality to all
Implemented review suggestions from rossbar
Implement suggestions from boothby
held_karp_ascent
#
The Held Karp relaxation was the most difficult part of my GSoC project and the part that I was the most worried about going into this May.
My plans on how to solve the relaxation evolved over the course of the summer as well, finally culminating in held_karp_ascent
.
In my GSoC proposal, I discuss using scipy
to solve the relaxation, but the Held Karp relaxation is a semiinfinite linear problem (that is, it is finite but exponential) so I would quickly surpass the capabilities of virtually any computer that the code would be run on.
Fortunately I realized that while I was still writing my proposal and was able to change it.
Next, I wanted to use the ellipsoid algorithm because that is the suggested method in the Asadpour paper [1].
As it happens, the ellipsoid algorithm is not implemented in numpy
or scipy
and after discussing the practicality of implementing the algorithm as part of this project, we decided that a robust ellipsoid solver was a GSoC project onto itself and beyond the scope of the Asadpour algorithm.
Another method was needed, and was found.
In the original paper by Held and Karp [3], they present three different algorithms for solving the relaxation, the columngeneration technique, the ascent method and the branch and bound method.
After reading the paper and comparing all of the methods, I decided that the branch and bound method was the best in terms of performance and wanted to implement that one.
The branch and bound method is a modified version of the ascent method, so I started by implementing the ascent method, then the branch and bound around it. This had the extra benefit of allowing me to compare the two and determine which is actually better.
Implementing the ascent method proved difficult. There were a number of subtle bugs in finding the minimum 1arborescences and finding the value of epsilon by not realizing all of the valid edge substitutions in the graph. More information about these problems can be found in my post titled Understanding the Ascent Method. Even after this the ascent method was not working proper, but I decided to move onto the branch and bound method in hopes of learning more about the process so that I could fix the ascent method.
That is exactly what happened! While debugging the branch and bound method, I realized that my function for finding the set of minimum 1arborescences would stop searching too soon and possibly miss the minimum 1arborescences. Once I fixed that bug, both the ascent as well as the branch and bound method started to produce the correct results.
But which one would be used in the final project?
Well, that came down to which output was more compatible with the rest of the Asadpour algorithm. The ascent method could find a fractional solution where the edges are not totally in or out of the solution while the branch and bound method would take the time to ensure that the solution was integral. As it would happen, the Asadpour algorithm expects a fractional solution to the Held Karp relaxation so in the end the ascent method one out and the branch and bound method was removed from the project.
All of this is detailed in the (many) blog posts I wrote on this topic, which are listed below.
Blog posts about the Held Karp relaxation
My first two posts were about the scipy
solution and the ellipsoid algorithm.
11 Apr 2021  Held Karp Relaxation
8 May 2021  Held Karp Separation Oracle
This next post discusses the merits of each algorithm presenting in the original Held and Karp paper [3].
3 Jun 2021  A Closer Look At Held Karp
And finally, the last three Held Karp related posts are about the debugging of the algorithms I did implement.
22 Jun 2021  Understanding The Ascent Method
28 Jun 2021  Implementing The Held Karp Relaxation
7 Jul 2021  Finalizing Held Karp
Commits about the Held Karp relaxation
Annotations only provided if needed.
Grabbing black reformats  Initial Ascent method implementation
Working on debugging ascent method plus black reformats
Ascent method terminating, but at nonoptimal solution
minor edits  Removed some debug statements
Fixed termination condition, still given nonoptimal result
Minor bugfix, still nonoptimal result  Ensured reported answer is the cycle if multiple options
Fixed subtle bug in find_epsilon()  Fixed the improper substitute detection bug
Cleaned code and tried something which didn’t work
Black formats  Initial branch and bound implementation
Branch and bound returning optimal solution
black formatting changes  Split ascent and branch and bound methods into different functions
Performance tweaks and testing fractional answers
Asadpour output for ascent method
Removed branch and bound method. One unit test misbehaving
Added asymmetric fractional test for the ascent method
Removed printn statements and tweaked final test to be more asymmetric
Changed HK to only report on the support of the answer
spanning_tree_distribution
#
Once we have the support of the Held Karp relaxation, we calculate edge weights $\gamma$ for support so that the probability of any tree being sampled is proportional to the product of $e^{\gamma}$ across its edges. This is called a maximum entropy distribution in the Asadpour paper. This procedure was included in the Asadpour paper [1] on page 386.
 Set $\gamma = \vec{0}$.
 While there exists an edge $e$ with $q_e(\gamma) > (1 + \epsilon)z_e$:
 Compute $\delta$ such that if we define $\gamma’$ ad $\gamma_e’ = \gamma_e  \delta$ and $\gamma_f’ = \gamma_e$ for all $f \in E \backslash {e}$, then $q_e(\gamma’) = (1 + \epsilon / 2)z_e$
 Set $\gamma \leftarrow \gamma'$
 Output $\tilde{\gamma} := \gamma$.
Where $q_e(\gamma)$ is the probability that any given edge $e$ will be in a sampled spanning tree chosen with probability proportional to $\exp(\gamma(T))$. $\delta$ is also given as
$$ \delta = \frac{q_e(\gamma)(1(1+\epsilon/2)z_e)}{(1q_e(\gamma))(1+\epsilon/2)z_e} $$
so the Asadpour paper did almost all of the heavy lifting for this function. However, they were not very clear on how to calculate $q_e(\gamma)$ other than that Krichhoff’s Tree Matrix Theorem can be used.
My original method for calculating $q*e(\gamma)$ was to apply Krichhoff’s Theorem to the original laplacian matrix and the laplacian produced once the edge $e$ is contracted from the graph. Testing quickly showed that once the edge is contracted from the graph, it cannot affect the value of the laplacian and thus after subtracting $\delta$ the probability of that edge would increase rather than decrease. Multiplying my original value of $q_e(\gamma)$ by $\exp(\gamma_e)$ proved to be the solution here for reasons extensively discussed in my blog post _The Entropy Distribution* and in particular the “Update! (28 July 2021)” section.
Blog posts about spanning_tree_distribution
13 Jul 2021  Entropy Distribution Setup
20 Jul 2021  The Entropy Distribution
Commits about spanning_tree_distribution
Draft of spanning_tree_distribution
Changed HK to only report on the support of the answer  Needing to limit $\gamma$ to only the support of the Held Karp relaxation is what caused this change
Fixed contraction bug by changing to MultiGraph. Problem with prob > 1  Because the probability is only proportional to the product of the edge weights, this was not actually a problem
Black reformats  Rewrote the test and cleaned the code
Fixed pypi test error  The pypi tests do not have numpy
or scipy
and I forgot to flag the test to be skipped if they are not available
Further testing of dist fix  Fixed function to multiply $q_e(\gamma)$ by $\exp(\gamma_e)$ and implemented exception if $\delta$ ever misbehaves
Can sample spanning trees  Streamlined finding $q_e(\gamma)$ using new helper function
Review suggestions from dshult  Implemented code review suggestions from one of my mentors
Implement suggestions from boothby
sample_spanning_tree
#
What good is a spanning tree distribution if we can’t sample from it?
While the Asadpour paper [1] provides a rough outline of the sampling process, the bulk of their methodology comes from the Kulkarni paper, Generating random combinatorial objects [5]. That paper had a much more detailed explanation and even this pseudo code from page 202.
$U = \emptyset,$ $V = E$
Do $i = 1$ to $N$;
$\qquad$Let $a = n(G(U, V))$
$\qquad\qquad a’$ $= n(G(U \cup {i}, V))$
$\qquad$Generate $Z \sim U[0, 1]$
$\qquad$If $Z \leq \alpha_i \times \left(a’ / a\right)$
$\qquad\qquad$then $U = U \cup {i}$,
$\qquad\qquad$else $V = V  {i}$
$\qquad$end.
Stop. $U$ is the required spanning tree.
The only real difficulty here was tracking how the nodes were being contracted.
My first attempt was a mess of if
statements and the like, but switching it to a mergefind data structure (or disjoint set data structure) proved to be a wise decision.
Of course, it is one thing to be able to sample a spanning tree and another entirely to know if the sampling technique matches the expected distribution.
My first iteration test for sample_spanning_tree
just sampled a large number of trees (50000) and they printed the percent error from the normalized distribution of spanning tree.
With a sample size of 50000 all of the errors were under 10%, but I still wanted to find a better test.
From my AP statistics class in high school I remembered the $X^2$ (Chisquared) test and realized that it would be perfect here.
scipy
even had the ability to conduct one.
By converting to a chisquared test I was able to reduce the sample size down to 1200 (near the minimum required sample size to have a valid chisquared test) and use a proper hypothesis test at the $\alpha = 0.01$ significance level.
Unfortunately, the test would still fail 1% of the time until I added the @py_random_state
decorator to sample_spanning_tree
, and then the test can pass in a Random
object to produce repeatable results.
Blog posts about sample_spanning_tree
21 Jul 2021  Preliminaries For Sampling A Spanning Tree
28 Jul 2021  Sampling A Spanning Tree
Commits about sample_spanning_tree
Developing test for sampling spanning tree
Changed sample_spanning_tree test to Chi squared test
Adding test cases  Implemented @py_random_state
decorator
Review suggestions from dshult  Implemented code review suggestions from one of my mentors
asadpour_atsp
#
This function was the last piece of the puzzle, connecting all of the others together and producing the final result!
Implementation of this function was actually rather smooth.
The only technical difficulty I had was reading the support of the flow_dict
and the theoretical difficulties were adapting the min_cost_flow
function to solve the minimum circulation problem.
Oh, and that if the flow is greater than 1 I need to add parallel edges to the graph so that it is still eulerian.
A brief overview of the whole algorithm is given below:
 Solve the Held Karp relaxation and symmertize the result to made it undirected.
 Calculate the maximum entropy spanning tree distribution on the Held Karp support graph.
 Sample $2 \lceil \ln n \rceil$ spanning trees and record the smallest weight one before reintroducing direction to the edges.
 Find the minimum cost circulation to create an eulerian graph containing the sampled tree.
 Take the eulerian walk of that graph and shortcut the answer.
 return the shortcut answer.
Blog posts about asadpour_atsp
29 Jul 2021  Looking At The Big Picture
10 Aug 2021  Completing The Asadpour Algorithm
Commits about asadpour_atsp
untested implementation of asadpour_tsp
Fixed runtime errors in asadpour_tsp  General traveling salesman problem function assumed graph were undirected. This is not work with an atsp algorithm
black reformats  Fixed parallel edges from flow support bug
Fixed rounding error with tests
Review suggestions from dshult  Implemented code review suggestions from one of my mentors
Implemented review suggestions from rossbar
Future Involvement with NetworkX#
Overall, I really enjoyed this Summer of Code. I was able to branch out, continue to learn python and more about graphs and graph algorithms which is an area of interest for me.
Assuming that I have any amount of free time this coming fall semester, I’d love to stay involved with NetworkX. In fact, there are already some things that I have in mind even though my current code works as is.

Move
sample_spanning_tree
tomst.py
and rename it torandom_spanning_tree
. The ability to sample random spanning trees is not a part of the greater NetworkX library and could be useful to others. One of my mentors mentioned it being relevant to Steiner trees and if I can help other developers and users out, I will. 
Adapt
sample_spanning_tree
so that it can use both additive and multiplicative weight functions. The Asadpour algorithm only needs the multiplicative weight, but the Kulkarni paper [5] does talk about using an additive weight function which may be more useful to other NetworkX users. 
Move my Krichhoff’s Tree Matrix Theorem helper function to
laplacian_matrix.py
so that other NetworkX users can access it. 
Investigate the following article about the Held Karp relaxation. While I have no definite evidence for this one, I do believe that the Held Karp relaxation is the slowest part of my implementation of the Asadpour algorithm and thus is the best place for improving it. The ascent method I am using comes from the original Held and Karp paper [3], but they did release a part II which may have better algorithms in it. The citation is given below.
M. Held, R.M. Karp, The travelingsalesman problem and minimum spanning trees: Part II. Mathematical Programming, 1971, 1(1), p. 6–25. https://doi.org/10.1007/BF01584070

Refactor the
Edmonds
class inbranchings.py
. That class is the implementation for Edmonds’ branching algorithm but uses an iterative approach rather than the recursive one discussed in Edmonds’ paper [2]. I did also agree to work with another person, lkora to help rework this class and possible add aminimum_maximal_branching
function to find the minimum branching which still connects as many nodes as possible. This would be analogous to a spanning forest in an undirected graph. At the moment, neither of us have had time to start such work. For more information please reference issue #4836.
While there are areas of this problem which I can improve upon, it is important for me to remember that this project was still a complete success. NetworkX now has an algorithm to approximate the traveling salesman problem in asymmetric or directed graphs.