The last and final post discussing the **VF2++ helpers** can be found here.
Now that we’ve figured out how to solve all the sub-problems that **VF2++** consists of, we are ready to combine our
implemented functionalities to create the final solver for the **Graph Isomorphism** problem.

We should quickly review the individual functionalities used in the VF2++ algorithm:

**Node ordering**which finds the optimal order to access the nodes, such that those that are more likely to match are placed first in the order. This reduces the possibility of infeasible searches taking place first.**Candidate selection**such that, given a node $u$ from $G_1$, we obtain the candidate nodes $v$ from $G_2$.**Feasibility rules**introducing easy-to-check cutting and consistency conditions which, if satisfied by a candidate pair of nodes $u$ from $G_1$ and $v$ from $G_2$, the mapping is extended.**$T_i$ updating**which updates the $T_i$ and $\tilde{T}_i$, $i=1,2$ parameters in case that a new pair is added to the mapping, and restores them when a pair is popped from it.

We are going to use all these functionalities to form our **Isomorphism solver**.

First of all, let’s describe the algorithm in simple terms, before presenting the pseudocode. The algorithm will look something like this:

- Check if all
**preconditions**are satisfied before calling the actual solver. For example there’s no point examining two graphs with different number of nodes for isomorphism. - Initialize all the necessary
**parameters**($T_i$, $\tilde{T}_i$, $i=1,2$) and maybe cache some information that is going to be used later. - Take the next unexamined node $u$ from the ordering.
- Find its candidates and check if there’s a candidate $v$ such that the pair $u-v$ satisfies the
**feasibility rules** - if there’s any, extend the mapping and
**go to 3**. - if not, pop the last pair $\hat{u}-\hat{v}$ from the mapping and try a different candidate $\hat{v}$, from the remaining candidates of $\hat{u}$
- The two graphs are
**isomorphic**if the number of**mapped nodes**equals the number of nodes of the two graphs. - The two graphs are
**not isomorphic**if there are no remaining candidates for the first node of the ordering (root).

The official code for the **VF2++** is presented below.

```
# Check if there's a graph with no nodes in it
if G1.number_of_nodes() == 0 or G2.number_of_nodes() == 0:
return False
# Check that both graphs have the same number of nodes and degree sequence
if not nx.faster_could_be_isomorphic(G1, G2):
return False
# Initialize parameters (Ti/Ti_tilde, i=1,2) and cache necessary information about degree and labels
graph_params, state_params = _initialize_parameters(G1, G2, node_labels, default_label)
# Check if G1 and G2 have the same labels, and that number of nodes per label is equal between the two graphs
if not _precheck_label_properties(graph_params):
return False
# Calculate the optimal node ordering
node_order = _matching_order(graph_params)
# Initialize the stack to contain node-candidates pairs
stack = []
candidates = iter(_find_candidates(node_order[0], graph_params, state_params))
stack.append((node_order[0], candidates))
mapping = state_params.mapping
reverse_mapping = state_params.reverse_mapping
# Index of the node from the order, currently being examined
matching_node = 1
while stack:
current_node, candidate_nodes = stack[-1]
try:
candidate = next(candidate_nodes)
except StopIteration:
# If no remaining candidates, return to a previous state, and follow another branch
stack.pop()
matching_node -= 1
if stack:
# Pop the previously added u-v pair, and look for a different candidate _v for u
popped_node1, _ = stack[-1]
popped_node2 = mapping[popped_node1]
mapping.pop(popped_node1)
reverse_mapping.pop(popped_node2)
_restore_Tinout(popped_node1, popped_node2, graph_params, state_params)
continue
if _feasibility(current_node, candidate, graph_params, state_params):
# Terminate if mapping is extended to its full
if len(mapping) == G2.number_of_nodes() - 1:
cp_mapping = mapping.copy()
cp_mapping[current_node] = candidate
yield cp_mapping
continue
# Feasibility rules pass, so extend the mapping and update the parameters
mapping[current_node] = candidate
reverse_mapping[candidate] = current_node
_update_Tinout(current_node, candidate, graph_params, state_params)
# Append the next node and its candidates to the stack
candidates = iter(
_find_candidates(node_order[matching_node], graph_params, state_params)
)
stack.append((node_order[matching_node], candidates))
matching_node += 1
```

This section is dedicated to the performance comparison between **VF2** and **VF2++**. The comparison was performed in
**random graphs** without labels, for number of nodes anywhere between the range $(100-2000)$. The results are depicted
in the two following diagrams.

We notice that the maximum speedup achieved is **14x**, and continues to increase as the number of nodes increase.
It is also highly prominent that the increase in number of nodes, doesn’t seem to affect the performance of **VF2++** to
a significant extent, when compared to the drastic impact on the performance of **VF2**. Our results are almost identical
to those presented in the original **VF2++ paper**, verifying the theoretical analysis and premises of the literature.

The achieved boost is due to some key improvements and optimizations, specifically:

**Optimal node ordering**, which avoids following unfruitful branches that will result in infeasible states. We make sure that the nodes that have the biggest possibility to match are accessed first.**Implementation in a non-recursive manner**, avoiding Python’s maximum recursion limit while also reducing function call overhead.**Caching**of both node degrees and nodes per degree in the beginning, so that we don’t have to access those features in every degree check. For example, instead of doing

```
res = []
for node in G2.nodes():
if G1.degree[u] == G2.degree[node]:
res.append(node)
# do stuff with res ...
```

to get the nodes of same degree as u (which happens a lot of times in the implementation), we just do:

```
res = G2_nodes_of_degree[G1.degree[u]]
# do stuff with res ...
```

where “G2_nodes_of_degree” stores set of nodes for a given degree. The same is done with node labels.

**Extra shrinking of the candidate set for each node**by adding more checks in the candidate selection method and removing some from the feasibility checks. In simple terms, instead of checking a lot of conditions on a larger set of candidates, we check fewer conditions but on a more targeted and significantly smaller set of candidates. For example, in this code:

```
candidates = set(G2.nodes())
for candidate in candidates:
if feasibility(u, candidate):
do_stuff()
```

we take a huge set of candidates, which results in poor performance due to maximizing calls of “feasibility”, thus performing the feasibility checks in a very large set. Now compare that to the following alternative:

```
candidates = [
n
for n in G2_nodes_of_degree[G1.degree[u]].intersection(
G2_nodes_of_label[G1_labels[u]]
)
]
for candidate in candidates:
if feasibility(u, candidate):
do_stuff()
```

Immediately we have drastically reduced the number of checks performed and calls to the function, as now we only apply them to nodes of the same degree and label as $u$. This is a simplification for demonstration purposes. In the actual implementation there are more checks and extra shrinking of the candidate set.

Let’s demonstrate our **VF2++** solver on a real graph. We are going to use the graph from the Graph Isomorphism wikipedia.

Let’s start by constructing the graphs from the image above. We’ll call
the graph on the left `G`

and the graph on the left `H`

:

```
import networkx as nx
G = nx.Graph(
[
("a", "g"),
("a", "h"),
("a", "i"),
("g", "b"),
("g", "c"),
("b", "h"),
("b", "j"),
("h", "d"),
("c", "i"),
("c", "j"),
("i", "d"),
("d", "j"),
]
)
H = nx.Graph(
[
(1, 2),
(1, 5),
(1, 4),
(2, 6),
(2, 3),
(3, 7),
(3, 4),
(4, 8),
(5, 6),
(5, 8),
(6, 7),
(7, 8),
]
)
```

```
res = nx.vf2pp_is_isomorphic(G, H, node_label=None)
# res: True
res = nx.vf2pp_isomorphism(G, H, node_label=None)
# res: {1: "a", 2: "h", 3: "d", 4: "i", 5: "g", 6: "b", 7: "j", 8: "c"}
res = list(nx.vf2pp_all_isomorphisms(G, H, node_label=None))
# res: all isomorphic mappings (there might be more than one). This function is a generator.
```

```
# Assign some label to each node
G_node_attributes = {
"a": "blue",
"g": "green",
"b": "pink",
"h": "red",
"c": "yellow",
"i": "orange",
"d": "cyan",
"j": "purple",
}
nx.set_node_attributes(G, G_node_attributes, name="color")
H_node_attributes = {
1: "blue",
2: "red",
3: "cyan",
4: "orange",
5: "green",
6: "pink",
7: "purple",
8: "yellow",
}
nx.set_node_attributes(H, H_node_attributes, name="color")
res = nx.vf2pp_is_isomorphic(G, H, node_label="color")
# res: True
res = nx.vf2pp_isomorphism(G, H, node_label="color")
# res: {1: "a", 2: "h", 3: "d", 4: "i", 5: "g", 6: "b", 7: "j", 8: "c"}
res = list(nx.vf2pp_all_isomorphisms(G, H, node_label="color"))
# res: {1: "a", 2: "h", 3: "d", 4: "i", 5: "g", 6: "b", 7: "j", 8: "c"}
```

Notice how in the first case, our solver may return a different mapping every time, since the absence of labels results in nodes that can map to more than one others. For example, node 1 can map to both a and h, since the graph is symmetrical.
On the second case though, the existence of a single, unique label per node imposes that there’s only one match for each node, so the mapping returned is deterministic. This is easily observed from
output of `list(nx.vf2pp_all_isomorphisms)`

which, in the first case, returns all possible mappings while in the latter, returns a single, unique isomorphic mapping.

The previous post can be found here, be sure to check it out so you
can
follow the process step by step. Since then, another two very significant features of the algorithm have been
implemented and tested: **node pair candidate selection** and **feasibility checks**.

As previously described, in the ISO problem we are basically trying to create a **mapping** such that, every node
from the first graph is matched to a node from the second graph. This searching for “feasible pairs” can be visualized
by a tree, where each node is the candidate pair that we should examine. This can become much clearer if we take a look
at the below figure.

In order to check if the graphs $G_1$, $G_2$ are isomorphic, we check every candidate pair of nodes and if it is feasible, we extend the mapping and go deeper into the tree of pairs. If it’s not feasible, we climb up and follow a different branch, until every node in $G_1$ is mapped to a node $G_2$. In our example, we start by examining node 0 from G1, with node 0 of G2. After some checks (details below), we decide that the nodes 0 and 0 are matching, so we go deeper to map the remaining nodes. The next pair is 1-3, which fails the feasibility check, so we have to examine a different branch as shown. The new branch is 1-2, which is feasible, so we continue on using the same logic until all the nodes are mapped.

Although in our example we use a random candidate pair of nodes, in the actual implementation we are able to target
specific pairs that are more likely to be matched, hence boost the performance of the algorithm. The idea is that, in
every step of the algorithm, **given a candidate**

$$u\in V_1$$

**we compute the candidates**

$$v\in V_2$$

where $V_1$ and $V_2$ are the nodes of $G_1$ and $G_2$ respectively. Now this is a puzzle that does not require a lot of
specific knowledge on graphs or the algorithm itself. Keep up with me, and you will realize it yourself. First, let $M$
be the mapping so far, which includes all the “covered nodes” until this point. There are actually **three** different
types of $u$ nodes that we might encounter.

Node $u$ has no neighbors (degree of $u$ equals to zero). It would be redundant to test as candidates for $u$, nodes from $G_2$ that have more than zero neighbors. That said, we eliminate most of the possible candidates and keep those that have the same degree as $u$ (in this case, zero). Pretty easy right?

Node $u$ has neighbors, but none of them belong to the mapping. This situation is illustrated in the following figure.

The grey lines indicate that the nodes of $G_1$ (left 1,2) are mapped to the nodes of $G_2$ (right 1,2). They are basically the mapping. Again, given $u$, we make the observation that candidates $v$ of u, should also have no neighbors in the mapping, and also have the same degree as $u$ (as in the figure). Notice how if we add a neighbor to $v$, or if we place one of its neighbors inside the mapping, there is no point examining the pair $u-v$ for matching.

Node $u$ has neighbors and some of them belong to the mapping. This scenario is also depicted in the below figure.

In this case, to obtain the candidates for $u$, we must look into the neighborhoods of nodes from $G_2$, which map back to the covered neighbors of $u$. In our example, $u$ has one covered neighbor (1), and 1 from $G_1$ maps to 1 from $G_2$, which has $v$ as neighbor. Also, for v to be considered as candidate, it should have the same degree as $u$, obviously. Notice how every node that is not in the neighborhood of 1 (in $G_2$) cannot be matched to $u$ without breaking the isomorphism.

Let’s assume that given a node $u$, we obtained its candidate $v$ following the process described in the previous section.
At this point, the **Feasibility Rules** are going to determine whether the mapping should be extended by the pair $u-v$
or if we should try another candidate. The **feasibility** of a pair $u-v$ is examined by **consistency** and
**cutting** checks.

At, first I am going to present the mathematical expression of the consistency check. It may seem complicated at first, but it’s going to be made simple by using a visual illustration. Using the notation $nbh_i(u)$ for the neighborhood of u in graph $G_i$, the consistency rule is:

$$\forall\tilde{v}\in nbh_2(v)\cap M:(u, M^{-1}(\tilde{v}))\in E_1) \wedge \forall\tilde{u}\in nbh_1(u)\cap M:(u, M(\tilde{u}))\in E_2)$$

We are going to use the following simple figure to demystify the above equation.

The mapping is depicted as grey lines between the nodes that are already mapped, meaning that 1 maps to A and 2 to B. What is implied by the equation is that, for two nodes $u$ and $v$ to pass the consistency check, the neighbors of $u$ that belong in the mapping, should map to neighbors of $v$ (and backwards). This could be checked by code as simple as:

```
for neighbor in G1[u]:
if neighbor in mapping:
if mapping[neighbor] not in G2[v]:
return False
elif G1.number_of_edges(u, neighbor) != G2.number_of_edges(
v, mapping[neighbor]
):
return False
```

where the final two lines also check the number of edges between node $u$ and its neighbor $\tilde{u}$, which should be the same as those between $v$ and its neighbor which $\tilde{u}$ maps to. At a very high level, we could describe this check as a 1-look-ahead check.

We have previously discussed what $T_i$ and $\tilde{T_i}$ represent (see previous post). These sets are used in the cutting checks as follows: the number of neighbors of $u$ that belong to $T_1$, should be equal to the number of neighbors of $v$ that belong to $T_2$. Take a moment to observe the below figure.

Once again, node 1 maps to A and 2 to B. The red nodes (4,5,6) are basically $T_1$ and the yellow ones (C,D,E) are $T_2$. Notice that in order for $u-v$ to be feasible, $u$ should have the same number of neighbors, inside $T_1$, as $v$ in $T_2$. In every other case, the two graphs are not isomorphic, which can be verified visually. For this example, both nodes have 2 of their neighbors (4,6 and C,E) in $T_1$ and $T_2$ respectively. Careful! If we delete the $V-E$ edge and connect $V$ to $D$, the cutting condition is still satisfied. However, the feasibility is going to fail, by the consistency checks of the previous section. A simple code to apply the cutting check would be:

```
if len(T1.intersection(G1[u])) != len(T2.intersection(G2[v])) or len(
T1out.intersection(G1[u])
) != len(T2out.intersection(G2[v])):
return False
```

where `T1out`

and `T2out`

correspond to $\tilde{T_1}$ and $\tilde{T_2}$ respectively. And yes, we have to check for
those as well, however we skipped them in the above explanation for simplicity.

At this point, we have successfully implemented and tested all the major components of the algorithm **VF2++**,

**Node Ordering****$T_i/\tilde{T_i}$ Updating****Feasibility Rules****Candidate Selection**

This means that, in the next post, hopefully, we are going to discuss our first, full and functional implementation of
**VF2++**.

This post includes all the major updates since the last post about VF2++. Each section is dedicated to a different sub-problem and presents the progress on it so far. General progress, milestones and related issues can be found here.

The node ordering is one major modification that **VF2++** proposes. Basically, the nodes are examined in an order that
makes the matching faster by first examining nodes that are more likely to match. This part of the algorithm has been
implemented, however there is an issue. The existence of detached nodes (not connected to the rest of the graph) causes
the code to crash. Fixing this bug will be a top priority during the next steps. The ordering implementation is described
by the following pseudocode.

Matching Order

Set$M = \varnothing$.Set$\bar{V1}$ : nodes not in order yetwhile$\bar{V1}$ not emptydo

- $rareNodes=[$nodes from $V_1$ with the rarest labels$]$
- $maxNode=argmax_{degree}(rareNodes)$
- $T=$ BFSTree with $maxNode$ as root
forevery level in $T$do

- $V_d=[$nodes of the $d^{th}$ level$]$
- $\bar{V_1} \setminus V_d$
- $ProcessLevel(V_d)$
- Output $M$: the matching order of the nodes.

Process Level

while$V_d$ not emptydo

- $S=[$nodes from $V_d$ with the most neighbors in M$]$
- $maxNodes=argmax_{degree}(S)$
- $rarestNode=[$node from $maxNodes$ with the rarest label$]$
- $V_d \setminus m$
- Append m to M

According to the VF2++ paper notation:

$$T_1=(u\in V_1 \setminus m: \exists \tilde{u} \in m: (u,\tilde{u}\in E_1))$$

where $V_1$ and $E_1$ contain all the nodes and edges of the first graph respectively, and $m$ is a dictionary, mapping every node of the first graph to a node of the second graph. Now if we interpet the above equation, we conclude that $T_1$ contains uncovered neighbors of covered nodes. In simple terms, it includes all the nodes that do not belong in the mapping $m$ yet, but are neighbors of nodes that are in the mapping. In addition,

$$\tilde{T_1}=(V_1 \setminus m \setminus T_1)$$

The following figure is meant to provide some visual explanation of what exactly $T_i$ is.

The blue nodes 1,2,3 are nodes from graph G1 and the green nodes A,B,C belong to the graph G2. The grey lines connecting those two indicate that in this current state, node 1 is mapped to node A, node 2 is mapped to node B, etc. The yellow edges are just the neighbors of the covered (mapped) nodes. Here, $T_1$ contains the red nodes (4,5,6) which are neighbors of the covered nodes 1,2,3, and $T_2$ contains the grey ones (D,E,F). None of the nodes depicted would be included in $\tilde{T_1}$ or $\tilde{T_2}$. The latter sets would contain all the remaining nodes from the two graphs.

Regarding the computation of these sets, it’s not practical to use the brute force method and iterate over all nodes in every step of the algorithm to find the desired nodes and compute $T_i$ and $\tilde{T_i}$. We use the following observations to implement an incremental computation of $T_i$ and $\tilde{T_i}$ and make VF2++ more efficient.

- $T_i$ is empty in the beggining, since there are no mapped nodes ($m=\varnothing$) and therefore no neighbors of mapped nodes.
- $\tilde{T_i}$ initially contains all the nodes from graph $G_i, i=1,2$ which can be realized directly from the notation if we consider both $m$ and $T_1$ empty sets.
- Every step of the algorithm either adds one node $u$ to the mapping or pops one from it.

We can conclude that in every step, $T_i$ and $\tilde{T_i}$ can be incrementally updated. This method avoids a ton of redundant operations and results in significant performance improvement.

The above graph shows the difference in performance between using the exhaustive brute force and incrementally updating $T_i$ and $\tilde{T_i}$. The graph used to obtain these measurements was a regular GNP Graph with a probability for an edge equal to $0.7$. It can clearly be seen that execution time of the brute force method increases much more rapidly with the number of nodes/edges than the incremental update method, as expected. The brute force method looks like this:

```
def compute_Ti(G1, G2, mapping, reverse_mapping):
T1 = {nbr for node in mapping for nbr in G1[node] if nbr not in mapping}
T2 = {
nbr
for node in reverse_mapping
for nbr in G2[node]
if nbr not in reverse_mapping
}
T1_out = {n1 for n1 in G1.nodes() if n1 not in mapping and n1 not in T1}
T2_out = {n2 for n2 in G2.nodes() if n2 not in reverse_mapping and n2 not in T2}
return T1, T2, T1_out, T2_out
```

If we assume that G1 and G2 have the same number of nodes (N), the average number of nodes in the mapping is $N_m$, and the average node degree of the graphs is $D$, then the time complexity of this function is:

$$O(2N_mD + 2N) = O(N_mD + N)$$

in which we have excluded the lookup times in $T_i$, $mapping$ and $reverse\_mapping$ as they are all $O(1)$. Our incremental method works like this:

```
def update_Tinout(
G1, G2, T1, T2, T1_out, T2_out, new_node1, new_node2, mapping, reverse_mapping
):
# This function should be called right after the feasibility is established and node1 is mapped to node2.
uncovered_neighbors_G1 = {nbr for nbr in G1[new_node1] if nbr not in mapping}
uncovered_neighbors_G2 = {
nbr for nbr in G2[new_node2] if nbr not in reverse_mapping
}
# Add the uncovered neighbors of node1 and node2 in T1 and T2 respectively
T1.discard(new_node1)
T2.discard(new_node2)
T1 = T1.union(uncovered_neighbors_G1)
T2 = T2.union(uncovered_neighbors_G2)
# todo: maybe check this twice just to make sure
T1_out.discard(new_node1)
T2_out.discard(new_node2)
T1_out = T1_out - uncovered_neighbors_G1
T2_out = T2_out - uncovered_neighbors_G2
return T1, T2, T1_out, T2_out
```

which based on the previous notation, is:

$$O(2D + 2(D + M_{T_1}) + 2D) = O(D + M_{T_1})$$

where $M_{T_1}$ is the expected (average) number of elements in $T_1$.

Certainly, the complexity is much better in this case, as $D$ and $M_{T_1}$ are significantly smaller than $N_mD$ and $N$.

In this post we investigated how node ordering works at a high level, and also
how we are able to calculate some important parameters so that the space and
time complexity are reduced.
The next post will continue with examining two more significant components of
the VF2++ algorith: the **candidate node pair selection** and the
**cutting/consistency** rules that decide when the mapping should or shouldn’t
be extended.
Stay tuned!

I got accepted as a **GSoC** contributor, and I am so excited to spend the summer working on such an incredibly
interesting project. The mentors are very welcoming, communicative, fun to be around, and I really look forward to
collaborating with them. My application for GSoC 2022 can
be found here.

My name is Konstantinos Petridis, and I am an **Electrical Engineering** student at the Aristotle University of
Thessaloniki. I am currently on my 5th year of studies, with a **Major in Electronics & Computer Science**. Although a
wide range of scientific fields fascinate me, I have a strong passion for **Computer Science**, **Physics** and
**Space**. I love to study, learn new things and don’t hesitate to express my curiosity by asking a bunch of questions
to the point of being annoying. You can find me on GitHub @kpetridis24.

The project I’ll be working on, is the implementation of **VF2++**, a state-of-the-art algorithm used for the
**Graph Isomorphism** problem, which lies in the
complexity class **NP**.
The functionality of the algorithm is similar to a regular, but
more complex form of a
**DFS**, but performed on the possible solutions rather than the
graph nodes. In order to verify/reject the isomorphism between two graphs, we examine every possible candidate pair of
nodes
(one from the first and one from the second graph) and check whether going deeper into the DFS tree is feasible using
specific rules. In case of feasibility establishment, the DFS tree is expanded, investigating deeper pairs. When one
pair is not feasible, we go up the tree and follow a different branch, just like in a regular **DFS**. More details
about the algorithm can be found here.

The major reasons I chose this project emanate from both my love for **Graph Theory**, and the fascinating nature of
this individual project. The algorithm itself is so recent, that **NetworkX** is possibly going to hold one of the first
implementations of it. This might become a reference that is going to help to further develop and optimize future
implementations of the algorithm by other organisations. Regarding my personal gain, I will become more familiar with
the open source communities and their philosophy, I will collaborate with highly skilled individuals and cultivate a
significant amount of experience on researching, working as a team, getting feedback and help when needed, contributing
to an actual scientific library.