This post includes all the major updates since the last post about VF2++. Each section is dedicated to a different sub-problem and presents the progress on it so far. General progress, milestones and related issues can be found here.
Node ordering#
The node ordering is one major modification that VF2++ proposes. Basically, the nodes are examined in an order that makes the matching faster by first examining nodes that are more likely to match. This part of the algorithm has been implemented, however there is an issue. The existence of detached nodes (not connected to the rest of the graph) causes the code to crash. Fixing this bug will be a top priority during the next steps. The ordering implementation is described by the following pseudocode.
Matching Order
- Set
. - Set
: nodes not in order yet - while
not empty do
nodes from with the rarest labels BFSTree with as root - for every level in
do
nodes of the level - Output
: the matching order of the nodes.
Process Level
- while
not empty do
nodes from with the most neighbors in M node from with the rarest label - Append m to M
and #
According to the VF2++ paper notation:
where
The following figure is meant to provide some visual explanation of what exactly
The blue nodes 1,2,3 are nodes from graph G1 and the green nodes A,B,C belong to the graph G2. The grey lines connecting
those two indicate that in this current state, node 1 is mapped to node A, node 2 is mapped to node B, etc. The yellow
edges are just the neighbors of the covered (mapped) nodes. Here,
Regarding the computation of these sets, it’s not practical to use the brute force method and iterate over all nodes in
every step of the algorithm to find the desired nodes and compute
is empty in the beginning, since there are no mapped nodes ( ) and therefore no neighbors of mapped nodes. initially contains all the nodes from graph which can be realized directly from the notation if we consider both and empty sets.- Every step of the algorithm either adds one node
to the mapping or pops one from it.
We can conclude that in every step,
The above graph shows the difference in performance between using the exhaustive brute force and incrementally updating
def compute_Ti(G1, G2, mapping, reverse_mapping):
T1 = {nbr for node in mapping for nbr in G1[node] if nbr not in mapping}
T2 = {
nbr
for node in reverse_mapping
for nbr in G2[node]
if nbr not in reverse_mapping
}
T1_out = {n1 for n1 in G1.nodes() if n1 not in mapping and n1 not in T1}
T2_out = {n2 for n2 in G2.nodes() if n2 not in reverse_mapping and n2 not in T2}
return T1, T2, T1_out, T2_out
If we assume that G1 and G2 have the same number of nodes (N), the average number of nodes in the mapping is
in which we have excluded the lookup times in
def update_Tinout(
G1, G2, T1, T2, T1_out, T2_out, new_node1, new_node2, mapping, reverse_mapping
):
# This function should be called right after the feasibility is established and node1 is mapped to node2.
uncovered_neighbors_G1 = {nbr for nbr in G1[new_node1] if nbr not in mapping}
uncovered_neighbors_G2 = {
nbr for nbr in G2[new_node2] if nbr not in reverse_mapping
}
# Add the uncovered neighbors of node1 and node2 in T1 and T2 respectively
T1.discard(new_node1)
T2.discard(new_node2)
T1 = T1.union(uncovered_neighbors_G1)
T2 = T2.union(uncovered_neighbors_G2)
# todo: maybe check this twice just to make sure
T1_out.discard(new_node1)
T2_out.discard(new_node2)
T1_out = T1_out - uncovered_neighbors_G1
T2_out = T2_out - uncovered_neighbors_G2
return T1, T2, T1_out, T2_out
which based on the previous notation, is:
where
Certainly, the complexity is much better in this
case, as
In this post we investigated how node ordering works at a high level, and also how we are able to calculate some important parameters so that the space and time complexity are reduced. The next post will continue with examining two more significant components of the VF2++ algorithm: the candidate node pair selection and the cutting/consistency rules that decide when the mapping should or shouldn’t be extended. Stay tuned!