[Tutorial] Tree Isomorphism CSES

#	User	Rating
1	ecnerwala	3649
2	Benq	3581
3	orzdevinwang	3570
4	Geothermal	3569
4	cnnfls_csy	3569
6	tourist	3565
7	maroonrk	3531
8	Radewoosh	3521
9	Um_nik	3482
10	jiangly	3468

#	User	Contrib.
1	maomao90	174
2	awoo	164
3	adamant	161
4	TheScrasse	159
5	nor	158
6	maroonrk	156
7	-is-this-fft-	152
8	SecondThread	147
9	orz	146
10	pajenegod	145

Introduction

I have no school right now, and there's no tutorials on Tree Isomorphism on Codeforces, so I decided to write one :).

What is a tree isomorphism?

Maybe you recognize the word isomorphic: in the context of abstract algebra when two objects are effectively the same, but are labelled differently. We can extend this to trees as well: two trees are isomorphic if they are the same, but may have different node labels. For example,

are isomorphic because we can relabel the nodes. If you want a more rigorous definition, then two trees $$$T_1 = (V_1, E_1)$$$ and $$$T_2 = (V_2, E_2)$$$ are isomorphic iff there exists a $$$\phi$$$ for which $$$\phi(E_1) = E_2$$$ and a $$$\phi^{-1}$$$ for which $$$\phi^{-1} (E_2) = E_1$$$.

What is the problem?

We want to determine if two unrooted trees are isomorphic in $$$\mathcal{O}(N \log N)$$$ time.

If we can solve the problem for rooted trees, can we solve it for unrooted trees?

Yes. We can transform our original problem of checking if two trees are isomorphic into checking if two rooted trees are isomorphic easily. How? We can find the centroids of our first tree and our centroids of our second tree, and then root them at their centroids. So now, we have transformed our unrooted case into a rooted case.

Heuristics

You can skip this, if you'd like. I'm just going to list out some ideas for tree isomorphism checking. All of the following ideas won't solve the problem, but some of them work decently well.

Idea 1: Find the degrees of all nodes and check if $$$\text{oc}_{T1}[i] = \text{oc}_{T2}[i]$$$, where $$$\text{oc}_{T}[i]$$$ is the number of vertices of degree $$$i$$$. If $$$\text{oc}_{T1}[i] = \text{oc}_{T2}[i] \forall i$$$ does not hold, then we know that the trees are not isomorphic.

Idea 2: Root the trees at their centroids. $$$\text{oc}_{T1}[i] = \text{oc}_{T2}[i]$$$, where $$$\text{oc}_{T}[i]$$$ is the number of vertices with subtree size $$$i$$$. If $$$\text{oc}_{T1}[i] = \text{oc}_{T2}[i] \forall i$$$ does not hold, then we know that the trees are not isomorphic.

Idea 3: If the diameters of the tree are not the same, then they cannot be isomorphic.

Idea 4: We know that for fixed $$$k$$$, the number of paths of length $$$k$$$ must be the same in both trees.

If we merge all of the ideas together, we will get a pretty good heuristic and will be able to identify with pretty good accuracy if a tree is isomorphic to another tree. But that's not good enough.

The idea

The idea is to parenthesize our tree. We can recursively, say that:

$$$val[v] = "(" + \sum_{\text{children c} \in v} val[c] + ")",$$$

where $$$+$$$ denotes string concatenation. But obviously, the string parenthesization for rooted trees do not always yield the same result, even for isomorphic trees:

So what can we do? It turns out, we can just "order" the parenthesization. That is, concactenate in increasing or decreasing order of the string parenthesization.

$$$val[v] = "(" + \sum_{\text{children c} \in v, \text{in increasing order of val[c]}} val[c] + ")",$$$

How would it look like if we did that?

So, in short, the idea is to parenthesize the tree and then concactenate in some fixed order (one easy way is in increasing order of the strings).

But that's $$$\mathcal{O}(N^2)$$$, since we're dealing with a lot of potentially large strings. No problem! We can replace each opening parenthesis with a $$$1$$$ and each closing parenthesis with a 0. For example, $$$(()())$$$ would become $$$110100$$$. We can then convert the binary string to a number, by using it's binary representation. In the case that the number is too large, we can take it modulo some large prime $$$p$$$.

The end.

Code

#include <vector>
#include <algorithm>
#include <iostream>
#include <set>
#include <cmath>
#include <map>
#include <random>
#include <cassert>
#include <ctime>
#include <cstdlib>
#include <limits.h>
 
using namespace std;
const int MOD = 1e9 + 9;
 
class Tree {
public:
    vector<vector<int>> adj;
    vector<int> centroid;
    vector<int> sub;
    vector<int> id;
    vector<int64_t> powr;
 
    void dfs(int curNode, int prevNode) {
        sub[curNode] = 1;
        bool is_centroid = true;
        vector<pair<int, int>> nodes;
        for (int v: adj[curNode]) {
            if (v != prevNode) {
                dfs(v, curNode);
                sub[curNode] += sub[v];
                if (sub[v] > (int) adj.size() / 2) {
                    is_centroid = false;
                }
                nodes.emplace_back(id[v], v);
            }
        }
        sort(nodes.begin(), nodes.end());
        id[curNode] = 1;
        for (auto& p: nodes) {
            id[curNode] = ((powr[sub[p.second] + 1] * id[curNode]) % MOD + id[p.second]) % MOD;
        }
        id[curNode] *= 2;
        id[curNode] %= MOD;
        if ((int) adj.size() - sub[curNode] > (int) adj.size() / 2) {
            is_centroid = false;
        }
        if (is_centroid) {
            centroid.push_back(curNode);
        }
    }
 
    vector<int> Centroid() {
        dfs(0, -1);
        return centroid;
    }
 
    bool isIsomorphic(int root1, Tree t2, int root2) {
        dfs(root1, root1);
        t2.dfs(root2, root2);
        sort(sub.begin(), sub.end());
        sort(t2.sub.begin(), t2.sub.end());
        return (id[root1] == t2.id[root2]);
    }
 
    void add_edge(int u, int v) {
        adj[u].push_back(v), adj[v].push_back(u);
    }
 
    Tree(int n) {
        adj.resize(n);
        sub.resize(n);
        id.resize(n);
        powr.resize(n + 1);
        powr[0] = 1;
        for (int i = 1; i <= n; i++) {
            powr[i] = 2 * powr[i - 1];
            powr[i] %= MOD;
        }
    }
};
 
int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);
    int t;
    cin >> t;
    while (t--) {
        int N;
        cin >> N;
        Tree t1(N), t2(N);
        for (int i = 0; i < N - 1; i++) {
            int u, v;
            cin >> u >> v;
            u--, v--;
            t1.add_edge(u, v);
        }
        for (int i = 0; i < N - 1; i++) {
            int u, v;
            cin >> u >> v;
            u--, v--;
            t2.add_edge(u, v);
        }
        vector<int> c1 = t1.Centroid();
        vector<int> c2 = t2.Centroid();
        bool done = false;
        for (int i: c1) {
            for (int j: c2) {
                if (!done && t1.isIsomorphic(i, t2, j)) {
                    cout << "YES\n";
                    done = true;
                    break;
                }
            }
        }
        if (!done) {
            cout << "NO\n";
        }
    }
}

URL to submit

map<vector<int>, int> hasher; int hashify(vector<int> x) { ranges::sort(x); if(!hasher[x]) { hasher[x] = hasher.size(); } return hasher[x]; } int hash(int v) { // get a "hash" of v's subtree vector<int> children; for(int u: g[v]) { children.push_back(hash(u)); } return hashify(children); }

Comments (9)

Write comment?

nor

2 years ago, # |

← Rev. 3 →

+35

There is a way to avoid using hashing too. Compress the corresponding parentheses strings into unique integers, and check for each level separately. Note that there can be a total of $$$O(n)$$$ unique integers, so if you use radix sort, you can do this in $$$O(n)$$$. This is a good reference imo.

Edit: This is an implementation of this idea, hopefully it's clearer now.

→ Reply

Olympia

2 years ago, # ^ |

+17

Interesting!

SleepyShashwat

+10

norz

adamant

← Rev. 2 →

You don't need a notion of bracket sequences to check rooted trees for isomorphism.

You can do it like this — let's assign each rooted tree a unique number. Knowing the assigned number for a rooted tree is the same as knowing a multiset of assigned numbers of the subtrees of its root children.

If we've already seen the multiset before, we extract its number from a map, otherwise we assign it the first unassigned number. It is implemented like this:

It is $$$O(n \log n)$$$ rather than $$$n$$$, but I guess the radix sort idea above could still apply?

Another problem to test rooted tree isomorphism algorithms: 102354J - Tree Automorphisms.

P. S. If you don't sort vector x in hashify, you'll still get a meaningful compression — with this approach you get to check if two rooted trees are isomorphic as plane trees.

+11

My implementation in particular doesn't use any bracket sequence (rather, it only relies on the fact that for a given level, we assign each unique subtree with root at that level a unique integer; uniqueness of subtrees is defined up to isomorphism). My initial idea was similar to yours: map each vector to a unique integer; however, it is more efficient to only care about the current level and do a radix-sort style compression for assigning vertex representations to the current level.

Sacharlemagne

8 months ago, # ^ |

[deleted comment]

dkg7888

There is a mistake in the 3rd Right sided Image, you forgot to count one node . BTW Nice Editorial , and thanks I didnt knew anything about this.

VLamarca

This tree isomorphism problem appeared in Latin America ICPC Regionals a week ago. I am the author of it, I thought this post would help many teams to solve it, but sadly no ACs during contest. The problem is not as straight forward as checking if 2 trees are isomorphic though.

tfg

+21

The problem is pretty straightforward, it's just a huge implementation problem in a contest with too many problems.

Olympia's blog

Introduction

Heuristics

The idea