Hello everyone,
finding the diameter is one of the most frequent ways to solve problems about trees. In this tutorial we will see how to find a diameter and some of its properties, and we will use them to solve some problems of increasing difficulty. I tried to put a lot of examples to make the understanding easier.
The first part of the tutorial is quite basic, so feel free to skip it and jump to the problems if you already know the concepts.
Target: rating $$$[1400, 2100]$$$ on CF
Prerequisites: basic graph theory, greedy
The diameter
Given an unweighted tree, let's define $$$\text{dist}(a, b) =$$$ the number of edges in the simple path $$$a \rightarrow b$$$.
A diameter of the tree is a simple path $$$a \rightarrow b$$$ that maximizes $$$\text{dist}(a, b)$$$ over all pairs of nodes. If there are multiple diameters, let's pick any of them.
The same definition is valid for a weighted tree with nonnegative weights (with $$$\text{dist}(a, b) =$$$ the sum of the weights of the edges in the simple path $$$a \rightarrow b$$$).
Finding a diameter
Given a tree with $$$n$$$ nodes are multiple ways to find a diameter. Here is one of the simplest ways:
Run a DFS from any node $$$p$$$. Let $$$a$$$ be a node whose distance from node $$$p$$$ is maximized. Run another DFS from node $$$a$$$. Let $$$b$$$ be a node whose distance from node $$$a$$$ is maximized. $$$a \rightarrow b$$$ is a diameter.
Tree = edges of a diameter + forest
Before proving the previous algorithm, let's analyze the structure of the tree (we will mention the diameter, but we will not use the fact that $$$a \rightarrow b$$$ is actually a diameter before proving it).
We started a DFS from node $$$p = 11$$$, and we got that node $$$a = 1$$$ is the farthest from $$$p$$$, and node $$$b = 7$$$ is the farthest from $$$a$$$.
Let's represent the diameter on a line. If you remove the edges of the diameter, you get a forest (i.e., several trees). Let's root each tree at the node in the diameter. What's the height (i.e., the maximum distance from the root to any node) of each component?
Let $$$q$$$ be the root of the component of $$$p$$$. Let's consider any component whose root $$$d$$$ is between $$$a$$$ and $$$p$$$ (included), and one of its nodes $$$c$$$.
We get
$$$\text{dist}(p, a) \geq \text{dist}(p, c) \implies \text{dist}(p, a) - \text{dist}(p, d) \geq \text{dist}(p, c) - \text{dist}(p, d) \implies \text{dist}(c, a) \geq \text{dist}(c, d)$$$.
In other words, the height of each component is at most the distance of the root of the component from one end of the diameter.
You can prove the same statement for the other end of the diameter, using that $$$b$$$ is the farthest node from $$$a$$$.
Proof that $$$a \rightarrow b$$$ is a diameter
For each $$$x$$$, let $$$f(x)$$$ be the number of inversions if you consider only the elements from $$$1$$$ to $$$x$$$ in the permutation.
First, let's put $$$x$$$ at the end of the permutation: this requires $$$x - \text{pos}(x)$$$ moves. That's optimal (the actual proof is similar to Proof 1; in an intuitive way, if you put the last element to the end of the array, it doesn't interfere anymore with the other swaps).
For example, if you have the permutation $$$[2, 3, 7, 8, 6, 9, 1, 4, 5]$$$ and you move the $$$9$$$ to the end, you get $$$[2, 3, 7, 8, 6, 1, 4, 5, 9]$$$ and now you need to sort $$$[2, 3, 7, 8, 6, 1, 4, 5]$$$. Hence, $$$f(x) = f(x-1) + x - \text{pos}(x)$$$. For each $$$x$$$, $$$x - \text{pos}(x)$$$ is actually the number of pairs $$$(i, j)$$$ ($$$1 \leq i < j \leq x$$$) such that $$$x = a_i > a_j$$$. So $$$f(x)$$$ is equal to the number of inversions.
Counting inversions in $$$O(n \log n)$$$
You can use a Fenwick tree (or a segment tree). There are other solutions (for example, using divide & conquer + merge sort), but they are usually harder to generalize.
For each $$$j$$$, calculate the number of $$$i < j$$$ such that $$$a_i > a_j$$$.
The Fenwick tree should contain the frequency of each value in $$$[1, n]$$$ in the prefix $$$[1, j - 1]$$$ of the array.
So, for each $$$j$$$, the queries look like
- $$$res := res + \text{range_sum}(a_j + 1, n)$$$
- add $$$1$$$ in the position $$$a_j$$$ of the Fenwick tree
Observations / slight variations of the problem
By using a Fenwick tree, you are actually calculating the number of inversions for each prefix of the array.
You can calculate the number of swaps required to sort an array (not necessarily a permutation, but for now let's assume that its elements are distinct) by compressing the values of the array. For example, the array $$$[13, 18, 34, 38, 28, 41, 5, 29, 30]$$$ becomes $$$[2, 3, 7, 8, 6, 9, 1, 4, 5]$$$.
You can also calculate the number of swaps required to get an array $$$b$$$ (for now let's assume that its elements are distinct) starting from $$$a$$$, by renaming the values. For example,
$$$a = [2, 3, 7, 8, 6, 9, 1, 4, 5], b = [9, 8, 5, 2, 1, 4, 7, 3, 6]$$$
is equivalent to
$$$a = [4, 8, 7, 2, 9, 1, 5, 6, 3], b = [1, 2, 3, 4, 5, 6, 7, 8, 9]$$$
$$$a^{-1}$$$ (a permutation such that $$$(a^{-1})_{a_x} = x$$$, i.e. $$$(a^{-1})_x$$$ is equal to the position of $$$x$$$ in $$$a$$$) has the same number of inversions as $$$a$$$. For example, $$$[2, 3, 7, 8, 6, 9, 1, 4, 5]$$$ and $$$[7, 1, 2, 8, 9, 5, 3, 4, 6]$$$ have both $$$16$$$ inversions. Sketch of a proof: note that, when you swap two elements in adjacent positions in $$$a$$$, you are swapping two adjacent values in $$$a^{-1}$$$, and the number of inversions in $$$a^{-1}$$$ also increases by $$$1$$$ or decreases by $$$1$$$ (like in Proof 1).
Hint 1This problem seems very similar to "calculate the number of swaps required to get an array $$$b$$$ (with distinct elements) starting from $$$a$$$", but there can be equal elements.
Hint 2Some swaps are useless (which ones?)
Hint 3It's useless to swap equal characters. For each character, can you get its final position in the reversed string? (Treat equal characters as distinguishable values)
SolutionNote that it's useless to swap equal characters (in fact, the string doesn't change and you waste $$$1$$$ move).
Consider each character $$$c$$$ from 'a'
to 'z'
separately. The relative order of a subsequence of equal characters doesn't change in the optimal solution.
Hence, the $$$i$$$-th leftmost occurrence of $$$c$$$ in the initial string should move to the $$$i$$$-th leftmost occurrence of $$$c$$$ in the reversed string. If you number each character of the string, treating equal characters as distinguishable values, you can calculate the array $$$a$$$ such that $$$a_i$$$ is the final position of $$$s_i$$$ in the reversed string, and the result is the number of inversions of $$$a$$$.
For example, consider the string acabcaaababbcb
. The reversed string is bcbbabaaacbaca
. The positions of 'b'
in the reversed string are $$${1, 3, 4, 6, 11}$$$, and this means that $$$a_4 = 1, a_9 = 3, \dots, a_{14} = 11$$$ (in fact, $$$a = [5, 2, 7, 1, 10, 8, 9, 12, 3, 14, 4, 6, 13, 11]$$$).
Hint 1If there are two adjacent elements, it's optimal to remove them.
Hint 2In Proof 1, we've seen that a swap changes the number of inversions by $$$1$$$. Here, you should calculate something slightly different than the number of inversions.
Hint 3Consider two pairs of equal elements. What should their relative positions be at the end?
Hint 4The answer is $$$n + \text{# of pairs of integers (x, y) such that their order in the array is xyxy}$$$. Can you calculate it efficiently?
SolutionConsider two pairs of equal elements (integers $$$(x, y)$$$). If you want to remove those pairs, in some moment their relative positions should be $$$\text{xyyx}$$$ (or $$$\text{yxxy}$$$). Instead, if the relative positions are $$$\text{xyxy}$$$, you want to swap two of these elements. If $$$k$$$ is the number of pairs of integers $$$(x, y)$$$ such that their order in the array is $$$\text{xyxy}$$$, do you need exactly $$$k$$$ swaps (and $$$n$$$ removals)? It turns out that the answer is yes. Sketch of a proof:
- if there are no $$$\text{xyxy}$$$, there are two adjacent equal elements (so you can remove them)
- if you remove two adjacent equal elements, the number of $$$\text{xyxy}$$$ doesn't change
- if there are no adjacent equal elements, there is a $$$\text{xyxy}$$$ such that two of these characters are adjacent (so you can swap them and eliminate a $$$\text{xyxy}$$$)
Now let's calculate $$$k$$$. Let $$$(x_i, y_i)$$$ be $$$n$$$ pairs such that $$$x_i < y_i, a_{x_i} = a_{y_i}$$$: now you have to calculate, for each $$$j$$$, how many indices $$$i$$$ meet $$$x_i < x_j, x_j < y_i < y_j$$$.
You can use a Fenwick tree that contains how many times each value in $$$[1, n]$$$ occurs as a $$$y_i$$$, considering only pairs such that $$$x_i < x_j$$$.
So, for each $$$j$$$, the queries look like
- $$$res := res + \text{range_sum}(x_j + 1, y_j - 1)$$$
- add $$$1$$$ in the position $$$y_j$$$ of the Fenwick tree
Hint 2The answer is -1
if there are $$$\geq 2$$$ characters with an odd number of occurrences in the string. Otherwise, can you calculate the optimal final position of each character, like in 1430E - String Reversal?
Hint 3Once again, it's useless to swap equal characters. So, what's the relative position of equal characters? Which characters should be located in symmetrical positions (i.e. $$$i, n - i + 1$$$)?
Hint 4The $$$i$$$-th leftmost occurrence and the $$$i$$$-th rightmost occurrence of a character $$$c$$$ in the initial string should move to symmetrical positions in the final string. Let's consider such "symmetrical pairs": what's the optimal relative order of two pairs with distinct letters? What about the letter in the center of the string (if its length is odd)?
SolutionIf there are $$$\geq 2$$$ characters with an odd number of occurrences in the string, you can't make a palindrome.
Since it's useless to swap equal characters, you know which pairs of characters will stay on symmetrical positions in the final string, i.e. the $$$i$$$-th leftmost occurrence and the $$$i$$$-th rightmost occurrence of any character $$$c$$$.
In the string acabcaaababbc
, for example, you can make these "symmetrical pairs": positions $$$(1, 10), (2, 13), (3, 8), (4, 12), (6, 7), (9, 11)$$$. The character in position $$$5$$$ should go in the center of the final string (i.e. position $$$7$$$).
The symmetrical pairs have to be nested inside each other, in some order. Given two symmetrical pairs, which of them should be located outside the other one?
It turns out that the pair that contains the leftmost element should be located outside. In fact, if you want to reach $$$\text{xyyx}$$$, the initial configurations with $$$x$$$ on the left ($$$\text{xxyy}$$$, $$$\text{xyxy}$$$, $$$\text{xyyx}$$$) require $$$2, 1, 0$$$ moves respectively, while reaching $$$\text{yxxy}$$$ requires $$$2, 1, 2$$$ moves respectively.
So you can sort the symmetrical pairs by leftmost element and construct the array $$$a$$$ such that $$$a_i$$$ is the final position of $$$s_i$$$ in the palindromic string (in this case, $$$[1, 2, 3, 4, 7, 5, 9, 11, 6, 13, 8, 10, 12]$$$) and the result is the number of inversions of $$$a$$$.
Implementation (C++)
Hint 1Which balls can you put at the end?
Hint 2At the end, you have a ball with number $$$n$$$. For example, if it's white, you have to sort the remaining $$$n-1$$$ white balls and $$$n$$$ black balls. How to calculate efficiently the cost required?
Hint 3Let $$$dp_{i,j}$$$ be the minimum cost required to sort the first $$$i$$$ white balls (with a number from $$$1$$$ to $$$i$$$) and the first $$$j$$$ black balls (with a number from $$$1$$$ to $$$j$$$). You need transitions in $$$O(1)$$$.
Hint 4Re-read Proof 2 in this tutorial.
SolutionIt's easy to prove that the leftmost $$$k$$$ balls in the final arrangement are the white balls with a number in $$$[1, i]$$$ and the black balls with a number in $$$[1, j]$$$ for some $$$i, j$$$ such that $$$i+j = k$$$.
For fixed $$$i, j$$$, let $$$dp_{i,j}$$$ be the minimum cost (considering only such balls). If the last ball is white, you can move it immediately to the end (see Proof 2), and the cost is the number of $$$l$$$ in the initial array such that
- $$$l \leq i - 1$$$ if the ball is white, $$$l \leq j$$$ if the ball is black
- $$$\text{pos}(l) > \text{pos}(i)$$$
Then you spend $$$dp_{i-1,j}$$$ to sort the rest of the array. You can calculate the cost similarly if the last ball is black.
Computing such transitions in $$$O(1)$$$ can be done with simple prefix sums (for example, let $$$ps_{i, c, j}$$$ be the number of balls with position $$$\geq i$$$, color $$$c$$$ and value $$$\leq j$$$).
The result is $$$dp_{n,n}$$$.
Implementation (C++)
Other problems
IOI 2019/1
arc120_c (suggested by Ghassane)
Hackerearth — Swapping numbers (Inferno03)
Hackerearth — Make the strings equal (Inferno03)
1526D - Kill Anton (somil_jain_120)
JOI 2021/3 (Final Round) (you can submit here)
Conclusions
We've seen that a lot of problems where you have to swap adjacent elements can be tackled with greedy observations, such as looking at the optimal relative positions of the values in the final array; then, a lot of these problems can be reduced to "find the number of inversions" or similar.
Of course, suggestions/corrections are welcome. In particular, please share in the comments other problems where you have to swap adjacent elements.
I hope you enjoyed the blog!