Big integers with negative digits: The Trygub numbers

#	User	Rating
1	ecnerwala	3649
2	Benq	3581
3	jiangly	3578
4	orzdevinwang	3570
5	Geothermal	3569
5	cnnfls_csy	3569
7	tourist	3565
8	maroonrk	3531
9	Radewoosh	3521
10	Um_nik	3482

#	User	Contrib.
1	maomao90	174
2	awoo	165
3	adamant	163
4	TheScrasse	159
5	nor	158
6	maroonrk	156
7	-is-this-fft-	152
8	SecondThread	147
9	orz	146
10	pajenegod	145

Hi everyone!

Today we will talk about something that might be useful in programming competitions. Yay! Also great thanks to antontrygubO_o for sharing this with me and others in a Discord server.

Let's implement a data structure that maintains a big number $$$N$$$ in base $$$b$$$ and supports the following:

Given (possibly negative) integers $$$|x|, |y| \leq n$$$, add $$$x b^y$$$ to $$$N$$$.
Given $$$k$$$ and assuming $$$N \geq 0$$$, print the $$$k$$$-th digit of $$$N$$$.
Check if $$$N$$$ is positive, negative or equals to $$$0$$$.

Each operation should take at most $$$O(\log n)$$$ amortized time and $$$O(q)$$$ memory. While some approach you may think of immediately would imply using segment tree, or a similar structure, the solution proposed here only requires std::map, so it's much shorter and easier to implement (at the slight expense of increased constant factor). It may be used in the following problems:

If you implement the big integers in these numbers the standard way (i.e. keeping digits in the $$$[0, b)$$$ segment, carefully executing carries, etc), you will quickly learn that you may get in trouble because you may be forced to do and undo a lot of carry operations which chain up and so you need to frequently change large segments between the values of $$$0$$$ and $$$b-1$$$.

Now, stop being fooled by the non-negative propaganda! You don't have to do it! Let's give ourselves some slack and allow negative digits. Well, just a bit of them. Instead of maintaining digits in $$$[0,b)$$$, let's now maintain them in the interval $$$(-b, b)$$$. It seems like a tiny change, but the effect is tremendous. On one hand, the representation of any number is not unique anymore. On the other hand, when we actually reach the value $$$b$$$ or $$$-b$$$, we wrap them back to $$$0$$$, and carry $$$1$$$ or $$$-1$$$ to the next digit correspondingly.

Noticed anything? The carry now wraps us from the endpoints of the interval to its middle instead of from one endpoint to another! It would be easy to add $$$1$$$ to a particular bit, turn it into $$$b$$$ and cause a chain of carries by it. But! If after that we add $$$-1$$$ to the same bit, it will not wrap all the bits back to $$$b-1$$$! It will just change this specific bit to $$$-1$$$! So, we give up the uniqueness of the representation, but we gain a whole lot of stability in exchange.

The C++ implementation for the first two queries is also quite concise:

code

struct trygub_num {
    const int base = 1 << 30;

    map<int, int> digs;

    void add(int a, int b) {
        digs[b] += a;
        int t;
        do {
            t = digs[b] / base;
            digs[b + 1] += t;
            digs[b] -= t * base;
            if(digs[b] == 0) {
                digs.erase(b);
            }
            b++;
        } while(t);
        if(digs[b] == 0) {
            digs.erase(b);
        }
    }

    int get(int k) {
        auto it = digs.lower_bound(k);
        int ans = it == end(digs) || it->first > k ? 0 : it->second;
        if(it != begin(digs) && prev(it)->second < 0) {
            ans--;
        }
        return (ans + base) % base;
    }
} me;

I tested it on #2302. 「NOI2017」整数 and it works!

P.S. Applying it to 1817E - Полусумма and 1810F - M-дерево is left to the curious reader as an exercise :)

P.P.S. Is this trick well-known? Does it have a name?

Comments (10)

Write comment?

adamant

12 months ago, # |

+58

At the request of jeroenodb and in honor of great antontrygubO_o I also suggest calling the structure "Trygub num".

→ Reply

ffao

+20

Lyrically

+77

Finally adamant posted something that we can understand:P

bvd

+30

Allowing negative digits in number representations has already appeared in Concrete Mathematics, page 15-16, but this is the first time I see the "wrapping around" part.

← Rev. 3 →

Now that I think about it, the intended solution to 102354E - Decimal Expansion uses pentagonal number theorem to represent the constant $$$\frac{9}{10}\frac{99}{100}\frac{999}{1000} \dots$$$ as a Trygub number in base $$$10$$$ and recover the corresponding digit. So many applications!

alwyn

Another problem: INC 2022 C — Powers of Two

jeroenodb

← Rev. 6 →

+24

I really wanted to know why the amortized complexity is good. So I decided to prove it for myself:

We will be using a potential function, which represents how bad we messed up the data structure. If the potential function is high it means we need to clean up a lot of trash.

Let's take as potential function:

$$$\phi(\text{trygub num DS}) = c\log(n) \sum | \text{digs}[i]|/b$$$

The absolute value of the digits, times a constant $$$c \log(n)$$$, which isn't too important for now. It's intuitively clear that the higher the absolute values of the digits, the more carries can happen in future operations, so such a state of the data structure is worse. Good to note is that this potential function starts at $$$0$$$ and its value is always non-negative.

To prove the amortized bound, let's see how the potential changes with the add operation. (It is the only operation that changes the potential function).

In the $$$\texttt{add}$$$ function, if there are $$$k$$$ carries, the function needs to do $$$O( (1+ k) \cdot \log(n))$$$ work. When a carry is happening, it means we subtract $$$\text{carry} \cdot b$$$ from $$$\text{digs}[i]$$$ and add back $$$\text{carry}$$$ to $$$\text{digs}[i+1]$$$. The carrying always decreases $$$|\text{digs}[i]|$$$ and either increases or decreases $$$|\text{digs}[i+1]|$$$. In the worst-case, the potential function decreases by at least $$$ c \log(n) \cdot (1 - \frac{1}{b})$$$. So

$$$T_\text{amortized} = \Delta \phi + \text{actual work} \leq -k \cdot c \log(n) (1 - \frac{1}{b} ) + O( (k+1) \log(n))$$$

If the constant $$$c$$$ is chosen big enough, such that the $$$-k$$$ term is bigger than the $$$+k$$$ term inside the big Oh notation, the $$$+k$$$ and $$$-k$$$ terms can cancel. So $$$T_\text{amortized} = O(\log(n))$$$.

We didn't yet account for the extra potential that is due to the addition of $$$x$$$ to the digit. If we can assume $$$x \leq b$$$, then this adds potential $$$\leq c \log(n) \cdot b / b = c \log(n)$$$, so it is still amortized fast.

If $$$b \ll x $$$, the amortized analysis does not work (you can prove some slightly worse bounds though). Luckily, this can be easily fixed. It doesn't bother us if we change the base used in the data structure to $$$b^\prime = b^k$$$, for $$$k>0$$$, so we just choose a large enough $$$k$$$, such that $$$b^k \geq \max X$$$.

You can even prove if $$$\max X = O( \text{base}^c)$$$, for some constant $$$c$$$, the amortized analysis still works.

12 months ago, # ^ |

Btw, a good way to think about how a potential function actually proves an amortized bound is this:

The potential function can be seen as a bank account, where you can put money into to save it for later. Whenever you're decreasing the potential you are taking away money. Everyone knows money = time. So in this analogy, to pay for the time of a slow operation, you take some money out of the bank account. With the money you saved in earlier operations by paying extra and putting it in the bank account. By ensuring the potential function is non-negative you are ensuring you're never spending more money than you have in the bank.

Motherhacker123

Nice

peltorator

+16

Now, stop being fooled by the non-negative propaganda!

Thank you adamant for keeping Codeforces from propaganda!

adamant's blog