Given a nontrivial rectifiable closed plane curve $\gamma\colon\mathbb{R}/\ell\mathbb{Z}\to\mathbb{R}^{2}$ of length $\ell$ parameterized by the arclength, the arc-chord constant of $\gamma$ is defined as
This post revisits the topic of integer division, building upon the discussion in the previous post. Specifically, I’ll delve into removing trailing zeros in the decimal representation of an input integer, or more broadly, factoring out the highest power of a given constant that divides the input. This exploration stems from the problem of converting floating-point numbers into strings, where certain contemporary algorithms, such as Schubfach and Dragonbox, may yield outputs containing trailing zeros.
It is well-known that the integer division is quite a heavy operation on modern CPU’s - so slow, in fact, that it has even become a common wisdom to avoid doing it at ALL cost in performance-critical sections of a program. I do not know why division is particularly hard to optimize from the hardware perspective. I am just guessing, maybe (1) every general algorithm is essentially just a minor variation of the good-old long division, (2) which is almost impossible to parallelize. But whatever, that is not the topic of this post.
This post is about an ingenious algorithm for printing integers into decimal strings. It sounds like an extremely simple problem, but it is in fact quite more complicated than one might imagine. Let us more precisely define what we want to do: we take an integer of specific bit-width and a byte buffer, and convert the input integer into a string consisting of its decimal digits, and then write it into the given buffer. For simplicity, we will assume that the integer is unsigned and is of $32$-bits. So, we want to implement the following function written in C++:
char*itoa(std::uint32_tn,char*buffer){// Convert n into decimal digit string and write it into buffer.// Returns the position right next to the last character written.}
There are numerous algorithms for doing this, and I will dig into a clever algorithm invented by James Anhalt (jeaiii), which seems to be the fastest known algorithm at the point of writing this post.
When I was working on Dragonbox and Grisu-Exact (which are float-to-string conversion algorithms with some nice properties) I had to come up with a fast method for computing things like $\lfloor n\log_{10}2 \rfloor$ or $\lfloor n\log_{2}10 \rfloor$, or more generally $\lfloor nx\rfloor$ for some integer $n$ and a fixed positive real number $x$. Actually, the sign of $x$ isn’t extremely important, but let us just assume $x>0$ for simplicity.