## Motivation:

Given the set of prime numbers $mathbb{P} subset mathbb{N}$, the Copeland-Erdös constant $mathcal{C}$ is defined as (1):

begin{equation}

mathcal{C} = sum_{n=1}^infty p_n cdot 10^{-(n+ sum_{k=1}^n lfloor log_{10} p_k rfloor} tag{1}

end{equation}

where $p_n$ is the nth prime number.

Now, it is generally known that $mathcal{C}$ is normal and that normal numbers are finite-state incompressible. As machine learning systems are finite-state machines it occurred to me that prime formulas are not PAC-learnable by machine learning systems regardless of their computational power (5). While there are probably several approaches to this question, compression bounds driven by the Shannon source coding theorem seem natural to me as it demonstrates that in the asymptotic limit it is impossible to compress i.i.d. data such that the average number of bits per symbol is less than the Shannon entropy of the source.

## Question:

Might there be an effective information-theoretic approach to normal numbers motivated by the Shannon source coding theorem? In principle, such an approach would allow us to derive compression bounds for any normal number.

To clarify what I mean, I have added a description of my attempts to find an effective information-theoretic definition of normal numbers.

## An information-theoretic approach to normal numbers:

### An information-theoretic definition of normal number:

Given the alphabet $Sigma$ with $lvert Sigma rvert = alpha$ and $X = Sigma^{infty}$ we may define:

begin{equation}

Sigma^n cap X_N := bigcup_{w_i in Sigma^n} w_i cap {x_i}_{i=1}^{N-n+1} tag{2}

end{equation}

where $x_i = X((i-1)cdot n, i cdot n -1)$ and $limlimits_{N to infty} X_N = X$.

In this context, $X$ is normal to base $alpha$ if for any $Z_N sim U(Sigma^n cap X_N)$ with $N gg n$ the average amount of information gained from observing each digit in $Z_N$ converges to:

begin{equation}

log_2 lvert Sigma^n rvert = n cdot log_2 lvert Sigma rvert tag{3}

end{equation}

as $N to infty$.

### Proof of equivalence with the usual definition:

From a frequentist perspective, we may define the probabilities:

begin{equation}

forall w_i in Sigma^n, p_{w_i} = lim_{N to infty} frac{mathcal{N}(X_N,w_i)}{N-n+1} = frac{1}{|Sigma^n|} tag{4}

end{equation}

where $mathcal{N}(X_N,w_i)$ counts the number of times the string $w_i$ appears as a substring of $X_N$.

We have a uniform distribution over $Sigma^n$ in the sense that:

begin{equation}

forall w_i, w_{j neq i} in Sigma^n, p_{w_i} = p_{w_{j neq i}} tag{5}

end{equation}

Now, we define the random variable $Z_N sim U(Sigma^n cap X_N)$ whose Shannon entropy is given by:

begin{equation}

H(Z_N) = – sum_{i=1}^{lvert Sigma^n rvert} P(Z_N = w_i) log_2 P(Z_N = w_i) tag{6}

end{equation}

(6) is defined for sufficiently large $N$ since:

begin{equation}

forall w_i in Sigma^n forall epsilon > 0 exists m in mathbb{N} forall N geq m, Biglvert P(Z_N = w_i) – frac{1}{|Sigma^n|} Bigrvert = Biglvert frac{mathcal{N}(X_N,w_i)}{N-n+1} – frac{1}{|Sigma^n|} Bigrvert < epsilon tag{7}

end{equation}

and therefore we have:

begin{equation}

lim_{N to infty} H(Z_N) = log_2 lvert Sigma^n rvert = n cdot log_2 lvert Sigma rvert tag{8}

end{equation}

## Application to the Copeland-Erdös constant:

Given that $mathcal{C}$ is normal in base-10, it is finite-state incompressible. In particular, if $mathcal{A}$ is the description language for all finite-state automata(which includes all learnable programs) then we may deduce that for large $N$ (6):

begin{equation}

mathbb{E}(K_{mathcal{A}}(mathcal{C}_N)) sim N cdot log_2 (10) tag{9}

end{equation}

where $mathcal{C}_N$ denotes the first $N$ digits of $mathcal{C}$ and $K(cdot)$ denotes prefix-free Kolmogorov Complexity. From (9) we may deduce that a prime formula is not PAC-learnable.

## References:

- Copeland, A. H. and Erdős, P. “Note on Normal Numbers.” Bull. Amer. Math. Soc. 52, 857-860, 1946.
- A. N. Kolmogorov Three approaches to the quantitative definition of information. Problems of Information and Transmission, 1(1):1–7, 1965
- Olivier Rioul. This is IT: A Primer on Shannon’s Entropy and Information. Séminaire Poincaré. 2018.
- Edward Witten. A Mini-Introduction To Information Theory. 2019.
- Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press. 2014.
- Peter Grünwald and Paul Vitányi. Shannon Information and Kolmogorov Complexity. 2010.