nt.number theory – Sum of inverse squares of numbers divisible only by primes in the kernel of a quadratic character

Let $chi$ be a primitive quadratic Dirichlet character of d modulus $m$, and consider the product
$$prod_{substack{p text{ prime} \ chi(p) = 1}} (1-p^{-2})^{-1}.$$

What can we say about the value of this product? Do we have good upper or lower bounds?

Some observations, ideas, and auxiliary questions

  • When $chi$ is trivial, it has value $zeta(2)$.
  • In general, since Chebotarev density theorem (CDT) tells us that $chi(p)$ is equidistributed in the limit, I would “want” the value to be something like

$$Big(zeta(2)prod_{p | m} (1-p^{-2})Big)^{frac{1}{2}}.$$

However, if I’m not mistaken, it seems that the error terms in effective forms of CDT may cause this to be very far from the truth. We can’t ignore what happens before we are close to equidistribution as the tail and the head are both $O(1)$. We can’t even control the error term well (without GRH) because of Siegel zeroes.

  • I don’t think we can appeal to Dirichlet density versions of CDT since those only tell us things in the limit as $s$ goes to $1$ and here $s = 2$.
  • Is there a way to “Dirichlet character”-ify a proof of $zeta(2) = pi^2/6$ to get a formula for this more general case? At least with Euler’s proof via Weierstrass factorization, it seems that we would need some holomorphic function which has zeroes whenever $chi(n) = 1$.

I had a few other ideas but they all seem to run into the same basic problem of “can’t ignore the stuff before the limit”… am I missing something?

nt.number theory – Stabilizers of points in the upper half-plane

Suppose that $Gamma$ is a group acting discontinuously on $mathcal{H} = { z in mathbb{C} : operatorname{Im}(z) > 1}$. In order to keep things simple, suppose that $Gamma subseteq operatorname{GL}_{2}(mathbb{R})$ and is acting via the linear fractional transformations. In general, for $z in mathcal{H}$, what can be said about $operatorname{Stab}_{Gamma}(z)$? Of course, it is very well-known that $operatorname{Stab}_{operatorname{SL}_{2}(mathbb{R})}(i) = operatorname{SO}(2)$. I have also seen that when $Gamma$ is a quaternion algebra over a totally real number field $F$, certain points’ stabilizers are CM extensions of $F$ (contained in $Gamma$). Are there any other exceptional examples? Specifically, when $Gamma$ is a quaternion algebra?

nt.number theory – Please get to the prime number theorem for arithmetic progressions

Thanks for contributing an answer to MathOverflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

nt.number theory – On the determinant $det[gcd(i-j,n)]_{1le i,jle n}$

In Sept. 2013, I investigated the determinant
$$D_n=det(gcd(i-j,n))_{1le i,jle n}$$
and computed the values $D_1,ldots,D_{100}$ (cf. http://oeis.org/A228884). To my surprise, they are all positive!

Question. Does $D_n>0$ hold for all $n=1,2,3,ldots$?

I believe that $D_n$ is always positive. How to prove this?

It is easy to see that $D_n$ is divisible by $sum_{k=1}^ngcd(k,n)=sum_{dmid n}varphi(d)frac nd$.
It seems that $varphi(n)^{varphi(n)}sum_{k=1}^ngcd(k,n)$ divides $D_n$. Maybe there is a simple explanation for this.

Your comments are welcome!

nt.number theory – Prove using Dyck naturals: for $n in mathbb{N}_{+}$ and big enough $k in mathbb{N}_{+}$, $p_{k-1}

While conducting research in connection with arXiv:2102.02777 (“Recursive Prime Factorizations: Dyck Words as Numbers”), I noticed certain interesting patterns, one of which inspired the following conjecture:

Conjecture. Let $n in mathbb{N}_{+}$. Then there exists a positive integer $k_{text{min}}$ such that for all integers $k ge k_{text{min}}$,

$p_{k-1} < cdots < np_{k-a_{n}},$

where $a_{n}$ is the number of prime power divisors of $n$, i.e., $a$ is Sequence A073093 in the Online Encyclopedia of Integer Sequences (OEIS).

Example. Let $n = 10$, and let $k$ be an integer greater than or equal to $k_{text{min}} = 51$. Then I conjecture that

$p_{k-1} < 2p_{k-2} < 3p_{k-2} < 4p_{k-3} < 5p_{k-2} < 6p_{k-3} < 7p_{k-2} < 8p_{k-4} < 9p_{k-3} < 10p_{k-3}$.

These inequalities are very easily obtained by observing trends in the Dyck naturals (ibid., Definition 2.9). Let $theta(n)$ be that finite subsequence of $mathbb{N}$ such that every term in the subsequence is represented by a Dyck natural of semilength $n$. For example, $theta(4) = (5,6,8,9,16)$, since the Dyck natural representations of the terms in $theta(4)$ are $()()(()), (())(()), (()(())), ()((()))$ and $(((())))$ respectively, all of these being words of semilength 4. I call this “the stripe of semilength 4” (ibid., Definition 5.1), which may be somewhat misleading, as “semilength” refers not to the stripe itself but to the lengths of the recursive prime factorizations (RPFs) of the terms in the stripe. With that caveat, let us consider stripes $theta(0)$ through $theta(10)$, looking for patterns in the $i$th term of each stripe having an $i$th term.

$theta(0) = (0)$

$theta(1) = (1)$

$theta(2) = (2)$

$theta(3) = (3,4)$

$theta(4) = (5,6,8,9,16)$

$theta(5) = (7, 10, 12, 15, 18, 25, 27, 32, 64, 81, 256, 512, 65536)$

$theta(6) = (11, 14, 20, 21, 24, 30, 35, 36, 45, 48, 49, 50, 54, ldots,theta(6)_{text{last}})$

$theta(7) = (13, 22, 28, 33, 40, 42, 55, 60, 63, 70, 72, 77, 80, ldots,theta(7)_{text{last}})$

$theta(8) = (17, 26, 39, 44, 56, 65, 66, 84, 91, 99, 110, 112, 120, ldots,theta(8)_{text{last}})$

$theta(9) = (19, 34, 51, 52, 78, 85, 88, 117, 119, 130, 132, 168, 176, ldots,theta(9)_{text{last}})$

$theta(10) = (23, 38, 57, 68, 95, 102, 104, 133, 153, 156, 170, 208, 209, ldots,theta(10)_{text{last}})$

That should be enough to give an idea of the pattern inspiring the conjecture. For example, the pattern as manifested in the first terms is obvious: for stripes $theta(k)$ where $k ge 2$, the first term in $theta(k)$ is the prime number $p_{k-1}$. Also, for stripes $theta(k)$ where $k ge 3$, the second term is the semiprime $2p_{k-2}$. And for stripes $theta(k)$ where $k ge 8$, the third term is the semiprime $3p_{k-2}$.

For the next revision of arXiv:2102.02777, I plan to include a modification of the conjecture expressing the idea in terms of recursive prime factorizations. But obviously a theorem would be better than a conjecture (if the conjecture is true, of course), so I wonder whether someone could point the way. The conjecture would read like this:

Conjecture. Let $n in mathbb{N}_{+}$. Then there exists a positive integer $k_{text{min}}$ such that for all positive integers $k ge k_{text{min}}$, the $n$th term in stripe $theta(k)$ is equal to $np_{k-a_{n}}$, where $p_{i}$ is the $i$th prime number and $a_{n}$ is the number of prime power divisors of $n$, i.e., $a$ is Sequence A073093 in the Online Encyclopedia of Integer Sequences (OEIS).

I asked a related question on Math Stackexchange (https://math.stackexchange.com/questions/4033170/how-might-i-prove-or-disprove-this-conjecture-7p-k-2-8p-k-4-for-all-k), and users @lulu and @QC_QAOA were extremely helpful, but I cannot immediately see how their proof strategies could be adapted to prove that the patterns apply to Dyck naturals (which doesn’t invalidate their strategies, because I did not constrain the form of the desired answer to involve Dyck naturals). A different kind of approach seems more suitable, something in the realm of formal language theory (perhaps using combinatorics on words?), since this is all coming from patterns observed in Dyck language representations of numbers. Can anyone out there help me find a proof? And does someone out there see a pattern in the sequence of $k_{text{min}} = (2,3,8,8,13,13,35,35,10,14,ldots)$? Any help would be greatly appreciated…

nt.number theory – Prove this is true only when $s equiv pm r^{pm 1} pmod{q}$

Let us fix a positive integer $q$, and let us define two functions $P, Q: mathbb{N}^2 to mathbb{N}$ as follows:
$$ P(s,t) := sum_{j=1}^t leftlfloor frac{j (s-1) + t}{q} rightrfloor$$
$$ Q(s,t) := sum_{j=1}^t leftlceil frac{j (s+1) – t}{q} rightrceil$$

If we define the function $A_s(t) : mathbb{N} to mathbb{N}$ by:
$$ A_s(t) := P(s,t) – Q(s,t) $$

I claim that $A_s = A_r$ (as functions of $t$) if and only if one of the following is true:

  • $s equiv r pmod{q}$
  • $s equiv -r pmod{q}$
  • $sr equiv 1 pmod{q}$
  • $sr equiv -1 pmod{q}$

I do have a proof of both implications, but they are rather involved and a bit too technical. The hardest part is to show that $A_s = A_r$ implies one of the four bullets. I suspect that there must exist an easier argument to solve this. For example, with this notation one can deduce some straightforward identities such as:

$$P(-s,t) = -Q(s,t)$$

which helps to prove that the second bullet implies $A_s = A_r$. I am wondering if this is indeed a hard problem and technical stuff has to play a role in a proof or if I am missing a simpler proof. Even a simple proof of the fact that both the third or fourth bullet imply $A_s = A_r$ would be nice, since what I have is lengthy and ugly.

nt.number theory – On the equivalence of incompressibility and incompleteness in the natural sciences


In the history of science the issue of whether the scientific method is observer independent has been a recurring question. This appears to date back to Planck who supplied a useful intuition:

Science can’t solve the ultimate mystery of nature. And that is because, in the last analysis, we ourselves are a part of the mystery that we are trying to solve.-Planck

In my attempts to address Planck’s claim, I managed to derive a direct correspondence between incompressibility, epistemic uncertainty and incompleteness. In particular, I found a mathematical inequality from which we may deduce that science is not observer independent.

As this result is general, I think it may be of general interest and I am also curious about related results. However, perhaps the biggest challenge involves formulating the problem of scientific measurement and scientific induction in a clear and mathematically precise manner.

Note: This question is a bit long and my estimate is that it might take 15 minutes to read in total but I tried as much as possible to optimise for clarity and focus on intuitions and insights. From my internet searches, it appears that not many mathematicians have given much thought to this general question but I could be wrong. Hence my decision to share it here.

The general problem of scientific measurement:

The general problem of the mathematical sciences involves describing the behaviour of dynamical systems in terms of the solutions to certain classes of differential equations. Now, given that any partial derivative may be expressed as a ratio of two observables that generally have distinct units and more generally the chain rule in multivariable calculus implements a product of scientific units, dimensional independence is an essential requirement of any system of fundamental units $U={u_i}_{i=1}^n$.

All other scientific types are derived from $U$ so if $mathbb{C}$ denotes the set of composite scientific types:

forall c in mathbb{C}, exists alpha_i in mathbb{Z}, c = prod_{i=1}^n u_i^{alpha_i} tag{1}

Furthermore, if we define the n-dimensional vector space:

text{span}(log U) = big{ sum_{i=1}^n alpha_i log u_i lvert u_i in U, alpha_i in mathbb{Z} big} tag{2}

the $u_i$ are dimensionally independent in the sense that $forall u_i,u_{jneq i} in U, log u_i perp log u_{j neq i}$:

exists alpha_i in mathbb{Z}, sum_{i=1}^n alpha_i log u_i iff alpha_i = 1 land alpha_{j neq i} = 0 tag{4}

It follows that all scientific types are derived types relative to $U$ and from (4) we may deduce that all natural signals have a natural factorisation in terms of $U$.

This result is generally known as the Buckingham-Pi theorem and the International System of Units is a concrete instance of this theorem. These are defined in terms of the fundamental constants $bar{h}, c, e, k_B…$ for two reasons. First, these constants are dimensionally independent as desired. Second, these are constants so these definitions are not expected to change.

Paradoxically, as all the relations in science are defined in terms of the fundamental constants these represent the epistemic limits of Quantum Field Theory(QFT). For this reason, high-precision measurements of the fundamental constants not only advance the frontiers of high-precision engineering but also provide fundamental tests of QFT. As Poincaré pointed out scientists only discover the relation between things and not the things themselves. I am digressing a little but I shall return to this point later on.

The reason why modern science has been so successful is that only a small number of observables are dimensionally independent, and this insight leads us to a different approach to scientific discovery that is equivalent in power to the approach of classifying and analysing systems of differential equations that most scientists are familiar with.

The general problem of scientific induction:

In principle, any physical system may be modelled as a discrete dynamical system:

forall k in mathbb{Z}, x_{k+1} = T circ x_k tag{5}

where, without loss of generality, $x_k subset S subset mathbb{R}^n$, $k$ is a discrete time index and $T:S rightarrow S$ is a dynamic map. This representation is epistemologically sound as data collected from dynamical systems always comes in discrete-time samples.

Within this context, we may represent data as evaluations of functions of the state $x_k$, known as observables. In fact, if $g: S rightarrow mathbb{R}$ is an observable associated with the system then the collection of all such observables forms a vector space due to the Buckingham-Pi theorem.

Now, the Koopman operator $U$ is a linear transform on this vector space given by:

Ug(x) = g circ T(x) tag{6}

which in a discrete setting implies:

Ug(x_k) = g circ T(x_k) = g(x_{k+1}) tag{7}

where the linearity of the Koopman operator follows from the linearity of the composition operator:

U circ (g_1 + g_2)(x) = g_1 circ T(x) + g_2 circ T(x) tag{8}

So we may think of the Koopman operator as lifting dynamics of the state space to the space of observables.

In this setting of Koopman operators, we may formulate the general problem of scientific induction as the challenge of discovering a machine learning model $M$ that approximates the eigenfunctions of $U$ so for a dataset $X={X_{i+1},X_i}$ we are looking for $M$ such that:

hat{X}_{i+1} = M circ X_i tag{9}

lVert hat{X}_{i+1} – X_{i+1} rVert approx 0 tag{10}

In general $S$ is a complex system and $M$ has bounded resources, so the statistical and epistemic uncertainty associated with $S$ relative to $M$ is best handled by a family of probabilistic models $P_M$. So the general problem involves discovering a machine learning model $P in P_M$ such that the entropy satisfies:

H(P(X_{i+1} lvert X_i) approx 0 tag{11}

where (11) encodes both the statistical and inductive generalisation aspects of scientific induction.

On the equivalence of incompressibility and incompleteness in the natural sciences:

Given our formulation of scientific induction, the Minimum Description Length principle implies that the Kolmogorov Complexity $K(cdot)$ of our dataset $X$ is given by:

K_P(D) = min_{P in P_M} (K_P(X) + lvert P rvert) = min_{P in P_M} K(X lvert P) + K(P) tag{12}

so there is an implicit tradeoff between model accuracy and model complexity. Now, let’s suppose $exists P^* in P_M$ such that:

K(X|P^*) + K(P^*) = min_{P in P_M} K(X lvert P) + K(P) tag{13}

as $P^*$ is a probabilistic program, we may represent the expected Kolmogorov Complexity as:

mathbb{E}(K(D)) = mathbb{E}(K(X|P^*)) + mathbb{E}(K(P^*)) tag{14}

where the expectation is calculated with the distributions of $P^*$ and $P(D)$. Now, since the expected Kolmogorov Complexity equals the Shannon entropy we find:

mathbb{E}(K(D)) = H(D|P^*) + H(P^*) tag{15}

where the implicit assumption in (14) is that in the regime of repeatable scientific experiments, ergodic assumptions are satisfied.

From (14) we may deduce the following lower bound:

mathbb{E}(K(D)) geq H(P^*) tag{16}

This lower-bound addresses Planck’s claim as it shows that science is not observer independent. In particular, the structure underlying the fundamental axioms embedded in $P^*$ are beyond the knowledge of $P^*$. To be precise, the epistemic uncertainty of $P^*$ concerning itself is exactly the Minimum Description Length of $P^*$ relative to a universal computer. As any computational process is ultimately a mathematical description of a physical process, we may identify such a universal simulator with the universe itself. Such an identification is consistent with the implications of the Physical Church-Turing thesis.

In particular, since $P^*$ implicitly contains a theory of computation and all such theories require arithmetic an optimal encoding of arithmetic is beyond the knowledge of $P^*$ independently of the computational resources available to $P^*$.


My intuition for why the primes are incompressible rests upon two reasons that are very similar to the reasons why the fundamental constants represent the epistemic limits of the totality of science. First, they are dimensionally independent. By this I mean that we may define the infinite-dimensional vector space:

text{span}(log mathbb{P}) = Big{sum_{i=1}^infty alpha_i log p_i lvert p_i in mathbb{P}, alpha_i in mathbb{Z}Big} tag{17}

where $log mathbb{Q_+} subset text{span}(log mathbb{P})$, and $forall p_j, p_{i neq j} in mathbb{P}, log p_{i neq j} perp log p_j$ since:

exists alpha_i in mathbb{Z}, sum_{i=1}^infty alpha_i log p_i = log p_j iff alpha_j = 1 land alpha_{i neq j} = 0 tag{18}

which implies that prime factorisations are unique.

Second, any mathematical system that is sufficiently powerful to formulate a general theory of computation must contain
arithmetic and since all data types are derived from the integers which have a unique representation in terms of the primes, it follows that all the other types in
such a system are derived types relative to the prime numbers.


  1. Bernard Koopman. Hamiltonian systems and Transformations in Hilbert Space.
  2. Steven L. Brunton. Notes on Koopman operator theory. 2019.
  3. Peter D. Grünwald. The Minimum Description Length Principle . MIT Press. 2007.
  4. M. Li and P. Vitányi. An Introduction to Kolmogorov Complexity and Its Applications. Graduate Texts in Computer Science. Springer. 1997.
  5. Fine & Rosenberger. Number Theory: An Introduction Via the Distribution of Primes. 2007.
  6. Copeland, B. Jack, “The Church-Turing Thesis”, The Stanford Encyclopedia of Philosophy (Summer 2020 Edition), Edward N. Zalta (ed.), URL = https://plato.stanford.edu/archives/sum2020/entries/church-turing/.
  7. Henri Poincaré. Science et Hypothèse. Champs sciences. 2014.
  8. Bertrand Russell. Human Knowledge: Its Scope and Limits. 2009.

nt.number theory – Examples of “total cancellation functions” $f_Q(n)$?

Let $f_Q(n):mathbb{N}tomathbb{N}$ be a series of arithmetic functions for every real value $Q$, for which the expected values $mathbb{E}_{ninmathbb{N}}(f_Q(n))$ all exist. We say that $f_Q(n)$ is a total cancellation function if, for any bounded sequence $(a_n)$, it holds that $lim_{Qtoinfty}mathbb{E}_{ninmathbb{N}}(a_nf_Q(n))=0$ provided that the expected values the limit is being taken over all exist.

I’ve been messing around for a while trying find examples of total cancellation functions, and the only example I have discovered is the functions $f_Q(n)=sum_{substack{d|n \ d<Q}}mu(d)$ where $mu(d)$ is the Mobius function.

Finding more examples might be interesting as it could lead to conjectures or theorems about conditions functions $f_Q(n)$ must have to be total cancellation.

nt.number theory – Periodic decimal expansion of $012345679$

When $frac{A}{81}$ has a periodic decimal expansion equal to $012345679$?

A is an integer.

A=1, A=23005 give the required decimal expansion.

Is there a way to predict when A will give that periodic expansion?

nt.number theory – information-theoretic derivation of the prime number theorem


While going through a couple interesting papers on the Physics of the Riemann Hypothesis (1) and the Minimum Description Length Principle (2), a derivation(not a proof) of the Prime Number Theorem occurred to me. I thought I would share it here as I wonder whether other mathematicians have pursued this direction. It appears to have interesting implications although the arguments are elementary.

An information-theoretic derivation of the prime number theorem:

If we know nothing about the primes in the worst case we may assume that each prime number less than or equal to $N$ is drawn uniformly from $(1,N)$. So our source of primes is:

X sim U((1,N)) tag{1}

where $H(X) = ln(N)$ is the Shannon entropy of the uniform distribution.

Now, given a strictly increasing integer sequence of length $N$, $U_N = {u_i}_{i=1}^N in (1,N)^N$ we may define the prime encoding of $U_N$ as the binary sequence $X_N = {x_i}_{i=1}^N$ where $x_i =1$ if $u_i$ is prime and
$x_i=0$ otherwise. With no prior knowledge, given that each integer is either prime or not prime,
we have $2^N$ possible prime encodings(i.e. arrangements of the primes) in $(1,N) subset mathbb{N}$.

If there are $pi(N)$ primes less than or equal to $N$ then the average number of bits per arrangement gives us the average amount of information gained from correctly identifying each prime number in $U_N$ as:

S_c = frac{log_2 (2^N)}{pi(N)}= frac{N}{pi(N)} tag{2}

Furthermore, if we assume a maximum entropy distribution over the primes then we would expect that each prime is drawn from a uniform distribution as in (1) so we would have:

S_c = frac{N}{pi(N)} sim ln(N) tag{3}

and this implies:

pi(N) sim frac{N}{ln(N)} tag{4}

which happens to be equivalent to the prime number theorem.


Has this line of reasoning already been developed?

Note: I wrote a few more thoughts on the subject on my blog.


  1. Dániel Schumayer and David A. W. Hutchinson. Physics of the Riemann Hypothesis. Arxiv. 2011.
  2. Peter D. Grünwald. The Minimum Description Length Principle . MIT Press. 2007.