This is a variant of this question. Consider two probability distributions $D$ and $U$, over $n$-bit strings, where $U$ is the uniform distribution. Assume that $D$ and $U$ are far apart in total variation distance, ie,

begin{equation}

d_{text{TV}}(D, U) geq frac{2}{3}.

end{equation}

We are not given an explicit description of $D$: we are only given black-box access, ie, we are only given a sampling device that can sample from $D$. Consider a sample $z in {0, 1}^{n}$, taken from either $D$ or $U$. We want to know which one is the case and to do that, we consider polynomial-time algorithms that use the sampling device. I am looking for a single optimal deterministic algorithm that works optimally for all $D$.

Let $A$ be this optimal algorithm. For each $D$, our algorithm $A$ is optimal in the sense that

$$

Pr_{z sim D} (A(z) = 1) geq frac{3}{4}, \

Pr_{z sim U} (A(z) = 0) geq frac{3}{4},

$$

Output $1$ is interpreted as “the sample comes from $D$, and output $0$ is interpreted as “the sample comes from $U$.”

Let $A$ use the black-box sampling device a polynomial number of times (at most) and get samples $z_{1}, z_{2}, ldots, z_{k}$ from $D$, for some polynomial $k$. My intuition is that, if this best algorithm decides that $z$ indeed came from $D$, then it must be true that $z_{i} = z$ for at least one $i in (k)$. In other words, since we know nothing about $D$ or its support, we have to “see” $z$ at least once in the samples we collect from $D$ to ascertain that $z$ indeed came from $D$. How do I mathematically formalize this statement?

Also, does this same intuition hold if we are given a polynomial number of samples as input (taken from either $D$ or $U$) instead of just one and are also given access to a black-box sampler for $D$?