Wikipedia’s article on parametric statistical models (https://en.wikipedia.org/wiki/Parametric_model) mentions that you could parameterize all probability distributions with a one-dimensional real parameter, since the set of all probability measures & $mathbb{R}$ share the same cardinality.

This fact is mentioned in the cited text (Bickel et al, Efficient and Adaptive Estimation for Semiparametric Models), but not proved or elaborated on.

This is pretty neat to me. (If I’d been forced to guess, I would have guessed the set of possible probability distributions to be bigger, since pdfs are functions $mathbb{R}rightarrowmathbb{R}$, and we’re counting probability distributions that don’t have a density, too. It’s got to be countable additivity constraining the number of possible distributions, but how?)

Where could I go to find a proof of this, or is it straightforward enough to outline in an answer here? Does its proof depend on AC or the continuum hypothesis? We need some kind of condition on the cardinality of the sample space that neither Wikipedia or Bickel mention, right (if it’s too big, then the number of degenerate probability distributions is too big)?