This is a simple exercise on Conditional Variance that I’m trying to understand the proof of.

Let $X$ and $Y$ be two real-valued random variables such that $Y$ is square-integrable. We call the random variable $mathbf{E}left((Y-mathbf{E}(Y mid X))^{2} mid Xright)$ the conditional variance of $Y$ given $X$, denoted by $operatorname{Var}(Y mid X)$. Show that for all Borel measurable $f: mathbf{R} rightarrow mathbf{R}$ such that $f(X)$ is square-integrable,

$$

mathbf{E}left((Y-f(X))^{2}right)=mathbf{E}(operatorname{Var}(Y mid X))+mathbf{E}left((mathbf{E}(Y mid X)-f(X))^{2}right).

$$

Here is the solution, where I seem to be missing something simple.

We have

$$

begin{aligned}

mathbf{E}left((Y-f(X))^{2}right) &=mathbf{E}left(((Y-mathbf{E}(Y mid X))+(mathbf{E}(Y mid X)-f(X)))^{2}right) \

&=mathbf{E}left((Y-mathbf{E}(Y mid X))^{2}right)+2 mathbf{E}((Y-mathbf{E}(Y mid X))(mathbf{E}(Y mid X)-f(X))) \

&+mathbf{E}left((mathbf{E}(Y mid X)-f(X))^{2}right)

end{aligned}

$$

The first term in the last line equals $mathbf{E}(operatorname{Var}(Y mid X)),$ while the second term vanishes since $mathbf{E}(Y mid X)-f(X)$ is $sigma(X)$-measurable. The proof is complete.

Okay, but what does $mathbf{E}(Y mid X)-f(X)$ being $sigma(X)$-measurable have to do with anything? Why does it imply that $2 mathbf{E}((Y-mathbf{E}(Y mid X))(mathbf{E}(Y mid X)-f(X))) = 0$?