I found a book where the author uses the following property of mutual information:

Leave $ X $,$ Y $,$ Z $ be arbitrary discrete random variables and leave $ W $ Be a random indicator variable.

$$

(1) I (X: Y mid Z)

= Pr (W = 0) I (X: Y mid Z, W = 0)

+ Pr (W = 1) I (X: Y mid Z, W = 1)

$$

I do not understand why this property is maintained in general.

To show this, I was thinking of proceeding as follows:

begin {align}

I (X: Y mid Z)

& = E_z (I (X: Y mid Z = z)) \

& = E_w (E_z (I (X: Y mid Z = z) | W = w)) \

& = Pr (W = 0) E_z (I (X: Y mid Z = z) | W = 0) \

& + Pr (W = 1) E_z (I (X: Y mid Z = z) | W = 1).

end {align}

where the second line follows the law of total expectation.

However, this does not seem to be the right approach since it is not clear to me that

$$

E_z (I (X: Y mid Z = z) | W = w) = I (X: Y mid Z, W = w) $$

holds.

What is the correct way to show (1)?