It's worth looking at a gamma chart for additional perspective when thinking about this. The standard gamma screen, for example, is 2.2. The curve looks like this:
50% gray, in an 8-bit space, is 127 (horizontal axis). This is aligned with ~ 20% luminance output from the screen. Both for visualization and for printing, the concept of gamma is important since it provides the mapping or conversion between the linear data (camera / image) and the logarithmic sensitivity of the human eye.
The human eye can resolve something of the order of 10-14 f-stops of dynamic range to a fixed size of the pupil. This is up to ~ 3 stops better than the best 14-bit RAW DSLR cameras. Our brain is also capable of using all that data at the same time, it is as if we had a 16-bit RAW image processor integrated into our visual cortex[*] and automatically adjust the light and shadow levels, etc., to obtain a perfect exposure in real time. ~ 18% gray is just an empirical value that conforms to the processing that our eyes will naturally apply to the scene they see.
It is empirical because it works and looks medium gray in a typical scene. However, the eye is easy to deceive and is extremely sensitive to context. The brain will create, without mercy, a photoshop of what the eyes see to try to make sense of it, and the grays routinely imagine themselves as any shadow that makes sense to us. The classic illusion of this is this:
second The squares are identical in brightness. So, yes, the eye is extremely non-linear and, moreover, is not even uniform in its representation on our visual field. The shadows are illuminated, the bright lights darken and the whole scene is compressed into a narrow range of perception from which we can extract details.
When I take high dynamic range scenes, this is intuitive, I think, for photographers: we really have to work on the publication to balance a high dynamic range scene in a way that looks similar to what the eye perceives. When we can control the light, we add LOTS of it – fill in. Obtaining a balanced color photo that does not require much publication requires that we add as much light as possible to complete the dark areas of the scene, reducing the dynamic range as much as possible to produce a flatter and more evenly illuminated scene ( just like our brain tries to do with the scenes we see).
To answer the comment below, this is taken from the image above to make the point:
[*] To be more precise, for those who want it, part of the processing and compression of the initial image is done through several layers of specialized cells directly behind the retina before the information is sent to the brain.