Of everything in the resolution series, the theory about how much detail is perceptible to a human is the only part that is hard to measure and quantify. When someone tells you that the maximum that you can perceive is governed by the size and spacing of your cone cells; they are certainly wrong. But how far it goes beyond that is hard to quantify and likely varies significantly from person to person, which would go a long way to explaining the disparity between people who can’t imagine, let alone see the difference between a 1080p display on a phone vs 4k, while other people see it so clearly that it’s hard to imagine someone without a visual impairment not being able to see it.
In this article, I’ll outline the mechanisms at play. If you’re finding this part of the greater resolution topic hard-going, you may want to start with the deep-dive into how fonts are displayed on a 1080p display.
Here is a video overview of what is perceptible to a human:
Table of contents
Mental model of the scene
Look at this illustration of what the retina sees in the Additional Images section of Wikipedia’s retina entry.
Notice how little of that image is clear. Here is my representation from the video:
Above: A frame from the original video showing an approximation of how much of a human eye’s field of view is detailed. In reality I way under-sold how narrow the detailed area is. Please refer the above wikipedia link for a more accurate representation.
Yet unless you have some form of visual deficiency, you have a complete picture, and everything is clear. More like this:
Above: A frame from the original video representation showing a complete mental model of the world.
To properly understand this, I suggest beginning with visual memory, working memory, sensory memory, and decay theory (in relation to working memory) and follow the rabbit hole. This is quite a fascinating rabbit hole. But I’m not going to do justice to it here. So the gist of it is:
- Our eyes are regularly moving, scanning the scene so that the detailed section of the retina can gain highly detailed information, painting our mental model of the world with detail.
- With more time, we can fill in more detail.
- Some predictable movement (eye, head, body, object) helps us build a much more detailed picture.
- More/unpredictable movement detracts from our ability to build detail.
- After some time of not scanning part of the scene, that part of the memory will expire. A common place to experience this is while watching an interesting movie on a small screen, at a distance:
Above: A simulation of a reduced visual model during concentration sufficient to suspend regular eye movement.
Interpolation
One of the things that really struck me while researching all of this, that really adds weight to how we interpolate detail to increase our understanding of the world, is the ConeMosaics diagram on Wikipedia’s Retina entry. What really jumped out at me was how non-uniform it is. This says so much, so let’s work through it.
Do we interpolate?
What would it look like if we didn’t?
There are very few straight lines of cone cells on the retina, and none for any meaningful distance. Without interpolation, straight lines would look jagged, and those jagged edges would inconsistently dance around as the line moves through our vision. This isn’t 100% it, but you can get the idea by looking at diagonal lines on old low-resolution dos games:
Above: A screenshot of an old dos game. Importantly the horizon is tilted slightly, causing it to have regular distortions.
With those cones being laid out in a non-uniform positions, they give us some interesting advantages:
Reducing bias towards particular orientations
As an object rotates in our vision, it has a relatively good chance of having a similar number of cones cleanly matching regardless of the orientation. Ie We don’t have to tilt our heads to perceive any given object well.
Simulating a higher resolution
But this is where it gets really cool.
- Draw an imaginary line through that diagram intersecting the center of one of the represented retinas (there are two in the diagram).
- Move that line slightly.
- Notice that all along the line; cones that cleanly match with the line are now different from the ones that cleanly matched a moment ago.
- Also notice how some cones match cleanly, some match by a varying degree, and some don’t match at all that did a moment ago.
- As we move the line, we are filling in detail about the line that we didn’t have before. Because we know how much we moved it, and we have a clear signal for which cones are cleanly matching the line.
- This is why a small amount of movement that you control can make something easier to read, while a small amount of movement that you don’t control can make it much harder. Ie movement that you don’t control, is harder to predict, which breaks your mental model, and tracking, which makes it even harder to regain confidence.
- Now think back to the interpolation from a couple of sections ago.
- We can accurately determine where that line is.
- Knowing where that line is in relation something else, eg the other side of an “l”, gives us more confidence in the shape.
All-in-all, this gives us significantly higher logical resolution than we physically have, and has a lot in common with how modern camera sensors work.
I think that all of this was probably inspiration for Google’s super res zoom.
Spatial encoding
Until I dug in to the research, I had no idea that we do Spatial encoding. The retina effectively compresses the image it sees to get it to the brain using limited bandwidth. I’m mentioning it here because it effectively reduces the logical resolution that the brain receives. Not enough to counter the points above, but it is part of the bigger picture.
Sticks and stones, and questions
Above: A frame from the explanation in the video of why the resolution of the retina is not the limiting factor.
The visualisation in the video was a nice visualisation, and I’m happy with it. But I think I’ve explained it better in this blog post. If there’s anything in there that’s unclear, or you have any other questions, please post them in the comments on the video, and I’ll post updates from time to time.
Connecting the dots
Something that jumps out at me with all of this; is that from one person to the next; we may not all use all of these techniques to the same level.
I remember throughout my days at school, I used to get reactions from people about how small my hand-writing was. I didn’t do it for the sake of it, I did it because that was what was easiest for me to read. On occasion, I’d show off, and my capital letters would be less than 1mm high. That level wasn’t practical: The pen wasn’t fine enough, and my hands would cramp pretty quickly. But maintaining under 2mm was very comfortable for me. It wasn’t easy for anyone else needing to read it, like my teachers, so in hind sight …. That didn’t age well. Sorry to my teachers, and thank you for your patience.
A few decades have passed since then, and my eyes are a few decades older. But to this day; the most comfortable size for me to read is where the capitals are around 1.5mm high. - I expect this to get larger as I get older.
As I was recalling all of this, I connected a couple of dots that may or may not be supposed to be connected…
If I was regularly reading at this size when I was little, I probably trained myself to make use of all of these techniques to a higher extent than someone who wasn’t. This is another reason why getting people to try the app to get some data would be really interesting regardless of which resolution your phone has.
Wrapping up
This post was a lot less clear-cut than I’d normally like, but I thought that it was important to cover it. Next week, I have something that you can directly try yourself. In the mean time; if you haven’t checked out the deep-dive into how fonts are displayed on a 1080p display, you totally should.