Can you read this title with your peripheral vision?
Probably not — only the middle of your sight has the necessary “resolution”. But why? AI might just have an answer…
One of the main areas in which AI is utilized is image processing. Be it the detection of cars by a Tesla’s auto-pilot or the recognition of cancer cells — AI is good at analyzing images:
Such AI programs are popularly conceptualized as resembling the human vision in a way — hence why they’re called neural networks, albeit artificial. But are they really all that similar?
Human vision & computer vision — the difference
Take the task of classifying visual information for example — that is being able to tell what object is on an image. Does an AI program see the same in the image that you see in real life?
One may say that we have a sense of depth in what we see, which an image doesn’t provide, but the difference is even more fundamental. The human eye has what’s called “variable” resolution. You’re reading this article now, but would you be able to if you move your screen in your peripheral vision, without turning your eye in that direction?
Hey, don’t try too hard; you can come back now.
In any case, if you really moved your screen in the periphery and are not using a 70pt+ font, you probably weren’t successful at reading.
Why is that?
Only the center of our vision has high-resolution fovea (cells), which makes sense from an evolutionary perspective; vision is expensive. Over 50% of our brain is already dedicated to processing what we see. If we saw high-resolution everywhere, we’d probably need the head of an alien to house our brain… Now, consider what an AI program “sees” in an image. Here’s a comparison of the two:
Notice a difference? You may have not considered it before, but your eye’s resolution gradually decreases away from the focus — hence the term “variable” resolution — but what an image captures doesn’t (excluding the odd macro-portraits). This “variable” resolution strategy significantly reduces the amount of information our brains need to process.
“AI achieves superhuman accuracy at image classification”
Some AI snobs conclude
Yeah, right — only that AI programs “see” a lot more information in an image than your eyes do in the same field of view.
If our brain receives “variable” resolution input to recognize what we see and an AI program receives full-resolution one, is it fair to compare the two? Probably not.
+1 point for humans?
Not so quickly.
Thinking about this gave birth to the research I conducted, led by Prof. Shimon Ullman and Dr. Daniel Harari at the Weizmann AI Center: What would happen if we give AI programs the same “variable” resolution mess that we see? Would they still do well? Can we build an AI program that is tuned specifically for such “variable” resolution input?
Excluding the technical details, in short, you can never do better at analyzing an image if you’re provided with less information to begin with.
Essentially, this means that an AI program given a “variable” resolution image will virtually always perform worst than one given a full-resolution image — simply due to lack of information.
This makes sense if you think about it. Imagine you only saw a small circle comprising 5% of your normal vision:
The circle clearly reduces the amount of information you perceive. Naturally, it would be harder to tell what you’re seeing, right?
The same translates to the AI programs — the less information they’re given, the more likely they are to make a mistake.
One of the interesting paradigms in our brain, however, is that it has learned to distribute this little information in a very effective way — depicting the entire scene with variable resolution, rather than just a small circle, but that’s a topic for another time!
Now for the interesting part: exactly how much worst would an AI perform, given 5% of the information in an image? 2x? 3x?
The result will surprise you.
Consider those 2 images of a lamp:
One image contains 20X less information than the other. Imagine we have a million such pairs of images, showing everyday objects/entities (the ImageNet dataset), and we make 2 AI programs (ResNet-s) — one to classify the full-resolution images (left) and one the “variable” resolution ones (right).
As discussed, we expect the “variable” resolution AI (right) to perform worst. However, can you guess how much worst — considering the 20X less information it is given?
Stop here if you want to think about this.
The AI given the full resolution images (left) classified them correctly 75% of the time. The AI given the “variable” resolution images (right), which keep in mind contain 20X less information, classified them correctly 66% of the time.
Only a 9% difference for so much less information! Would you have guessed that? Let’s see why it is significant.
We just demonstrated that an AI program tuned to “variable” resolution images — which contain only 5% of the original information — can perform only 9% worst compared to its high-resolution counterpart. In essence, the extra 95% information that the full-resolution AI has access to, apparently, contributes very little — indeed, only 9% (66% vs 75%).
Starting to understand why we’ve evolved to see in “variable” resolution? Even if we did see high-resolution everywhere, it probably wouldn’t have made a huge difference in how well we comprehend the world! It would have, however, made a huge difference for our head size & energy needs; more information to process needs more brain. Evolution has figured all of this out on its own!
Personally, I find it fascinating to be able to recreate this result through artificial neural networks. I hope you did too!
Did you know you were seeing in “variable” resolution before reading this article?