Health & Medicine

Study offers glimpse into how monkeys — and machines — see a 3D world

New visual modeling research revealed a specialized algorithm deep in the primate brain that transforms 2D images into 3D mental models.

3 min read
Hakan Yilmaz and Ilker Yildirim

Hakan Yilmaz, a Ph.D. candidate in Yale’s Graduate School of Arts and Sciences, with Ilker Yildirim, an assistant professor of psychology in Yale’s Faculty of Arts and Sciences.

Photo by Dan Renzetti

Study offers glimpse into how monkeys — and machines — see a 3D world
0:00 / 0:00

Yale researchers have discovered a process in the primate brain that sheds new light on how visual systems work and could lead to advances in both human neuroscience and artificial intelligence.

Working with a new computational model, researchers uncovered an algorithm that reveals how the primate brain constructs internal three-dimensional (3D) representations of an object when viewing a two-dimensional (2D) image of that object.

“This gives us evidence that the goal of vision is to establish a 3D understanding of an object,” said study senior author Ilker Yildirim, an assistant professor of psychology in Yale’s Faculty of Arts and Sciences. “When you open your eyes, you see 3D scenes — the brain’s visual system is able to construct a 3D understanding from a stripped-down 2D view.”

Researchers have dubbed this process “inverse graphics,” describing how the brain’s visual processing system works like a computer graphics process, but in reverse, from a 2D image through a less view-dependent “2.5D” intermediate representation, and up to a much more view-tolerant 3D object.

The findings were published in the Proceedings of the National Academy of Sciences.

A human brain, in essence, transforms 2D images that one sees — perhaps on paper or on a screen — into 3D mental models. Computer graphics, meanwhile, do the opposite, rendering 3D scenes into 2D images.

“This is a significant advance in understanding computational vision,” Yildirim said. “Your brain automatically does this, and it’s hard work, computationally. It remains a challenge to get machine vision systems to come close to doing this for the everyday scenes we can encounter.”

The finding could fuel research in human neuroscience and vision disorders, as well as advance the creation of machine vision systems with primate vision capabilities, researchers say. 

In their work, researchers found that part of the temporal lobe of the primate brain — specifically, the inferotemporal cortex, an area critical for visual processing — transforms images into 3D mental models of objects.

They did this by deploying what is known as a Body Inference Network (BIN), a neural network-based model able to create a 2D representation of an object based on properties of shape, posture, and orientation.

But in this case, researchers trained BIN to invert this process, training it to construct 3D human and monkey bodies from images (labeled with 3D data) directly. With this input, BIN was shown to reverse the usual computer graphics process, arriving at 3D properties derived from the 2D images. 

After comparing this BIN data with brain data recorded in macaques as they were shown macaque body images, the researchers found that BIN’s processing stages matched activity in the two regions of the macaque brain (MSB and ASB) involved with processing body shapes.

“Our model explained the visual processing in the brain much more closely than other AI models typically do,” Yildirim said. “We are most interested in the neuroscience and cognitive science aspects of this, but also with the hope that this can help inspire new machine vision systems and facilitate possible medical interventions in the future.”

Other authors of the study included first author Hakan Yilmaz and Aalap Shah, who are both Ph.D. candidates in Yale’s Graduate School of Arts and Sciences, and researchers from Princeton University and KU Leuven in Belgium.