# Concept

## Color and neural networks

I’m hoping I can stumble on a better description of human color perception by trying to answer the question:

What can be said about the relationship between neural networks for vision and human color perception?

In many ways, this question has issues. The topic of human color perception is the subject of deep unanswered questions. Color is a property of our mind, an appearance in consciousness and I am currently of the belief that we won’t get a satisfying answer to the question what is color? without knowing something deep about consciousness. So, it is tenuous to then take the idea of color perception and ask how it relates to neural networks. It might be a useless question for other reasons too: these ideas might be so different that their comparison is useless. Sure, the idea of pain is as nebulous a topic as color, but I’m reasonably confident that the question “what can be said about the relationship between pain and NAND logic gates?” is pretty useless.

At this point I can start to mount a defense of the question. To me, it doesn’t seem obvious that color and neural networks are so vastly different as to be unrelatable. If the answer to the question is “No, nothing much can be said about the relationship between neural networks for vision and color”, then this conclusion would be an interesting and valuable insight in itself.

We can shift the inquiry slightly by avoiding the word color and instead talk about light and reflectance; this switch helps avoid the interesting but nebulous questions of consciousness. Having said that, it is difficult to explain the ideas without reaching for the word color, and I don’t make an attempt to avoid it.

## Project map

The following mind map lays out some of the project ideas.

## Paper ideas

A brainstorming activity. If the following were paper titles, would they make sense, would they be interesting and could supporting evidence ever be found?

• CNNs trained on ImageNet are colorblind.
• ResNet trained on ImageNet is colorblind.
• ImageNet classification can be achieved in grayscale.
• CNNs trained on ImageNet develop a 2D colorspace.
• CNNs trained on ImageNet are [not] unaware on related colors.
• CNNs trained on ImageNet are [not] invariant to illumination changes.

These questions can be reworded with different network types, different vision tasks and different datasets.

## The dataset that trained human vision

The evolution that gave us human vision can be thought of as having been run on a dataset that is now impossible to recreate. Even to fix a period, say 40-50,000 years ago, and ask what was the distribution of scenes humans lived in, or what was distribution of light they saw–even this question seems impossible to answer. But this is the sort of data that is needed in order to understand the directions human vision was pushed in by evolution. Even if we were to obtain this type of data, we would need a good idea of the state of human vision at that point, as evolution is not producing optimal systems, but reworking existing ones while being ignorant of any global optima. For example, our understanding of the three human cone types cannot ignore the loss of two cone types experienced by mammals during the time of the dinosaurs and the later duplication of the red cone.1

## The need for human-like vision

For many tasks there is no requirement for a model to create any specific intermediate representation before outputting a result—standard object recognition tasks fall into this category. Tasks that ask questions about human perception; however, inherently require a degree of parity with human vision. If the prediction task is to determine the response of a human, say to a visual stimulus, then the nature of human vision, such as the representation of shape or color becomes relevant. For a specific example, consider a photo editing application taking instructions from a human like Change the color of the boots to look more brown.’’ For the application to succeed, it seems important for system to understand how to more strongly elicit the sensation of brown in humans.

A second benefit of models having a degree of parity with human vision is that it allows greater model interpretability and explainability. Consider a model whose decisions depend on fine texture detail imperceptible to humans; compare it to a model whose decisions depend on representations with a degree of human vision parity—the latter affords an easier exploration into the behavior of the model.

## Color science techniques applied to neural networks

Can the approach taken in the paper Could a Neuroscientist Understand a Microprocessor?“ be taken in the space of neural networks and color science? The paper explains itself clearly in it’s abstract:

There is a popular belief in neuroscience that we are primarily data limited, and that producing large, multimodal, and complex datasets will, with the help of advanced data analysis algorithms, lead to fundamental insights into the way the brain processes information. These datasets do not yet exist, and if they did we would have no way of evaluating whether or not the algorithmically-generated insights were sufficient or even correct. To address this, here we take a classical microprocessor as a model organism, and use our ability to perform arbitrary experiments on it to see if popular data analysis methods from neuroscience can elucidate the way it processes information.

The authors arrived at the conclusion that the analytic approaches in neuroscience fall short of expectations.

The abstract method of this paper is to take known techniques that are used to investigate an unknown system and apply them to a known system. A priori, there is confidence as to what concepts are necessary and sufficient to understand the known system. After applying the techniques to the known system, we can investigate to what extent the techniques reveal these concepts. Difficulty in arriving at the expected necessary concepts is evidence that there the techniques are lacking. Conversely, concepts that are revealed may refute this claim of insufficiency by instead proposing an alternative description.

When considering color and neural networks, is there a mapping between tools and systems that would make this type of investigation interesting? The techniques such as color matching have had success laying the foundations of colorimetry and and later color appearance models. What is it about human vision that made these techniques useful? Can they be used to investigate vision systems, and if not, what is missing that prevents the application of the techniques. Going one step further, what is the minimum properties of a vision system that must exist for the techniques to be applicable.

Coming from the reverse direction, there are many papers that claim to train neural networks to approach the behaviour of human vision networks. Tom Baden’s lab and other labs are making progress uncovering the circuits of the retina. If these latter circuits are taken as the “known system”, do the neural network approximations hold up on comparison.

## 3D prior

My general sentiment is that many neural network models used for vision tasks such as classification do not develop a single 3D model representation. There is too much flexibility for the network to develop multiple parallel expedient representations. If networks are designed with constraints that force them to develop a sort of 3D interface, then maybe this is a space in which it is possible to find scene representations, including surface reflectance properties, that can be compared to human experience.

## People list

Vincent Sitzmann
Assistant Professor at MIT EECS, running the Scene Representation Group (Jan, 2022). Working on representing scenes with neural networks. The representations being studied by Sitzmann are a better suited to search for and encourage material encodings compared to 2D convolutional networks.
Professor at University of Sussex, running the Baden Lab (Jan, 2022). Deciphering vision by understanding retinal networks. Everything from Tom Baden’s lab is excellent. A talk on the work in his lab..
Philipp Henzler
3rd year PhD student (Nov, 2021). Working on neural texture representation and 3D reconstruction. Wrote the 2021 paper Generative Modelling of BRDF Textures from Flash Images.
Takuma Morimoto
Postdoctoral fellow at Department of Experimental Psychology, University of Oxford (Jan 2022). Has done research into color constancy, contextual colors and the color brown.
Keiji Uchikawa
Professor at Tokyo Institute of Technology. Investigating color constancy.
Akiyoshi Kitaoka
Professor of psychology at Ritsumeikan University in Osaka (Jan, 2022). Running the journal of illusions. Illusions are a good source of inspiration for identifying test cases to compare human and machine behavior.

## Bibliography

1.
Baden, T. & Osorio, D. The Retinal Basis of Vertebrate Color Vision. Annual Review of Vision Science 5, 177–200 (2019).
2.
Morimoto, T., Kusuyama, T., Fukuda, K. & Uchikawa, K. Human color constancy based on the geometry of color distributions. Journal of Vision 21, 7 (2021).
3.
Uchikawa, K., Morimoto, T. & Matsumoto, T. Understanding individual differences in color appearance of ‘#TheDress based on the optimal color hypothesis. Journal of Vision 17, 10 (2017).
4.
Buck, S. L. & DeLawyer, T. A new comparison of brown and yellow. Journal of Vision 12, 9–9 (2012).
5.
DeLawyer, T., Morimoto, T. & Buck, S. L. Dichoptic perception of brown. Journal of the Optical Society of America. A, Optics, Image Science, and Vision 33, A123–128 (2016).
6.
Buck, S. L. et al. Influence of surround proximity on induction of brown and darkness. Journal of The Optical Society of America A-optics Image Science and Vision 33, (2016).
7.
Morimoto, T., Slezak, E. & Buck, S. L. No effects of surround complexity on brown induction. Journal of The Optical Society of America A-optics Image Science and Vision 33, (2016).
8.
Araujo, A., Norris, W. & Sim, J. Computing receptive fields of convolutional neural networks. Distill (2019) doi:10.23915/distill.00021.