In January 2002, I was asked to give an opening talk and performance for NAMM, the annual trade show for makers and sellers of musical instruments. What I did was create a rhythmic beat by making the most extreme funny faces I could in quick succession. A computer was watching my face through a digital camera and generating varied opprobrious percussive sounds according to which funny face it recognized. Keeping a rhythm with your face is a new, strange trick. It tickles while you do it. We should expect a generation of kids to adopt the practice en masse any year now.
This is the sort of deceptively silly event that should be taken seriously as an indicator of technological change. My sense is that by the end of this decade, pattern-recognition tasks like facial tracking will become commonplace. On one level, this means we have to rethink policy related to privacy, since hypothetically a network of security cameras could automatically determine where everyone is and what faces they are making, but there are many other extraordinary possibilities. Imagine that your avatar in Second Life (or better yet, in fully realized, immersive virtual reality) was conveying the subtleties of your facial expressions at every moment. Wouldn’t that lead to a splendid new outpouring of creative interpersonal energy?
A deeper meaning for me is that science is gaining an ability to use formal descriptions of ideas like metaphor and similarity that were previously reserved for artists and poets (see Jaron’s World: Computer Evolution for possible implications regarding the future of scientific simulations). Having explicit, rigorous ways to describe the kinds of processes that go on inside brains will bring us closer to a scientific understanding of ourselves. Indeed, pattern-recognition technology and neuroscience are growing up together.
The software I used at NAMM was a perfect example of this intertwining. It was developed by a little company called Eyematic, where I served as chief scientist. The original project had begun under the auspices of Christoph von der Malsburg, a University of Southern California neuroscientist, and his students, especially postdoc Hartmut Neven. Christoph might be best known for his influential theory from the early 1980s that synchronous firing (when multiple neurons go off at the same moment) is important to the way that neural networks function.
In this case, Christoph was trying to develop hypotheses about what functions are performed by particular patches of tissue in the visual cortex—the part of the brain that receives input from the eyes. There aren’t yet any instruments that can measure what a large, complicated neural net is doing in detail, especially while it is part of a living brain, so scientists have to find indirect ways of testing their ideas about what’s going on in there. For instance, if a hypothesis about what a part of the brain is doing turns out to inspire a working technology, that certainly gives the hypothesis a boost.
These days, neuroscience can inspire practical technology rather quickly. Although Eyematic folded, Hartmut Neven and many of the original students started a successor company to salvage the software, and that company was swallowed up by Google last year. What Google plans to do with the stuff isn’t clear yet. I hope they’ll come up with some creative applications along with the expected searching of images on the Net.
Will the age of pattern recognition inspire more privacy anxieties or creative festivities? One determining factor might be whether enough people can develop practical intuitions about how these algorithms work. Informed users will take charge, while ignorant ones will be bamboozled. I’d like to do my little bit to help, so here is an attempt at a commonsense explanation of how image-recognition algorithms work.
I’ll start with a childhood memory. When I was a boy growing up in the desert of southern New Mexico, I encountered a simple example of pattern recognition in the dirt roads. The roads had wavy “corduroy” bumps created by the tires of previous cars; the spacing of the bumps was determined by the average speed of the drivers on the road. When your speed matched that average, the ride would feel less bumpy. You couldn’t see the bumps with your eyes except right at sunset, when the horizontal red light rays highlighted every irregularity in the ground. At midday you had to drive to perceive the hidden information in the road.
Digital algorithms must approach pattern recognition in a similarly indirect way, and they often have to make use of a common procedure that’s a little like running virtual tires over virtual bumps. It’s called the Fourier transform. A Fourier transform detects how much action there is at particular “speeds” (frequencies) in a block of digital information. The graphic equalizer display on many audio players, which shows the intensity of the music in different frequency bands, is a familiar example.
Unfortunately, the Fourier transform isn’t powerful enough to recognize a face, but it has a more sophisticated older brother, the Gabor wavelet transform, that can get us halfway there. This mathematical process identifies individual blips of action at particular frequencies in particular places, while the Fourier transform just tells you what frequencies are present overall.
In order to explain wavelets, I’m going to take us back to cars driving across bumpy surfaces—but this time I’m also going to invoke some preposterous imaginary aliens. Suppose for a moment that Earth was visited by huge, jovial aliens who enjoyed imprinting patterns on the deserts by pressing city-size coins down against the dirt. The aliens’ coins mimic ours: They have flattened sculptures of human faces on each side. Like the old roads I grew up with, the impressions left on the desert floor are invisible except at sunset.
Your job is to ride around on the desert, then find and recognize the faces without waiting until sunset. You can also invite friends to ride along with you. There are a lot of strategies you could use, but I’ll describe one here that works pretty well. First, assign each driver a specific face spot to look for. For instance, imagine you are looking for the left corner of the mouth. This is where the thin ends of the lips meet, and there may be an angular wedge of darkness in between them, depending on how open the mouth is.
You have no idea how the coin was rotated, where it was pressed to the ground, or how big the face on it is. So your best bet is to start riding around at random locations in spiral motions. Why spirals? They can match up equally well with an impression of the juncture of the lips regardless of its size or orientation. This is an important idea: Your driving strategy at the most minute level determines what kinds of results you can get in seeing the big picture.
You are looking for a spot where, as you spiral around, you feel two impressions that have a gulf in between them—corresponding to the corner where the two lips and the opening of the mouth meet. As you spiral out, the impressions and the gulf should get smoothly larger, just like real lips.
Think you’ve found your spot? Now you call your friends and tell them your GPS coordinates. All the drivers have been issued a simple outline of the face spots that make up a generic face, and now they use it as a guide. If a driver thinks she has the edge of a nostril, then other drivers will look in the most likely places for the other end of the mouth, the corners of the eyes, and so on. Philosophers take note: The generic face map is like Plato’s ancient idea of an ideal version of a thing.
It’s possible that the person whose image was impressed into the ground was covering part of his face with a hand, so you might not find all the spots you’re looking for, but even so, if you find a bunch of them, you can feel confident you’ve found a face. But whose face? And what expression is it making?
To answer these questions, you need to refer to details about all the face spots that have been found in previous expeditions. Fortunately, we have an extensive database of previously recognized lists of spots that are known to correspond to individual people, to certain facial expressions, and to other qualities, like age and sex. Your new list of spots won’t exactly match any entry in this database, but it’s easy to find the closest matches. By this point, you’re pretty good at finding faces impressed in the dirt.
Where did that wonderful facial database come from? A lot of hard work—but mostly during the early phases of development. Initially, researchers had to grade the algorithm’s performance on a multitude of trials, retaining only those face patterns that gave correct results. The early stages of database gathering aren’t foolproof. A particular lab that has mostly clean-shaven engineers might initially fail to include examples from bearded guys like me, for instance. Later, once the system is performing reasonably well, it can gather more face patterns automatically. A Darwinian phase eventually begins, in which the algorithm evolves, ridding itself of incorrect face patterns and getting better and better with time.
There are striking parallels between what works in engineering and what is observed in human brains, including the Platonic/Darwinian duality: A newborn infant can track a simple diagrammatic face, but a child needs to see people in order to learn how to recognize individuals.
I’m happy to report that Hartmut’s group earned some top scores in a government-sponsored competition in face recognition. The National Institute of Standards and Technology tests these systems in the same spirit in which drugs and cars are tested; they’re important enough that the public needs to know which ones are trustworthy.
I’m less happy to report that I suffer from mild prosopagnosia, a subnormal ability to recognize faces. Computers are still not quite as good as people in general at recognizing faces, but the algorithms I’ve described here are already better at the task than I am. ￼