Alexander Sandy Pentland has wavy reddish brown hair, a matching beard, a straight nose, a ruddy complexion, and a quick smile. Spend a few hours with him at the MIT Media Lab, where he heads research into the emerging science of perceptual computing, and you’ll easily get to know his face and his expressions. No doubt you would instantly recognize him if you happened to run into him at MIT several months later. But what if he shaved his beard and went grocery shopping? Would you recognize him beardless, completely out of context, pushing a cart down the dairy aisle? Perhaps not, says Pentland. But you might say, ‘Gee, do his eyes look like someone’s I’ve seen before?’
Pentland’s goal is to teach computers not only to ask that question but to answer it. Toward that end he has captured thousands of faces in a photographic computer database--he calls it his facebase--and has developed software for searching through this collection and picking one face out of the multitude. It’s a tough trick. Just think of your own facebase, stored in your brain, containing the faces of everyone you know. When you are out on the street, walking past a parade of people, you are comparing each passing face against the ones you’ve remembered. If you get a match, a bell goes off in your head and you instantly recall who that person is. This is essentially what Pentland’s software does.
How does it do that? How do we, for that matter, distinguish one face from another? Our faces are really more similar than they are different. Each has a nose, a mouth, two eyes, two ears, a chin, cheeks, and a brow. Undoubtedly, recognition comes from seeing particular features in combination--a narrow nose, say, teamed with squinty eyes, round cheeks, and a large forehead. But they must all be working in concert with some other factors that we can’t quite put into words. In the end, though we take the process of recognizing a fellow human’s face for granted, the neurological task is as mysterious as it is instantaneous. It is a talent that is partly hard-wired into our brains and partly learned through constant practice. A baby, after all, can recognize its mother’s face in a room full of people. A senior citizen at a high school reunion can identify the face of someone he hasn’t seen in 50 years. From birth to death, the face is our most important visual stimulus, yet we don’t know precisely how we tell one face from another.
Still, just because we don’t know exactly how our brains work doesn’t mean we can’t find a way to goad computers into mimicking our actions. With computers, recognizing faces begins with the tiny dots known as pixels (for picture elements) that make up an image on the screen. Any computer image, whether text or graphic, is simply a map of pixels, with each assigned a specific shade or color. In Pentland’s facebase, a typical photo of a face is stored as a map of 128 by 128 pixels, for a total of more than 16,000.
Once the image is stored, Pentland’s software, called Photobook, goes through a series of steps to preprocess it. Just like rinsing dishes before they go into the dishwasher, the preprocessing makes the main work much easier. Every picture, for instance, must first be normalized so that all the photos will appear to have been taken with the same camera in the same lighting. The computer does this by electronically adjusting the pixels of each image until they conform to a common standard of brightness, coloration, and so on. The significant factor here is not the overall brightness of the picture but the pattern of brightness.
Then Photobook compresses the digital data needed to re-create each image so that they take up less space in the computer’s memory. Later on this compression will allow the computer to search through many images quickly. Pentland calls the specific technique he uses semantics- preserving image compression because it retains the essential meaning of each photo but expresses the information more concisely by describing it in terms of facial characteristics. For example, this method might represent an eye not as a collection of dots that correspond to points on a computer screen but in a mathematical shorthand that describes the eye’s shape and coloration.
This technique distinguishes Photobook from other types of image- recognition software. The simplest of these rely on typed verbal descriptions of an image such as photograph of Sandy Pentland wearing a Groucho Marx disguise. The computer merely sifts through the text of these descriptions and calls up a picture when it finds a word match. This crude method is effective for broad categorization, but it’s useless for finding one face among thousands of nameless faces that can’t be adequately described in words. Some other image databases keep a running index of the basic shapes of the images they hold. This makes it easy for an engineer to keep track of pistons and crankshafts and engine blocks and other automobile parts, but it hardly works for faces, which are all essentially the same shape. Still other databases differentiate among images by measuring the amount of certain colors, but by itself this method cannot sort apples from fire trucks, let alone one face from the next. For that task all these approaches are grossly inadequate, even when used in combination. The problem is that the differences between faces are so subtle that discerning them demands a far more explicit rendering of details, in a way that allows them to be compared and identified precisely.
Thus arises the last important step in establishing the facebase, a somewhat disturbing process called averaging. Just as you might get an average weight for a roomful of people by adding their individual weights and then dividing the sum by the number of individuals weighed, Pentland takes a mathematical average of faces. The computer looks at measurements it has taken of each feature of each of a few hundred faces--the shape and position of the eyes, ears, nose, mouth, cheekbones, and so on--and from them calculates a geometrically average constellation of features. The result invariably looks simultaneously eerie and pleasant. It will almost always look like an androgynous 17- or 18-year-old, one that by definition has no distinguishing characteristics. It will also look surprisingly attractive (see Such a Lovely Face, page 87).
Since no two people on this planet of more than 4 billion look exactly alike, you might think that there must be millions of ways in which faces differ from one another. Not so, Pentland says. Faces actually vary according to a mere 100 factors. Each face is a unique mixture, says Pentland, but it’s a mixture of only 100 things, at most. Most faces, in fact, are adequately described by 20 factors.
What are these factors? You can’t put English words on them, Pentland says. You can say a certain person has a wide nose, big eyes, a crooked mouth, or a cleft in his chin. But these 100 factors are more complicated than that. For instance, where is the nose wide? At the top, between the eyes, or down by the nostrils? And how does the nose sit in relation to the rest of the face? It’s configurational, Pentland says. It’s holistic. You can’t explain it.
Yet the computer must understand what these 100 factors are. Pentland fosters such understanding through the use of a technique whereby each face image is deconstructed into separate eigenfaces, a word derived from the German prefix eigen, meaning own or individual. An eigenface is a set of facial characteristics that tend to occur in tandem--in other words, if a person has one of these characteristics, he has them all. By the same token, the characteristics that make up one eigenface have no correlation with those of any of the others; and having one set of characteristics implies nothing about having any others.
Pentland’s eigenfaces are purely mathematical constructions. To generate them, the computer first takes the initial group of a few hundred faces that were used to build the average face and goes through them one by one, measuring how much its features differ from those of the average face. Then it correlates the measurements and sorts them according to which deviations tend to occur together. Each group of deviations constitutes an eigenface.
On Pentland’s computer screen, some of the eigenfaces seem to concentrate on easily defined areas: one might highlight the slope of the forehead, another the curve of the upper lip. Looking at them, you might think that Pentland has come up with a snazzy, computerized version of Mr. Potato Head, with its stick-on eyes, eyebrows, ears, nose, and mouth. The other eigenfaces, however, quickly dispel this impression. They appear as fuzzy, darkened faces with several unconnected areas brightly highlighted-- a vague region somewhere on the underside of the jaw, for example, teamed with similarly borderless regions under the nose and encircling the eyes.
With its set of 100 eigenfaces, the computer can now easily analyze all the faces in its facebase, each of which can be expressed as a combination of the eigenfaces--more of some, less of others. The eigenfaces are like filters that allow the computer to see just one aspect of a face at a time. They are also a sort of shorthand for describing just how each face differs from the average.
Finally the computer is ready to analyze a target face to see if it matches one or more of the faces in the facebase. The first step is to analyze the target face in terms of its component eigenfaces, an operation that Photobook accomplishes in seconds. As it turns out, in most cases a face can be identified using only a handful of eigenfaces--that is, for any given target face, most eigenface features will probably be nearly or completely absent while a few will be strongly present. After that analysis, it is a simple matter for the computer to reach into the facebase and pull out those faces that have similar eigenface features.
Pentland has several facebases; the largest contains about 7,500 photos of roughly 3,000 people, showing them at various angles and wearing different expressions. To demonstrate the system, Pentland calls up an image of a man with dark hair and a square face. With a click of his mouse, he instructs Photobook to find the photo or photos in the facebase that best match this target image. After a momentary pause, Photobook responds with 12 choices ranked in order of how closely they match. Sure enough, the first two images show the target face in slightly different poses.
Pentland’s work in face recognition began several years ago as a potential Orwellian nightmare. At the time, the Arbitron Company was looking for ways to leapfrog its main rival, the Nielsen Company, in measuring the habits of TV viewers. Nielsen, of course, is famous for its determinations of how many people watch a given television show, and its ratings are used to decide how much advertisers pay for time on the program. Nielsen obtains its ratings through a meter installed in the TVs of a few thousand families around the country. But the device determines only when the TV is on and what channel it is tuned to, not who is actually watching the program or when they are looking at the screen. In 1987, Arbitron executives approached Media Lab director Nicholas Negroponte; they wanted to know if it was possible to invent a people meter, a box that would sit on top of the TV and watch the watchers, recognizing precisely when each member of a household was in the TV room and what he or she was doing.
Negroponte knew just the person to lead the Arbitron project. Sandy Pentland had done his undergraduate work in psychology and computer science and had received his doctorate from MIT in psychology and artificial intelligence. Negroponte had lured him back to MIT a few years later to head a group studying machine perception.
Pentland agreed to take on the Arbitron project, but after three years the sponsors lost interest. It wasn’t because the public became aware of the scheme and objected to it. And it wasn’t because Pentland and his crew couldn’t make face recognition work the way Arbitron wanted. On the contrary, says Pentland, Arbitron pulled out of the research because face recognition worked too well. The company decided that if advertisers found out too much about people’s real viewing habits, they might scale back spending, thus upsetting Arbitron’s clients. What if they realized that most people fall asleep when watching baseball? Or that most evening news shows aren’t really watched, just used as background noise?
Pentland has since expanded his work into new areas, one of the most important of which is known as expression analysis. This offshoot of his technology is predicated on the notion that if computers are ever to become better servants, they should be able to tell whether we’re in a good mood or bad, bored or anxious.
Chief among the members of his staff working on the problem is computer scientist Irfan Essa. To get computers to read facial expressions such as happiness or anger, Essa has designed three-dimensional animated models of common facial movements. His animated faces move according to biomedical data gathered from facial surgeons and anatomists. Essa uses this information to simulate exactly what happens when a person’s static, expressionless face, whose muscles are completely relaxed and free of stress, breaks out into a laugh or a frown or some other expression of emotion.
Essa is piggybacking on cross-cultural studies done in the sixties and seventies by psychologists Paul Ekman at the University of California Medical Center in San Francisco and Carroll Izard of the University of Delaware. Their studies have shown that different peoples use surprisingly similar facial muscle movements to convey expression. Specifically, Ekman’s studies have revealed six expressions that are constant for all of humanity: anger, disgust, surprise, happiness, sadness, and fear. In other words, one person’s angry or surprised expression looks much like everyone else’s. There is no culture, for instance, in which a furrowed brow means happiness, Essa notes. Anyone could walk into the Amazon jungle with a broad smile. The people there wouldn’t be able to talk with him, but they would know he was happy.
Hoping to use this work as a basis for computers that can recognize expressions, Pentland and Essa are processing video footage of faces and generating motion energy maps. These are pictograms that use bright blotches of color to show how the mouth moves, the eyes squint, and the cheeks scrunch. By matching these motion patterns with one of the six common expression maps, Essa can already get his computer to determine whether someone is happy, sad, angry, or surprised, and so on.
Essa is not content with the six universal expressions, however, and is trying to expand the range of emotions the system can identify. Boredom is a tough one, he says. To detect it, you need to know the context of what the person is doing. Confusion is even more difficult because it can vary even among different people in the same culture. Essa is also working on distinguishing between a fake smile, in which just the lips move, and a real smile, which involves a softening of the eyes. This difference explains why good actors must actually feel the emotion they are trying to express. If you can’t fool a computer, you can’t fool an audience.
Pentland envisions many applications for Essa’s work. Let’s have the computer read our faces, he says. If a kid using an educational CD-ROM appears bored or confused, the software should respond appropriately. It could jack up the entertainment quotient, he says. Or it could slow down and backtrack.
Another potential application involves teleconferencing. One of the major barriers to sending live video back and forth over long-distance phone lines is that video requires more bandwidth, or transmission capacity, than most of today’s phone lines have. If a set of your mother’s most common facial expressions were stored locally on your videophone, the phone could evoke a certain expression and display it every time your mother made it. When the videophone on your mother’s end recognized that she was totally disgusted, it would have to transmit only a brief code to indicate the emotion.
The marketplace may find even more applications for Pentland’s face-recognition technology. Last year, for example, British Telecommunications--which provides partial funding for Pentland’s work-- began developing a security system based on Photobook. It would use video cameras to scan crowds of shoppers, and Pentland’s software to match those faces against a database of mug shots of criminals who have repeatedly been caught shoplifting. If a match occurred, the system would alert security guards.
The U.S. Army, too, has been funding Pentland’s research, with applications in mind not only for itself but for other branches of the military and for federal law-enforcement authorities. One is a simple secure-entry system. Military personnel would have their faces stored in the system. Then, when someone was trying to gain entry to, say, a nuclear submarine, the face-recognition software could check to see if that person was authorized to do so. If not, access would be denied. According to Pentland, preliminary tests by the Army found the technology to be 98 percent accurate, meaning an average of 2 faces out of every 100 may result in a false match. Pentland expects to improve the reliability of his software so that such a system would be more secure than encoded ID cards. Unlike a card, Pentland points out, a face cannot be lost or stolen. Photobook can even penetrate a heavy disguise, Pentland claims, by homing in on the bone structure around the eyes, which, he says, is the most enduring and difficult-to-change part of the human face.
Recently the White House questioned Pentland about using face recognition to thwart terrorists and drug runners. We know who the terrorists are, Pentland says. There is a small set of bad guys. As was alleged in the Oklahoma bombing case, the suspects typically scope out a building or other target many times before doing a job. Face-recognition cameras around public locations could check whether certain known suspects were showing up frequently. In addition, cameras at customs checkpoints could spot the faces of known drug dealers who typically use disguises, fake passports, and phony visas.
By the end of the decade, Pentland predicts, face-recognition technology will be everywhere. Already several states are testing early versions of the software, which have been licensed to several commercial companies. The Massachusetts Department of Motor Vehicles plans to test it on drivers who claim they’ve lost their licenses and want a replacement. The goal would be to see if those drivers really are who they claim to be, thus foiling those trying to obtain phony identification. Fingerprints, of course, could also verify identities, but fingerprinting takes so much time that it is impractical to use on everyone. Since every driver’s photo is already on file, scanning a face and matching it against a large set of faceprints would be easier.
Such uses inevitably raise the question of privacy: Will cameras that recognize you eventually track you down and feed your itinerary into government and corporate databases? Perhaps. But Pentland prefers to accentuate the positive aspects of his technology. Already more and more video cameras are appearing in public places, he points out, and people accept the trade-off. At automatic teller machines, for instance, the presence of video cameras makes people feel more secure. It may even deter crime. Pentland says those systems should all have face-recognition software built into them so that legitimate customers can be recognized. In this respect, your face could be used instead of, or in addition to, your four- or five-letter password.
Used in the proper way, face recognition might foster what Pentland calls a small-town environment. We would like to make the world seem like a small town, he says, where everyone might know your business but where everything is done on a friendly basis. In a small town, you know the bully. You watch out for him. But for the good people, doors should open and services should be available to them just because they show their faces. Whether that’s wonderful or not, he adds, depends on whether you believe small towns are good or bad. Privacy is not so much a question of technology, he says, as of preventing authorities from giving out information about where you go and what you do. As long as different companies and government agencies don’t provide such information to a central source, people shouldn’t have a problem with it.
In the future, Pentland hopes, all our machines, from PCs to cars, will get to know who we are and what we prefer. That will let the machines handle mundane tasks and free people to live and work more comfortably and intelligently. And though Pentland acknowledges the fear some people have that artificially smart computers will wind up actually running things and controlling our lives, he’s pretty sure we are on the right road.
At the heart of Pentland’s techno-optimism is a distinction he believes is critical. His overarching goal is to give machines the same perceptual skills as people, through the complementary technologies of face recognition and expression analysis. This might sound like the research on artificial intelligence, that elusive quest to bestow thinking skills on computer chips, but Pentland vigorously shuns the AI label. AI is the study of how to replace people with machines, he says. I want to make people more powerful.
The next time you gaze in rapture at a supermodel, just remember that though that face may be seen by the world as breathtakingly beautiful, it is also likely to be unusually average.
Or geometrically normalized, as Nancy Etcoff would say. Etcoff, a psychologist from MIT who’s now on staff at Massachusetts General Hospital, has long been interested in notions of facial beauty. These days she regularly visits Sandy Pentland’s shop at the MIT Media Lab to study the averaged faces that his software yields. She has found that they bear a striking resemblance to those of supermodels such as Kate Moss, one of the most celebrated faces of the 1990s. Moss, in Etcoff’s opinion, looks like an androgynous 18-year-old with few distinguishing features.
Moreover, Etcoff’s studies suggest that the more faces you combine, the more attractive the result seems to become. Etcoff believes her findings refute assertions such as those made by Naomi Wolf in her best-selling book The Beauty Myth that attractiveness is subjective and that feminine beauty is a construct of Madison Avenue.
There really is something recognizable and exciting about a beautiful woman, Etcoff says. Madison Avenue simply exploits the preferences that we are born with. There may be in our brain some sort of averageness computer. Composite faces and beautiful ones seem to share a tendency to be symmetrical, with no discernible differences between the right and left sides. Some studies suggest that animals with symmetrical features are more likely to find mates, but the issue is unresolved.
Of course, beautiful faces are often anything but average. Extreme traits, such as the large eyes of Bette Davis, the robust jawline of Humphrey Bogart, or the mole near Cindy Crawford’s mouth, can be deemed ultra-attractive. It’s the peacock’s tail idea, Etcoff says. Some people may be viewed as so attractive that they’re able to afford one or more eccentric or flamboyant traits. By contrast, if you had a face that had the largest variance from the average, you might look like the Joker. On the other hand, if you conscientiously try to achieve the smallest variance from average, you risk mimicking Michael Jackson, who seems to be cosmetically altering his face so that it approaches a universal ideal. -- E.I.S.