I always had this weird idea that I had to know lots of different things, says a faintly uncomfortable Marty Sereno. He shifts in his chair, turning his back to a Post-it-fringed computer screen and a table so deep in open journals that it looks like the cross-bedded planes of a geologic formation. Sereno is sitting in his office at the University of California at San Diego, where drawn miniblinds shut out the distracting southern California sunshine. When you’re trying to do interdisciplinary stuff, he continues, it’s no good putting two specialists together in a room, because they can’t talk to each other. You’ve got to pretend you’re in the other field; you have to go and live with the natives. It has to be all in one head.
To understate the case considerably, the 40-year-old Sereno has a lot of things in one head. There is his primary research interest, of course, which is the neurological architecture of vision in primates and rodents. Then there are the new techniques in brain imaging that he has helped pioneer, and the computer programs he and his collaborators have conceived to display the results. There is, as well, a wealth of information on subjects as various as linguistics, communication systems in animals, philosophy, and modern jazz (he’s an avid guitarist).
And then there is his unconventional theory about brain evolution and the origins of human language, which has been simmering on a back burner of his mind since graduate school. The theory appears flamboyantly interdisciplinary and complex. But Sereno merely shrugs at that characterization. Some things just have a lot of parts, he says. Not an impossible number, but enough that ten won’t do. Sometimes it just has to be a hundred.
Reduced to almost haiku proportions, Sereno’s idea is this: language ability arose in the human brain not through the development of a new, uniquely human language organ, as most accounts have it, but by a relatively minor rewiring of a neural system that was already there. And that neural wiring belonged largely to the visual system, a part of the brain that recent research--including Sereno’s own--has shown to be almost unimaginably complex.
These are statements slightly less heretical than those an earlier Martin nailed to the door of Wittenberg Castle Church, but not by much. Language is often regarded as a cognitive boundary, one of the last things that separate us from our primate cousins. But if Sereno is right, and language rode into our brains on the coattails, so to speak, of vision, we humans are once again a little less special than we thought.
For the moment, the evidence Sereno produces to support his theory is largely circumstantial; he cites chiefly the path taken by the brain in the course of its evolution. Roughly 500 million years ago, when the first vertebrates appeared, a small lump at the rear of the brain stem expanded to become the cerebellum. At the same time, a pair of small, primitive structures surrounding the brain stem and cerebellum expanded into the two hemispheres of the cerebrum. Finally, between 200 and 300 million years ago, the six-layered cerebral cortex appeared in mammals as a blanket of nerve cells covering the cerebrum. In humans, this eighth-of-an- inch-thick layer is folded in intricate wrinkles, and the rear two-thirds of it is divided into areas corresponding to the senses, or what neuroscientists call modalities: hearing, touch, vision, and so on.
As Sereno points out, we know that in the other, nonhuman higher primates, the visual processing system takes up half the cortex. It is not known precisely how much of the human brain is given over to vision--the kinds of invasive experiments traditionally needed to determine that can only be done in animals, not people. But Sereno thinks it likely that new techniques in brain scanning will reveal that we, too, have that much visual cortex in our brains. And he thinks that natural selection could well have jury-rigged those preexisting structures to perform some new functions. What could be more logical, he asks, than running the new train of language on the old tracks of vision? We ought to pay more attention to the things animals do that might have been built upon for language, he says. Look--the system is definitely souped up in us. People can do a lot more stuff than monkeys can. But the basic hardware is not that different.
According to the prevailing notion of how the human brain is organized, language is centered in a couple of areas on the left side of the brain that are named for the nineteenth-century scientists who discovered them. One, called Broca’s area, sits just below the temple; it is involved in language production. The other, Wernicke’s area, is just behind the ear and seems to control language comprehension.
Broca’s and Wernicke’s areas are certainly involved with language, says Sereno; his quarrel is with the idea that language is confined there. In his view, localizing a higher-order function such as language to two quarter-size patches of cortex smacks of a prescientific mind-set. It’s sort of like a holdover from the phrenologists, he says, referring to proponents of the eighteenth-century notion that such individual traits as musical ability or a tendency toward violence could be detected from bumps on the skull. Sereno thinks instead that language centers might be scattered all over the brain, largely in the mosaic of cortical areas devoted to visual processing but also in parts devoted to motor coordination and auditory perception.
Most scenarios of language evolution tend to sidestep the knotty issue of what might have gone on in the human brain to make language possible. Since the researchers who are interested in language evolution tend to be linguists and anthropologists, not neuroscientists, they focus on questions such as when language evolved or what its early phases might have been like. But Sereno brings a much broader perspective to the table.
The eclecticism that has flowered into Sereno’s language theory has deep roots. His mother is an artist and art teacher. His father is a former civil engineer whose heart always really belonged to psychology and philosophy; he quit his job to become a mail carrier when Sereno was a teenager. The two gave Martin, their oldest child, the middle name Irenaeus, after a theologian of the second century.
Martin isn’t the only family member possessed by a thirst to know things. His brother, Paul, is a paleontologist specializing in dinosaurs; two of his four sisters are psycholinguists, the others neuroscientists. Thanksgiving dinners at our house are pretty weird, he says. When you get together with your siblings, you tend to revert to childhood anyway, but we have all this new stuff to fight about!
Sereno has been casting his academic net wide for years. After majoring in geology in college, and uncertain what interest to pursue, he fired off applications to graduate programs in anthropology, geology, linguistics, philosophy, and paleontology before settling on an interdisciplinary program at the University of Chicago. The program required students to complete a master’s degree in a hands-on scientific discipline as a grounding for more theoretical doctoral research. Sereno chose neurobiology and began a project that involved mapping the brains of turtles. (A reminder of those days, a pancake-size turtle named Spanky, lives in an aquarium in the Serenos’ kitchen.)
But he continued studying linguistics and philosophy along with brain biology, and late one night in 1980 those disparate strands began to twist together in the idea that the visual system might be a pathway to language. At the time, for one paper, he’d been reviewing the evidence that mammals have several separate brain areas devoted to vision; simultaneously he was writing another paper on the grammar of sign language. In his spare time, meanwhile, he was doodling mentally about the concept of codes: sign language as one kind of code, spoken language as another, DNA as a code that tells cells what proteins to make.
It all jarred me loose into thinking about language in a more general way, Sereno says. He began to see a similarity between what the mysterious language system in the brain was doing as it tacked together the meaning extracted from individual words in a series, and what the visual system was doing as it put together the information gathered from a series of glances. If the mental tasks were so similar, why couldn’t the brain be using some of the same wiring?
When we look at a scene, Sereno explains, we feel as though we’re taking it all in at once, but what actually happens is quite different: We scan a scene with a long series of staccato eye fixations, called saccades, which occur at a rate of several per second. Each individual saccade projects a new part of the external scene onto the retina, the network of light-sensitive cells at the back of the eyeball. The optic nerve carries that image (after a quick stop-off in a part of the brain called the dorsal lateral geniculate nucleus) to the primary visual area, in the back of the cortex.
The image received by the primary visual area is a sort of distorted map of differences in the intensity of the light that falls on the retina. Some researchers have called this map the primal sketch. The sketch is then shuttled forward in the brain to a number of higher-order visual areas, each of which specializes in analyzing one of its aspects-- color, say, or motion or form--although there appears to be a fair degree of overlap in their functions. In monkeys there are between 20 and 30 visual areas, and Sereno thinks there are probably about that many in people too. These visual areas of the brain knit the threads of the scene together, reconstructing it as a collection of objects with volume that occupy space.
The main job of the visual part of the brain is looking around and updating some kind of representation of the world, Sereno explains, for the purpose of getting around in it. The job of the language part of the brain is very similar, he says, except for the obvious difference that language can deal with what isn’t there--the past and the future, the imagined--as well as what is there, in the present.
Individual words, Sereno thinks, are like individual saccades, each revealing only part of the fictive scene. The brain of a speaker produces a string of these words in a specific order governed by the rules of syntax; the brain of a listener collects the individual words in a short-term memory storehouse, where it attempts to fit them together until enough have accumulated to create a mental picture. Without the stuff around it, Sereno says, you can’t get much information out of a single glance. Understanding a scene requires putting together a piece of information you get ten glances down the road with one you got ten glances ago. That’s very much like language, where you have a sequence of words, and then you refer with a pronoun to something that happened earlier in the sentence.
Suppose you said, ‘John went to the store. Then he went home.’ A person listening to you would assume that the pronoun he, the guy who went home, is the same person you were referring to initially, the one who went to the store. You have the same problem in vision all the time. For example, when I glance around the room, I look at the door, I look at Bill the monkey, I look at my bicycle over there, and I look back at Bill the monkey. That’s a very similar situation: you have to figure out that’s the same Bill, the same monkey you were looking at before. Has Bill changed? Has Bill’s relationship to other things in the scene changed?
Sereno believes both kinds of decoding take place in the visual areas of the cortex. When someone’s speaking to you, he says, words generate a stream of patterns in your auditory cortex. Somehow those patterns, which are representations of speech sounds, are recognized in groups that stand for words. And somehow they travel to your visual cortex and activate a little glop--a glancelike portion--of higher visual cortex activity over there. They simulate what would happen if you actually saw something. Of course, that’s just the crude version of the theory: it only explains how you talk about things you can actually see. But since language works through metaphors, as many scholars argue, you can use concrete images to talk about abstract concepts as well.
The anatomy of an animal’s brain mirrors the way the animal gets information about its surroundings. Llamas make a living grazing, so the area of llama brain devoted to lip sensation is bigger than the combined areas devoted to sensations from all the rest of its body. Bats avoid obstacles by bouncing sound off their surroundings and listening to the echoes; accordingly, they have a huge auditory cortex.
Stored in Sereno’s lab in the Cognitive Sciences Building are drawer after drawer of thin-sectioned animal brains, from rats to ground squirrels to various species of monkey. Each translucent slice of brain is affixed to a slide and carefully labeled; each of them, Sereno says, illustrates one more event in the expansion of the visual cortex.
He began accumulating slides as a student in Chicago, and he has continued doing so ever since. He is especially interested in defining the boundaries of primate visual areas. Though researchers claimed there were as many as 25, the borders of all but a few were too subtle to be detected without electrophysiological mapping. In a way the work was tedious, he says, like being an anatomist in 1600 and discovering where the bones are.
The experiments themselves are painstaking. First the animal is anesthetized, and a small hole is opened in its skull. A tiny electrode, finer than a hair, is implanted in the animal’s brain, and its eyes are trained on a half-sphere of clear plastic marked with a grid. Then a light source is passed behind the plastic; when the implanted electrode picks up the signal of a neuron firing, the researchers know that the site of the electrode marks the portion of visual cortex that processes that precise point in the animal’s visual field. Then the electrode is moved and the process is repeated. A single experiment routinely establishes up to 600 such points and may run up to 90 hours.
To the untutored eye, each of the thousands of slides Sereno and his students have generated over the years is an amorphous gray blob, devoid of anatomy. To Sereno, though, every one is rich with information. See the whiskers? he says, pointing at an area of a thin section with the tip of a pencil. Sure enough, a couple of dozen tiny white dots cluster in the sensory area of the rat brain, each marking the spot in the cortex where the sensation from a single whisker is processed.
Sereno puts the slide of a ground squirrel, a rodent not much bigger than a rat, on the light table. Because the squirrel uses vision instead of touch to find its way around, its whisker areas are actually somewhat smaller than the rat’s. Its primary visual area, however, is four times bigger. And the higher-level visual areas have expanded even more. Look--this is a giganto visual area, Sereno says, pointing to a place in the squirrel brain called TP. It’s eight times bigger than the equivalent rat area. Overall, the squirrel’s brain is two to three times the size of the rat’s, mostly because it has so much more volume devoted to vision. This is my model for how our brains expanded, Sereno explains.
It’s not, however, the model that has informed most notions about the brain since the Middle Ages. Medieval scholars figured there had to be some central place in the human brain where the straw of raw sensory input got turned into the gold of thought; where the visual image of a steeple and the sound of a bell combined to create the idea of church. These philosophers called the hypothetical area the common sensorium, from which we get our term common sense. The idea that this common sensorium existed, and was the sole property of humans, persisted for centuries. But as nineteenth-century and early-twentieth-century researchers began to map the brains of animals, from rats to higher primates, they found no such area. What they did find was that much of the cortex was committed to so-called lower functions--sensory input and motor function.
Nevertheless, the idea of the common sensorium remained influential in the budding science of neuropsychology; researchers were convinced that human brains had to be different. After all, monkeys are smarter than lemurs, apes are smarter than monkeys, and we’re smarter than apes. Perhaps as one went up the evolutionary ladder, scientists thought, the area of the brain devoted to combining and analyzing raw sensory input- -no longer called the common sensorium but the polymodal cortex--would get bigger and bigger. It seemed logical to think our smarts came from all that polymodal cortex just sitting there under our skulls, waiting to think.
Logical perhaps, but probably wrong. Cognition seems so unified, somehow, agrees Sereno. It makes sense that there should be a place in there where everything comes together and the mind works on it. But from what we see, it doesn’t look like the brain is wired that way. On the contrary, as the cortex has expanded, the increase has been in areas devoted to one modality or another. To me, Sereno says, this suggests that much of the computation done by the cortex--functions such as language and thought--is tied to one or another modality. He believes we use our visual areas as our primary means of processing language because they’re what we use to make sense of our surroundings. If you were a talking bat, you would process language with your auditory system. If you were a platypus, you’d use the cortical areas dedicated to your bill.
Until recently, there were virtually no data on exactly how the human cortex was parceled out, because noninvasive technology to investigate localization of brain function hadn’t been invented yet. Since most people are understandably reluctant to have electrodes poked into their brains in the interest of science, data had to come from patients who had suffered injuries in particular regions of the brain.
The last five years, however, have seen an explosion in brain- imaging technology, an explosion to which Sereno and his colleagues have made profound contributions. Discovering what parts of our brains we use for what kinds of thought requires imaging techniques that provide instantaneous, well-localized signals of brain activity. Until around 1990, the available technology--electroencephalography, or EEG, and magnetoencephalography, or MEG--was inadequate for the task. These techniques record the brain’s electrical impulses as fast as they happen but don’t reveal where they originate.
In the early nineties Sereno and then-graduate student Anders Dale figured out how to make a computer combine the data from MEG and EEG and determine where the signals were coming from. They still needed a surface on which to display their findings, so Dale wrote the first computer program that could automatically reconstruct a three-dimensional picture of the brain from a set of two-dimensional magnetic resonance images, or MRIs. As a bonus, this program can plump up the resulting image of the highly fissured brain like a raisin in a steamer; on the gently inflated cortex, the boundaries of each cortical area can be precisely traced. This technique turned out to be of enormous benefit to researchers trying to draw the boundaries of what Sereno calls states in the cortical country.
Sereno and his collaborators have recently set out to map human visual areas using a noninvasive technique called functional magnetic resonance imaging, or fMRI. By using magnetic fields to measure changes in blood flow to the brain, fMRI can reveal which areas of the brain are working on particular tasks; it thus provides a localized picture of ongoing brain activity (though it doesn’t catch split-second changes, the way EEG and MEG can). At last, with the technology to record brain activity in real time, and a smooth, unfolded surface on which to display it, Sereno has all the tools at his disposal to prove, or fail to prove, his vision- language theory.
Over the past few years Sereno and his wife, Claudia (a social worker with San Diego’s homeless), along with neurobiologist Roger Tootell of the Massachusetts General Hospital Nuclear Magnetic Resonance Center in Boston, some friends, colleagues, and a few paid volunteers, have entombed themselves for hours at a stretch in an fMRI scanner. While the subject lies motionless, Sereno or an assistant flashes images--sometimes patterns, sometimes words--on a screen a few inches away.
In its own way, this work is as hard on the researchers as the marathon neurophysiology experiments. The subjects, supine on a table, are slid into a huge metal tube nearly ten feet long and six feet in diameter. They clench a bite bar between their jaws to keep their heads steady and must focus their eyes intently, since even a very small eye motion muddies the results. They wear metal cages on their heads for mapping, and earplugs to block the 100-decibel clanking of the giant magnet. I think it’s sort of cozy, myself, says Sereno. It’s kind of a Zen thing. Of course, you have to remember not to drink too much coffee.
Through this scanner work, Sereno has discovered that linguistic tasks produce a high level of activity in areas of his subjects’ brains that, had those friends and colleagues been monkeys, would be higher visual areas. What’s more, the levels of activity are much higher in those areas when the subjects are shown meaningful sentences than when they’re shown random words. Other researchers, he adds, have found that stimulating those areas with electrodes inhibits speech production in much the same way that stimulating the classical language areas of Broca and Wernicke does.
As support for his theory, these results aren’t much yet, but they’re a start. I’d say the onus is on people to show clearly there’s some evidence of a new language area that just got stuck in there, says Sereno. The default position is to assume that the human brain is more or less like animal brains, but we use it in different ways.
Several years ago, Sereno recalls, he was giving a talk at Ohio State, a stronghold of ape cognition research. He broached one of his favorite analogies: the one between birdsong and human language. Some guy got up and said, ‘Are you trying to tell me you think birds have more sophisticated vocal learning than apes? And I said, ‘Well, yes. Absolutely.’
The main thing you notice about nonhuman primates and vocal learning is how bad they are at it, he explains. Songbirds are easily a thousand times better.
Two of the most important properties of language, Sereno points out, are syntax, which is roughly equivalent to structure, and semantics, which is roughly equivalent to meaning. The natural communication system of apes has semantics aplenty, he says, but no syntax; ape calls can be put together in any order without changing the meaning. Further, most nonhuman primate communication is limbic--emotionally determined--rather than learned. A monkey deaf from birth will make the full range of monkey calls; a deafened songbird chick, on the other hand, will not sing.
The learning ability of songbirds is what makes Sereno think that comparing birdsong with human speech might be productive. Chicks spend several months listening to mature birds singing before they begin to imitate. Then they produce subsong, meaningless sounds analogous to human baby babbling. A little later they start to produce fragments of songs. Finally they produce adult songs, sometimes hundreds of distinct ones, each comprising up to 20 song fragments, or syllables, concatenated.
In Sereno’s view, then, birds have all the prerequisites for language: they have the vocal machinery, they have the distinct sounds, they have the capacity to string sounds together. The only thing they lack is semantics. If birds had anything to say, he says, they could definitely say it. But evidently they don’t, because most birdsong is just a bizarrely elaborated way of saying ‘Get out of here!’ or ‘Mate with me!’
In the Sereno version of human language evolution, hominids might have developed the capacity to make noises like birdsong--sounds without much semantic content. They could have done it to attract mates, he thinks, an idea not without precedent. After all, some birdsong experts think the phenomenon of elaborate song arose because it signaled reproductive fitness, and that the bird with the capacity for the greatest amount of sustained song is the fittest mate.
If true, this scenario gets language evolution over its big hurdle: the time frame. Many anthropologists believe that humans began to speak somewhere toward the end of the last 100,000 years. For evidence they point to the sudden appearance of many new types of stone tools--a phenomenon Sereno calls this incredible riffing in stone--after nearly a million years of little change. Some take this to mean that humans were finally able to think symbolically, to remember complicated sequences, to communicate instructions.
But the question is, how? Making language in the human sense is not merely a complicated mental problem but a complicated anatomic problem as well. Take, for instance, the position of the larynx, the opening from the windpipe that connects the lungs with the throat. It is high up in the throat of nonhuman primates, so the root of the tongue can shield the opening during eating and drinking. Humans are born with a high larynx. By the time we start to talk, however, the larynx has descended to a low position. Some researchers think our low larynx is what enables us to speak intelligibly: the tongue has room to move around and form vowels without blocking the larynx. This placement has a major disadvantage, though, putting us at risk of choking every time we eat or drink.
Technical details like larynx placement have caused many scenarios for the evolution of language to come apart: Why would natural selection have exposed humans to anatomic risks unless language was developed enough to provide an important survival advantage? But how could language have developed enough to provide an important selective advantage if no one had the vocal equipment to speak it?
The birdsong analogy solves the problem. If the pressure of sexual selection, not the pressure to communicate, was driving the refinement of the vocal machinery, that process could have been going on throughout hominid evolution. Then, when Sereno’s proposed relatively minor neural rewiring of the visual system made language possible, the vocal mechanism could already have been up and running, waiting to pronounce the first meaningful words.
What if, Sereno suggests in an as yet unpublished paper, early humans had evolved an elaborate system of essentially phonetic vocalizations, a kind of ‘talking song’ with no component semantics? And he adds, in a burst of lyricism, Perhaps early hominid pairs duetted like bay wrens, virtually innocent of reference.
To date, Sereno’s idea about the connection between vision and language has circulated mostly among his immediate circle of colleagues. True, he’s floated the theory at a few meetings and seminars, and he’s published abbreviated treatments in journals specializing in knotty philosophical-scientific problems, such as the Journal of Theoretical Biology. But he’s never given the vision-into-language theory a full- length, full-fledged airing in a peer-reviewed journal. This is largely because the sort of hard evidence that might convince literal-minded colleagues--fMRI scans, for instance, rather than compelling analogies--is only beginning to come in. (But now that I’ve got tenure, Sereno says, laughing and rubbing his hands together in exaggerated anticipation, I’m really going to hit this hard.)
While few neuroscientists are prepared to go all the way to presemantic, prelapsarian duetting hominids with Sereno, the skepticism with which they tend to regard the idea is tempered by their respect for his more mainstream work. Linguists, on the other hand, are cutting Sereno a little less slack. But the skeptical reactions of colleagues seem to energize Sereno rather than discourage him. I know, he says cheerfully, I’m really extreme. A lot of times, progress happens when somebody is slightly nervous in another field. It’s scary when you start out--like when you go into a seminar room and you don’t even understand the words people are using. But I think you’re more flexible, more attentive, less hidebound when you’re afraid someone may call you a dilettante.