Technology

Emerging Technology: Page Rank

How the mighty Internet search engine's rankings of results can be manipulated

Newsletter

Sign up for our email newsletter for the latest science news
 

Illustration by Leo Espinosa

Does Google have a favorite in the upcoming presidential election? Not Google’s CEO or its charismatic young founders, not the shareholders of the company—Google, the search engine itself. Does it favor one candidate over the other?

This is not a preposterous question. Just type the words miserable failure as a search query, and see the top result. The document that Google thinks is most relevant is a biography of the 43rd president of the United States, George W. Bush.

Before some of you write angry letters to the editor about the liberal bias infecting search engines, consider this: The results for “miserable failure” were not directly shaped by anyone at Google. Instead, Web users manipulated the rankings by altering their own Web pages. The practice is called Googlebombing.

This kind of group intervention is possible because of the ingenious way that Google calculates its search results, a much-scrutinized system called PageRank. The exact PageRank search recipe is the digital-age version of the original Coca-Cola formula—known only to the wizards inside Google’s Silicon Valley headquarters. But it involves two known variables.

PageRank was designed to deal with the epidemic of informational overload that accompanied the Web’s explosive growth in the mid-1990s. If you search for a relatively common word like paleontology, you’ll find hundreds of thousands of pages that contain the word. A search engine is not much use unless it finds the most valuable pages and returns them as the top results. Theoretically, Google could have hired thousands of humans to read all the pages in its index (there are around 4 billion), but that approach is prohibitively expensive. Instead, the company decided to outsource the problem: They tapped the free labor of all the people in the world who were already creating Web pages.

Most Web pages are littered with hypertext links to other Web sites, since linking is one of the Web’s fundamental innovations. Google’s software recognizes those links as votes. Every time someone somewhere on the Web decides to link to a page, Google tracks that link and files it away as an endorsement of the page’s content. The PageRank system tallies all the votes for every page it has found on the Web. Pages that have attracted more links become more prominent in Google’s rankings, while pages with few links get pushed down the list. This enables Google to separate out the signal from the noise online. When you query “paleontology” you get more than half a million results, but the top 10 are the 10 pages that have received the most links. Because people on average are more likely to link to sites that they find valuable, most of the time quality rises to the top.

The second ingredient necessary for Googlebombing is a lesser-known quirk of the PageRank algorithm. Let’s say you create a link from your page to a site devoted to the life of baseball great Willie Mays. Google tracks that link as a vote for the quality of the Willie Mays site, but it also pays attention to the words you use to describe the site. If you link to the Mays site with the phrase “Barry Bonds’s godfather,” Google learns to associate the phrase with that site, even if it doesn’t mention Barry Bonds. If enough people on the Web publish the same link with the same phrase, eventually Google will serve up the Mays site as the top result for the search query “Barry Bonds’s godfather.”

In this example, Google has actually learned something about Willie Mays that wasn’t included in the primary site, since Willie Mays is in fact Barry Bonds’s godfather. The learning comes out of watching patterns of linking activity across the entire Web and looking for commonalities and trends in all that data. That knack for pattern recognition is central to Google’s intelligence, but it can be exploited. Encourage enough people to link to a given page with a specific phrase and you can manipulate Google’s results. This is how Google came to think of George W. Bush as a miserable failure. A computer programmer named George Johnston linked to the president’s biography with the “miserable failure” phrase and encouraged other like-minded Netizens to put up similar links from their sites.

If this makes you think that the mighty oracle of Google can be easily manipulated, keep in mind that Googlebombs are vastly more effective with unusual search phrases, like “miserable failure.” (The original Googlebomb was a prank among Web designers, pointing to one designer’s home page with the phrase “talentless hack.”) You won’t be able to redirect the world to your home page by Googlebombing the phrase “Britney Spears,” because there are millions of existing links referencing Britney Spears; your Googlebomb would be like a tiny sparkler next to that massive arsenal.

Googlebombing began as a digital in-joke, but it has already become performance art. Brooklyn-based artist and comic Ze Frank has integrated Googlebombing into his live performances. “I’d been thinking about how a magic trick might work online, and I decided to see if I could Googlebomb the phrase ‘what was I going to say next,’ ” he says. Frank spent weeks encouraging visitors to his Web site to link to a special page he’d created with that unusual phrase. Then, during a performance at the Technology, Entertainment, and Design Conference in Monterey, California, earlier this year, he interrupted his presentation to call up a live Web connection to Google on a giant screen visible to the entire crowd. Feigning uncertainty about where his talk was headed, Frank announced that he would ask Google what he should say next. He typed in the query “what was I going to say next,” and like a ventriloquist’s dummy, Google delivered up as its top result the first sentence of Frank’s next presentation slide.

Staging artificial results in the Google index is something like launching an experimental theater project in a city square: In the midst of all this real life and real information, something staged and artificial appears. But as the history of graffiti art has shown, there’s a fine line between sophisticated public art and irritating public nuisance. That’s one of the concerns expressed by Rael Dornfest, coauthor of the book Google Hacks: 100 Industrial-Strength Tips & Tools. Dornfest and his colleagues deliberately left Googlebombing out of their book. “In public space, there are social norms; so should there be around Google,” he says. “While, yes, they’re ‘just a company,’ they’re also a commons of sorts to be tended by the Web as a community. Manipulations of its index only serve to confabulate results for everyone else. I say all that knowing full well that much of this is all in the spirit of fun. If I search for ‘talentless hack,’ chances are I am not conducting a search with the hope of real results. But this could just as easily be turned into something more real. Think about linking the phrase ‘Red Cross hunger relief fund’ to an interloper with a PayPal button linked to his private bank account.”

Google’s director of search quality, Peter Norvig, says that traditional Googlebombs aren’t a major concern: “The point of Googlebombs, in a way, is that they don’t really matter—with ‘miserable failure,’ it worked because there was nobody on the Web advertising themselves as a miserable failure, and in our query stream, nobody was asking for ‘miserable failure’ either. The only reason they ask for it now is because they got an e-mail saying, ‘Look at this.’ It was a kind of ecological niche that no one wanted to occupy, and so it’s easy for anything to crawl in there.”

Norvig and his team are more concerned about attempts to manipulate Google’s results for profit: “That’s a more serious matter. There are hotly competed queries—say ‘digital cameras’—where we want to bring people to authoritative sites. So there we have to check because people do things like register a hundred different sites and have them all link to each other. If you’re charitable, you call it ‘search engine optimization’; if you’re less charitable, you call it ‘search engine spam.’ We try to recognize when that happens. What we’ll do is say, ‘Ah, here are a bunch of sites that are all interconnected,’ and we’ll count that as one link, not a hundred.”

In the end, what the people who run Google hope to protect is the confidence among its users: When you get an unusual result, they don’t want you wondering to yourself, “Is this just another Googlebomb?” Consider the top result that Google delivered a few weeks ago for the word Jew: a hateful anti-Semitic site called Jewwatch.com. Is this an accurate reflection of the general pattern of links across the Web, or is the prominence of Jewwatch.com the result of a targeted Googlebomb by a small number of linkers?

To many people, the Googlebomb “miserable failure” will be offensive. But is it different from real-world attempts to manipulate the public’s political views? You can stand in the town square with a sign calling the president a miserable failure, or you can create a network of links to do the same thing online. If TV stations accept advertising approved by George W. Bush that suggests something as absurd as John Kerry not caring about the lives of soldiers in Iraq, why shouldn’t we tolerate public-opinion linking? After all, if you don’t like the results for “miserable failure,” you can always create a new set of links and try to overthrow the top result.

Google provides nearly instantaneous answers to more than 200 million search queries on an average day. During peak hours, the service responds to roughly 2,000 queries a second, relying on a network of more than 10,000 computers running the open-source Linux operating system.

Of course, that battle has already begun. At press time, the second and third top Google search results for “miserable failure” linked to biography pages for Jimmy Carter and Michael Moore.

Discuss this article in the Discover Forum

1 free article left
Want More? Get unlimited access for as low as $1.99/month

Already a subscriber?

Register or Log In

1 free articleSubscribe
Discover Magazine Logo
Want more?

Keep reading for as low as $1.99!

Subscribe

Already a subscriber?

Register or Log In

More From Discover
Recommendations From Our Store
Shop Now
Stay Curious
Join
Our List

Sign up for our weekly science updates.

 
Subscribe
To The Magazine

Save up to 40% off the cover price when you subscribe to Discover magazine.

Copyright © 2023 Kalmbach Media Co.