• Google's revival of a Usenet archive opens up a wealth of possibilities

    New Economy; Google's revival of a Usenet archive opens up a wealth of possibilities but also raises some privacy issues.
    By Susan Stellin, May 7, 2001, NY Times
    WITH some decent search skills, a bit of free time and a degree of skepticism about what one ultimately finds, a person can learn a lot by trolling through an archive of Internet bulletin boards that has once again become available online.

    In February, the Internet search service Google bought from the failed information site Deja dot com an archive of more than 650 million messages posted on electronic bulletin boards, known as Usenet newsgroups, that dates to 1995.

    Started in 1979 as a tool for computer programmers, Usenet became more accessible in the mid-1990's when Deja developed software that made posting messages on the public system easier and began to archive the postings. Usenet blossomed into a vibrant,
    chaotic community of users communicating about almost every imaginable subject -- from arcane technical topics like data compression to more general interests like ham radio, parenthood or motorcycles.

    But with the e-commerce explosion, Deja shifted its focus toward online retailing and scaled back its public Usenet archive a year ago, before the company drowned in the market undertow earlier this year.

    Google bought Deja's Usenet archive in February, with the intention of restoring the full archive and improving the message posting and searching capabilities. The company refers to its new Usenet search service, available at groups dot google dot com,
    as an ''archive of human conversation.'' To tap in is to experience something similar to old-fashioned eavesdropping: long stretches of dull discussion interspersed with juicy tidbits of gossip and the occasional confession that makes the listener squirm.

    Among the mundane postings, noteworthy primarily for their name power: a 1996 message by Marc Andreessen, who at that time was busy building Netscape, seeking tips for making sure his year-old bulldog played nicely with another bulldog soon to join the
    family; a message, around the same time, apparently from MacKenzie Bezos, the wife of Amazon's founder, inquiring about a good dog obedience school in Seattle.

    Beyond the issues of canine behavior in the households of busy Internet executives, such postings raise questions about the implications of increasingly sophisticated search technology. To read about pet queries is innocuous enough. But a message posted
    by another Internet professional, responding to a posting that suggested he had left his previous position because of inappropriate behavior with female staff members, moves into different territory.

    As Google's chief executive and co-founder, Larry Page, noted, ''When you search Google, you're searching the equivalent of a stack of paper that's 110 miles high -- in half a second.'' And he acknowledged that such an awe-inspiring information tool had
    social and cultural repercussions: ''You have more access to information, but that means you have access to bad things, as well.''

    With respect to most of the Web pages that Google searches with its 8,000 server computers, it is fair to say that the people who publish those pages understand they are putting information into the public domain. But the history of Usenet is somewhat
    more complicated.

    In Usenet's original incarnation, messages posted to newsgroups disappeared within weeks, replaced by other comments on the same topic in what was perceived as an ongoing electronic conversation. When Deja dot com, then called Deja News, began archiving
    messages in 1995 and making them searchable, there were protests by those who felt the bulletin boards were never intended to be permanent.

    In response, Deja made it possible for users to exclude their postings from its archive by typing the phrase ''X-No-archive: yes'' at the beginning of a message. With that change, and as Deja subsequently shifted its business model toward consumer-
    written product reviews and trimmed its public Usenet archive, the privacy issue faded to the background.

    Google's acquisition of the archive, however, not to mention a mass-audience popularity that Deja never achieved, may revive some of those privacy concerns. Although Google may be preserving an important historical resource -- an effort that some have
    lauded -- the company is also making the record of this ''human conversation'' accessible in ways that its participants may not have been able to anticipate.

    Some of the messages on Usenet involve caustic personal attacks -- or equally vitriolic defenses against those attacks. Others display ill-conceived opinions, rash statements or embarrassing late-night rants. And all of it is now searchable by entering a
    key word, a date range or a name. Postings include a name and e-mail address; the text of messages can also be searched to see if someone is mentioned by name.

    ''Being able to search large amounts of information is something we know how to do better than anyone else,'' Mr. Page said. But Google is also trying to address the privacy implications of that feat, he said, by honoring the ''X-No-archive'' standard
    and allowing individuals to remove their old posts. And once Google introduces the ability to post new messages through its service, called Google Groups, the company plans to use an authentication process to make sure that people posting under a
    particular e-mail address are who they say they are.

    Theoretically, this authentication measure will help cut down on postings falsely attributed to someone else -- like a disparaging message about Intel ostensibly posted by the Internet pioneer Vinton G. Cerf, who said via e-mail that the posting ''is
    absolutely NOT from me.'' But Mr. Page said people might still find a way to fake a posting under someone else's name; users can also post messages under an alias, which he says he himself does, as long as they do so under a valid e-mail address.

    Surprisingly, for all the privacy issues that Google's acquisition of the Usenet archive might seem to raise, the subject has not been taken up by the usual online privacy-rights advocates. Deborah Pierce, a lawyer at the Electronic Frontier Foundation,
    said the reason was Usenet's open-discussion platform. ''If that's not a public forum,'' she said, ''I don't know what is.''

    But she said people might have qualms if they learned that ''what they thought is an ephemeral conversation is now going to be stored -- possibly forever.''

    As for the privacy implications of advances in search technology, Bruce Koball, a longtime organizer of the Computers, Freedom and Privacy conference, said the matter might draw more scrutiny in coming years, as the impact became clearer.

    ''People can be rightfully mortified when they come back five years from now and see a post that they made,'' Mr. Koball said. ''And now it's enshrined in magnetic media for time immemorial.''

    He described the data trail a person now generates as a ''digital doppelgänger'' -- a record of information about an individual's activities and behavior. In the past, that type of record might have been compiled by the government or a law enforcement
    agency for some deliberate purpose, he said. But now, ''what we have is people essentially creating their own dossier -- just through their everyday activities.''

    Along with that, however, comes what some call information transparency. Google may have inadvertently created a tool for mass surveillance, but that tool is accessible to anyone, Mr. Page said. ''You go to Google and you type in your name,'' he said, ''
    and see exactly what we know.''

    And yet he acknowledged that Google's search capabilities might have opened Pandora's box.

    ''To be able to find things with high accuracy and high reliability really quickly has an incredible impact on the world,'' Mr. Page said. ''Over all, I think that's going to be a net positive, but it is something we worry about.''


