#Amazonfail, the Google Books Settlement, and the importance of open access for preserving cultural heritage: In honor of National Library Week

Over the past two years for National Library Week, I have posted about the importance of openness of publication and accessibility of government information and the limitations of relying on Google. Free Government Information, Public.Resource.org, OpentheGovernment (PDF),  and others, are continuing to do a great job of promoting openness in regards to government (and scholarly) information. Unfortunately, most people are not aware of the great usefulness and importance of government information. But they do know about Amazon, Google, and YouTube, with many among us using them everyday. What would many do to find information if they stopped working?

The #Amazonfail censorship/ glitch / griefing situation last weekend shows the power of publics working together and the organic nature of much of tagging and movementsourcing; people will often be able to create a simple way of communicating information with each other (the first person to use the #Amazonfail tag on twitter used it because it worked as a folksonomy of the situation and it spiralled from there because it was effective). But it also shows the difficulty for all when most rely on one source — Amazon — for information about bestsellers and similar items.

Siva Vaidhyanathan says that #Amazonfail is more than just about crowdsourcing and user tagging, it is about “metadata, cataloging, books, Web commerce, and justice.” A commenter quoted in the New York Times states that “We have to now keep a more diligent eye on Amazon and how they handle the world’s cultural heritage.”

Have we really placed Amazon (and similar companies) in charge of our cultural heritage? Perhaps not directly, but many people have high expectations for these companies’ ability to make information accessible –even if this does not take into account most of the aspects of information literacy.

But libraries differ from these for-profit companies in how they organize information and why they exist. Most libraries are not-profit and their goal is to serve some type of public (what librarians call a patron group). Libraries are generally built on similar organizational systems to each other– such as Library of Congress or Dewey classification, but libraries are intentionally duplicative in their collections. Not only do libraries often have the same item in their collections, but through interlibrary loan, libraries are tied together in a larger network.  And unlike Amazon and Google, even if a library’s online catalog wasn’t working, a user could still use the organizational system to find useful information.

But another major difference is that libraries — and even twitter — directly rely on people for the system to work, not a algorithm, as with Amazon and Google. As we’ve seen with Googlebombing and likely with #Amazonfail, it is possible for an algorithm to be fooled. Or provide inaccurate information.

We rely on Google quite openly, even though sometimes the information is not right. For example, as of when this post is posted, the top result when googling “four stages of tornadoes” gives the blunt answer of “u suck balls” from wiki.answers. This can’t possibly anywhere close to the correct answer to this scientific question, but it is the one Google’s algorithm is choosing!

In my previous posts, I mentioned how what Google has promised from Google Books isn’t what is actually available in many cases. However, some are expecting this settlement between two private/non-public entities to somehow also be a settlement that protects the interests of the public, though there are many that disagree, including Siva Vaidhyanathan, some vehemently. There is a group of professors attempting to intervene in the Google settlement on behalf of the public:

“The proposed settlement will make Google the only company in the world with a license to use orphaned works.  No other company will be able to buy a similar license because, outside the context of the proposed class-action settlement in this case, there is no one from whom to buy such a license….The settling parties plot a cartel in orphaned works.

…  Because exclusive rights in orphaned works do not serve the ultimate purpose of copyright, the public domain has a claim to free, fair use of orphaned works.

We have the right to intervene to present the public domain’s claim to free, fair use of orphaned works.  None of the present parties will present our claim….”

And what about YouTube? While there is much government information on YouTube, what happens if the company goes out of business? Free Government Information ponders whether

agencies that rely on YouTube as a channel of communication keeping copies of the videos they post there? Would they make them available through another channel? What if … libraries had copies?

Relying on private companies — like Google, like YouTube, like West — to give us access to government information — leaves us without options if these access points disappear.

Presently under challenge is access to government-funded scientific information by H.R. 801 – The Fair Copyright in Research Works Act introduced by Rep. John Conyers. If enacted, the bill would reverse the National Institutes of Health (NIH) Public Access Policy regarding public access to taxpayer-funded research and make it impossible for other federal agencies to put similar policies into place. Publicly funded medical research is the metadata of our lives — we don’t see it, but it affects our health and how we live our lives.

Many oppose this bill, including Harvard University, which has written a letter opposing this legislation:

The NIH public access policy has meant that all Americans have access to the important biomedical research results that they have funded through NIH grants. Some 3,000 articles in the life sciences are added to this invaluable public resource each month because of the NIH policy, and one million visitors a month use the site to take advantage of these research papers. The policy respects copyright law and the valuable work of scholarly publishers.

[Instead of passing this bill], Congress should broaden the mandate to other agencies, by passing the Federal Research Public Access Act first introduced in 2006. Doing so would increase transparency of government and of the research that it funds, and provide the widest availability of research results to the citizens who funded it.

Google, Amazon, and the publishing industry — are highly valuable and useful tools and services — but we should not allow closed proprietary systems to determine how we address information that belongs entirely or in part to the public — like the public domain, government publications, and publicly funded studies. And even when “public” information is not at issue, we need to become more wary on relying solely on these systems.

Multiple systems, locations, and means of access are essential to preserve our cultural heritage — as Free Government Information discusses in regards to government information, yet applicable to so much more:

… no single digital archive or repository can ever be as secure and safe as multiple archives, libraries, and repositories. … The nature of digital information is that it can easily be corrupted, altered, lost, or destroyed. It can become unreadable or unusable without constant attention. Relying on any single entity is simply not as safe as relying on multiple organizations. … But this is about more than redundant copies. It is also about relying on different organizations because they have different funding sources, different constituencies, different technologies, and different collections. No single digital collection can ever be as safe as multiple, reliable digital collections.

%d bloggers like this: