Saturday, November 20, 2010

Unit 11: Web Search and OAI Protocol

OAI Protocol
This article, filled with acronyms, gives an overview of OAI-MPH (open archives initiative Protocol for Metadata Harvesting—specifically what its original purpose for creation was and the different ways it’s being used today. OAI’s original goal was to provide access to “diverse e-print archives through metadata harvesting”, but is now used in a variety of communities. The article highlights three of these communities, which include: Open Language Archives Community, The Sheet Music Consortium, and the National Science Digital Library. The Open Lang. Archives uses OAI to create a “network of repositories” from 27 metadata harvesters. The Sheet Music Consortium utilizes the Dublin Core to describe sheet music, which is a cataloging challenge. OAI provides a means of “virtual collection”. For the Nat’l Science Digital Library, OAI acts as the aggregator of metadata. It’s interesting that OAI’s original purpose has branched out to help a wide range of communities in the information world.


Web Search Engines, Parts 1 & 2
After reading, I’ve realized that I had never given much thought to what goes into a search engine producing the results that it does. I am amazed not only at what a complex process this is, but also how incredibly quickly we are given our search results. Another important aspect of the search engines is their PageRank. This popularity score given to web pages ensures us that if we search for “University of Pittsburgh”, we are almost guaranteed that the school’s website will be the first hit in the list, rather than some random person’s website that says they graduated from the “University of Pittsburgh” in 1989.

The Deep Web
Like most topics in the readings for this class, I didn’t know much about the Deep Web. This article provides quite a bit of information regarding information buried on the web. It makes me wonder what I’m missing out on when researching and just fooling around online. I was amazed to read that, “Public information on the deep Web is currently 400-550 times larger than commonly defined on the WWW”. This number could be even greater now since this article was written in 2001. The article also states that ‘directed query technology’ is the method necessary to bring deep and surface web information together.

1 comment:

  1. Hey Caitlin,

    I agree with you about the deep web article. It was shocking to see how much larger it is compared to the surface web. I think that computer scientists and librarians should continue to work together to develop techniques that will "fish" for some of the information in the deep web. The wealth of knowledge seems to be more satisfying to users and researchers as well.

    ReplyDelete