Sunday, December 09, 2007

Deep Web Search Technology Poem


Bright Planet, Deep Web

www.allwatchers.com and www.allreaders.com are web sites in the sense that a file is downloaded to the user's browser when he or she surfs to these addresses. But that's where the similarity ends. These web pages are front-ends, gates to underlying databases. The databases contain records regarding the plots, themes, characters and other features of, respectively, movies and books. Every user-query generates a unique web page whose contents are determined by the query parameters. The number of singular pages thus capable of being generated is mind boggling. Search engines operate on the same principle - vary the search parameters slightly and totally new pages are generated. It is a dynamic, user-responsive and chimerical sort of web.

These are good examples of what www.brightplanet.com call the "Deep Web" (previously inaccurately described as the "Unknown or Invisible Internet"). They believe that the Deep Web is 500 times the size of the "Surface Internet" (a portion of which is spidered by traditional search engines). This translates to c. 7500 TERAbytes of data (versus 19 terabytes in the whole known web, excluding the databases of the search engines themselves) - or 550 billion documents organized in 100,000 deep web sites. By comparison, Google, the most comprehensive search engine ever, stores 1.4 billion documents in its immense caches at www.google.com. The natural inclination to dismiss these pages of data as mere re-arrangements of the same information is wrong. Actually, this underground ocean of covert intelligence is often more valuable than the information freely available or easily accessible on the surface. Hence the ability of c. 5% of these databases to charge their users subscription and membership fees. The average deep web site receives 50% more traffic than a typical surface site and is much more linked to by other sites. Yet it is transparent to classic search engines and little known to the surfing public.

It was only a question of time before someone came up with a search technology to tap these depths (www.completeplanet.com).

LexiBot, in the words of its inventors, is...

"...the first and only search technology capable of identifying, retrieving, qualifying, classifying and organizing "deep" and "surface" content from the World Wide Web. The LexiBot allows searchers to dive deep and explore hidden data from multiple sources simultaneously using directed queries. Businesses, researchers and consumers now have access to the most valuable and hard-to-find information on the Web and can retrieve it with pinpoint accuracy."

It places dozens of queries, in dozens of threads simultaneously and spiders the results (rather as a "first generation" search engine would do). This could prove very useful with massive databases such as the human genome, weather patterns, simulations of nuclear explosions, thematic, multi-featured databases, intelligent agents (e.g., shopping bots) and third generation search engines. It could also have implications on the wireless internet (for instance, in analysing and generating location-specific advertising) and on e-commerce (which amounts to the dynamic serving of web documents).

This transition from the static to the dynamic, from the given to the generated, from the one-dimensionally linked to the multi-dimensionally hyperlinked, from the deterministic content to the contingent, heuristically-created and uncertain content - is the real revolution and the future of the web. Search engines have lost their efficacy as gateways. Portals have taken over but most people now use internal links (within the same web site) to get from one place to another. This is where the deep web comes in. Databases are about internal links. Hitherto they existed in splendid isolation, universes closed but to the most persistent and knowledgeable. This may be about to change. The flood of quality relevant information this will unleash will dramatically dwarf anything that preceded it.

Sam Vaknin is the author of "Malignant Self Love - Narcissism Revisited" and "After the Rain - How the West Lost the East". He is a columnist in "Central Europe Review", United Press International (UPI) and ebookweb.org and the editor of mental health and Central East Europe categories in The Open Directory, Suite101 and searcheurope.com. Until recently, he served as the Economic Advisor to the Government of Macedonia. His web site: http://samvak.tripod.com.


Bookmark this post!

Add to Mr. Wong Add to Webnews Add to Icio Add to Oneview Add to Folkd Add to Yigg Add to Linkarena Add to Digg Add to Del.icio.us Add to Reddit Add to Simpy Add to StumbleUpon Add to Slashdot Add to Netscape Add to Furl Add to Yahoo Add to Spurl Add to Google Add to Blinklist Add to Blogmarks Add to Diigo Add to Technorati Add to Newsvine Add to Blinkbits Add to Ma.Gnolia Add to Smarking Add to Netvouz

Labels: , , ,