Marcus P. Zillman, M.S., A.M.H.A. Author/Speaker/Consultant

<$BlogRSDUrl$> Marcus P. Zillman, M.S., A.M.H.A. Author/Speaker/Consultant

Marcus P. Zillman, M.S., A.M.H.A. Author/Speaker/Consultant
Internet Happenings, Events and Sources

Friday, April 29, 2005

QProber

QProber: Classifying and Searching "Hidden-Web" Text Databases
http://qprober.cs.columbia.edu/

Project Summary

Many valuable text databases on the web have non-crawlable contents that are "hidden" behind search interfaces. Hence traditional search engines do not index this valuable information. One way to facilitate access to "hidden-web" databases is through commercial Yahoo!-like directories, which organize these databases manually into categories that users can browse. Our QProber system automates the classification of searchable text databases (whether their contents are "hidden" or not) by adaptively probing the databases with queries derived from document classifiers, without retrieving any documents. A large-scale experimental evaluation over 130 real web databases indicates that our technique produces highly accurate database classification results using -on average- fewer than 200 queries of four words or less to classify a database (TOIS'03 paper; SIGMOD'01 paper). Interestingly, our technique is attractive to classify even crawlable text databases (i.e., databases whose contents are not "hidden") as long as search interfaces for the databases are available (IEEE Data Engineering Bulletin'02 paper). An alternative way to facilitate access to text databases is through "metasearchers," which provide a unified query interface to search many databases at once. For efficiency, a critical task for a metasearcher is the selection of the most promising databases to search for a query, a task that typically relies on statistical summaries of the database contents. We derive content summaries from searchable text databases by exploiting our probing-based classification algorithm to adaptively zoom in on and extract documents that are representative of the topic coverage of the databases. We can then build content summaries from these topically-focused document samples. A large-scale experimental evaluation over a variety of databases indicates that our content-summary construction technique is efficient and produces more accurate summaries than those from previously proposed strategies (VLDB'02 paper, SIGMOD'04 paper).

Panagiotis G. Ipeirotis recommended the following papers from their comprehensive database:

http://qprober.cs.columbia.edu/publications/icde2005-abstract.html
http://qprober.cs.columbia.edu/publications/tois2003-abstract.html
http://qprober.cs.columbia.edu/publications/vldb2002-abstract.html

This has been aded to Deep Web Research Subject Tracer™ Information Blog.

posted by Marcus Zillman | 4:25 AM

archives

subject tracers™

Contact Marcus

Blogged.com Blog Directory

Subscribe with Bloglines

blog radio

Marcus P. Zillman's Web Site Home

Marcus P. Zillman's Blog Home

Marcus P. Zillman Blog Archives

Internet Alerts

Marcus P. Zillman Internet Speaker

Marcus P. Zillman Internet Consultant

Marcus P. Zillman Internet Tutor

Abbreviated Bio

Blogger Profile

White Papers

Virtual Private Library™

Awareness Watch™ Monthly Newsletter

Zillman Monthly Columns Archives

LinkSeries Publications

Survivor's Manual for The New Economy

Research Resources Online Guide

Internet Privacy and Security Resources

Entrepreneurial Links 101

Market Intelligence Resources

Current Awareness Monitors, Alerts and Information Traps

Internet MiniGuides™

SourceSeries™ Internet Research Workshops

What Is the Deep Web - A Podcast Inteview

Deep Web Research and Discovery Resources 2013

Web Guide for the New Economy

eGreenBot - Your Search Engine for Green Resources

eHealthcareBot - Your Search Engines for Healthcare Resources

eFinancialBot - Your Global Financial Search Engine

eMarketingBot - Your Search Engine for Marketing Resources

August 2013 Column - Auction Resources On the Internet

July 2013 Column - HealthcareBots and Subject Directories

June 2013 Column - Tutorial Resources and Sites on the Internet

May 2013 Column - Journalism Resources on the Internet

April 2013 Column - Grid, Distributed, and Cloud Computing Resources Primer

March 2013 Column - Theology Online Resources

February 2013 Column - Information Futures and Prediction Markets

January 2013 Column - Managing Information Overload Resources 2013

December 2012 Column - ShoppingBots and Online Shopping 2013

November 2012 Column - Online Games Sites and Resources 2013

October 2012 Column - Statistics Resources and Big Data on the Internet

September 2012 Column - Privacy Resources and Sites On the Internet

August 2012 Column - Healthcare Online Resources 2012

July 2012 Column - Business Intelligence Resources On the Internet

June 2012 Column - Finding People Resources and Sites on the Internet

May 2012 Column - Discovery Resources for Navigating the New Economy

April 2012 Column - Searching the Internet - A Primer

March 2012 Column - Directory of Directories

February 2012 Column - Data Mining and Web Data Extractors 2012

January 2012 Column - Entrepreneurial Resources 2012

December 2011 Column - How To Determine Information Quality and Competency: Resources, Sources and Sites

November 2011 Column - Financial Internet Sources 2012

October 2011 Column - Journalist's Resources On the Internet

September 2011 Column - Finding Experts by Using the Internet

August 2011 Column - Student Research Resources

July 2011 Column - Tools for Online Knowledge Discovery

June 2011 Column - Elder Online Resources

May 2011 Column - Online Grant Resources

April 2011 Column - Semantic Web Research Resources

March 2011 Column - Artificial Intelligence Resources On the Internet

February 2011 Column - Bot and Intelligent Agent Research Resources On the Internet

January 2011 Column - Peer to Peer (P2P) Resources

December 2010 Column - ShoppingBots and Online Shopping Resources 2011

November 2010 Column - Online Knowledge Discovery Resources and Sites

October 2010 Column - Internet Experts

September 2010 Column - Healthcare Resources On the Internet 2010

August 2010 Column - eReference Resources

July 2010 Column - Employment Online Resources

June 2010 Column - Astronomy Resources Online

May 2010 Column - Internet Demographics

April 2010 Column - International Trade Resources 2010

March 2010 Column - Tutorial Resources 2010

February 2010 Column - Agriculture Resources 2010

January 2010 Column - Accessibility Resources 2010

December 2009 Column - ShoppingBots and Online Shopping 2010

November 2009 Column - Online Games Resources 2010

October 2009 Column - Privacy Resources 2010

September 2009 Column - Educational Resources for Twitter, SMS and Text Messaging

August 2009 Column - 2010 The New Economy Analytics, Resources and Alerts

July 2009 Column - Web Data Extractors Sites and Resources

June 2009 Column - ChatterBots Sites and Resources

May 2009 Column - Online Games Resources

April 2009 Column - Biological Informatics Resources

March 2009 Column - Anti-Virus, Anti-Hoax, Anti-Myth, Anti-Fraud, and Anti-Spam Resources

February 2009 Column - New Economy Resources

January 2009 Column - Peer To Peer Resources

December 2008 Column - ShoppingBots and Online Shopping 2009

November 2008 Column - Statistics Resources

October 2008 Column - Green Files

September 2008 Column - Employment Resources Available Over the Internet

August 2008 Column - Financial Sources Available Over the Internet

July 2008 Column - World Wide Web Reference Resources

June 2008 Column - Script and Code Resourcs

May 2008 Column - Searching the Internet

April 2008 Column - Data Mining and Web Extraction Resources On the Internet

March 2008 Column - Grid Resources On the Internet

February 2008 Column - Finding People Resources and Sites

January 2008 Column - Online Entrepreneurial Resources 2008

December 2007 Column - Shopping Bots and Online Shopping Resources 2008

November 2007 Column - Reference Resources

October 2007 Column - eCommerce Resources

September 2007 Column - Peer To Peer (P2P) Search Engines and Resources

August 2007 Column - Student Research Resources

July 2007 Column - Healthcare Resources On the Internet

June 2007 Column - Finding Experts By Using the Internet

May 2007 Column - Journalism Resources On the Internet

April 2007 Column - Grant Resources On the Internet

March 2007 Column - Bot Research Resources On the Internet

February 2007 Column - Auction Resources On the Internet

January 2007 Column - Directory Resources On the Internet

December 2006 Column - ShoppingBots and Online Shopping Resources 2007

November 2006 Column - Genealogy Resources On the Internet

October 2006 Column - Privacy Resources On the Internet

September 2006 Column - Games Resources

August 2006 Column - Biotechnology Resources

July 2006 Column - Biological Informatics Resources

June 2006 Column - Astronomy Resources

May 2006 Column - Business Intelligence Resources

April 2006 Column - Agriculture Resources

March 2006 Column - Accessibility Resources

February 2006 Column - Social Informatics Resources

January 2006 Column - Artificial Intelligence Resources

December 2005 Column - ShoppingBots and Online Shopping 2006

November 2005 Column - Elder Resources

October 2005 Column - Entrepreneurial Resources

September 2005 Column - Employment Resources

August 2005 Column - ChatterBots

July 2005 Column - Military Resources

June 2005 Column - Knowledge Discovery Resources

May 2005 Column - Semantic Web Research Resources

April 2005 Column - Prediction Markets and Information Futures

March 2005 Column - Tutorial Resources

February 2005 Column - Information Quality Resources and Sources

January 2005 Column - World Wide Web Reference

White Paper: eReference Library Link Toolkit

White Paper: Finding Experts By Using the Internet

White Paper: Healthcare Bots and Subject Directories

White Paper: Business Intelligence Online Resources

White Paper: Knowledge Discovery Resources 2012

White Paper: Academic and Scholar Search Engines and Sources

White Paper: Online Research Browsers

White Paper: Online Research Tools

White Paper: Online Social Networking

White Paper: Web Data Extractors

White Paper: Bots, Blogs and News Aggregators

White Paper: Searching the Internet

White Paper: Current Awareness Discovery Tools on the Internet

White Paper: Using the Internet As a Dynamic Resource Tool for Knowledge Discovery

White Paper: Web Guide for the New Economy

Information Traps

listen to marcus™

Watch Marcus™

Workshops By Marcus™

Links By Marcus™

Smarter Bots

Information Futures Markets

Prediction Markets

Internet Experts List

Anti-Virus, Anti-Hoax, Anti-Myths, Anti-Spam Sites

Internet Demographics

Student Research

Deep Web Research

Knowledge Discovery

Bot Research

Healthcare Resources

Research Resources

Statistics Resources

Biological Informatics

Business Intelligence Resources

eCommerce Resources

Directory Resources

ShoppingBots

ChatterBots

Finding People

Genealogy Resources

Privacy Resources

Astronomy Resources

Information Quality Resources

Outsourcing/Offshoring Information and Resources

Agriculture Resources

Theology Resources

Financial Sources

Reference Resources

Data Mining Resources

Employment Resources

Auction Resources

Games Resources

Artificial Intelligence Resources

Grant Resources

Grid Resources

Elder Resources

Tutorial Resources

Entrepreneurial Resources

Script Resources

World Wide Web Reference

Social Informatics

Military Resources

Accessibility Resources

Biotechnology Resources

Journalism Resources

Green Files

New Economy Analytics, Resources and Alerts

International Trade Resources

Intrapreneurial Resources

Directory of Fact Checkers

AnswerSpot

Contact Marcus

Current Awareness Monitors, Alerts and Information Traps for 2010 42 Page Digital Report by Marcus P. Zillman, M.S., A.M.H.A. ... Keep Current Using the Internet by clicking here

Market Intelligence Resources 2010 193 Page Digital MiniGuide by Marcus P. Zillman, M.S., A.M.H.A. ... The latest Market Intelligence Resources by clicking here

Entrepreneurial Links 101 231 Page eReference Digital Book by Marcus P. Zillman, M.S., A.M.H.A. ... Receive the Latest Internet Resources for the Up and Coming Entrepreneur by clicking here

Internet Privacy and Security Resources eReference Digital publication by Marcus P. Zillman, M.S., A.M.H.A. ... The Latest Internet Privacy and Security Resources by clicking here

Research Resources Online Guide 340 Page Digital Publication by Marcus P. Zillman, M.S., A.M.H.A. ... The Latest Research Resources and Tools by clicking here

The Survivor's Manual for The New Economy 239 Page Digital Publication by Marcus P. Zillman, M.S., A.M.H.A. ... The Latest New Economy Resources and Tools by clicking here

free counters

Follow Marcus P. Zillman, M.S., A.M.H.A. on Twitter by clicking here

© 2000 - 2018 Marcus P. Zillman, M.S., A.M.H.A.