Purpose
:
A Web user wishing to find information on a
particular subject must guess at the keywords under which that information might be
classified by a standard search engine. The user must then wade trough the results --Web
page titles and addresses-- produced by the search engine, results which are, often, too
numerous to pursue in full detail. Certain Web-page results can be eliminated immediately.
Others must be examined before the retain/eliminate decision is made. Links from an
examined Web-page must often be folowed before a retain/eliminate decision ca be made.
Approach:
We propose to automate as much of this porcess
possible for a significant class of queries. In particular, we propose to address three
very pratical problems:
-
The problem of matching firms with informations relevant to
pursuit of their business.
-
The problem of matching customers with companies that provide
products, services or expertise.
-
The problem of matching firms with potential customers.
The proposed approach differs from the currently
used by available search engines in two ways. First, we will utilize strategies derived
from the aeras of data mining, software agents, natural language processing, Bayesian
network, and "ontologies". Second, we will support both the classical active
user-search scenario and a background match-making scenario. In the background
match-making scenario, the search engine will be constinuously searching the Web for
people/businesses with certain profiles and for customers, products and services of
potential interest to those people/businesses.
Status:
Web Search for extracting Interest Areas
We have implemented a Web-Front-end, on which users can select many different interest
choices. They can also select between different demographic choices, corresponding to the
features in the project of Ontologies
for Web Matching ("Representation
of Knowledge about Populations of Customers"). The system will return
listings of home pages of "web-citizens" that appear to fit into the given
interest areas and demographic choices. We are currently evaluating several competing
methods how to link keywords on a home page with interest areas and demographic
information. These methods include a rule-based system (JESS... Java Expert System Shell),
Information Retrieval (IR) techniques, e.g., Support Vector Machines, and probabilistic
methods (Bayesian Networks). We have also collected home pages, starting from the popular
Yahoo/Geocity portal site. These home pages have a very regular structure and are easier
to analyze than "random" home pages. These home pages will be used as a first
training set for our key-word to interest area association analysis.
Related
Publications:
|