NJCSE - Web Matching

PROJECT TITLE:	Web Matching
PROJECT ADVISOR:	James Geller and Yehoshua Perl

Back

Purpose :

A Web user wishing to find information on a particular subject must guess at the keywords under which that information might be classified by a standard search engine. The user must then wade trough the results --Web page titles and addresses-- produced by the search engine, results which are, often, too numerous to pursue in full detail. Certain Web-page results can be eliminated immediately. Others must be examined before the retain/eliminate decision is made. Links from an examined Web-page must often be folowed before a retain/eliminate decision ca be made.

Approach:

We propose to automate as much of this porcess possible for a significant class of queries. In particular, we propose to address three very pratical problems:

The problem of matching firms with informations relevant to pursuit of their business.
The problem of matching customers with companies that provide products, services or expertise.
The problem of matching firms with potential customers.

The proposed approach differs from the currently used by available search engines in two ways. First, we will utilize strategies derived from the aeras of data mining, software agents, natural language processing, Bayesian network, and "ontologies". Second, we will support both the classical active user-search scenario and a background match-making scenario. In the background match-making scenario, the search engine will be constinuously searching the Web for people/businesses with certain profiles and for customers, products and services of potential interest to those people/businesses.

Status:

Web Search for extracting Interest Areas

We have implemented a Web-Front-end, on which users can select many different interest choices. They can also select between different demographic choices, corresponding to the features in the project of Ontologies for Web Matching ("Representation of Knowledge about Populations of Customers"). The system will return listings of home pages of "web-citizens" that appear to fit into the given interest areas and demographic choices. We are currently evaluating several competing methods how to link keywords on a home page with interest areas and demographic information. These methods include a rule-based system (JESS... Java Expert System Shell), Information Retrieval (IR) techniques, e.g., Support Vector Machines, and probabilistic methods (Bayesian Networks). We have also collected home pages, starting from the popular Yahoo/Geocity portal site. These home pages have a very regular structure and are easier to analyze than "random" home pages. These home pages will be used as a first training set for our key-word to interest area association analysis.

Related Publications: