Book chapter
A Similarity Reinforcement Algorithm for Heterogeneous Web Pages
Web Technologies Research and Development - APWeb 2005, pp.121-132
Lecture Notes in Computer Science, Springer Berlin Heidelberg
2005
DOI: 10.1007/978-3-540-31849-1_13
Abstract
Many machine learning and data mining algorithms crucially rely on the similarity metrics. However, most early research works such as Vector Space Model or Latent Semantic Index only used single relationship to measure the similarity of data objects. In this paper, we first use an Intra- and Inter- Type Relationship Matrix (IITRM) to represent a set of heterogeneous data objects and their inter-relationships. Then, we propose a novel similarity-calculating algorithm over the Inter- and Intra- Type Relationship Matrix. It tries to integrate information from heterogeneous sources to serve their purposes by iteratively computing. This algorithm can help detect latent relationships among heterogeneous data objects. Our new algorithm is based on the intuition that the intra-relationship should affect the inter-relationship, and vice versa. Experimental results on the MSN logs dataset show that our algorithm outperforms the traditional Cosine similarity.
Details
- Title: Subtitle
- A Similarity Reinforcement Algorithm for Heterogeneous Web Pages
- Creators
- Ning Liu - Tsinghua UniversityJun Yan - Peking UniversityFengshan Bai - Tsinghua UniversityBenyu Zhang - Microsoft Research AsiaWensi Xi - Virginia TechWeiguo Fan - Virginia TechZheng Chen - Microsoft Research AsiaLei Ji - Microsoft Research AsiaChenyong Hu - Institute of SoftwareWei-Ying Ma - Microsoft Research Asia
- Resource Type
- Book chapter
- Publication Details
- Web Technologies Research and Development - APWeb 2005, pp.121-132
- Publisher
- Springer Berlin Heidelberg; Berlin, Heidelberg
- Series
- Lecture Notes in Computer Science
- DOI
- 10.1007/978-3-540-31849-1_13
- eISSN
- 1611-3349
- ISSN
- 0302-9743
- Language
- English
- Date published
- 2005
- Academic Unit
- Business Analytics
- Record Identifier
- 9984380453602771
Metrics
7 Record Views