THU TREC 2002: Web Track Experiments

  • Min Zhang ,
  • Ruihua Song ,
  • Chuan Lin ,
  • Liang Ma ,
  • Zhe Jiang ,
  • Yijiang Jin ,
  • Yiqun Liu ,
  • Le Zhao ,
  • Shaoping Ma

The Eleventh Text Retrieval Conference (TREC 2002) |

Anchor text has been proofed efficient in former TREC experiments on homepage finding task[1]
and somewhat useful to ad hoc retrieval by result combination[2]. In this year, our conclusion was
consistent with formers. Besides, the use of the URL and links inside the webpage were also
observed. Again, results on training set are encouraging.
We made an assumption that a key resource is more likely to link to multiple relevant documents.
Then the out-degree of the page and the similarities of the documents the page point to were used
as the two factors for key resource selection. Experimental results were quite good, showing their
ability of finding key resource on one server.
Two site uniting (SU) approaches have been studied to select proper pages as the representation of
one server. (1) The document which has index characteristic and has a high enough similarity is
reserved as key resource. (2) Documents of the same server in result list are given different
reliability factor which is decaying by decreases of similarities. Both are useful for given
examples (using as training set) in this year’s Web track, especially the latter one. Better results
were got by combing SU approach and out-degree factor mentioned above to find key resource.
All the experiments we performed were run on Okapi system. There are quite a few parameters to
tune, which affect the performance greatly. Therefore, we also proposed and implemented a
genetic algorithm based dynamic parameter learning approach to all the tasks.