Big Scholarly Data in CiteSeerX: Information Extraction from the Web
- Alexander G. Ororbia ,
- Jian Wu ,
- Madian Khabsa ,
- Kyle Williams ,
- C. Lee Giles
Proceedings of BigScholar at WWW |
Published by ACM
We examine CiteSeerX, an intelligent system designed with the goal of automatically acquiring and organizing large-scale collections of scholarly documents from the world wide web. From the perspective of automatic information extraction and modes of alternative search, we examine various functional aspects of this complex system with an eye towards ongoing and future research developments.