Big Scholarly Data in CiteSeerX: Information Extraction from the Web

Alexander G. Ororbia; Jian Wu; Madian Khabsa; Kyle Williams; C. Lee Giles

Big Scholarly Data in CiteSeerX: Information Extraction from the Web

Alexander G. Ororbia ,
Jian Wu ,
Madian Khabsa ,
Kyle Williams ,
C. Lee Giles

Proceedings of BigScholar at WWW | May 2015

Published by ACM

Download BibTex

We examine CiteSeerX, an intelligent system designed with the goal of automatically acquiring and organizing large-scale collections of scholarly documents from the world wide web. From the perspective of automatic information extraction and modes of alternative search, we examine various functional aspects of this complex system with an eye towards ongoing and future research developments.