A 2D Conditional Random Fields Model for Web Information Extraction

  • Jun Zhu ,
  • Zaiqing Nie ,
  • Ji-Rong Wen ,
  • Bruce Zhang ,
  • Wei-Ying Ma

MSR-TR-2005-44 |

The Web contains an abundance of useful semi-structured information about real world objects, and our empirical study shows that strong sequence characteristics exist for the Web information about the objects of the same type across different Web sites. This paper introduces a two dimensional Conditional Random Fields model, incorporating the sequence characteristics and the 2D neighborhood dependencies, to automatically extract object information from the Web. We also present the experimental results comparing our model with the linear-chain CRF model in the domain of product information extraction. The experimental results show that our model significantly outperforms existing CRF models.