SffinX: Satori Free-text Facet Ingestion with Nlp eXtraction

Established: March 1, 2014


=> Mar 2014 : Dec 2015 …

SffinX aimed to extract millions of entities, relations and facts from web pages free text for a specific domain, then, mapping them to Satori ontology (MSO). SffinX targeted low human effort for training by only giving a handful of seed examples and low processing time in order to process millions of web pages and extract millions of entities, relations, and facts in range of hours.

First, SffinX collects a domain specific corpus around the given examples. Then, extracts entities relying on a 22 classes NER and verb-based relations relying a constituancy parser to formulate facts. SffinX then clusters relations lexically and semantically and ranks the facts based on signals from a Confidence Scorer. Finally, SffinX maps the entities, the relations, and the facts, if found, to the KB. SffinX has a real-time experience in which entities, relations, and facts are extracted on the fly from news RSS feeds.

SffinX was a three way collaboration from ATL cairo along with Microsoft Satori Extraction team and MSR Asia.