Discussion Graph Tool

Established: April 25, 2014

Discussion Graph Tool (DGT) is an easy-to-use analysis tool that provides a domain-specific language extracting co-occurrence relationships from social media and automates the tasks of tracking the context of relationships and other best practices. DGT provides a single-machine implementation, and also generates map-reduce-like programs for distributed, scalable analyses.

DGT simplifies social media analysis by making it easy to extract high-level features and co-occurrence relationships from raw data.

With just 3-4 simple lines of script, you can load your social media data, extract complex features, and generate a graph among arbitrary features. Throughout, DGT automates best-practices, such as tracking the context of relationships.



  • Out-of-the-box feature extraction for common scenarios, including mood and geo-location; as well as customizable dictionary and regular expression-based extractions.

    Analyze text for signs of joviality, fatigue, sadness, guilt, hostility, fear, and serenity. Map lat-lon coordinates to FIPS county codes. Recognize gender based on name.

    Identifies co-occurrence relationships within social media messages, user behaviors, locations or other features.

    Extract planar graphs and hyper-graphs of co-occurrence relationships, and tracks contextual statistics for each relationship.

    Import raw social media data from existing sources.

    Reads delimeter-separated TSV and CSV files, line-based JSON format (including output of common Twitter downloaders) and multi-line record formats.

    Analyze results in popular tools such asR, Gephi, and Excel

    Outputs JSON, TSV and GEXF.

    Extend DGT with custom feature extractors

    Incorporate your own feature extractors with DGT through a simple API. This makes it easy for others to build on your techniques and mix-and-match with others.

    More coming soon…

  • Aug 13: Some people were seeing errors trying to run the binaries because of an invalid signature on the binaries. We’ve fixed that now. Thanks for the bug reports!

    Aug 8: We’ve updated the DGT release, adding support for weighting data and projection on weighted values.  We’ve also updated and expanded our location mapping capabilities to map lat-lon coordinates and user-specified locations to countries, US states and US counties.

    June 19: Our first release is available!  Get in touch with your questions.  We’re looking for feedback. Tweet @emrek or email the team at discussiongraph@microsoft.com.  Thanks!

    June 16:  In preparation for our tool release, we’ve added 2 new step-by-step walkthroughs on analyzing the moods of product reviews and extracting graphs of hashtag relationships on Twitter.

  • Our step-by-step walkthroughs, and our reference guide give details about the tool and its usage.

    Read more about our tool and using it for deeper contextual analyses in our ICWSM 2014 paper, “Discussion Graphs: Putting Social Media Analysis in Context”, by Kıcıman, Counts, Gamon, De Choudhury and Thiesson. [PDF]

  • Have a question about how to use DGT for an analysis? Have a feedback or bug report?  Want to use your own feature extractor within DGT?

    Contact @emrek via Twitter or reach all of us via email at discussiongraph@microsoft.com.

  • We are continuing development of the public release of DGT.  Here is what is currently under development:

    • Qualitative sampling of raw data that supports each extracted relationship.
    • FILTER command for conditioning analyses on demographic or other feature values.
    • (Now available as of version 0.6) Improved support for extracting relationships among continuous or weighted feature values
    • Improved aggregation/summarization performancev0