Python+Machine Learning tutorial – Working with textual data
- Olivier Grisel | INRIA
Machine Learning with text data can be very useful for social networks analytics for instance to perform sentiment analysis. Extracting a “machine learnable” representation from raw text is an art in itself. In this session we will introduce the bag of words representation and its implementation in scikit-learn via its text vectorizers. We will discuss preprocessing with NLTK, n-grams extractions, TF-IDF weighting and the use of SciPy sparse matrices. Finally we will use that data to train and evaluate a Naive Bayes classifier and a Linear Support Vector Machine.
Speaker Details
Olivier Grisel is a software engineer in the Parietal team of Inria. He works to improve the speed and scalability of the scikit-learn machine learning library for the Python / Numpy / Scipy ecosystem. He also likes to share interesting Machine Learning papers and tricks on twitter: @ogrisel
-
-
Jeff Running
-
Watch Next
-
-
-
Accelerating MRI image reconstruction with Tyger
- Karen Easterbrook,
- Ilyana Rosenberg
-
-
-
-
-
-
-