Quick interaction between a human teacher and a learning machine presents numerous benefits and challenges when working with web-scale data. The human teacher guides the machine towards accomplishing the task of interest. The system leverages big data to find examples that maximize the training value of its interaction with the teacher.
Building classifiers and entity extractors is currently an inefficient process involving machine learning experts, developers and labelers. PICL enables teachers with no expertise in machine learning to build classifiers and entity extractors. PICL’s user interface reflects this objective and allows a few key actions that do not require ML or engineering skills. The teachers using PICL can (1) search or sample items to label, (2) label these items, (3) select and edit features, (4) monitor accuracy and (5) review errors.
Training, scoring and regularizing are not teacher actions on PICL. Rather, these computations happen implicitly and transparently. Training and scoring starts automatically after modifying features or providing labels. The teachers are always aware of the state of the system as a status bar indicates which actions are not yet reflected in the current model.
When teachers start building models in PICL, they select at least one initial feature and they search for some seed positive and negative items via a text query. They can then label data resulting from the search and submit these labels. From this point, PICL automatically trains a model and starts making predictions on new data, i.e. producing scores. The teachers can then sample data that are deemed useful to improve the model (active learning), or keep searching the dataset. If a model is available (i.e., after the cold-start period), PICL pre-labels the examples shown to the teacher with the current model’s most-likely prediction. As a result, the teacher can label efficiently by simply correcting those pre-labels that are not correct. Moreover, the process of explicitly correcting the model helps the teacher understand the weaknesses of the current model.
Teachers can also supply features to PICL. The teacher can either browse a corpus of existing features or create a new feature from scratch (e.g. dictionaries). Active features that represent the information that the model currently ‘sees’ are always visible in the interface.
At any point in time, teachers can evaluate their models: PICL splits the labeled data into a training and test set so that it can compute and display performance metrics, including estimates of the generalization performance. Teachers are therefore empowered to label, feature, review, debug and search. They can understand the performance of the model they produce. When they feel confident about their model, it can be exported for deployment.
This is a project from the Machine Teaching Group.