Support Vector Machines

Established: November 5, 2001

Support vector machines are a set of algorithms that learn from data by creating models that maximize their margin of error.

Support vector machines (opens in new tab) (SVMs) are a family of algorithms for classification (opens in new tab), regression (opens in new tab), transduction (opens in new tab), novelty detection (opens in new tab), and semi-supervised learning (opens in new tab). They work by choosing a model that maximizes the error margin (opens in new tab) of a training set.

SVMs were originally developed by Vladimir Vapnik (opens in new tab) in 1963. Since the mid-90s, a energetic research community has grown around them. If you want to learn more about SVMs, you can read Chris Burges’ tutorial. Nello Cristianini (opens in new tab) and John Shawe-Taylor (opens in new tab) have written a textbook (opens in new tab) about them. Bernhard Schölkopf (opens in new tab) and Alex Smola (opens in new tab) wrote a textbook about kernel methods (opens in new tab), which are a closely-related set of methods.

Since 1998, we’ve done basic research into making SVMs be more user-friendly. Our research has resulted in:

SMO: A fast algorithm for training SVMs from data, which is easy to understand and code.
A method for calibrating the output of an SVM to yield probabilities.
A simple method to convert a multi-class problem into a series of faster-to-solve two-class SVMs
A method to apply SVMs to find unusual items in a training set (novelty detection).
An online approximation to SVMs.

See the list of publications, below, for complete citations.

The real-world data sets described in the technical report (below) are available in a compressed ASCII format (zip format). Both the adult data and the web data are available. There is a readme.txt file in each zip archive that explains the format of the file. The testing set for the adult data, the testing set for the web data set, and the MNIST data set (opens in new tab) is also available.

MSR currently does not have any software that implements SVMs. LIBSVM (opens in new tab) is a popular package that is based on a SMO-like algorithm.

Check here for errata on the SMO “Fast training” physical paper (already corrected in the on-line version).
Sathiya Keerthi (opens in new tab) and colleagues have a paper (opens in new tab) that describes an improved SMO: instead of updating a single threshold, they update the bounds on permissible thresholds. They report substantial improvement in speed, especially for extreme C values.

Gary Flake and Steve Lawrence have an efficient SMO algorithm for Support Vector Regression.
There are some unclear ideas and errors in the “Fast Training” paper that I (John Platt) would like to clarify on this web page:
- Equations (12.9) and (12.10) in the paper are for when the decision function (1.7) subtracts b, not adds it.
- Equation (12.18) has a sign error: the plus should be a minus.
- The sentence after (12.24) should read “is negative” not “is positive”
- In the pseudo-code, after the “if (eta < 0)” statement, right before the “if (|a2-alph2| < …)” statement, a2 should be checked to see if it is within 1e-8 of 0 or C. If so, then a2 should be set to the nearest bound. This prevents round-off error from mistakenly forcing a point to be a support vector. Richard Lippmann points out that this value of 1e-8 is appropriate when inputs have approximately unit variance. Therefore, it is a good idea to scale your inputs (or kernel outputs) so that they are on the order of unity.
Opens in a new tab