Towards Understandable Neural Networks for High Level AI Tasks – Part 6


November 16, 2015


Paul Smolensky


Microsoft / Johns Hopkins University


Encoding discrete symbol structures as numerical vectors for neural network computation enables the similarity structure inherent in vectorial representations to yield generalizations that reflect content-similarity in a structure-sensitive fashion. Two examples will be presented.

  1. In language understanding, the mapping of arguments from syntactic roles (subject, object, etc.) to semantic roles (agent, patient, etc.) is controlled by the argument structure of verbs. Verbs differ in their argument structures, but fall into a modest number of similarity classes. The similarity of verbs on combined semantic and argument-structure dimensions can be encoded vectorially in distributed representations using the tensor product representation framework presented in previous lectures in this series (and briefly reviewed in this lecture).
  2. In a well-studied class of speech errors in the production of ‘tongue-twisters’, consonants are displaced from their target position to an incorrect position. It has been documented that such errors are more likely when the displacement preserves the syllable-internal position of the consonant (e.g., if a consonant’s target position is syllable-initial, an error is more likely to displace it into the initial position of another syllable). Errors are also most likely when a displaced consonant replaces a consonant to which it is similar featurally. And finally, a consonant is more likely to be displaced into a particular syllable position when consonants featurally similar to the displaced consonant are more frequent in that position. Simulations of the Gradient Symbolic Computation (GSC) networks introduced in previous lectures in this series simulate these structure- and content-similarity effects, and the basis for this model behavior can be understood formally.

Two remaining potential topics for this lecture series are: – comparison of the size of tensor product representations to the size of other schemes for encoding symbol structures in actual neural network models – programming GSC networks to perform function-application in l-calculus and tree-adjunction (as in Tree-Adjoining Grammar), thereby demonstrating that GSC networks truly have complete symbol-processing (or ‘algebraic’) capabilities, which Gary Marcus and others have argued (at MSR and elsewhere) are required for neural networks (artificial or biological) to achieve genuine human intelligence.

Overview of talk series: Current AI software relies increasingly on neural networks (NNs). The universal data structure of NNs is the numerical vector of activity levels of model neurons, typically with activity distributed widely over many neurons. Can NNs in principle achieve human-like performance in higher cognitive domains – such as inference, planning, grammar – where theories in AI, cognitive science, and linguistics have long argued that abstract, structured symbolic representations are necessary? The work I will present seeks to determine whether, and precisely how, distributed vectors can be functionally isomorphic to symbol structures for computational purposes relevant to AI – at least in certain idealized limits such as unbounded network size. This work – defining and exploring Gradient Symbolic Computation (GSC) – has produced a number of purely theoretical results. Current work at MSR is exploring the use of GSC to address large-scale practical problems using NNs that can be understood because they operate under the explanatory principles of GSC. Part I is available at http://resnet/resnet/fullvideo.aspx?id=36339. Part 2 at http://resnet/resnet/fullvideo.aspx?id=36370 Part 3 at http://resnet/resnet/fullvideo.aspx?id=36371 Part 4 at http://resnet/resnet/fullvideo.aspx?id=36402 Part 5 at http://resnet/resnet/fullvideo.aspx?id=36411


Paul Smolensky

Paul Smolensky is Krieger-Eisenhower Professor of Cognitive Science at Johns Hopkins University. His research addresses mathematical unification of the continuous and the discrete facets of cognition: principally, the development of grammar formalisms that are grounded in cognitive and neural computation. A member of the Parallel Distributed Processing (PDP) Research Group at UCSD (1986), he developed Harmony Theory, proposing what is now known as the ‘Restricted Boltzmann Machine’ architecture. He then developed Tensor Product Representations (1990), a compositional, recursive technique for encoding symbol structures as real-valued activation vectors. Combining these two theories, he co-developed Harmonic Grammar (1990) and Optimality Theory (1993), general grammatical formalisms now widely used in phonological theory. His publications include the books Mathematical perspectives on neural networks (1996, with M. Mozer, D. Rumelhart), Optimality Theory: Constraint interaction in generative grammar (1993/2004, with A. Prince), Learnability in Optimality Theory (2000, with B. Tesar), and The harmonic mind: From neural computation to optimality-theoretic grammar (2006, with G. Legendre). He was awarded the 2005 David E. Rumelhart Prize for Outstanding Contributions to the Formal Analysis of Human Cognition, a Blaise Pascal Chair in Paris (2008-9), and the 2015 Sapir Professorship of the Linguistic Society of America.