Learning Natural Languages by Learning Matrices (a brief overview)

Natural Language Understanding (NLU) is the broad research area in Natural Language Processing (NLP) that develops methods to analyze natural language and understand its meaning. It is a key component of any AI system that aims at truly interacting with humans. It is also a key component for automatic systems that do machine reading of the web and social media, which, given the current volumes of information, is the only practical way to access this content.

First I will give a brief overview of Natural Processing Processing tasks, and the evolution of machine learning approaches in recent years. Natural language is structured, very rich, ambiguous, and offers limitless ability to say new things. Because of this, the desire is to have machine learning algorithms that learn hidden-state compositional models of language, and answer questions such as: what are the units and parts of a language? what is the meaning of each part? how do we compose parts into bigger parts? what is the meaning of a composed expression? how do we use these models to solve specific needs? 

Deep learning has made great progress on these questions. Today we have giant neural models like BERT or GPT-3 that are trained at worldwide scale, and are found useful for virtually any empirical NLP task. However, it's largely unclear what these models are learning, and what is their capacity to generalize (as opposed to memorizing data). Also, the costs of learning these models is huge. 

In the second part of this talk, I will focus on compositional models of language that take the form of weighted automata, which are a restricted class of recurrent neural networks. I will describe Spectral Learning algorithms, a family of learning algorithms that reduces the problem of learning a weighted automata to some form of matrix learning. This reduction is based on theoretical connections between formal languages and distributions over the strings they generate. I will highlight several good properties of this family of techniques, and contrast them with deep learning approaches.

Finally, I will describe some research lines on unsupervised spectral learning of natural language grammars that I will pursue in the next few years.