07/01/2021
By Olga Kovaleva

Ph.D. Candidate: Olga Kovaleva, Department of Computer Science
Date: July 7, 2021,  noon
Zoom Link

Committee members:

  • Anna Rumshisky (supervisor), Associate Professor, Computer Science Department, UMass Lowell
  • Hong Yu, Professor, Computer Science Department, UMass Lowell
  • Tingjian Ge, Professor, Computer Science Department, UMass Lowell
  • Byron Wallace, Assistant Professor, Khoury College of Computer Sciences, Northeastern University

Abstract:
The recently proposed Transformer neural network architecture has revolutionized the entire field of Natural Language Processing (NLP). Transformer-based architectures currently give state-of-the-art performance on many NLP benchmark tasks, but little is known about the exact mechanisms that contribute to their outstanding success. This dissertation aims to address some of the existing gaps in the understanding of the workings of the Transfomer-based models, with a particular focus on the model that has first demonstrated their success in natural language understanding - BERT.

Using a subset of natural language understanding tasks and a set of handcrafted features-of-interest, we propose the methodology and carry out qualitative and quantitative analysis of the information encoded within the self-attention mechanism of BERT. Our findings suggest that there is a limited set of attention patterns that are repeated across different model elements, indicating the overall model overparametrization. We show that manually disabling attention in certain components leads to a performance improvement over the regular fine-tuned BERT models.

Furthermore, we examine the space of hidden representations computed by BERT-like models, and present a heuristic for detecting their most fragile parts. Extending our methodology to other Transformer-based models, we confirm that similar effects are observed across a wide variety of commonly used architectures.