11/04/2022
By Shree Thatte

The Richard A. Miner School of Computer & Information Sciences invites you to attend a master’s thesis defense by Shree Thatte on “Transformer-based Program Synthesis Through Abstract Syntax Trees.”

Candidate Name: Shree Thatte
Degree: Master’s
Date: Wednesday, Nov. 16, 2022
Time: 1 to 2 p.m.
Location: Via Zoom

Thesis/Dissertation Title: Transformer-based Program Synthesis Through Abstract Syntax Trees

Committee Members:

  • Tingjian Ge (advisor), Computer Science Department, University of Massachusetts Lowell
  • Anna Rumshishy, Computer Science Department, University of Massachusetts Lowell
  • Cindy Chen, Computer Science Department, University of Massachusetts Lowell
  • Ruizhe Ma, Computer Science Department, University of Massachusetts Lowell

Brief Abstract:

Program Synthesis is the task of teaching a model to generate code/program that will satisfy user requirements given in the form of a text description or test cases. There have been many advances in program synthesis due to the Transformer based deep learning models. Through experiments, we study the effectiveness of pre-trained transformer models in generating programming language code for a given English language problem, and how Abstract Syntax Trees (AST) affect model performance. We initially use T5, and then T5-code pretrained models. We find that T5-code and the corresponding tokenizer provide much better results than T5. We have conducted experiments to generate Java and python code from English language prompts. We then conduct experiments to generate python Abstract Syntax Trees (AST) from English language prompts. We find that the T5-code fine-tuned on English-Java text was able to synthesize Java functions for simple prompts. We also find that fine tuning on the English-python AST provides a much better Bleu score (63) than English-Python(30). We further convert the model generated python code to AST and check if the Bleu score is better, but it does not beat the bleu score of model generated AST. Nonetheless, we have also found some limitation of generating the AST directly.