02/08/2024
By Yicheng Sun
The Kennedy College of Sciences, Miner School of Computer & Information Sciences, invites you to attend a doctoral dissertation defense by Yicheng Sun on "Automatic WordNet Construction and Its Application in Generating Distractors for Cloze Questions."
Ph.D. Candidate: Yicheng Sun
Time: Monday, Feb. 12, 2024
Time: 10 a.m. Eastern Time
Location: This will be a virtual defense via Zoom Meeting ID: 960 371 7060
Committee Members:
- Jie Wang (advisor), Professor, Department of Computer Science
- Tingjian Ge (member), Professor, Department of Computer Science
- Li Feng (member), Instructional design manager, The TJX companies
Abstract:
We study how to automatically generate cloze questions from given texts to help assess reading comprehension, where a cloze question consists of a stem with a blank space holder for the answer key, and a few distractors along with the correct answer keys for generating confusions. We present a generative method called CQG (Cloze Question Generator) for constructing cloze questions from a given article, using neural networks and WordNet with an emphasis on generating multi-gram distractors. WordNet is a hypernym-hyponym network of synsets, where each synset is a set of lemmas sharing the same gloss labeled by a lexname. CQG harnesses word-sense disambiguation, text-to-text transformation, and WordNet’s synset taxonomies to select an answer key from a sentence, segment the answer key into instances, and generate instance-level distractor candidates (IDCs) using a transformer and sibling synsets. After ranking the IDCs based on contextual embedding similarities, synset, and lexical relatedness, CQG forms distractor candidates and checks if they align with people's writing conventions to determine whether they can be distractors. CQG significantly outperforms SOTA results, confirmed by the high quality of the generated distractors assessed by human judges.
CQG, however, is confined by WordNet's limited vocabulary, which fails to generate cloze questions for answer keys whose lemma forms are not included in WordNet. There are tens of thousands of new lemmas that are not yet included in WordNet. It is therefore desirable to construct an automated system that can add new lemmas to WordNet with their glosses being the only information available, which can be readily obtained from Wikitionary and other open sources. This task is challenging. We tackle this challenge by devising a system called WordNeter that, given a new lemma-gloss pair, predicts a lexname for the given gloss, determines if a new synset should be formed for the lemma, predicts a hypernym for the new synset, and updates the existing hypernym-hyponym relations in WordNet. We show that WordNeter achieves a 94.6% F1 score on lexname predictions for given glosses and 64.8% exact matches of the predicted direct hypernyms with the true direct hypernyms, which significantly outperforms GPT-3.5-Turbo, direct or finetuned, and other finetuned models. We note that even without exact matches of hypernym predictions, most predicted hypernyms are still helpful for generating high-quality distractors. Integrating WordNeter with CQG greatly expands CQG’s ability to generate satisfactory distractors for cloze questions with answer keys outside WordNet's current vocabulary, advancing the methodology of cloze question generation.