EMNLP 2020 Best Paper Award is announced!

7 min readNov 19, 2020

Credit to NASA, source: https://2020.emnlp.org/ — (Credit to Nasa, Source: https://2020.emnlp.org/)

EMNLP 2020 is ongoing and the Best Paper Award is announced! This year the award goes to David Gaddy and Dan Klein from the University of California, Berkeley, with the title: Digital Voicing of Silent Speech

Abstract: In this paper, we consider the task of digitally voicing silent speech, where silently mouthed words are converted to audible speech based on electromyography (EMG) sensor measurements that capture muscle impulses. While prior work has focused on training speech synthesis models from EMG collected during the vocalized speech, we are the first to train from EMG collected during the silently articulated speech. We introduce a method of training on silent EMG by transferring audio targets from vocalized to silent signals. Our method greatly improves the intelligibility of audio generated from silent EMG compared to a baseline that only trains with vocalized data, decreasing transcription word error rate from 64% to 4% in one data condition and 88% to 68% in another. To spur further development on this task, we share our new dataset of silent and vocalized facial EMG measurements.

Other awards include Honourable Mention (Best Paper) and Best Demo Paper Award.

Honourable Mention (Best Paper):

Visually Grounded Compound PCFGs — Yanpeng Zhao, Ivan Titov from Institute for Logic, Language and Computation, University of Edinburgh; ILLC, University of Amsterdam

Abstract: Exploiting visual groundings for language understanding has recently been drawing much attention. In this work, we study visually grounded grammar induction and learn a constituency parser from both unlabeled text and its visual groundings. Existing work on this task (Shi et al., 2019) optimizes a parser via Reinforce and derives the learning signal only from the alignment of images and sentences. While their model is relatively accurate overall, its error distribution is very uneven, with low performance on certain constituents types (e.g., 26.2% recall on verb phrases, VPs) and high on others (e.g., 79.6% recall on noun phrases, NPs). This is not surprising as the learning signal is likely insufficient for deriving all aspects of phrase-structure syntax and gradient estimates are noisy. We show that using an extension of probabilistic context-free grammar model we can do fully-differentiable end-to-end visually grounded learning. Additionally, this enables us to complement the image-text alignment loss with a language modeling objective. On the MSCOCO test captions, our model establishes a new state of the art, outperforming its non-grounded version and, thus, confirming the effectiveness of visual groundings in constituency grammar induction. It also substantially outperforms the previous grounded model, with largest improvements on more `abstract’ categories (e.g., +55.1% recall on VPs).

2. Spot The Bot: A Robust and Efficient Framework for the Evaluation of Conversational Dialogue Systems — Jan Deriu et al. from Zurich University of Applied Sciences, National Distance Education University, University of the Basque Country, Synapse Developpement

Abstract: The lack of time-efficient and reliable evaluation methods hamper the development of conversational dialogue systems (chatbots). Evaluations requiring humans to converse with chatbots are time and cost-intensive, put high cognitive demands on the human judges, and yield low-quality results. In this work, we introduce \emph{Spot The Bot}, a cost-efficient and robust evaluation framework that replaces human-bot conversations with conversations between bots. Human judges then only annotate for each entity in a conversation whether they think it is human or not (assuming there are humans participants in these conversations). These annotations then allow us to rank chatbots regarding their ability to mimic the conversational behavior of humans. Since we expect that all bots are eventually recognized as such, we incorporate a metric that measures which chatbot can uphold human-like behavior the longest, i.e., \emph{Survival Analysis}. This metric has the ability to correlate a bot’s performance to certain of its characteristics (e.g., \ fluency or sensibleness), yielding interpretable results. The comparably low cost of our framework allows for frequent evaluations of chatbots during their evaluation cycle. We empirically validate our claims by applying \emph{Spot The Bot} to three domains, evaluating several state-of-the-art chatbots, and drawing comparisons to related work. The framework is released as a ready-to-use tool.

3. GLUCOSE: GeneraLized and COntextualized Story Explanations — Nasrin Mostafazadeh et al. from Elemental Cognition

Abstract: When humans read or listen, they make implicit commonsense inferences that frame their understanding of what happened and why. As a step toward AI systems that can build similar mental models, we introduce GLUCOSE, a large-scale dataset of implicit commonsense causal knowledge, encoded as causal mini-theories about the world, each grounded in a narrative context. To construct GLUCOSE, we drew on cognitive psychology to identify ten dimensions of causal explanation, focusing on events, states, motivations, and emotions. Each GLUCOSE entry includes a story-specific causal statement paired with an inference rule generalized from the statement. This paper details two concrete contributions. First, we present our platform for effectively crowdsourcing GLUCOSE data at scale, which uses semi-structured templates to elicit causal explanations. Using this platform, we collected a total of ~670K specific statements and general rules that capture implicit commonsense knowledge about everyday situations. Second, we show that existing knowledge resources and pretrained language models do not include or readily predict GLUCOSE’s rich inferential content. However, when state-of-the-art neural models are trained on this knowledge, they can start to make commonsense inferences on unseen stories that match humans’ mental models.

4. If Beam Search is the Answer, What was the Question? — Clara Meister et al. from ETH Zurich, Johns Hopkins University, University of Cambridge

Abstract: Quite surprisingly, exact maximum a posteriori (MAP) decoding of neural language generators frequently leads to low-quality results. Rather, most state-of-the-art results on language generation tasks are attained using beam search despite its overwhelmingly high search error rate. This implies that the MAP objective alone does not express the properties we desire in text, which merits the question: if beam search is the answer, what was the question? We frame beam search as the exact solution to a different decoding objective in order to gain insights into why high probability under a model alone may not indicate adequacy. We find that beam search enforces uniform information density in text, a property motivated by cognitive science. We suggest a set of decoding objectives that explicitly enforce this property and find that exact decoding with these objectives alleviates the problems encountered when decoding poorly calibrated language generation models. Additionally, we analyze the text produced using various decoding strategies and see that, in our neural machine translation experiments, the extent to which this property is adhered to strongly correlates with BLEU.

Best Demo Paper Award:

Transformers: State-of-the-Art Natural Language Processing — Thomas Wolf et al. from Hugging Face

Github: https://github.com/huggingface/transformers

Playground to test: https://transformer.huggingface.co/

Abstract: Recent progress in natural language processing has been driven by advances in both model architecture and model pretraining. Transformer architectures have facilitated building higher-capacity models and pretraining has made it possible to effectively utilize this capacity for a wide variety of tasks. Transformers is an open-source library with the goal of opening up these advances to the wider machine learning community. The library consists of carefully engineered state of-the art Transformer architectures under a unified API. Backing this library is a curated collection of pretrained models made by and available for the community. Transformers is designed to be extensible by researchers, simple for practitioners, and fast and robust in industrial deployments.

If you’re interested in the demo paper, I think last year’s paper would grab your attention too: AllenNLP Interpret: A Framework for Explaining Predictions of NLP Models

Website: https://allennlp.org/interpret

Abstract: Despite constant advances and seemingly super-human performance on constrained domains, state-of-the-art models for NLP are imperfect. These imperfections, coupled with today’s advances being driven by (seemingly black-box) neural models, leave researchers and practitioners scratching their heads asking, why did my model make this prediction?

We present AllenNLP Interpret, a toolkit built on top of AllenNLP for interactive model interpretations. The toolkit makes it easy to apply gradient-based saliency maps and adversarial attacks to new models, as well as develop new interpretation methods. AllenNLP interpret contains three components: a suite of interpretation techniques applicable to most models, APIs for developing new interpretation methods (e.g., APIs to obtain input gradients), and reusable front-end components for visualizing the interpretation results.

This page presents links to:

Paper describing the framework, the technical implementation details, and showing some example use cases.
Live demos for various models and tasks, such as
Masked Language Modeling using BERT, to explain why it made certain mask predictions.

Textual Entailment and Sentiment Analysis using ELMo-based LSTM classifiers.

SQuAD and DROP reading comprehension using an ELMo-based QANet

NER using an LSTM-CRF model based on ELMo.

Tutorials for interpreting any model of your choice, and addding a new interpretation method.
Code for interpreting/attacking models and visualizing the results in the demo (e.g., sentiment analysis).

Citation:

@inproceedings{Wallace2019AllenNLP,
  Author = {Eric Wallace and Jens Tuyls and Junlin Wang and Sanjay Subramanian
  and Matt Gardner and Sameer Singh},
  Booktitle = {Empirical Methods in Natural Language Processing},
  Year = {2019},
  Title = { {AllenNLP Interpret}: A Framework for Explaining Predictions of {NLP} Models}}

EMNLP 2020 Best Paper Award is announced!

Honourable Mention (Best Paper):

Best Demo Paper Award:

Written by Angus Tay