Clément Rebuffel

Clément Rebuffel Quant Researcher at G-Research

Made in France 🇫🇷

Deep Learning aficionado

Natural Language Processing 📚

London, UK
clement.rebuffel@protonmail.com

Data-QuestEval: A Referenceless Metric for Data-to-Text Semantic Evaluation

QuestEval is a reference-less metric used in text-to-text tasks, that compares the generated summaries directly to the source text, by automatically asking and answering questions. Its adaptation to Data-to-Text tasks is not straightforward as it requires multimodal Question Generation and Answering systems on the considered tasks, which are seldom available. To this purpose, we propose a method to build synthetic multimodal corpora enabling to train multimodal components for a data-QuestEval metric.

Clément Rebuffel Thomas Scialom Laure Soulier Benjamin Piwowarski Sylvain Lamprier Jacopo Staiano Geoffrey Scoutheeten Patrick Gallinari

Empirical Methods in Natural Language Processing (EMNLP 2021)

Published August 2021
Controlling Hallucinations at Word Level in Data-to-Text Generation

Data-to-Text Generation (DTG) is a subfield of Natural Language Generation aiming at transcribing structured data in natural language descriptions. The field has been recently boosted by the use of neural-based generators which exhibit on one side great syntactic skills without the need of hand-crafted pipelines; on the other side, the quality of the generated text reflects the quality of the training data, which in realistic settings only offer imperfectly aligned structure-text pairs.

Clément Rebuffel Marco Roberti Laure Soulier Rossella Cancelliere Geoffrey Scoutheeten Patrick Gallinari

Data Mining and Knowledge Discovery (DMKD 2021)

Published January 2021
PARENTing via Model-Agnostic Reinforcement Learning to Correct Pathological Behaviors in Data-to-Text Generation

In language generation models conditioned by structured data, the classical training via maximum likelihood almost always leads models to pick up on dataset divergence (i.e., hallucinations or omissions), and to incorporate them erroneously in their own generations at inference. In this work, we build ontop of previous Reinforcement Learning based approaches and show that a model-agnostic framework relying on the recently introduced PARENT metric is efficient at reducing both hallucinations and omissions.

Clément Rebuffel Laure Soulier Geoffrey Scoutheeten Patrick Gallinari

International Conference on Natural Language Generation (INLG 2020)

Published December 2020
A Hierarchical Model for Data-to-Text Generation

Transcribing structured data into natural language descriptions has emerged as a challenging task, referred to as “data-to-text”. These structures generally regroup multiple elements, as well as their attributes. Most attempts rely on translation encoder-decoder methods which linearize elements into a sequence. This however loses most of the structure contained in the data. In this work, we propose to overpass this limitation with a hierarchical model that encodes the data-structure at the element-level and the structure level.

Clément Rebuffel Laure Soulier Geoffrey Scoutheeten Patrick Gallinari

European Conference on Information Retrieval (ECIR 2020)

Published April 2020

Clément Rebuffel Quant Researcher at G-Research

Find me & Follow me

Data-QuestEval: A Referenceless Metric for Data-to-Text Semantic Evaluation

Controlling Hallucinations at Word Level in Data-to-Text Generation

PARENTing via Model-Agnostic Reinforcement Learning to Correct Pathological Behaviors in Data-to-Text Generation

A Hierarchical Model for Data-to-Text Generation