Learning to Converse with Noisy Data: Generation with Calibration

Published in IJCAI, 2018

Authors: Mingyue Shang, Zhenxin Fu, Nanyun Peng, Yansong Feng, Dongyan Zhao, Rui Yan

Abstract: The availability of abundant conversational data on the Internet brought prosperity to the generationbased open domain conversation systems. In the training of the generation models, existing methods generally treat all the training data equivalently. However, the data crawled from the websites may contain many noises. Blindly training with the noisy data could harm the performance of the final generation model. In this paper, we propose a generation with calibration framework, that allows high quality data to have more influences on the generation model and reduces the effect of noisy data. Specifically, for each instance in training set, we employ a calibration network to produce a quality score for it, then the score is used for the weighted update of the generation model parameters. Experiments show that the calibrated model outperforms baseline methods on both automatic evaluation metrics and human annotations.

Download paper here