Ask AI a math question

We investigate evaluation metrics for dialogue response generation systems where supervised labels, such as task completion, are not available.Recent works in response generation have adopted metrics from machine translation to compare a model's generated response to a single target response.We show that these metrics correlate very weakly with human judgements …

Ask a Question