13 January 2021

Chatbot Evaluations

  • ChatEval 
  • Acute-Eval
  • SSA 
  • NUC 
  • SASSI 
  • WER 
  • DSTC 
  • DSTC2 
  • BLEU 
  • PARADISE
  • QoE
  • IQ
  • Perplexity
  • F1
  • Hits@k
  • Average Utterance Length
  • Ratio of Rare Words
  • Number of Repetitions
  • Number of System Questions
  • Comprehensible
  • Interesting
  • Topical Relevance
  • Response Incorrectness
  • Conversation Continuity
  • Engagement
  • Conversational Depth
  • Coherence
  • Domain Coverage
  • Conversational Diversity and Breadth
  • Naturalness
  • Informativeness
  • Unification Quality
  • User Satisfaction (Questionnaires)
  • User Simulations