12 May 2020

Text Production Datasets

Data-to-Text Generation
  • WikiBio
  • WikiNLG
  • SBNation
  • RotoWire
MR-to-Text Generation (Meaning Representation)
  • SR'18
  • E2E
Text-to-Text Generation
  • Summarization (DUC2001-2005)
    • CNN
    • DailyMail
    • NYTimes
    • NewsRoom
    • XSum
  • Simplification
    • PWKP
    • WikiLarge
    • Newsela
  • Compression
    • Gigaword
    • Automatic Creation of Extractive Sentence/Compression
    • MASC
    • Multi-Reference Corpus for Abstractive Compression
    • Cohn and Lapata's Corpus
  • Paraphrasing
    • MSRP
    • PIT-2015
    • Twitter News URL Corpus
    • ParaNMT-80
    • ParaNMT-50
    • MTC
    • PPDB