Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus
- Paper
- Sep 30, 2021
- #ArtificialIntelligence #ComputerScience
Large language models have led to remarkable progress on many NLP tasks, and researchers are turning to ever-larger text corpora to train them. Some of the largest corpora
available...
Show More