Ask a Question

Prefer a chat interface with context about you and your work?

LongWanjuan: Towards Systematic Measurement for Long Text Quality

LongWanjuan: Towards Systematic Measurement for Long Text Quality

The quality of training data are crucial for enhancing the long-text capabilities of foundation models. Despite existing efforts to refine data quality through heuristic rules and evaluations based on data diversity and difficulty, there's a lack of systematic approaches specifically tailored for assessing long texts. Addressing this gap, our work …