Dolphin: A Spoken Language Proficiency Assessment System for Elementary Education

Type: Preprint

Publication Date: 2020-04-20

Citations: 21

DOI: https://doi.org/10.1145/3366423.3380018

Download PDF

Abstract

Spoken language proficiency is critically important for children's growth and personal development. Due to the limited and imbalanced educational resources in China, elementary students barely have chances to improve their oral language skills in classes. Verbal fluency tasks (VFTs) were invented to let the students practice their spoken language proficiency after school. VFTs are simple but concrete math related questions that ask students to not only report answers but speak out the entire thinking process. In spite of the great success of VFTs, they bring a heavy grading burden to elementary teachers. To alleviate this problem, we develop Dolphin, a spoken language proficiency assessment system for Chinese elementary education. Dolphin is able to automatically evaluate both phonological fluency and semantic relevance of students' VFT answers. We conduct a wide range of offline and online experiments to demonstrate the effectiveness of Dolphin. In our offline experiments, we show that Dolphin improves both phonological fluency and semantic relevance evaluation performance when compared to state-of-the-art baselines on real-world educational data sets. In our online A/B experiments, we test Dolphin with 183 teachers from 2 major cities (Hangzhou and Xi'an) in China for 10 weeks and the results show that VFT assignments grading coverage is improved by 22\%.

Locations

  • arXiv (Cornell University) - View - PDF
  • DataCite API - View

Similar Works

Action Title Year Authors
+ PDF Chat CJEval: A Benchmark for Assessing Large Language Models Using Chinese Junior High School Exam Data 2024 Qianwen Zhang
Haochen Wang
Fang Li
Siyu An
Lingfeng Qiao
Liangcai Gao
Di Yin
Xing Sun
+ PDF Chat Automatic Assessment of Spoken Language Proficiency of Non-native Children 2019 Roberto Gretter
Marco Matassoni
Katharina Allgaier
Svetlana Tchistiakova
Daniele Falavigna
+ PDF Chat ChatPRCS: A Personalized Support System for English Reading Comprehension based on ChatGPT 2024 Xizhe Wang
Yihua Zhong
Changqin Huang
Xiaodi Huang
+ PDF Chat Reading Miscue Detection in Primary School through Automatic Speech Recognition 2024 Lingyun Gao
Cristian Tejedor-Garcı́a
Helmer Strik
Catia Cucchiarini
+ ChatPRCS: A Personalized Support System for English Reading Comprehension based on ChatGPT 2023 Xizhe Wang
Yihua Zhong
Changqin Huang
Xiaodi Huang
+ Generating and Evaluating Tests for K-12 Students with Language Model Simulations: A Case Study on Sentence Reading Efficiency 2023 Eric Zelikman
Wanjing Anya Ma
Jasmine E. Tran
Diyi Yang
Jason D. Yeatman
Nick Haber
+ PDF Chat Generating and Evaluating Tests for K-12 Students with Language Model Simulations: A Case Study on Sentence Reading Efficiency 2023 Eric Zelikman
Wanjing Ma
Jasmine Tran
Diyi Yang
Jason D. Yeatman
Nick Haber
+ PDF Chat MALAMUTE: A Multilingual, Highly-granular, Template-free, Education-based Probing Dataset 2024 Sagi Shaier
George Arthur Baker
Chiranthan Sridhar
Lawrence Hunter
Katharina von der Wense
+ PDF Chat NLP and Education: using semantic similarity to evaluate filled gaps in a large-scale Cloze test in the classroom 2024 Túlio Sousa de Gois
F. Freitas
Julián Tejada
Raquel Meister Ko. Freitag
+ PDF Chat Automated Generation of Multiple-Choice Cloze Questions for Assessing English Vocabulary Using GPT-turbo 3.5 2024 Qiao Wang
Ralph L. Rose
Naho Orita
Ayaka Sugawara
+ PDF Chat PediaBench: A Comprehensive Chinese Pediatric Dataset for Benchmarking Large Language Models 2024 Qian Zhang
Panfeng Chen
Jiali Li
Shuo Feng
Shuyu Liu
Mei Chen
Hui Li
Yanhao Wang
+ Large Language Models Only Pass Primary School Exams in Indonesia: A Comprehensive Test on IndoMMLU 2023 Fajri Koto
Nurul Aisyah
Haonan Li
Timothy Baldwin
+ PDF Chat Large Language Models Only Pass Primary School Exams in Indonesia: A Comprehensive Test on IndoMMLU 2023 Fajri Koto
Nurul Aisyah
Haonan Li
Timothy Baldwin
+ ADPS—A Prescreening Tool for Students With Dyslexia in Learning Traditional Chinese 2024 Ka Yan Fung
Kit-Yi Tang
Tze Leung Rick Lui
Kuen Fung Sin
Lik‐Hang Lee
Huamin Qu
Shenghui Song
+ PDF Chat \llinstruct: An Instruction-tuned model for English Language Proficiency Assessments 2024 Debanjan Ghosh
Sophia Siu Chee Chan
+ PDF Chat Toward the Evaluation of Large Language Models Considering Score Variance across Instruction Templates 2024 Yusuke Sakai
Adam Nohejl
Jiangnan Hang
Hidetaka Kamigaito
Taro Watanabe
+ PDF Chat F-Eval: Asssessing Fundamental Abilities with Refined Evaluation Methods 2024 Yu Sun
Keyu Chen
Shujie Wang
Qipeng Guo
Hang Yan
Xipeng Qiu
Xuanjing Huang
Dahua Lin
+ PDF Chat COIG-CQIA: Quality is All You Need for Chinese Instruction Fine-tuning 2024 Yuelin Bai
Xinrun Du
Yiming Liang
Yonggang Jin
Ziqiang Liu
Junting Zhou
Tianyu Zheng
Xincheng Zhang
Nuo Ma
Zekun Wang
+ PDF Chat A Survey on Machine Reading Comprehension—Tasks, Evaluation Metrics and Benchmark Datasets 2020 Changchang Zeng
Shaobo Li
Qin Li
Jie Hu
Jianjun Hu
+ PDF Chat Can Large Language Models Make the Grade? An Empirical Study Evaluating LLMs Ability To Mark Short Answer Questions in K-12 Education 2024 Owen Henkel
Libby Hills
Adam Boxer
Bill Roberts
Zachary Levonian