Difference between revisions of "Speech Recognition for Education"

From Penn Center for Learning Analytics Wiki
Jump to navigation Jump to search
(correction)
 
(2 intermediate revisions by one other user not shown)
Line 1: Line 1:
Wang et al. (2018) [[https://www.researchgate.net/publication/336009443_Monitoring_the_performance_of_human_and_automated_scores_for_spoken_responses pdf]]
Wang et al. (2018) [https://www.researchgate.net/publication/336009443_Monitoring_the_performance_of_human_and_automated_scores_for_spoken_responses pdf]
*Automated scoring model for evaluating English spoken responses
*Automated scoring model for evaluating English spoken responses
*SpeechRater gave a significantly lower score than human raters for German
*SpeechRater gave a significantly lower score than human raters for German students
*SpeechRater scored in favor of Chinese group, with H1-rater scores higher than mean
*SpeechRater gave higher scores to students from China than human raters, with H1-rater scores higher than mean




  Loukina & Buzick (2017) [[https://onlinelibrary.wiley.com/doi/pdfdirect/10.1002/ets2.12170 pdf]]
  Loukina & Buzick (2017) [https://onlinelibrary.wiley.com/doi/pdfdirect/10.1002/ets2.12170 pdf]
*a model (the SpeechRater) automatically scoring open-ended spoken responses for speakers with documented or suspected speech impairments
*a model (the SpeechRater) automatically scoring open-ended spoken responses for speakers with documented or suspected speech impairments
*SpeechRater was less accurate for test takers who were deferred for signs of speech impairment (ρ<sup>2</sup> = .57) than test takers who were given accommodations for documented disabilities (ρ<sup>2</sup> = .73)
*SpeechRater was less accurate for test takers who were deferred for signs of speech impairment (ρ<sup>2</sup> = .57) than test takers who were given accommodations for documented disabilities (ρ<sup>2</sup> = .73)
Loukina et al. (2019) [https://aclanthology.org/W19-4401.pdf pdf]
*Models providing automated speech scores on English language proficiency assessment
*L1-specific model trained on the speaker’s native language was the least fair, especially for Chinese, Japanese, and Korean speakers, but not for German speakers
*All models (Baseline, Fair feature subset, L1-specific) performed worse for Japanese speakers

Latest revision as of 06:09, 10 June 2022

Wang et al. (2018) pdf

  • Automated scoring model for evaluating English spoken responses
  • SpeechRater gave a significantly lower score than human raters for German students
  • SpeechRater gave higher scores to students from China than human raters, with H1-rater scores higher than mean


  Loukina & Buzick (2017) pdf

  • a model (the SpeechRater) automatically scoring open-ended spoken responses for speakers with documented or suspected speech impairments
  • SpeechRater was less accurate for test takers who were deferred for signs of speech impairment (ρ2 = .57) than test takers who were given accommodations for documented disabilities (ρ2 = .73)


Loukina et al. (2019) pdf

  • Models providing automated speech scores on English language proficiency assessment
  • L1-specific model trained on the speaker’s native language was the least fair, especially for Chinese, Japanese, and Korean speakers, but not for German speakers
  • All models (Baseline, Fair feature subset, L1-specific) performed worse for Japanese speakers