https://www.pcla.wiki/api.php?action=feedcontributions&user=Seiyon&feedformat=atomPenn Center for Learning Analytics Wiki - User contributions [en]2024-03-29T01:59:36ZUser contributionsMediaWiki 1.37.1https://www.pcla.wiki/index.php?title=At-risk/Dropout/Stopout/Graduation_Prediction&diff=433At-risk/Dropout/Stopout/Graduation Prediction2022-12-21T23:52:12Z<p>Seiyon: </p>
<hr />
<div>Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf]<br />
* Models predicting student retention in an online college program<br />
* J48 decision trees achieved much lower Kappa and AUC for Black students than White students<br />
* J48 decision trees achieved significantly lower Kappa but higher AUC for male students than female students<br />
* JRip decision rules achieved almost identical Kappa and AUC for Black students and White students<br />
* JRip decision trees achieved much lower Kappa and AUC for male students than female students<br />
<br />
Hu and Rangwala (2020) [https://files.eric.ed.gov/fulltext/ED608050.pdf pdf]<br />
* Models predicting if a college student will fail in a course<br />
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against African-American students, while other models (particularly Logistic Regression and Rawlsian Fairness) performed far worse<br />
* The level of bias was inconsistent across courses, with MCCM prediction showing the least bias for Psychology and the greatest bias for Computer Science<br />
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against male students, performing particularly better for Psychology course.<br />
* Other models (Logistic Regression and Rawlsian Fairness) performed far worse for male students, performing particularly worse in Computer Science and Electrical Engineering.<br />
<br />
<br />
<br />
Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf]<br />
* Models predicting six-year college graduation<br />
* False negatives rates were greater for Latino students when Decision Tree and Random Forest yielded was used<br />
* White students had higher false positive rates across all models, Decision Tree, SVM, Logistic Regression, Random Forest, and SGD<br />
* False negatives rates were greater for male students than female students when SVM, Logistic Regression, and SGD were used<br />
<br />
<br />
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]<br />
* Models predicting student's high school dropout<br />
* The decision trees showed little difference in AUC among White, Black, Hispanic, Asian, American Indian and Alaska Native, and Native Hawaiian and Pacific Islander.<br />
* The decision trees showed very minor differences in AUC between female and male students<br />
<br />
<br />
Gardner, Brooks and Baker (2019) [[https://www.upenn.edu/learninganalytics/ryanbaker/LAK_PAPER97_CAMERA.pdf pdf]]<br />
* Model predicting MOOC dropout, specifically through slicing analysis<br />
* Some algorithms performed worse for female students than male students, particularly in courses with 45% or less male presence<br />
<br />
<br />
Baker et al. (2020) [[https://www.upenn.edu/learninganalytics/ryanbaker/BakerBerningGowda.pdf pdf]]<br />
* Model predicting student graduation and SAT scores for military-connected students<br />
* For prediction of graduation, algorithms applying across population resulted an AUC of 0.60, degrading from their original performance of 70% or 71% to chance.<br />
* For prediction of SAT scores, algorithms applying across population resulted in a Spearman's ρ of 0.42 and 0.44, degrading a third from their original performance to chance.<br />
<br />
<br />
Kai et al. (2017) [https://files.eric.ed.gov/fulltext/ED596601.pdf pdf]<br />
* Models predicting student retention in an online college program<br />
* J-48 decision trees achieved much higher Kappa and AUC for students whose parents did not attend college than those whose parents did<br />
* J-Rip decision rules achieved much higher Kappa and AUC for students whose parents did not attended college than those whose parents did<br />
<br />
<br />
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]<br />
* Models predicting college dropout for students in residential and fully online program<br />
* The model showed better recall for students who are under-represented minority (URM; not White or Asian), male, first-generation, or with greater financial needs<br />
* Whether the socio-demographic information was included or not, the model showed worse accuracy and true negative rates for residential students who are under-represented minority (URM; not White or Asian), male, first-generation, or with greater financial needs<br />
* Both accuracy and true negative rates were better for students who are first-generation, or with greater financial needs<br />
<br />
Verdugo et al. (2022) [https://dl.acm.org/doi/abs/10.1145/3506860.3506902 pdf]<br />
* An algorithm predicting dropout from university after the first year<br />
* Several algorithms achieved better AUC and F1 for students who attended public high schools than for students who attended private high schools.<br />
* Several algorithms predicted better AUC for male students than female students; F1 scores were more balanced.<br />
<br />
Sha et al. (2022) [https://ieeexplore.ieee.org/abstract/document/9849852]<br />
* Predicting dropout in XuetangX platform using neural network<br />
* A range of over-sampling methods tested<br />
* Regardless of over-sampling method used, dropout performance was slightly better for males.</div>Seiyonhttps://www.pcla.wiki/index.php?title=At-risk/Dropout/Stopout/Graduation_Prediction&diff=432At-risk/Dropout/Stopout/Graduation Prediction2022-12-21T23:51:36Z<p>Seiyon: </p>
<hr />
<div>Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf]<br />
* Models predicting student retention in an online college program<br />
* J48 decision trees achieved much lower Kappa and AUC for Black students than White students<br />
* J48 decision trees achieved significantly lower Kappa but higher AUC for male students than female students<br />
* JRip decision rules achieved almost identical Kappa and AUC for Black students and White students<br />
* JRip decision trees achieved much lower Kappa and AUC for male students than female students<br />
<br />
<br />
<br />
Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf]<br />
* Models predicting six-year college graduation<br />
* False negatives rates were greater for Latino students when Decision Tree and Random Forest yielded was used<br />
* White students had higher false positive rates across all models, Decision Tree, SVM, Logistic Regression, Random Forest, and SGD<br />
* False negatives rates were greater for male students than female students when SVM, Logistic Regression, and SGD were used<br />
<br />
<br />
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]<br />
* Models predicting student's high school dropout<br />
* The decision trees showed little difference in AUC among White, Black, Hispanic, Asian, American Indian and Alaska Native, and Native Hawaiian and Pacific Islander.<br />
* The decision trees showed very minor differences in AUC between female and male students<br />
<br />
<br />
Gardner, Brooks and Baker (2019) [[https://www.upenn.edu/learninganalytics/ryanbaker/LAK_PAPER97_CAMERA.pdf pdf]]<br />
* Model predicting MOOC dropout, specifically through slicing analysis<br />
* Some algorithms performed worse for female students than male students, particularly in courses with 45% or less male presence<br />
<br />
<br />
Baker et al. (2020) [[https://www.upenn.edu/learninganalytics/ryanbaker/BakerBerningGowda.pdf pdf]]<br />
* Model predicting student graduation and SAT scores for military-connected students<br />
* For prediction of graduation, algorithms applying across population resulted an AUC of 0.60, degrading from their original performance of 70% or 71% to chance.<br />
* For prediction of SAT scores, algorithms applying across population resulted in a Spearman's ρ of 0.42 and 0.44, degrading a third from their original performance to chance.<br />
<br />
<br />
Kai et al. (2017) [https://files.eric.ed.gov/fulltext/ED596601.pdf pdf]<br />
* Models predicting student retention in an online college program<br />
* J-48 decision trees achieved much higher Kappa and AUC for students whose parents did not attend college than those whose parents did<br />
* J-Rip decision rules achieved much higher Kappa and AUC for students whose parents did not attended college than those whose parents did<br />
<br />
<br />
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]<br />
* Models predicting college dropout for students in residential and fully online program<br />
* The model showed better recall for students who are under-represented minority (URM; not White or Asian), male, first-generation, or with greater financial needs<br />
* Whether the socio-demographic information was included or not, the model showed worse accuracy and true negative rates for residential students who are under-represented minority (URM; not White or Asian), male, first-generation, or with greater financial needs<br />
* Both accuracy and true negative rates were better for students who are first-generation, or with greater financial needs<br />
<br />
Verdugo et al. (2022) [https://dl.acm.org/doi/abs/10.1145/3506860.3506902 pdf]<br />
* An algorithm predicting dropout from university after the first year<br />
* Several algorithms achieved better AUC and F1 for students who attended public high schools than for students who attended private high schools.<br />
* Several algorithms predicted better AUC for male students than female students; F1 scores were more balanced.<br />
<br />
Sha et al. (2022) [https://ieeexplore.ieee.org/abstract/document/9849852]<br />
* Predicting dropout in XuetangX platform using neural network<br />
* A range of over-sampling methods tested<br />
* Regardless of over-sampling method used, dropout performance was slightly better for males.</div>Seiyonhttps://www.pcla.wiki/index.php?title=Gender:_Male/Female&diff=326Gender: Male/Female2022-05-19T13:03:12Z<p>Seiyon: </p>
<hr />
<div>Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf]<br />
* Models predicting student retention in an online college program<br />
* J48 decision trees achieved significantly lower Kappa but higher AUC for male students than female students<br />
* JRip decision rules achieved much lower Kappa and AUC for male students than female students<br />
<br />
<br />
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]<br />
* Models predicting student's high school dropout<br />
* The decision trees showed very minor differences in AUC between female and male students<br />
<br />
<br />
Hu and Rangwala (2020) [https://files.eric.ed.gov/fulltext/ED608050.pdf pdf]<br />
* Models predicting if a college student will fail in a course<br />
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against male students, performing particularly better for Psychology course.<br />
* Other models (Logistic Regression and Rawlsian Fairness) performed far worse for male students, performing particularly worse in Computer Science and Electrical Engineering.<br />
<br />
<br />
Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf]<br />
* Models predicting six-year college graduation<br />
* False negatives rates were greater for male students than female students when SVM, Logistic Regression, and SGD were used<br />
<br />
<br />
Gardner, Brooks and Baker (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/LAK_PAPER97_CAMERA.pdf pdf]<br />
* Model predicting MOOC dropout, specifically through slicing analysis<br />
* Some algorithms studied performed worse for female students than male students, particularly in courses with 45% or less male presence<br />
<br />
<br />
Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]<br />
* Model predicting course outcome<br />
* Marginal differences were found for prediction quality and in overall proportion of predicted pass between groups<br />
* Inconsistent in direction between algorithms.<br />
<br />
<br />
Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf]<br />
* Models predicting college success (or median grade or above)<br />
* Random forest algorithms performed significantly worse for male students than female students<br />
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values<br />
<br />
<br />
Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]<br />
* Model predicting undergraduate short-term (course grades) and long-term (average GPA) success<br />
* Female students were inaccurately predicted to achieve greater short-term and long-term success than male students.<br />
* The fairness of models improved when a combination of institutional and click data was used in the model<br />
<br />
<br />
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]<br />
* Models predicting college dropout for students in residential and fully online program<br />
* Whether the socio-demographic information was included or not, the model showed worse true negative rates and worse accuracy for male students<br />
* The model showed better recall for male students, especially for those studying in person<br />
* The difference in recall and true negative rates were lower, and thus fairer, for male students studying online if their socio-demographic information was not included in the model<br />
<br />
<br />
Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]<br />
* Models predicting course outcome of students in a virtual learning environment (VLE)<br />
* More male students were predicted to pass the course than female students, but this overestimation was fairly small and not consistent across different algorithms<br />
* Among the algorithms, Naive Bayes had the lowest normalized mutual information value and the highest ABROCA value<br />
<br />
<br />
Bridgeman et al. (2009) <br />
[https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring pdf]<br />
<br />
* Automated scoring models for evaluating English essays, or e-rater<br />
* E-Rater system performed comparably accurately for male and female students when assessing their 11th grade essays<br />
<br />
<br />
<br />
Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf]<br />
* A later version of automated scoring models for evaluating English essays, or e-rater<br />
* E-Rater system correlated comparably well with human rater when assessing TOEFL and GRE essays written by male and female students</div>Seiyonhttps://www.pcla.wiki/index.php?title=Latino/Latina/Latinx/Hispanic_Learners_in_North_America&diff=325Latino/Latina/Latinx/Hispanic Learners in North America2022-05-19T12:57:47Z<p>Seiyon: </p>
<hr />
<div>Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf]<br />
* Models predicting six-year college graduation<br />
* False negatives rates were greater for Latino students when Decision Tree and Random Forest yielded was used<br />
* White students had higher false positive rates across all models, Decision Tree, SVM, Logistic Regression, Random Forest, and SGD<br />
<br />
<br />
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]<br />
* Models predicting student's high school dropout<br />
* The decision trees showed little difference in AUC among White, Black, Hispanic, Asian, American Indian and Alaska Native, and Native Hawaiian and Pacific Islander.<br />
<br />
<br />
Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf]<br />
* Models predicting college success (or median grade or above)<br />
* Random forest algorithms performed significantly worse for underrepresented minority students (URM; American Indian, Black, Hawaiian or Pacific Islander, Hispanic, and Multicultural) than non-URM students (White and Asian)<br />
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values<br />
<br />
<br />
Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]<br />
* Model predicting undergraduate short-term (course grades) and long-term (average GPA) success<br />
* Hispanic students were inaccurately predicted to perform worse for both short-term and long-term<br />
* The fairness of models improved when either click or a combination of click and survey data, and not institutional data, was included in the model<br />
<br />
<br />
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]<br />
* Models predicting college dropout for students in residential and fully online program<br />
* Whether the socio-demographic information was included or not, the model showed worse true negative rates for students who are underrepresented minority (URM; or not White or Asian), and worse accuracy if URM students are studying in person <br />
* The model showed better recall for URM students, whether they were in residential or online program<br />
<br />
<br />
Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring page]<br />
* Automated scoring models for evaluating English essays, or e-rater<br />
* E-Rater gave significantly better scores than human rater for 11th grade essays written by Hispanic students and Asian-American students than White students</div>Seiyonhttps://www.pcla.wiki/index.php?title=Black/African-American_Learners_in_North_America&diff=324Black/African-American Learners in North America2022-05-19T12:57:13Z<p>Seiyon: </p>
<hr />
<div>Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf]<br />
* Models predicting student retention in an online college program<br />
* J48 decision trees achieved much lower Kappa and AUC for Black students than White students<br />
* JRip decision rules achieved almost identical Kappa and AUC for Black students and White students<br />
<br />
<br />
Hu and Rangwala (2020) [https://files.eric.ed.gov/fulltext/ED608050.pdf pdf]<br />
* Models predicting if a college student will fail in a course<br />
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against African-American students, while other models (particularly Logistic Regression and Rawlsian Fairness) performed far worse<br />
* The level of bias was inconsistent across courses, with MCCM prediction showing the least bias for Psychology and the greatest bias for Computer Science<br />
<br />
<br />
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]<br />
* Models predicting student's high school dropout<br />
* The decision trees showed little difference in AUC among White, Black, Hispanic, Asian, American Indian and Alaska Native, and Native Hawaiian and Pacific Islander.<br />
<br />
<br />
Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf]<br />
* Models predicting college success (or median grade or above)<br />
* Random forest algorithms performed significantly worse for underrepresented minority students (URM; American Indian, Black, Hawaiian or Pacific Islander, Hispanic, and Multicultural) than non-URM students (White and Asian)<br />
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values<br />
<br />
<br />
Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]<br />
* Model predicting undergraduate short-term (course grades) and long-term (average GPA) success<br />
* Black students were inaccurately predicted to perform worse for both short-term and long-term<br />
* The fairness of models improved when either click or a combination of click and survey data, and not institutional data, was included in the model<br />
<br />
<br />
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]<br />
* Models predicting college dropout for students in residential and fully online program<br />
* Whether the socio-demographic information was included or not, the model showed worse true negative rates for students who are underrepresented minority (URM; or not White or Asian), and worse accuracy if URM students are studying in person <br />
* The model showed better recall for URM students, whether they were in residential or online program<br />
<br />
<br />
Ramineni & Williamson (2018) [https://files.eric.ed.gov/fulltext/EJ1202928.pdf pdf]<br />
* Revised automated scoring engine for assessing GRE essay<br />
* E-rater gave African American test-takers significantly lower scores than human raters when assessing their written responses to argument prompts<br />
* The shorter essays written by African American test-takers were more likely to receive lower scores as showing weakness in content and organization<br />
<br />
<br />
Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring pdf]<br />
* Automated scoring models for evaluating English essays, or e-rater <br />
* The score difference between human rater and e-rater was significantly smaller for 11th grade essays written by White and African American students<br />
<br />
<br />
<br />
Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502 pdf]<br />
* A later version of automated scoring models for evaluating English essays, or e-rater<br />
* E-rater gave significantly lower score than human rater when assessing African-American students’ written responses to issue prompt in GRE</div>Seiyonhttps://www.pcla.wiki/index.php?title=White_Learners_in_North_America&diff=323White Learners in North America2022-05-19T12:47:24Z<p>Seiyon: </p>
<hr />
<div>Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring pdf]<br />
* Automated scoring models for evaluating English essays, or e-rater <br />
* The score difference between human rater and e-rater was significantly smaller for 11th grade essays written by White and African American students</div>Seiyonhttps://www.pcla.wiki/index.php?title=Black/African-American_Learners_in_North_America&diff=322Black/African-American Learners in North America2022-05-19T12:47:09Z<p>Seiyon: </p>
<hr />
<div>Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf]<br />
* Models predicting student retention in an online college program<br />
* J48 decision trees achieved much lower Kappa and AUC for Black students than White students<br />
* JRip decision rules achieved almost identical Kappa and AUC for Black students and White students<br />
<br />
<br />
Hu and Rangwala (2020) [https://files.eric.ed.gov/fulltext/ED608050.pdf pdf]<br />
* Models predicting if a college student will fail in a course<br />
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against African-American students, while other models (particularly Logistic Regression and Rawlsian Fairness) performed far worse<br />
* The level of bias was inconsistent across courses, with MCCM prediction showing the least bias for Psychology and the greatest bias for Computer Science<br />
<br />
<br />
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]<br />
* Models predicting student's high school dropout<br />
* The decision trees showed little difference in AUC among White, Black, Hispanic, Asian, American Indian and Alaska Native, and Native Hawaiian and Pacific Islander.<br />
<br />
<br />
Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf]<br />
* Models predicting college success (or median grade or above)<br />
* Random forest algorithms performed significantly worse for underrepresented minority students (URM; American Indian, Black, Hawaiian or Pacific Islander, Hispanic, and Multicultural) than non-URM students (White and Asian)<br />
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values<br />
<br />
<br />
Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]<br />
* Model predicting undergraduate short-term (course grades) and long-term (average GPA) success<br />
* Black students were inaccurately predicted to perform worse for both short-term and long-term<br />
* The fairness of models improved when either click or a combination of click and survey data, and not institutional data, was included in the model<br />
<br />
<br />
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]<br />
* Models predicting college dropout for students in residential and fully online program<br />
* Whether the socio-demographic information was included or not, the model showed worse true negative rates for students who are underrepresented minority (URM; or not White or Asian), and worse accuracy if URM students are studying in person <br />
* The model showed better recall for URM students<br />
<br />
<br />
Ramineni & Williamson (2018) [https://files.eric.ed.gov/fulltext/EJ1202928.pdf pdf]<br />
* Revised automated scoring engine for assessing GRE essay<br />
* E-rater gave African American test-takers significantly lower scores than human raters when assessing their written responses to argument prompts<br />
* The shorter essays written by African American test-takers were more likely to receive lower scores as showing weakness in content and organization<br />
<br />
<br />
Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring pdf]<br />
* Automated scoring models for evaluating English essays, or e-rater <br />
* The score difference between human rater and e-rater was significantly smaller for 11th grade essays written by White and African American students<br />
<br />
<br />
<br />
Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502 pdf]<br />
* A later version of automated scoring models for evaluating English essays, or e-rater<br />
* E-rater gave significantly lower score than human rater when assessing African-American students’ written responses to issue prompt in GRE</div>Seiyonhttps://www.pcla.wiki/index.php?title=National_Origin_or_National_Location&diff=321National Origin or National Location2022-05-19T12:34:19Z<p>Seiyon: </p>
<hr />
<div><br />
Ogan et al. (2015) [https://link.springer.com/content/pdf/10.1007/s40593-014-0034-8.pdf pdf]<br />
<br />
* Multi-national models predicting learning gains from student's help-seeking behavior<br />
* Models built on only U.S. or combined data sets performed extremely poorly for Costa Rica<br />
* Models performed better when built on and applied for the same country, except for Philippines where model built on that country which was outperformed slightly by model built on U.S. data<br />
<br />
<br />
<br />
Li et al. (2021) [https://arxiv.org/pdf/2103.15212.pdf pdf]<br />
<br />
* Model predicting student achievement on the standardized examination PISA<br />
* Inaccuracy of the U.S.-trained model was greater for students from countries with lower scores of national development (e.g. Indonesia, Vietnam, Moldova)<br />
<br />
<br />
Wang et al. (2018) [https://www.researchgate.net/publication/336009443_Monitoring_the_performance_of_human_and_automated_scores_for_spoken_responses pdf]<br />
<br />
* Automated scoring model for evaluating English spoken responses<br />
* SpeechRater gave a significantly lower score than human raters for German<br />
* SpeechRater scored in favor of Chinese group, with H1-rater scores higher than mean<br />
<br />
<br />
Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring page]<br />
<br />
* Automated scoring models for evaluating English essays, or e-rater<br />
<br />
* E-Rater gave significantly better scores than human rater for TOEFL essays (independent task) written by speakers of Chinese and Korean<br />
* E-Rater correlated poorly with human rater and gave better scores than human rater for GRE essays (both issue and argument prompts) written by Chinese speakers<br />
<br />
<br />
<br />
Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf]<br />
<br />
* A later version of automated scoring models for evaluating English essays, or e-rater<br />
* E-rater gave better scores for test-takers from Chinese speakers (Mainland China, Taiwan, Hong Kong) and Korean speakers when assessing TOEFL (independent prompt) essay<br />
* E-rater gave lower scores for Arabic, Hindi, and Spanish speakers when assessing their written responses to independent prompt in TOEFL</div>Seiyonhttps://www.pcla.wiki/index.php?title=Automated_Essay_Scoring&diff=320Automated Essay Scoring2022-05-19T12:34:06Z<p>Seiyon: </p>
<hr />
<div>Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring page]<br />
<br />
* Automated scoring models for evaluating English essays, or e-rater<br />
* E-Rater gave significantly better scores than human rater for 11th grade essays written by Hispanic students and Asian-American students than White students<br />
* E-Rater gave significantly better scores than human rater for TOEFL essays (independent task) written by speakers of Chinese and Korean<br />
* E-Rater correlated poorly with human rater and gave better scores than human rater for GRE essays (both issue and argument prompts) written by Chinese speakers<br />
* E-Rater system performed comparably accurately for male and female students when assessing their 11th grade essays, TOEFL, and GRE writings<br />
<br />
<br />
<br />
Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf]<br />
<br />
* A later version of automated scoring models for evaluating English essays, or e-rater<br />
* E-rater gave significantly lower score than human rater when assessing African-American students’ written responses to issue prompt in GRE<br />
* E-rater gave better scores for test-takers from Chinese speakers (Mainland China, Taiwan, Hong Kong) and Korean speakers when assessing TOEFL (independent prompt) essay<br />
* E-rater gave lower scores for Arabic, Hindi, and Spanish speakers when assessing their written responses to independent prompt in TOEFL<br />
* E-Rater system correlated comparably well with human rater when assessing TOEFL and GRE essays written by male and female students<br />
<br />
<br />
<br />
<br />
Ramineni & Williamson (2018) [https://onlinelibrary.wiley.com/doi/10.1002/ets2.12192 pdf]<br />
<br />
* Revised automated scoring engine for assessing GSE essay<br />
<br />
* E-rater gave African American test-takers significantly lower scores than human raters when assessing their written responses to argument prompts<br />
* The shorter essays written by African American test-takers were more likely to receive lower scores as showing weakness in content and organization<br />
<br />
<br />
<br />
<br />
Wang et al. (2018) [https://www.researchgate.net/publication/336009443_Monitoring_the_performance_of_human_and_automated_scores_for_spoken_responses pdf]<br />
*Automated scoring model for evaluating English spoken responses<br />
*SpeechRater gave a significantly lower score than human raters for German<br />
*SpeechRater scored in favor of Chinese group, with H1-rater scores higher than mean</div>Seiyonhttps://www.pcla.wiki/index.php?title=Automated_Essay_Scoring&diff=319Automated Essay Scoring2022-05-19T12:33:57Z<p>Seiyon: </p>
<hr />
<div>Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring page]<br />
<br />
* Automated scoring models for evaluating English essays, or e-rater<br />
* E-Rater gave significantly better scores than human rater for 11th grade essays written by Hispanic students and Asian-American students than White students<br />
* E-Rater gave significantly better scores than human rater for TOEFL essays (independent task) written by speakers of Chinese and Korean<br />
* E-Rater correlated poorly with human rater and gave better scores than human rater for GRE essays (both issue and argument prompts) written by Chinese speakers<br />
* E-Rater system performed comparably accurately for male and female students when assessing their 11th grade essays, TOEFL, and GRE writings<br />
<br />
<br />
<br />
Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf]<br />
<br />
* A later version of automated scoring models for evaluating English essays, or e-rater<br />
* E-rater gave significantly lower score than human rater when assessing African-American students’ written responses to issue prompt in GRE<br />
* E-rater gave better scores for test-takers from Chinese speakers (Mainland China, Taiwan, Hong Kong) and Korean speakers when assessing TOEFL (independent prompt) <br />
* E-rater gave lower scores for Arabic, Hindi, and Spanish speakers when assessing their written responses to independent prompt in TOEFL<br />
* E-Rater system correlated comparably well with human rater when assessing TOEFL and GRE essays written by male and female students<br />
<br />
<br />
<br />
<br />
Ramineni & Williamson (2018) [https://onlinelibrary.wiley.com/doi/10.1002/ets2.12192 pdf]<br />
<br />
* Revised automated scoring engine for assessing GSE essay<br />
<br />
* E-rater gave African American test-takers significantly lower scores than human raters when assessing their written responses to argument prompts<br />
* The shorter essays written by African American test-takers were more likely to receive lower scores as showing weakness in content and organization<br />
<br />
<br />
<br />
<br />
Wang et al. (2018) [https://www.researchgate.net/publication/336009443_Monitoring_the_performance_of_human_and_automated_scores_for_spoken_responses pdf]<br />
*Automated scoring model for evaluating English spoken responses<br />
*SpeechRater gave a significantly lower score than human raters for German<br />
*SpeechRater scored in favor of Chinese group, with H1-rater scores higher than mean</div>Seiyonhttps://www.pcla.wiki/index.php?title=National_Origin_or_National_Location&diff=318National Origin or National Location2022-05-19T12:31:27Z<p>Seiyon: </p>
<hr />
<div><br />
Ogan et al. (2015) [https://link.springer.com/content/pdf/10.1007/s40593-014-0034-8.pdf pdf]<br />
<br />
* Multi-national models predicting learning gains from student's help-seeking behavior<br />
* Models built on only U.S. or combined data sets performed extremely poorly for Costa Rica<br />
* Models performed better when built on and applied for the same country, except for Philippines where model built on that country which was outperformed slightly by model built on U.S. data<br />
<br />
<br />
<br />
Li et al. (2021) [https://arxiv.org/pdf/2103.15212.pdf pdf]<br />
<br />
* Model predicting student achievement on the standardized examination PISA<br />
* Inaccuracy of the U.S.-trained model was greater for students from countries with lower scores of national development (e.g. Indonesia, Vietnam, Moldova)<br />
<br />
<br />
Wang et al. (2018) [https://www.researchgate.net/publication/336009443_Monitoring_the_performance_of_human_and_automated_scores_for_spoken_responses pdf]<br />
<br />
* Automated scoring model for evaluating English spoken responses<br />
* SpeechRater gave a significantly lower score than human raters for German<br />
* SpeechRater scored in favor of Chinese group, with H1-rater scores higher than mean<br />
<br />
<br />
Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring page]<br />
<br />
* Automated scoring models for evaluating English essays, or e-rater<br />
<br />
* E-Rater gave significantly better scores than human rater for TOEFL essays (independent task) written by speakers of Chinese and Korean<br />
* E-Rater correlated poorly with human rater and gave better scores than human rater for GRE essays (both issue and argument prompts) written by Chinese speakers<br />
<br />
<br />
<br />
Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf]<br />
<br />
* A later version of automated scoring models for evaluating English essays, or e-rater<br />
* E-rater gave better scores for test-takers from Chinese speakers (Mainland China, Taiwan, Hong Kong) and Korean speakers when assessing TOEFL (independent prompt) and GRE essays<br />
* E-rater gave lower scores for Arabic, Hindi, and Spanish speakers when assessing their written responses to independent prompt in TOEFL</div>Seiyonhttps://www.pcla.wiki/index.php?title=Automated_Essay_Scoring&diff=317Automated Essay Scoring2022-05-19T12:31:07Z<p>Seiyon: </p>
<hr />
<div>Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring page]<br />
<br />
* Automated scoring models for evaluating English essays, or e-rater<br />
* E-Rater gave significantly better scores than human rater for 11th grade essays written by Hispanic students and Asian-American students than White students<br />
* E-Rater gave significantly better scores than human rater for TOEFL essays (independent task) written by speakers of Chinese and Korean<br />
* E-Rater correlated poorly with human rater and gave better scores than human rater for GRE essays (both issue and argument prompts) written by Chinese speakers<br />
* E-Rater system performed comparably accurately for male and female students when assessing their 11th grade essays, TOEFL, and GRE writings<br />
<br />
<br />
<br />
Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf]<br />
<br />
* A later version of automated scoring models for evaluating English essays, or e-rater<br />
* E-rater gave significantly lower score than human rater when assessing African-American students’ written responses to issue prompt in GRE<br />
* E-rater gave better scores for test-takers from Chinese speakers (Mainland China, Taiwan, Hong Kong) and Korean speakers when assessing TOEFL (independent prompt) and GRE essays<br />
* E-rater gave lower scores for Arabic, Hindi, and Spanish speakers when assessing their written responses to independent prompt in TOEFL<br />
* E-Rater system correlated comparably well with human rater when assessing TOEFL and GRE essays written by male and female students<br />
<br />
<br />
<br />
<br />
Ramineni & Williamson (2018) [https://onlinelibrary.wiley.com/doi/10.1002/ets2.12192 pdf]<br />
<br />
* Revised automated scoring engine for assessing GSE essay<br />
<br />
* E-rater gave African American test-takers significantly lower scores than human raters when assessing their written responses to argument prompts<br />
* The shorter essays written by African American test-takers were more likely to receive lower scores as showing weakness in content and organization<br />
<br />
<br />
<br />
<br />
Wang et al. (2018) [https://www.researchgate.net/publication/336009443_Monitoring_the_performance_of_human_and_automated_scores_for_spoken_responses pdf]<br />
*Automated scoring model for evaluating English spoken responses<br />
*SpeechRater gave a significantly lower score than human raters for German<br />
*SpeechRater scored in favor of Chinese group, with H1-rater scores higher than mean</div>Seiyonhttps://www.pcla.wiki/index.php?title=National_Origin_or_National_Location&diff=316National Origin or National Location2022-05-19T12:30:02Z<p>Seiyon: </p>
<hr />
<div><br />
Ogan et al. (2015) [https://link.springer.com/content/pdf/10.1007/s40593-014-0034-8.pdf pdf]<br />
<br />
* Multi-national models predicting learning gains from student's help-seeking behavior<br />
* Models built on only U.S. or combined data sets performed extremely poorly for Costa Rica<br />
* Models performed better when built on and applied for the same country, except for Philippines where model built on that country which was outperformed slightly by model built on U.S. data<br />
<br />
<br />
<br />
Li et al. (2021) [https://arxiv.org/pdf/2103.15212.pdf pdf]<br />
<br />
* Model predicting student achievement on the standardized examination PISA<br />
* Inaccuracy of the U.S.-trained model was greater for students from countries with lower scores of national development (e.g. Indonesia, Vietnam, Moldova)<br />
<br />
<br />
Wang et al. (2018) [https://www.researchgate.net/publication/336009443_Monitoring_the_performance_of_human_and_automated_scores_for_spoken_responses pdf]<br />
<br />
* Automated scoring model for evaluating English spoken responses<br />
* SpeechRater gave a significantly lower score than human raters for German<br />
* SpeechRater scored in favor of Chinese group, with H1-rater scores higher than mean<br />
<br />
<br />
Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring page]<br />
<br />
* Automated scoring models for evaluating English essays, or e-rater<br />
<br />
* E-Rater gave significantly better scores than human rater for TOEFL essays (independent task) written by speakers of Chinese and Korean<br />
* E-Rater correlated poorly with human rater and give better scores than human rater for GRE essays (both issue and argument prompts) written by Chinese speakers<br />
<br />
<br />
<br />
Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf]<br />
<br />
* A later version of automated scoring models for evaluating English essays, or e-rater<br />
* E-rater gave better scores for test-takers from Chinese speakers (Mainland China, Taiwan, Hong Kong) and Korean speakers when assessing TOEFL (independent prompt) and GRE essays<br />
* E-rater gave lower scores for Arabic, Hindi, and Spanish speakers when assessing their written responses to independent prompt in TOEFL</div>Seiyonhttps://www.pcla.wiki/index.php?title=National_Origin_or_National_Location&diff=315National Origin or National Location2022-05-19T12:29:42Z<p>Seiyon: </p>
<hr />
<div><br />
Ogan et al. (2015) [https://link.springer.com/content/pdf/10.1007/s40593-014-0034-8.pdf pdf]<br />
<br />
* Multi-national models predicting learning gains from student's help-seeking behavior<br />
* Models built on only U.S. or combined data sets performed extremely poorly for Costa Rica<br />
* Models performed better when built on and applied for the same country, except for Philippines where model built on that country which was outperformed slightly by model built on U.S. data<br />
<br />
<br />
<br />
Li et al. (2021) [https://arxiv.org/pdf/2103.15212.pdf pdf]<br />
<br />
* Model predicting student achievement on the standardized examination PISA<br />
* Inaccuracy of the U.S.-trained model was greater for students from countries with lower scores of national development (e.g. Indonesia, Vietnam, Moldova)<br />
<br />
<br />
Wang et al. (2018) [https://www.researchgate.net/publication/336009443_Monitoring_the_performance_of_human_and_automated_scores_for_spoken_responses pdf]<br />
<br />
* Automated scoring model for evaluating English spoken responses<br />
* SpeechRater gave a significantly lower score than human raters for German<br />
* SpeechRater scored in favor of Chinese group, with H1-rater scores higher than mean<br />
<br />
<br />
Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring page]<br />
<br />
* Automated scoring models for evaluating English essays, or e-rater<br />
<br />
* E-Rater gave significantly better scores than human rater for TOEFL essays (independent task) written by speakers of Chinese and Korean<br />
* E-Rater correlated poorly with human rater and give better scores for GRE essays (both issue and argument prompts) written by Chinese speakers<br />
<br />
<br />
<br />
Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf]<br />
<br />
* A later version of automated scoring models for evaluating English essays, or e-rater<br />
* E-rater gave better scores for test-takers from Chinese speakers (Mainland China, Taiwan, Hong Kong) and Korean speakers when assessing TOEFL (independent prompt) and GRE essays<br />
* E-rater gave lower scores for Arabic, Hindi, and Spanish speakers when assessing their written responses to independent prompt in TOEFL</div>Seiyonhttps://www.pcla.wiki/index.php?title=Asian/Asian-American_Learners_in_North_America&diff=314Asian/Asian-American Learners in North America2022-05-19T12:29:09Z<p>Seiyon: </p>
<hr />
<div>Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]<br />
* Models predicting student's high school dropout<br />
* The decision trees showed little difference in AUC among White, Black, Hispanic, Asian, American Indian and Alaska Native, and Native Hawaiian and Pacific Islander.<br />
<br />
<br />
Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring page]<br />
* Automated scoring models for evaluating English essays, or e-rater <br />
* E-Rater gave significantly better scores than human rater for 11th grade essays written by Hispanic students and Asian-American students than White students<br />
<br />
<br />
<br />
Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf]<br />
* Models predicting college success (or median grade or above)<br />
* Random forest algorithms performed significantly worse for underrepresented minority students (URM; American Indian, Black, Hawaiian or Pacific Islander, Hispanic, and Multicultural) than non-URM students (White and Asian)<br />
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values</div>Seiyonhttps://www.pcla.wiki/index.php?title=Latino/Latina/Latinx/Hispanic_Learners_in_North_America&diff=313Latino/Latina/Latinx/Hispanic Learners in North America2022-05-19T12:28:33Z<p>Seiyon: </p>
<hr />
<div>Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf]<br />
* Models predicting six-year college graduation<br />
* False negatives rates were greater for Latino students when Decision Tree and Random Forest yielded was used<br />
* White students had higher false positive rates across all models, Decision Tree, SVM, Logistic Regression, Random Forest, and SGD<br />
<br />
<br />
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]<br />
* Models predicting student's high school dropout<br />
* The decision trees showed little difference in AUC among White, Black, Hispanic, Asian, American Indian and Alaska Native, and Native Hawaiian and Pacific Islander.<br />
<br />
<br />
Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf]<br />
* Models predicting college success (or median grade or above)<br />
* Random forest algorithms performed significantly worse for underrepresented minority students (URM; American Indian, Black, Hawaiian or Pacific Islander, Hispanic, and Multicultural) than non-URM students (White and Asian)<br />
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values<br />
<br />
<br />
Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]<br />
* Model predicting undergraduate short-term (course grades) and long-term (average GPA) success<br />
* Hispanic students were inaccurately predicted to perform worse for both short-term and long-term<br />
* The fairness of models improved when either click or a combination of click and survey data, and not institutional data, was included in the model<br />
<br />
<br />
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]<br />
* Models predicting college dropout for students in residential and fully online program<br />
* Whether the socio-demographic information was included or not, the model showed worse true negative rates for students who are underrepresented minority (URM; or not White or Asian), and worse accuracy if URM students are studying in person <br />
* The model showed better recall for URM students<br />
<br />
<br />
Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring page]<br />
* Automated scoring models for evaluating English essays, or e-rater<br />
* E-Rater gave significantly better scores than human rater for 11th grade essays written by Hispanic students and Asian-American students than White students</div>Seiyonhttps://www.pcla.wiki/index.php?title=Native_Language_and_Dialect&diff=312Native Language and Dialect2022-05-19T12:27:20Z<p>Seiyon: </p>
<hr />
<div>Naismith et al. (2018) [http://d-scholarship.pitt.edu/40665/1/EDM2018_paper_37.pdf pdf]<br />
* Model that measures L2 learners’ lexical sophistication with the frequency list based on the native speaker corpora<br />
* Arabic-speaking learners are rated systematically lower across all levels of human-assessed English proficiency than speakers of Chinese, Japanese, Korean, and Spanish<br />
* Level 5 Arabic-speaking learners are inaccurately evaluated to have similar level of lexical sophistication as Level 4 learners from China, Japan, Korean and Spain<br />
* When used on the ETS corpus, essays by Japanese-speaking learners with higher human-rated lexical sophistication are rated significantly lower in lexical sophistication than Arabic, Japanese, Korean and Spanish peers<br />
<br />
<br />
<br />
Loukina et al. (2019) [https://aclanthology.org/W19-4401.pdf pdf]<br />
<br />
* Models providing automated speech scores on English language proficiency assessment<br />
* L1-specific model trained on the speaker’s native language was the least fair, especially for Chinese, Japanese, and Korean speakers, but not for German speakers<br />
* All models (Baseline, Fair feature subset, L1-specific) performed worse for Japanese speakers</div>Seiyonhttps://www.pcla.wiki/index.php?title=Native_Language_and_Dialect&diff=311Native Language and Dialect2022-05-19T12:26:51Z<p>Seiyon: </p>
<hr />
<div>Naismith et al. (2018) [http://d-scholarship.pitt.edu/40665/1/EDM2018_paper_37.pdf pdf]<br />
* Model that measures L2 learners’ lexical sophistication with the frequency list based on the native speaker corpora<br />
* Arabic-speaking learners are rated systematically lower across all levels of human-assessed English proficiency than speakers of Chinese, Japanese, Korean, and Spanish.<br />
* Level 5 Arabic-speaking learners are inaccurately evaluated to have similar level of lexical sophistication as Level 4 learners from China, Japan, Korean and Spain .<br />
* When used on the ETS corpus, essays by Japanese-speaking learners with higher human-rated lexical sophistication are rated significantly lower in lexical sophistication than Arabic, Japanese, Korean and Spanish peers.<br />
<br />
<br />
<br />
Loukina et al. (2019) [https://aclanthology.org/W19-4401.pdf pdf]<br />
<br />
* Models providing automated speech scores on English language proficiency assessment<br />
* L1-specific model trained on the speaker’s native language was the least fair, especially for Chinese, Japanese, and Korean speakers, but not for German speakers<br />
* All models (Baseline, Fair feature subset, L1-specific) performed worse for Japanese speakers</div>Seiyonhttps://www.pcla.wiki/index.php?title=At-risk/Dropout/Stopout/Graduation_Prediction&diff=310At-risk/Dropout/Stopout/Graduation Prediction2022-05-19T12:02:23Z<p>Seiyon: </p>
<hr />
<div>Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf]<br />
* Models predicting student retention in an online college program<br />
* J48 decision trees achieved much lower Kappa and AUC for Black students than White students<br />
* J48 decision trees achieved significantly lower Kappa but higher AUC for male students than female students<br />
* JRip decision rules achieved almost identical Kappa and AUC for Black students and White students<br />
* JRip decision trees achieved much lower Kappa and AUC for male students than female students<br />
<br />
<br />
Hu and Rangwala (2020) [https://files.eric.ed.gov/fulltext/ED608050.pdf pdf]<br />
* Models predicting if a college student will fail in a course<br />
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against African-American students, while other models (particularly Logistic Regression and Rawlsian Fairness) performed far worse<br />
* The level of bias was inconsistent across courses, with MCCM prediction showing the least bias for Psychology and the greatest bias for Computer Science<br />
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against male students, performing particularly better for Psychology course.<br />
* Other models (Logistic Regression and Rawlsian Fairness) performed far worse for male students, performing particularly worse in Computer Science and Electrical Engineering.<br />
<br />
<br />
Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf]<br />
* Models predicting six-year college graduation<br />
* False negatives rates were greater for Latino students when Decision Tree and Random Forest yielded was used<br />
* White students had higher false positive rates across all models, Decision Tree, SVM, Logistic Regression, Random Forest, and SGD<br />
* False negatives rates were greater for male students than female students when SVM, Logistic Regression, and SGD were used<br />
<br />
<br />
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]<br />
* Models predicting student's high school dropout<br />
* The decision trees showed little difference in AUC among White, Black, Hispanic, Asian, American Indian and Alaska Native, and Native Hawaiian and Pacific Islander.<br />
* The decision trees showed very minor differences in AUC between female and male students<br />
<br />
<br />
Gardner, Brooks and Baker (2019) [[https://www.upenn.edu/learninganalytics/ryanbaker/LAK_PAPER97_CAMERA.pdf pdf]]<br />
* Model predicting MOOC dropout, specifically through slicing analysis<br />
* Some algorithms performed worse for female students than male students, particularly in courses with 45% or less male presence<br />
<br />
<br />
Baker et al. (2020) [[https://www.upenn.edu/learninganalytics/ryanbaker/BakerBerningGowda.pdf pdf]]<br />
* Model predicting student graduation and SAT scores for military-connected students<br />
* For prediction of graduation, algorithms applying across population resulted an AUC of 0.60, degrading from their original performance of 70% or 71% to chance.<br />
* For prediction of SAT scores, algorithms applying across population resulted in a Spearman's ρ of 0.42 and 0.44, degrading a third from their original performance to chance.<br />
<br />
<br />
Kai et al. (2017) [https://files.eric.ed.gov/fulltext/ED596601.pdf pdf]<br />
* Models predicting student retention in an online college program<br />
* J-48 decision trees achieved much higher Kappa and AUC for students whose parents did not attend college than those whose parents did<br />
* J-Rip decision rules achieved much higher Kappa and AUC for students whose parents did not attended college than those whose parents did<br />
<br />
<br />
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]<br />
* Models predicting college dropout for students in residential and fully online program<br />
* The model showed better recall for students who are under-represented minority (URM; not White or Asian), male, first-generation, or with greater financial needs<br />
* Whether the socio-demographic information was included or not, the model showed worse accuracy and true negative rates for residential students who are under-represented minority (URM; not White or Asian), male, first-generation, or with greater financial needs<br />
* Both accuracy and true negative rates were better for students who are first-generation, or with greater financial needs</div>Seiyonhttps://www.pcla.wiki/index.php?title=Socioeconomic_Status&diff=309Socioeconomic Status2022-05-19T12:00:54Z<p>Seiyon: </p>
<hr />
<div>Yudelson et al. (2014) [https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.659.872&rep=rep1&type=pdf pdf]<br />
<br />
* Models discovering generalizable sub-populations of students across different schools to predict students' learning with Carnegie Learning’s Cognitive Tutor (CLCT)<br />
<br />
* Models trained on schools with a high proportion of low-SES student performed worse than those trained with medium or low proportion<br />
* Models trained on schools with low, medium proportion of SES students performed similarly well for schools with high proportions of low-SES students<br />
<br />
<br />
Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]<br />
<br />
* Models predicting undergraduate course grades and average GPA<br />
<br />
* Students from low-income households were inaccurately predicted to perform worse for both short-term (final course grade) and long-term (GPA)<br />
* Fairness of model improved if it included only clickstream and survey data<br />
<br />
<br />
<br />
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]<br />
*Models predicting college dropout for students in residential and fully online program<br />
*Whether the socio-demographic information was included or not, the model showed worse accuracy and true negative rates for residential students with greater financial needs<br />
*The model showed better recall for students with greater financial needs, especially for those studying in person</div>Seiyonhttps://www.pcla.wiki/index.php?title=Gender:_Male/Female&diff=308Gender: Male/Female2022-05-19T11:59:35Z<p>Seiyon: </p>
<hr />
<div>Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf]<br />
* Models predicting student retention in an online college program<br />
* J48 decision trees achieved significantly lower Kappa but higher AUC for male students than female students<br />
* JRip decision rules achieved much lower Kappa and AUC for male students than female students<br />
<br />
<br />
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]<br />
* Models predicting student's high school dropout<br />
* The decision trees showed very minor differences in AUC between female and male students<br />
<br />
<br />
Hu and Rangwala (2020) [https://files.eric.ed.gov/fulltext/ED608050.pdf pdf]<br />
* Models predicting if a college student will fail in a course<br />
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against male students, performing particularly better for Psychology course.<br />
* Other models (Logistic Regression and Rawlsian Fairness) performed far worse for male students, performing particularly worse in Computer Science and Electrical Engineering.<br />
<br />
<br />
Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf]<br />
* Models predicting six-year college graduation<br />
* False negatives rates were greater for male students than female students when SVM, Logistic Regression, and SGD were used<br />
<br />
<br />
Gardner, Brooks and Baker (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/LAK_PAPER97_CAMERA.pdf pdf]<br />
* Model predicting MOOC dropout, specifically through slicing analysis<br />
* Some algorithms studied performed worse for female students than male students, particularly in courses with 45% or less male presence<br />
<br />
<br />
Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]<br />
* Model predicting course outcome<br />
* Marginal differences were found for prediction quality and in overall proportion of predicted pass between groups<br />
* Inconsistent in direction between algorithms.<br />
<br />
<br />
Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf]<br />
* Models predicting college success (or median grade or above)<br />
* Random forest algorithms performed significantly worse for male students than female students<br />
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values<br />
<br />
<br />
Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]<br />
* Model predicting undergraduate short-term (course grades) and long-term (average GPA) success<br />
* Female students were inaccurately predicted to achieve greater short-term and long-term success than male students.<br />
* The fairness of models improved when a combination of institutional and click data was used in the model<br />
<br />
<br />
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]<br />
* Models predicting college dropout for students in residential and fully online program<br />
* Whether the socio-demographic information was included or not, the model showed worse true negative rates for male students, and worse accuracy if male students are studying online<br />
* The model showed better recall for male students, especially for those studying in person<br />
* The difference in recall and true negative rates were lower, and thus fairer, for male students studying online if their socio-demographic information was not included in the model<br />
<br />
<br />
Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]<br />
* Models predicting course outcome of students in a virtual learning environment (VLE)<br />
* More male students were predicted to pass the course than female students, but this overestimation was fairly small and not consistent across different algorithms<br />
* Among the algorithms, Naive Bayes had the lowest normalized mutual information value and the highest ABROCA value<br />
<br />
<br />
Bridgeman et al. (2009) <br />
[https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring pdf]<br />
<br />
* Automated scoring models for evaluating English essays, or e-rater<br />
* E-Rater system performed comparably accurately for male and female students when assessing their 11th grade essays<br />
<br />
<br />
<br />
Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf]<br />
* A later version of automated scoring models for evaluating English essays, or e-rater<br />
* E-Rater system correlated comparably well with human rater when assessing TOEFL and GRE essays written by male and female students</div>Seiyonhttps://www.pcla.wiki/index.php?title=Latino/Latina/Latinx/Hispanic_Learners_in_North_America&diff=307Latino/Latina/Latinx/Hispanic Learners in North America2022-05-19T11:57:48Z<p>Seiyon: </p>
<hr />
<div>Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf]<br />
* Models predicting six-year college graduation<br />
* False negatives rates were greater for Latino students when Decision Tree and Random Forest yielded was used<br />
* White students had higher false positive rates across all models, Decision Tree, SVM, Logistic Regression, Random Forest, and SGD<br />
<br />
<br />
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]<br />
* Models predicting student's high school dropout<br />
* The decision trees showed little difference in AUC among White, Black, Hispanic, Asian, American Indian and Alaska Native, and Native Hawaiian and Pacific Islander.<br />
<br />
<br />
Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf]<br />
* Models predicting college success (or median grade or above)<br />
* Random forest algorithms performed significantly worse for underrepresented minority students (URM; American Indian, Black, Hawaiian or Pacific Islander, Hispanic, and Multicultural) than non-URM students (White and Asian)<br />
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values<br />
<br />
<br />
Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]<br />
* Model predicting undergraduate short-term (course grades) and long-term (average GPA) success<br />
* Hispanic students were inaccurately predicted to perform worse for both short-term and long-term<br />
* The fairness of models improved when either click or a combination of click and survey data, and not institutional data, was included in the model<br />
<br />
<br />
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]<br />
* Models predicting college dropout for students in residential and fully online program<br />
* Whether the socio-demographic information was included or not, the model showed worse true negative rates for students who are underrepresented minority (URM; or not White or Asian), and worse accuracy if URM students are studying in person <br />
* The model showed better recall for URM students<br />
<br />
<br />
Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring page]<br />
* Automated scoring models for evaluating English essays, or e-rater<br />
* E-Rater gave significantly better scores for 11th grade essays written by Hispanic students and Asian-American students than White students</div>Seiyonhttps://www.pcla.wiki/index.php?title=Black/African-American_Learners_in_North_America&diff=306Black/African-American Learners in North America2022-05-19T11:56:06Z<p>Seiyon: </p>
<hr />
<div>Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf]<br />
* Models predicting student retention in an online college program<br />
* J48 decision trees achieved much lower Kappa and AUC for Black students than White students<br />
* JRip decision rules achieved almost identical Kappa and AUC for Black students and White students<br />
<br />
<br />
Hu and Rangwala (2020) [https://files.eric.ed.gov/fulltext/ED608050.pdf pdf]<br />
* Models predicting if a college student will fail in a course<br />
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against African-American students, while other models (particularly Logistic Regression and Rawlsian Fairness) performed far worse<br />
* The level of bias was inconsistent across courses, with MCCM prediction showing the least bias for Psychology and the greatest bias for Computer Science<br />
<br />
<br />
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]<br />
* Models predicting student's high school dropout<br />
* The decision trees showed little difference in AUC among White, Black, Hispanic, Asian, American Indian and Alaska Native, and Native Hawaiian and Pacific Islander.<br />
<br />
<br />
Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf]<br />
* Models predicting college success (or median grade or above)<br />
* Random forest algorithms performed significantly worse for underrepresented minority students (URM; American Indian, Black, Hawaiian or Pacific Islander, Hispanic, and Multicultural) than non-URM students (White and Asian)<br />
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values<br />
<br />
<br />
Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]<br />
* Model predicting undergraduate short-term (course grades) and long-term (average GPA) success<br />
* Black students were inaccurately predicted to perform worse for both short-term and long-term<br />
* The fairness of models improved when either click or a combination of click and survey data, and not institutional data, was included in the model<br />
<br />
<br />
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]<br />
* Models predicting college dropout for students in residential and fully online program<br />
* Whether the socio-demographic information was included or not, the model showed worse true negative rates for students who are underrepresented minority (URM; or not White or Asian), and worse accuracy if URM students are studying in person <br />
* The model showed better recall for URM students<br />
<br />
<br />
Ramineni & Williamson (2018) [https://files.eric.ed.gov/fulltext/EJ1202928.pdf pdf]<br />
* Revised automated scoring engine for assessing GRE essay<br />
* E-rater gave African American test-takers significantly lower scores than human raters when assessing their written responses to argument prompts<br />
* The shorter essays written by African American test-takers were more likely to receive lower scores as showing weakness in content and organization<br />
<br />
<br />
Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring pdf]<br />
* Automated scoring models for evaluating English essays, or e-rater <br />
* E-rater gave significantly higher score for 11th grade essays written by Asian American and Hispanic students, particularly, Hispanic female students<br />
* The score difference between human rater and e-rater was significantly smaller for 11th grade essays written by White and African American students<br />
* E-rater gave slightly lower score for GRE essays (argument and issue) written by Black test-takers while e-rated scores were higher for Asian test-takers in the U.S<br />
<br />
<br />
Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502 pdf]<br />
* A later version of automated scoring models for evaluating English essays, or e-rater<br />
* E-rater gave significantly lower score than human rater when assessing African-American students’ written responses to issue prompt in GRE</div>Seiyonhttps://www.pcla.wiki/index.php?title=National_Origin_or_National_Location&diff=305National Origin or National Location2022-05-19T11:52:04Z<p>Seiyon: </p>
<hr />
<div><br />
Ogan et al. (2015) [https://link.springer.com/content/pdf/10.1007/s40593-014-0034-8.pdf pdf]<br />
<br />
* Multi-national models predicting learning gains from student's help-seeking behavior<br />
* Models built on only U.S. or combined data sets performed extremely poorly for Costa Rica<br />
* Models performed better when built on and applied for the same country, except for Philippines where model built on that country which was outperformed slightly by model built on U.S. data<br />
<br />
<br />
<br />
Li et al. (2021) [https://arxiv.org/pdf/2103.15212.pdf pdf]<br />
<br />
* Model predicting student achievement on the standardized examination PISA<br />
* Inaccuracy of the U.S.-trained model was greater for students from countries with lower scores of national development (e.g. Indonesia, Vietnam, Moldova)<br />
<br />
<br />
Wang et al. (2018) [https://www.researchgate.net/publication/336009443_Monitoring_the_performance_of_human_and_automated_scores_for_spoken_responses pdf]<br />
<br />
* Automated scoring model for evaluating English spoken responses<br />
* SpeechRater gave a significantly lower score than human raters for German<br />
* SpeechRater scored in favor of Chinese group, with H1-rater scores higher than mean<br />
<br />
<br />
Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring page]<br />
<br />
* Automated scoring models for evaluating English essays, or e-rater<br />
<br />
* E-Rater gave significantly better scores for TOEFL essays (independent task) written by speakers of Chinese and Korean<br />
* E-Rater correlated poorly with human rater and give better scores for GRE essays (both issue and argument prompts) written by Chinese speakers<br />
<br />
<br />
<br />
Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf]<br />
<br />
* A later version of automated scoring models for evaluating English essays, or e-rater<br />
* E-rater gave better scores for test-takers from Chinese speakers (Mainland China, Taiwan, Hong Kong) and Korean speakers when assessing TOEFL (independent prompt) and GRE essays<br />
* E-rater gave lower scores for Arabic, Hindi, and Spanish speakers when assessing their written responses to independent prompt in TOEFL</div>Seiyonhttps://www.pcla.wiki/index.php?title=National_Origin_or_National_Location&diff=304National Origin or National Location2022-05-19T11:51:57Z<p>Seiyon: </p>
<hr />
<div><br />
Ogan et al. (2015) [https://link.springer.com/content/pdf/10.1007/s40593-014-0034-8.pdf pdf]<br />
<br />
* Multi-national models predicting learning gains from student's help-seeking behavior<br />
* Models built on only U.S. or combined data sets performed extremely poorly for Costa Rica<br />
* Models performed better when built on and applied for the same country, except for Philippines where model built on that country which was outperformed slightly by model built on U.S. data<br />
<br />
<br />
<br />
Li et al. (2021) [https://arxiv.org/pdf/2103.15212.pdf pdf]<br />
<br />
* Model predicting student achievement on the standardized examination PISA<br />
* Inaccuracy of the U.S.-trained model was greater for students from countries with lower scores of national development (e.g. Indonesia, Vietnam, Moldova)<br />
<br />
<br />
Wang et al. (2018) [https://www.researchgate.net/publication/336009443_Monitoring_the_performance_of_human_and_automated_scores_for_spoken_responses pdf]<br />
<br />
* Automated scoring model for evaluating English spoken responses<br />
* SpeechRater gave a significantly lower score than human raters for German<br />
* SpeechRater scored in favor of Chinese group, with H1-rater scores higher than mean<br />
<br />
<br />
Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring page]<br />
<br />
* Automated scoring models for evaluating English essays, or e-rater<br />
<br />
* E-Rater gave significantly better scores for TOEFL essays (independent task) written by speakers of Chinese and Korean<br />
* E-Rater correlated poorly with human rater and give better scores for GRE essays (both issue and argument prompts) written by Chinese speakers<br />
<br />
Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf]<br />
<br />
* A later version of automated scoring models for evaluating English essays, or e-rater<br />
* E-rater gave better scores for test-takers from Chinese speakers (Mainland China, Taiwan, Hong Kong) and Korean speakers when assessing TOEFL (independent prompt) and GRE essays<br />
* E-rater gave lower scores for Arabic, Hindi, and Spanish speakers when assessing their written responses to independent prompt in TOEFL</div>Seiyonhttps://www.pcla.wiki/index.php?title=Gender:_Male/Female&diff=303Gender: Male/Female2022-05-19T11:50:50Z<p>Seiyon: </p>
<hr />
<div>Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf]<br />
* Models predicting student retention in an online college program<br />
* J48 decision trees achieved significantly lower Kappa but higher AUC for male students than female students<br />
* JRip decision rules achieved much lower Kappa and AUC for male students than female students<br />
<br />
<br />
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]<br />
* Models predicting student's high school dropout<br />
* The decision trees showed very minor differences in AUC between female and male students<br />
<br />
<br />
Hu and Rangwala (2020) [https://files.eric.ed.gov/fulltext/ED608050.pdf pdf]<br />
* Models predicting if a college student will fail in a course<br />
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against male students, performing particularly better for Psychology course.<br />
* Other models (Logistic Regression and Rawlsian Fairness) performed far worse for male students, performing particularly worse in Computer Science and Electrical Engineering.<br />
<br />
<br />
Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf]<br />
* Models predicting six-year college graduation<br />
* False negatives rates were greater for male students than female students when SVM, Logistic Regression, and SGD were used<br />
<br />
<br />
Gardner, Brooks and Baker (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/LAK_PAPER97_CAMERA.pdf pdf]<br />
* Model predicting MOOC dropout, specifically through slicing analysis<br />
* Some algorithms studied performed worse for female students than male students, particularly in courses with 45% or less male presence<br />
<br />
<br />
Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]<br />
* Model predicting course outcome<br />
* Marginal differences were found for prediction quality and in overall proportion of predicted pass between groups<br />
* Inconsistent in direction between algorithms.<br />
<br />
<br />
Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf]<br />
* Models predicting college success (or median grade or above)<br />
* Random forest algorithms performed significantly worse for male students than female students<br />
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values<br />
<br />
<br />
Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]<br />
* Model predicting undergraduate short-term (course grades) and long-term (average GPA) success<br />
* Female students were inaccurately predicted to achieve greater short-term and long-term success than male students.<br />
* The fairness of models improved when a combination of institutional and click data was used in the model<br />
<br />
<br />
Yu and colleagues (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]<br />
* Models predicting college dropout for students in residential and fully online program<br />
* Whether the socio-demographic information was included or not, the model showed worse true negative rates for male students, and worse accuracy if they are studying online<br />
* The model showed better recall for male students, especially for those studying in person<br />
* The difference in recall and true negative rates were lower, and thus fairer, for male students studying online if their socio-demographic information was not included in the model<br />
<br />
<br />
Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]<br />
* Models predicting course outcome of students in a virtual learning environment (VLE)<br />
* More male students were predicted to pass the course than female students, but this overestimation was fairly small and not consistent across different algorithms<br />
* Among the algorithms, Naive Bayes had the lowest normalized mutual information value and the highest ABROCA value<br />
<br />
<br />
Bridgeman et al. (2009) <br />
[https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring pdf]<br />
<br />
* Automated scoring models for evaluating English essays, or e-rater<br />
* E-Rater system performed comparably accurately for male and female students when assessing their 11th grade essays<br />
<br />
<br />
<br />
Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf]<br />
* A later version of automated scoring models for evaluating English essays, or e-rater<br />
* E-Rater system correlated comparably well with human rater when assessing TOEFL and GRE essays written by male and female students</div>Seiyonhttps://www.pcla.wiki/index.php?title=Automated_Essay_Scoring&diff=302Automated Essay Scoring2022-05-19T11:37:29Z<p>Seiyon: </p>
<hr />
<div>Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring page]<br />
<br />
* Automated scoring models for evaluating English essays, or e-rater<br />
* E-Rater gave significantly better scores for 11th grade essays written by Hispanic students and Asian-American students than White students<br />
* E-Rater gave significantly better scores for TOEFL essays (independent task) written by speakers of Chinese and Korean<br />
* E-Rater correlated poorly with human rater and give better scores for GRE essays (both issue and argument prompts) written by Chinese speakers<br />
* E-Rater system performed comparably accurately for male and female students when assessing their 11th grade essays, TOEFL, and GRE writings<br />
<br />
<br />
<br />
Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf]<br />
<br />
* A later version of automated scoring models for evaluating English essays, or e-rater<br />
* E-rater gave significantly lower score than human rater when assessing African-American students’ written responses to issue prompt in GRE<br />
* E-rater gave better scores for test-takers from Chinese speakers (Mainland China, Taiwan, Hong Kong) and Korean speakers when assessing TOEFL (independent prompt) and GRE essays<br />
* E-rater gave lower scores for Arabic, Hindi, and Spanish speakers when assessing their written responses to independent prompt in TOEFL<br />
* E-Rater system correlated comparably well with human rater when assessing TOEFL and GRE essays written by male and female students<br />
<br />
<br />
<br />
<br />
Ramineni & Williamson (2018) [https://onlinelibrary.wiley.com/doi/10.1002/ets2.12192 pdf]<br />
<br />
* Revised automated scoring engine for assessing GSE essay<br />
<br />
* E-rater gave African American test-takers significantly lower scores than human raters when assessing their written responses to argument prompts<br />
* The shorter essays written by African American test-takers were more likely to receive lower scores as showing weakness in content and organization<br />
<br />
<br />
<br />
<br />
Wang et al. (2018) [https://www.researchgate.net/publication/336009443_Monitoring_the_performance_of_human_and_automated_scores_for_spoken_responses pdf]<br />
*Automated scoring model for evaluating English spoken responses<br />
*SpeechRater gave a significantly lower score than human raters for German<br />
*SpeechRater scored in favor of Chinese group, with H1-rater scores higher than mean</div>Seiyonhttps://www.pcla.wiki/index.php?title=National_Origin_or_National_Location&diff=301National Origin or National Location2022-05-19T11:37:08Z<p>Seiyon: </p>
<hr />
<div>Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring page]<br />
<br />
* Automated scoring models for evaluating English essays, or e-rater<br />
<br />
* E-Rater gave significantly better scores for TOEFL essays (independent task) written by speakers of Chinese and Korean<br />
* E-Rater correlated poorly with human rater and give better scores for GRE essays (both issue and argument prompts) written by Chinese speakers<br />
<br />
<br />
<br />
Bridgeman, Trapani, and Attali (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf]<br />
*A later version of E-Rater system for automatic grading of GSE essay<br />
* Chinese students were given higher scores than when graded by human essay raters<br />
*Speakers of Arabic and Hindi were given lower scores<br />
<br />
<br />
<br />
Ogan et al. (2015) [https://link.springer.com/content/pdf/10.1007/s40593-014-0034-8.pdf pdf]<br />
<br />
* Multi-national models predicting learning gains from student's help-seeking behavior<br />
* Models built on only U.S. or combined data sets performed extremely poorly for Costa Rica<br />
* Models performed better when built on and applied for the same country, except for Philippines where model built on that country which was outperformed slightly by model built on U.S. data<br />
<br />
<br />
<br />
Li et al. (2021) [https://arxiv.org/pdf/2103.15212.pdf pdf]<br />
<br />
* Model predicting student achievement on the standardized examination PISA<br />
* Inaccuracy of the U.S.-trained model was greater for students from countries with lower scores of national development (e.g. Indonesia, Vietnam, Moldova)<br />
<br />
<br />
Wang et al. (2018) [https://www.researchgate.net/publication/336009443_Monitoring_the_performance_of_human_and_automated_scores_for_spoken_responses pdf]<br />
<br />
* Automated scoring model for evaluating English spoken responses<br />
* SpeechRater gave a significantly lower score than human raters for German<br />
* SpeechRater scored in favor of Chinese group, with H1-rater scores higher than mean<br />
<br />
<br />
Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring pdf]<br />
<br />
* Automated scoring models for evaluating English essays, or e-rater<br />
* E-rater gave significantly higher score for students from China and South Korea than 14 other countries when assessing independent writing task in Test of English as a Foreign Language (TOEFL)<br />
* E-rater gave slightly higher scores for GRE analytical writing, both argument and issue prompts, by students from China whose written responses tended to be the longest and below average on grammar, usage and mechanics<br />
<br />
<br />
<br />
Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf]<br />
<br />
* A later version of automated scoring models for evaluating English essays, or e-rater<br />
* E-rater gave better scores for test-takers from Chinese speakers (Mainland China, Taiwan, Hong Kong) and Korean speakers when assessing TOEFL (independent prompt) and GRE essays<br />
* E-rater gave lower scores for Arabic, Hindi, and Spanish speakers when assessing their written responses to independent prompt in TOEFL</div>Seiyonhttps://www.pcla.wiki/index.php?title=Asian/Asian-American_Learners_in_North_America&diff=300Asian/Asian-American Learners in North America2022-05-19T11:36:26Z<p>Seiyon: </p>
<hr />
<div>Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]<br />
* Models predicting student's high school dropout<br />
* The decision trees showed little difference in AUC among White, Black, Hispanic, Asian, American Indian and Alaska Native, and Native Hawaiian and Pacific Islander.<br />
<br />
<br />
Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring page]<br />
* Automated scoring models for evaluating English essays, or e-rater <br />
* E-Rater gave significantly better scores for 11th grade essays written by Hispanic students and Asian-American students than White students<br />
<br />
<br />
<br />
Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf]<br />
* Models predicting college success (or median grade or above)<br />
* Random forest algorithms performed significantly worse for underrepresented minority students (URM; American Indian, Black, Hawaiian or Pacific Islander, Hispanic, and Multicultural) than non-URM students (White and Asian)<br />
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values</div>Seiyonhttps://www.pcla.wiki/index.php?title=Latino/Latina/Latinx/Hispanic_Learners_in_North_America&diff=299Latino/Latina/Latinx/Hispanic Learners in North America2022-05-19T11:35:52Z<p>Seiyon: </p>
<hr />
<div>Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf]<br />
* Models predicting six-year college graduation<br />
* False negatives rates were greater for Latino students when Decision Tree and Random Forest yielded was used<br />
* White students had higher false positive rates across all models, Decision Tree, SVM, Logistic Regression, Random Forest, and SGD<br />
<br />
<br />
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]<br />
* Models predicting student's high school dropout<br />
* The decision trees showed little difference in AUC among White, Black, Hispanic, Asian, American Indian and Alaska Native, and Native Hawaiian and Pacific Islander.<br />
<br />
<br />
Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf]<br />
* Models predicting college success (or median grade or above)<br />
* Random forest algorithms performed significantly worse for underrepresented minority students (URM; American Indian, Black, Hawaiian or Pacific Islander, Hispanic, and Multicultural) than non-URM students (White and Asian)<br />
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values<br />
<br />
<br />
Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]<br />
* Model predicting undergraduate short-term (course grades) and long-term (average GPA) success<br />
* Hispanic students were inaccurately predicted to perform worse for both short-term and long-term<br />
* The fairness of models improved when either click or a combination of click and survey data, and not institutional data, was included in the model<br />
<br />
<br />
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]<br />
* Models predicting college dropout for students in residential and fully online program<br />
* Whether the socio-demographic information was included or not, the model showed worse true negative rates for students who are underrepresented minority (URM; or not White or Asian), and worse accuracy if they are studying in person <br />
* The model showed better recall for URM students<br />
<br />
<br />
Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring page]<br />
* Automated scoring models for evaluating English essays, or e-rater<br />
* E-Rater gave significantly better scores for 11th grade essays written by Hispanic students and Asian-American students than White students</div>Seiyonhttps://www.pcla.wiki/index.php?title=Latino/Latina/Latinx/Hispanic_Learners_in_North_America&diff=298Latino/Latina/Latinx/Hispanic Learners in North America2022-05-19T11:35:34Z<p>Seiyon: </p>
<hr />
<div>Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf]<br />
* Models predicting six-year college graduation<br />
* False negatives rates were greater for Latino students when Decision Tree and Random Forest yielded was used<br />
* White students had higher false positive rates across all models, Decision Tree, SVM, Logistic Regression, Random Forest, and SGD<br />
<br />
<br />
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]<br />
* Models predicting student's high school dropout<br />
* The decision trees showed little difference in AUC among White, Black, Hispanic, Asian, American Indian and Alaska Native, and Native Hawaiian and Pacific Islander.<br />
<br />
<br />
Lee and Kizilcec (2020) https://arxiv.org/pdf/2007.00088.pdf pdf<br />
* Models predicting college success (or median grade or above)<br />
* Random forest algorithms performed significantly worse for underrepresented minority students (URM; American Indian, Black, Hawaiian or Pacific Islander, Hispanic, and Multicultural) than non-URM students (White and Asian)<br />
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values<br />
<br />
<br />
Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]<br />
* Model predicting undergraduate short-term (course grades) and long-term (average GPA) success<br />
* Hispanic students were inaccurately predicted to perform worse for both short-term and long-term<br />
* The fairness of models improved when either click or a combination of click and survey data, and not institutional data, was included in the model<br />
<br />
<br />
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]<br />
* Models predicting college dropout for students in residential and fully online program<br />
* Whether the socio-demographic information was included or not, the model showed worse true negative rates for students who are underrepresented minority (URM; or not White or Asian), and worse accuracy if they are studying in person <br />
* The model showed better recall for URM students<br />
<br />
<br />
Bridgeman et al. (2009) [[https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring page]]<br />
* Automated scoring models for evaluating English essays, or e-rater<br />
* E-Rater gave significantly better scores for 11th grade essays written by Hispanic students and Asian-American students than White students</div>Seiyonhttps://www.pcla.wiki/index.php?title=Latino/Latina/Latinx/Hispanic_Learners_in_North_America&diff=297Latino/Latina/Latinx/Hispanic Learners in North America2022-05-19T11:35:23Z<p>Seiyon: </p>
<hr />
<div>Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf]<br />
* Models predicting six-year college graduation<br />
* False negatives rates were greater for Latino students when Decision Tree and Random Forest yielded was used<br />
* White students had higher false positive rates across all models, Decision Tree, SVM, Logistic Regression, Random Forest, and SGD<br />
<br />
<br />
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]<br />
* Models predicting student's high school dropout<br />
* The decision trees showed little difference in AUC among White, Black, Hispanic, Asian, American Indian and Alaska Native, and Native Hawaiian and Pacific Islander.<br />
<br />
<br />
Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf]<br />
* Models predicting college success (or median grade or above)<br />
* Random forest algorithms performed significantly worse for underrepresented minority students (URM; American Indian, Black, Hawaiian or Pacific Islander, Hispanic, and Multicultural) than non-URM students (White and Asian)<br />
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values<br />
<br />
<br />
Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]<br />
* Model predicting undergraduate short-term (course grades) and long-term (average GPA) success<br />
* Hispanic students were inaccurately predicted to perform worse for both short-term and long-term<br />
* The fairness of models improved when either click or a combination of click and survey data, and not institutional data, was included in the model<br />
<br />
<br />
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]<br />
* Models predicting college dropout for students in residential and fully online program<br />
* Whether the socio-demographic information was included or not, the model showed worse true negative rates for students who are underrepresented minority (URM; or not White or Asian), and worse accuracy if they are studying in person <br />
* The model showed better recall for URM students<br />
<br />
<br />
Bridgeman et al. (2009) [[https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring page]]<br />
* Automated scoring models for evaluating English essays, or e-rater<br />
* E-Rater gave significantly better scores for 11th grade essays written by Hispanic students and Asian-American students than White students</div>Seiyonhttps://www.pcla.wiki/index.php?title=Gender:_Male/Female&diff=294Gender: Male/Female2022-05-18T21:10:24Z<p>Seiyon: </p>
<hr />
<div>Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf]<br />
* Models predicting student retention in an online college program<br />
* J48 decision trees achieved significantly lower Kappa but higher AUC for male students than female students<br />
* JRip decision rules achieved much lower Kappa and AUC for male students than female students<br />
<br />
<br />
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]<br />
* Models predicting student's high school dropout<br />
* The decision trees showed very minor differences in AUC between female and male students<br />
<br />
<br />
Hu and Rangwala (2020) [https://files.eric.ed.gov/fulltext/ED608050.pdf pdf]<br />
* Models predicting if a college student will fail in a course<br />
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against male students, performing particularly better for Psychology course.<br />
* Other models (Logistic Regression and Rawlsian Fairness) performed far worse for male students, performing particularly worse in Computer Science and Electrical Engineering.<br />
<br />
<br />
Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf]<br />
* Models predicting six-year college graduation<br />
* False negatives rates were greater for male students than female students when SVM, Logistic Regression, and SGD were used<br />
<br />
<br />
Gardner, Brooks and Baker (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/LAK_PAPER97_CAMERA.pdf pdf]<br />
* Model predicting MOOC dropout, specifically through slicing analysis<br />
* Some algorithms studied performed worse for female students than male students, particularly in courses with 45% or less male presence<br />
<br />
<br />
Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]<br />
* Model predicting course outcome<br />
* Marginal differences were found for prediction quality and in overall proportion of predicted pass between groups<br />
* Inconsistent in direction between algorithms.<br />
<br />
<br />
Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf]<br />
* Models predicting college success (or median grade or above)<br />
* Random forest algorithms performed significantly worse for male students than female students<br />
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values<br />
<br />
<br />
Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]<br />
* Model predicting undergraduate short-term (course grades) and long-term (average GPA) success<br />
* Female students were inaccurately predicted to achieve greater short-term and long-term success than male students.<br />
* The fairness of models improved when a combination of institutional and click data was used in the model<br />
<br />
<br />
Yu and colleagues (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]<br />
* Models predicting college dropout for students in residential and fully online program<br />
* Whether the socio-demographic information was included or not, the model showed worse true negative rates for male students, and worse accuracy if they are studying online<br />
* The model showed better recall for male students, especially for those studying in person<br />
* The difference in recall and true negative rates were lower, and thus fairer, for male students studying online if their socio-demographic information was not included in the model<br />
<br />
<br />
Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]<br />
* Models predicting course outcome of students in a virtual learning environment (VLE)<br />
* More male students were predicted to pass the course than female students, but this overestimation was fairly small and not consistent across different algorithms<br />
* Among the algorithms, Naive Bayes had the lowest normalized mutual information value and the highest ABROCA value<br />
<br />
<br />
Bridgeman et al. (2009) <br />
[https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring pdf]<br />
<br />
* Automated scoring models for evaluating English essays, or e-rater<br />
* E-Rater system performed comparably accurately for male and female students when assessing their 11th grade essays<br />
<br />
<br />
Bridgeman, Trapani, and Attali (2012) [https://www.researchgate.net/publication/233291671_Comparison_of_Human_and_Machine_Scoring_of_Essays_Differences_by_Gender_Ethnicity_and_Country pdf]<br />
* A later version of automated scoring models for evaluating English essays, or e-rater<br />
* The score difference between human rater and e-rater was marginal when written responses to GRE issue prompt by male and female test-takers were compared<br />
* The difference in score was significantly greater when assessing written responses to GRE argument prompt, as e-rater gave lower score for male test-takers, particularly for African American, American Indian, and Hispanic males, when assessing written responses to GRE argument prompt<br />
<br />
<br />
<br />
Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf]<br />
* A later version of automated scoring models for evaluating English essays, or e-rater<br />
* E-Rater system correlated comparably well with human rater when assessing TOEFL and GRE essays written by male and female students</div>Seiyonhttps://www.pcla.wiki/index.php?title=Gender:_Male/Female&diff=293Gender: Male/Female2022-05-18T21:10:14Z<p>Seiyon: </p>
<hr />
<div>Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf]<br />
* Models predicting student retention in an online college program<br />
* J48 decision trees achieved significantly lower Kappa but higher AUC for male students than female students<br />
* JRip decision rules achieved much lower Kappa and AUC for male students than female students<br />
<br />
<br />
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]<br />
* Models predicting student's high school dropout<br />
* The decision trees showed very minor differences in AUC between female and male students<br />
<br />
<br />
Hu and Rangwala (2020) [https://files.eric.ed.gov/fulltext/ED608050.pdf pdf]<br />
* Models predicting if a college student will fail in a course<br />
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against male students, performing particularly better for Psychology course.<br />
* Other models (Logistic Regression and Rawlsian Fairness) performed far worse for male students, performing particularly worse in Computer Science and Electrical Engineering.<br />
<br />
<br />
Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf]<br />
* Models predicting six-year college graduation<br />
* False negatives rates were greater for male students than female students when SVM, Logistic Regression, and SGD were used<br />
<br />
<br />
Gardner, Brooks and Baker (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/LAK_PAPER97_CAMERA.pdf pdf]<br />
* Model predicting MOOC dropout, specifically through slicing analysis<br />
* Some algorithms studied performed worse for female students than male students, particularly in courses with 45% or less male presence<br />
<br />
<br />
Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]<br />
* Model predicting course outcome<br />
* Marginal differences were found for prediction quality and in overall proportion of predicted pass between groups<br />
* Inconsistent in direction between algorithms.<br />
<br />
<br />
Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf]<br />
* Models predicting college success (or median grade or above)<br />
* Random forest algorithms performed significantly worse for male students than female students<br />
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values<br />
<br />
<br />
Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]<br />
* Model predicting undergraduate short-term (course grades) and long-term (average GPA) success<br />
* Female students were inaccurately predicted to achieve greater short-term and long-term success than male students.<br />
* The fairness of models improved when a combination of institutional and click data was used in the model<br />
<br />
<br />
Yu and colleagues (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]<br />
* Models predicting college dropout for students in residential and fully online program<br />
* Whether the socio-demographic information was included or not, the model showed worse true negative rates for male students, and worse accuracy if they are studying online<br />
* The model showed better recall for male students, especially for those studying in person<br />
* The difference in recall and true negative rates were lower, and thus fairer, for male students studying online if their socio-demographic information was not included in the model<br />
<br />
<br />
Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]<br />
* Models predicting course outcome of students in a virtual learning environment (VLE)<br />
* More male students were predicted to pass the course than female students, but this overestimation was fairly small and not consistent across different algorithms<br />
* Among the algorithms, Naive Bayes had the lowest normalized mutual information value and the highest ABROCA value<br />
<br />
<br />
Bridgeman et al. (2009) <br />
[https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring pdf]<br />
<br />
* Automated scoring models for evaluating English essays, or e-rater<br />
* E-Rater system performed comparably accurately for male and female students when assessing their 11th grade essays<br />
<br />
<br />
Bridgeman, Trapani, and Attali (2012) [https://www.researchgate.net/publication/233291671_Comparison_of_Human_and_Machine_Scoring_of_Essays_Differences_by_Gender_Ethnicity_and_Country pdf]<br />
* A later version of automated scoring models for evaluating English essays, or e-rater<br />
* The score difference between human rater and e-rater was marginal when written responses to GRE issue prompt by male and female test-takers were compared<br />
* The difference in score was significantly greater when assessing written responses to GRE argument prompt, as e-rater gave lower score for male test-takers, particularly for African American, American Indian, and Hispanic males, when assessing written responses to GRE argument prompt<br />
<br />
Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf]<br />
* A later version of automated scoring models for evaluating English essays, or e-rater<br />
* E-Rater system correlated comparably well with human rater when assessing TOEFL and GRE essays written by male and female students</div>Seiyonhttps://www.pcla.wiki/index.php?title=National_Origin_or_National_Location&diff=292National Origin or National Location2022-05-18T21:09:32Z<p>Seiyon: </p>
<hr />
<div>Bridgeman, Trapani, and Attali (2009) [https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.577.7573&rep=rep1&type=pdf pdf]<br />
<br />
* Automated scoring models for evaluating English essays, or e-rater<br />
<br />
* E-Rater gave significantly better scores for TOEFL essays (independent task) written by speakers of Chinese and Korean<br />
* E-Rater correlated poorly with human rater and give better scores for GRE essays (both issue and argument prompts) written by Chinese speakers<br />
<br />
<br />
<br />
Bridgeman, Trapani, and Attali (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf]<br />
*A later version of E-Rater system for automatic grading of GSE essay<br />
* Chinese students were given higher scores than when graded by human essay raters<br />
*Speakers of Arabic and Hindi were given lower scores<br />
<br />
<br />
<br />
Ogan et al. (2015) [https://link.springer.com/content/pdf/10.1007/s40593-014-0034-8.pdf pdf]<br />
<br />
* Multi-national models predicting learning gains from student's help-seeking behavior<br />
* Models built on only U.S.or combined data sets performed extremely poorly for Costa Rica<br />
* Models performed better when built on and applied for the dataset, except for Philippines which was outperformed slightly by U.S. model<br />
<br />
<br />
<br />
Li et al. (2021) [https://arxiv.org/pdf/2103.15212.pdf pdf]<br />
<br />
* Model predicting student achievement on the standardized examination PISA<br />
* Inaccuracy of the U.S.-trained model was greater for students from countries with lower scores of national development (e.g. Indonesia, Vietnam, Moldova)<br />
<br />
<br />
Wang et al. (2018) [https://www.researchgate.net/publication/336009443_Monitoring_the_performance_of_human_and_automated_scores_for_spoken_responses pdf]<br />
<br />
* Automated scoring model for evaluating English spoken responses<br />
* SpeechRater gave a significantly lower score than human raters for German<br />
* SpeechRater scored in favor of Chinese group, with H1-rater scores higher than mean<br />
<br />
<br />
Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring pdf]<br />
<br />
* Automated scoring models for evaluating English essays, or e-rater<br />
* E-rater gave significantly higher score for students from China and South Korea than 14 other countries when assessing independent writing task in Test of English as a Foreign Language (TOEFL)<br />
* E-rater gave slightly higher scores for GRE analytical writing, both argument and issue prompts, by students from China whose written responses tended to be the longest and below average on grammar, usage and mechanics<br />
<br />
<br />
<br />
Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf]<br />
<br />
* A later version of automated scoring models for evaluating English essays, or e-rater<br />
* E-rater gave better scores for test-takers from Chinese speakers (Mainland China, Taiwan, Hong Kong) and Korean speakers when assessing TOEFL (independent prompt) and GRE essays<br />
* E-rater gave lower scores for Arabic, Hindi, and Spanish speakers when assessing their written responses to independent prompt in TOEFL</div>Seiyonhttps://www.pcla.wiki/index.php?title=Automated_Essay_Scoring&diff=291Automated Essay Scoring2022-05-18T21:08:33Z<p>Seiyon: </p>
<hr />
<div>Bridgeman et al. (2009) [https://d1wqtxts1xzle7.cloudfront.net/52116920/AERA_NCME_2009_Bridgeman-with-cover-page-v2.pdf?Expires=1652902797&Signature=PiItDCa9BN8Mey1wuXaa2Lo0uCVp4I245Xx14GPKAthX7YREEF2wT8HEjmwwiSL~rn9tB21kL6zYIFrL3b44oyHfw5ywE1GQmGSeLCcK7f0WyfQUzYDXbQqWzJCInX9t3QvPKN05XK37iAn7SrEI5iN2HQcYmeF3B0fhtLdszf2-5TtPfT1dNwtdo8A30Z4xxtt~gIBQwXYtNhtPbv3idaZPUZe3lZf6kGGweKj3q9-yuyPc8VkXg7Tc72AOUlQqpjb2TDPH7vze1xLbg3Q1~YxYJnHWhvIINkbAadTLitQZvKhJOmV-it2pNEqqzrnwwl5~gqcgX180xVd89z81iQ__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA pdf]<br />
<br />
* Automated scoring models for evaluating English essays, or e-rater<br />
* E-Rater gave significantly better scores for 11th grade essays written by Hispanic students and Asian-American students than White students<br />
* E-Rater gave significantly better scores for TOEFL essays (independent task) written by speakers of Chinese and Korean<br />
* E-Rater correlated poorly with human rater and give better scores for GRE essays (both issue and argument prompts) written by Chinese speakers<br />
* E-Rater system performed comparably accurately for male and female students when assessing their 11th grade essays, TOEFL, and GRE writings<br />
<br />
<br />
<br />
Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf]<br />
<br />
* A later version of automated scoring models for evaluating English essays, or e-rater<br />
* E-rater gave significantly lower score than human rater when assessing African-American students’ written responses to issue prompt in GRE<br />
* E-rater gave better scores for test-takers from Chinese speakers (Mainland China, Taiwan, Hong Kong) and Korean speakers when assessing TOEFL (independent prompt) and GRE essays<br />
* E-rater gave lower scores for Arabic, Hindi, and Spanish speakers when assessing their written responses to independent prompt in TOEFL<br />
* E-Rater system correlated comparably well with human rater when assessing TOEFL and GRE essays written by male and female students<br />
<br />
<br />
<br />
<br />
Ramineni & Williamson (2018) [https://onlinelibrary.wiley.com/doi/10.1002/ets2.12192 pdf]<br />
<br />
* Revised automated scoring engine for assessing GSE essay<br />
<br />
* E-rater gave African American test-takers significantly lower scores than human raters when assessing their written responses to argument prompts<br />
* The shorter essays written by African American test-takers were more likely to receive lower scores as showing weakness in content and organization<br />
<br />
<br />
<br />
<br />
Wang et al. (2018) [https://www.researchgate.net/publication/336009443_Monitoring_the_performance_of_human_and_automated_scores_for_spoken_responses pdf]<br />
*Automated scoring model for evaluating English spoken responses<br />
*SpeechRater gave a significantly lower score than human raters for German<br />
*SpeechRater scored in favor of Chinese group, with H1-rater scores higher than mean</div>Seiyonhttps://www.pcla.wiki/index.php?title=Black/African-American_Learners_in_North_America&diff=290Black/African-American Learners in North America2022-05-18T20:44:35Z<p>Seiyon: </p>
<hr />
<div>Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf]<br />
* Models predicting student retention in an online college program<br />
* J48 decision trees achieved much lower Kappa and AUC for Black students than White students<br />
* JRip decision rules achieved almost identical Kappa and AUC for Black students and White students<br />
<br />
<br />
Hu and Rangwala (2020) [https://files.eric.ed.gov/fulltext/ED608050.pdf pdf]<br />
* Models predicting if a college student will fail in a course<br />
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against African-American students, while other models (particularly Logistic Regression and Rawlsian Fairness) performed far worse<br />
* The level of bias was inconsistent across courses, with MCCM prediction showing the least bias for Psychology and the greatest bias for Computer Science<br />
<br />
<br />
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]<br />
* Models predicting student's high school dropout<br />
* The decision trees showed little difference in AUC among White, Black, Hispanic, Asian, American Indian and Alaska Native, and Native Hawaiian and Pacific Islander.<br />
<br />
<br />
Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf]<br />
* Models predicting college success (or median grade or above)<br />
* Random forest algorithms performed significantly worse for underrepresented minority students (URM; American Indian, Black, Hawaiian or Pacific Islander, Hispanic, and Multicultural) than non-URM students (White and Asian)<br />
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values<br />
<br />
<br />
Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]<br />
* Model predicting undergraduate short-term (course grades) and long-term (average GPA) success<br />
* Black students were inaccurately predicted to perform worse for both short-term and long-term<br />
* The fairness of models improved when either click or a combination of click and survey data, and not institutional data, was included in the model<br />
<br />
<br />
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]<br />
* Models predicting college dropout for students in residential and fully online program<br />
* Whether the socio-demographic information was included or not, the model showed worse true negative rates for students who are underrepresented minority (URM; or not White or Asian), and worse accuracy if they are studying in person <br />
* The model showed better recall for URM students<br />
<br />
<br />
Ramineni & Williamson (2018) [https://files.eric.ed.gov/fulltext/EJ1202928.pdf pdf]<br />
* Revised automated scoring engine for assessing GRE essay<br />
* E-rater gave African American test-takers significantly lower scores than human raters when assessing their written responses to argument prompts<br />
* The shorter essays written by African American test-takers were more likely to receive lower scores as showing weakness in content and organization<br />
<br />
<br />
Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring pdf]<br />
* Automated scoring models for evaluating English essays, or e-rater <br />
* E-rater gave significantly higher score for 11th grade essays written by Asian American and Hispanic students, particularly, Hispanic female students<br />
* The score difference between human rater and e-rater was significantly smaller for 11th grade essays written by White and African American students<br />
* E-rater gave slightly lower score for GRE essays (argument and issue) written by Black test-takers while e-rated scores were higher for Asian test-takers in the U.S<br />
<br />
<br />
Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502 pdf]<br />
* A later version of automated scoring models for evaluating English essays, or e-rater<br />
* E-rater gave significantly lower score than human rater when assessing African-American students’ written responses to issue prompt in GRE</div>Seiyonhttps://www.pcla.wiki/index.php?title=Automated_Essay_Scoring&diff=289Automated Essay Scoring2022-05-18T20:38:38Z<p>Seiyon: </p>
<hr />
<div>Bridgeman et al. (2009) [https://d1wqtxts1xzle7.cloudfront.net/52116920/AERA_NCME_2009_Bridgeman-with-cover-page-v2.pdf?Expires=1652902797&Signature=PiItDCa9BN8Mey1wuXaa2Lo0uCVp4I245Xx14GPKAthX7YREEF2wT8HEjmwwiSL~rn9tB21kL6zYIFrL3b44oyHfw5ywE1GQmGSeLCcK7f0WyfQUzYDXbQqWzJCInX9t3QvPKN05XK37iAn7SrEI5iN2HQcYmeF3B0fhtLdszf2-5TtPfT1dNwtdo8A30Z4xxtt~gIBQwXYtNhtPbv3idaZPUZe3lZf6kGGweKj3q9-yuyPc8VkXg7Tc72AOUlQqpjb2TDPH7vze1xLbg3Q1~YxYJnHWhvIINkbAadTLitQZvKhJOmV-it2pNEqqzrnwwl5~gqcgX180xVd89z81iQ__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA pdf]<br />
<br />
* Automated scoring models for evaluating English essays, or e-rater<br />
* E-Rater gave significantly better scores for 11th grade essays written by Hispanic students and Asian-American students than White students<br />
* E-Rater gave significantly better scores for TOEFL essays (independent task) written by speakers of Chinese and Korean<br />
* E-Rater correlated poorly with human rater and give better scores for GRE essays (both issue and argument prompts) written by Chinese speakers<br />
* E-Rater system performed comparably accurately for male and female students when assessing their 11th grade essays, TOEFL, and GRE writings<br />
<br />
<br />
<br />
Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf]<br />
<br />
* A later version of automated scoring models for evaluating English essays, or e-rater<br />
*E-rater gave significantly lower score than human rater when assessing African-American students’ written responses to issue prompt in GRE<br />
<br />
<br />
<br />
Ramineni & Williamson (2018) [https://onlinelibrary.wiley.com/doi/10.1002/ets2.12192 pdf]<br />
<br />
* Revised automated scoring engine for assessing GSE essay<br />
<br />
* E-rater gave African American test-takers significantly lower scores than human raters when assessing their written responses to argument prompts<br />
* The shorter essays written by African American test-takers were more likely to receive lower scores as showing weakness in content and organization<br />
<br />
<br />
<br />
<br />
Wang et al. (2018) [https://www.researchgate.net/publication/336009443_Monitoring_the_performance_of_human_and_automated_scores_for_spoken_responses pdf]<br />
*Automated scoring model for evaluating English spoken responses<br />
*SpeechRater gave a significantly lower score than human raters for German<br />
*SpeechRater scored in favor of Chinese group, with H1-rater scores higher than mean</div>Seiyonhttps://www.pcla.wiki/index.php?title=Automated_Essay_Scoring&diff=288Automated Essay Scoring2022-05-18T20:38:22Z<p>Seiyon: </p>
<hr />
<div>Bridgeman et al. (2009) [https://d1wqtxts1xzle7.cloudfront.net/52116920/AERA_NCME_2009_Bridgeman-with-cover-page-v2.pdf?Expires=1652902797&Signature=PiItDCa9BN8Mey1wuXaa2Lo0uCVp4I245Xx14GPKAthX7YREEF2wT8HEjmwwiSL~rn9tB21kL6zYIFrL3b44oyHfw5ywE1GQmGSeLCcK7f0WyfQUzYDXbQqWzJCInX9t3QvPKN05XK37iAn7SrEI5iN2HQcYmeF3B0fhtLdszf2-5TtPfT1dNwtdo8A30Z4xxtt~gIBQwXYtNhtPbv3idaZPUZe3lZf6kGGweKj3q9-yuyPc8VkXg7Tc72AOUlQqpjb2TDPH7vze1xLbg3Q1~YxYJnHWhvIINkbAadTLitQZvKhJOmV-it2pNEqqzrnwwl5~gqcgX180xVd89z81iQ__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA pdf]<br />
<br />
* Automated scoring models for evaluating English essays, or e-rater<br />
* E-Rater gave significantly better scores for 11th grade essays written by Hispanic students and Asian-American students than White students<br />
* E-Rater gave significantly better scores for TOEFL essays (independent task) written by speakers of Chinese and Korean<br />
* E-Rater correlated poorly with human rater and give better scores for GRE essays (both issue and argument prompts) written by Chinese speakers<br />
* E-Rater system performed comparably accurately for male and female students when assessing their 11th grade essays, TOEFL, and GRE writings<br />
<br />
<br />
<br />
Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf]<br />
<br />
* A later version of automated scoring models for evaluating English essays, or e-rater<br />
* E-rater gave particularly lower score for African-American, and American-Indian males, when assessing written responses to issue prompt in GRE<br />
* The score was significantly lower when e-rater was assessing GRE written responses to argument prompt by African-American test-takers, both males and females.<br />
<br />
* E-rater gave slightly higher scores for test-takers from Chinese speakers (Mainland China, Taiwan, Hong Kong) and Korean speakers when assessing written responses to independent prompt in Test of English as a Foreign Language (TOEFL)<br />
* E-rater gave slightly lower scores for Arabic, Hindi, and Spanish speakers when assessing their written responses to independent prompt in TOEFL<br />
* E-rater gave significantly higher scores for test-takers from Mainland China than from Taiwan, Korea and Japan when assessing their GRE writings which tended to be below average on grammar, usage, and mechanics but longest response<br />
<br />
* The score difference between human rater and e-rater was marginal when written responses to GRE issue prompt by male and female test-takers were compared<br />
* The difference in score was significantly greater when assessing written responses to GRE argument prompt, as e-rater gave lower score for male test-takers, particularly for African American, American Indian, and Hispanic males, when assessing written responses to GRE argument prompt<br />
<br />
<br />
<br />
Ramineni & Williamson (2018) [https://onlinelibrary.wiley.com/doi/10.1002/ets2.12192 pdf]<br />
<br />
* Revised automated scoring engine for assessing GSE essay<br />
<br />
* E-rater gave African American test-takers significantly lower scores than human raters when assessing their written responses to argument prompts<br />
* The shorter essays written by African American test-takers were more likely to receive lower scores as showing weakness in content and organization<br />
<br />
<br />
<br />
<br />
Wang et al. (2018) [https://www.researchgate.net/publication/336009443_Monitoring_the_performance_of_human_and_automated_scores_for_spoken_responses pdf]<br />
*Automated scoring model for evaluating English spoken responses<br />
*SpeechRater gave a significantly lower score than human raters for German<br />
*SpeechRater scored in favor of Chinese group, with H1-rater scores higher than mean</div>Seiyonhttps://www.pcla.wiki/index.php?title=Black/African-American_Learners_in_North_America&diff=287Black/African-American Learners in North America2022-05-18T20:37:44Z<p>Seiyon: </p>
<hr />
<div>Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf]<br />
* Models predicting student retention in an online college program<br />
* J48 decision trees achieved much lower Kappa and AUC for Black students than White students<br />
* JRip decision rules achieved almost identical Kappa and AUC for Black students and White students<br />
<br />
<br />
Hu and Rangwala (2020) [https://files.eric.ed.gov/fulltext/ED608050.pdf pdf]<br />
* Models predicting if a college student will fail in a course<br />
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against African-American students, while other models (particularly Logistic Regression and Rawlsian Fairness) performed far worse<br />
* The level of bias was inconsistent across courses, with MCCM prediction showing the least bias for Psychology and the greatest bias for Computer Science<br />
<br />
<br />
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]<br />
* Models predicting student's high school dropout<br />
* The decision trees showed little difference in AUC among White, Black, Hispanic, Asian, American Indian and Alaska Native, and Native Hawaiian and Pacific Islander.<br />
<br />
<br />
Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf]<br />
* Models predicting college success (or median grade or above)<br />
* Random forest algorithms performed significantly worse for underrepresented minority students (URM; American Indian, Black, Hawaiian or Pacific Islander, Hispanic, and Multicultural) than non-URM students (White and Asian)<br />
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values<br />
<br />
<br />
Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]<br />
* Model predicting undergraduate short-term (course grades) and long-term (average GPA) success<br />
* Black students were inaccurately predicted to perform worse for both short-term and long-term<br />
* The fairness of models improved when either click or a combination of click and survey data, and not institutional data, was included in the model<br />
<br />
<br />
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]<br />
* Models predicting college dropout for students in residential and fully online program<br />
* Whether the socio-demographic information was included or not, the model showed worse true negative rates for students who are underrepresented minority (URM; or not White or Asian), and worse accuracy if they are studying in person <br />
* The model showed better recall for URM students<br />
<br />
<br />
Ramineni & Williamson (2018) [https://files.eric.ed.gov/fulltext/EJ1202928.pdf pdf]<br />
* Revised automated scoring engine for assessing GRE essay<br />
* E-rater gave African American test-takers significantly lower scores than human raters when assessing their written responses to argument prompts<br />
* The shorter essays written by African American test-takers were more likely to receive lower scores as showing weakness in content and organization<br />
<br />
<br />
Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring pdf]<br />
* Automated scoring models for evaluating English essays, or e-rater <br />
* E-rater gave significantly higher score for 11th grade essays written by Asian American and Hispanic students, particularly, Hispanic female students<br />
* The score difference between human rater and e-rater was significantly smaller for 11th grade essays written by White and African American students<br />
* E-rater gave slightly lower score for GRE essays (argument and issue) written by Black test-takers while e-rated scores were higher for Asian test-takers in the U.S<br />
<br />
<br />
Bridgeman et al. (2012) [https://www.researchgate.net/publication/233291671_Comparison_of_Human_and_Machine_Scoring_of_Essays_Differences_by_Gender_Ethnicity_and_Country pdf]<br />
* A later version of automated scoring models for evaluating English essays, or e-rater<br />
* E-rater gave significantly lower score than human rater when assessing African-American students’ written responses to issue prompt in GRE</div>Seiyonhttps://www.pcla.wiki/index.php?title=National_Origin_or_National_Location&diff=286National Origin or National Location2022-05-18T20:25:43Z<p>Seiyon: </p>
<hr />
<div>Bridgeman, Trapani, and Attali (2009) [https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.577.7573&rep=rep1&type=pdf pdf]<br />
<br />
* Automated scoring models for evaluating English essays, or e-rater<br />
<br />
* E-Rater gave significantly better scores for TOEFL essays (independent task) written by speakers of Chinese and Korean<br />
* E-Rater correlated poorly with human rater and give better scores for GRE essays (both issue and argument prompts) written by Chinese speakers<br />
<br />
<br />
<br />
Bridgeman, Trapani, and Attali (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf]<br />
*A later version of E-Rater system for automatic grading of GSE essay<br />
* Chinese students were given higher scores than when graded by human essay raters<br />
*Speakers of Arabic and Hindi were given lower scores<br />
<br />
<br />
<br />
Ogan et al. (2015) [https://link.springer.com/content/pdf/10.1007/s40593-014-0034-8.pdf pdf]<br />
<br />
* Multi-national models predicting learning gains from student's help-seeking behavior<br />
* Models built on only U.S.or combined data sets performed extremely poorly for Costa Rica<br />
* Models performed better when built on and applied for the dataset, except for Philippines which was outperformed slightly by U.S. model<br />
<br />
<br />
<br />
Li et al. (2021) [https://arxiv.org/pdf/2103.15212.pdf pdf]<br />
<br />
* Model predicting student achievement on the standardized examination PISA<br />
* Inaccuracy of the U.S.-trained model was greater for students from countries with lower scores of national development (e.g. Indonesia, Vietnam, Moldova)<br />
<br />
<br />
Wang et al. (2018) [https://www.researchgate.net/publication/336009443_Monitoring_the_performance_of_human_and_automated_scores_for_spoken_responses pdf]<br />
<br />
* Automated scoring model for evaluating English spoken responses<br />
* SpeechRater gave a significantly lower score than human raters for German<br />
* SpeechRater scored in favor of Chinese group, with H1-rater scores higher than mean<br />
<br />
<br />
Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring pdf]<br />
<br />
* Automated scoring models for evaluating English essays, or e-rater<br />
* E-rater gave significantly higher score for students from China and South Korea than 14 other countries when assessing independent writing task in Test of English as a Foreign Language (TOEFL)<br />
* E-rater gave slightly higher scores for GRE analytical writing, both argument and issue prompts, by students from China whose written responses tended to be the longest and below average on grammar, usage and mechanics<br />
<br />
<br />
<br />
Bridgeman, Trapani, and Attali (2012) [https://www.researchgate.net/publication/233291671_Comparison_of_Human_and_Machine_Scoring_of_Essays_Differences_by_Gender_Ethnicity_and_Country pdf]<br />
<br />
* A later version of automated scoring models for evaluating English essays, or e-rater<br />
* E-rater gave slightly higher scores for test-takers from Chinese speakers (Mainland China, Taiwan, Hong Kong) and Korean speakers when assessing written responses to independent prompt in Test of English as a Foreign Language (TOEFL)<br />
* E-rater gave slightly lower scores for Arabic and Hindi speakers when assessing their written responses to independent prompt in TOEFL<br />
* E-rater gave significantly higher scores for test-takers from Mainland China than from Taiwan, Korea and Japan when assessing their GRE writings which tended to be below average on grammar, usage, and mechanics but longest response</div>Seiyonhttps://www.pcla.wiki/index.php?title=Gender:_Male/Female&diff=285Gender: Male/Female2022-05-18T20:24:54Z<p>Seiyon: </p>
<hr />
<div>Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf]<br />
* Models predicting student retention in an online college program<br />
* J48 decision trees achieved significantly lower Kappa but higher AUC for male students than female students<br />
* JRip decision rules achieved much lower Kappa and AUC for male students than female students<br />
<br />
<br />
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]<br />
* Models predicting student's high school dropout<br />
* The decision trees showed very minor differences in AUC between female and male students<br />
<br />
<br />
Hu and Rangwala (2020) [https://files.eric.ed.gov/fulltext/ED608050.pdf pdf]<br />
* Models predicting if a college student will fail in a course<br />
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against male students, performing particularly better for Psychology course.<br />
* Other models (Logistic Regression and Rawlsian Fairness) performed far worse for male students, performing particularly worse in Computer Science and Electrical Engineering.<br />
<br />
<br />
Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf]<br />
* Models predicting six-year college graduation<br />
* False negatives rates were greater for male students than female students when SVM, Logistic Regression, and SGD were used<br />
<br />
<br />
Gardner, Brooks and Baker (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/LAK_PAPER97_CAMERA.pdf pdf]<br />
* Model predicting MOOC dropout, specifically through slicing analysis<br />
* Some algorithms studied performed worse for female students than male students, particularly in courses with 45% or less male presence<br />
<br />
<br />
Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]<br />
* Model predicting course outcome<br />
* Marginal differences were found for prediction quality and in overall proportion of predicted pass between groups<br />
* Inconsistent in direction between algorithms.<br />
<br />
<br />
Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf]<br />
* Models predicting college success (or median grade or above)<br />
* Random forest algorithms performed significantly worse for male students than female students<br />
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values<br />
<br />
<br />
Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]<br />
* Model predicting undergraduate short-term (course grades) and long-term (average GPA) success<br />
* Female students were inaccurately predicted to achieve greater short-term and long-term success than male students.<br />
* The fairness of models improved when a combination of institutional and click data was used in the model<br />
<br />
<br />
Yu and colleagues (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]<br />
* Models predicting college dropout for students in residential and fully online program<br />
* Whether the socio-demographic information was included or not, the model showed worse true negative rates for male students, and worse accuracy if they are studying online<br />
* The model showed better recall for male students, especially for those studying in person<br />
* The difference in recall and true negative rates were lower, and thus fairer, for male students studying online if their socio-demographic information was not included in the model<br />
<br />
<br />
Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]<br />
* Models predicting course outcome of students in a virtual learning environment (VLE)<br />
* More male students were predicted to pass the course than female students, but this overestimation was fairly small and not consistent across different algorithms<br />
* Among the algorithms, Naive Bayes had the lowest normalized mutual information value and the highest ABROCA value<br />
<br />
<br />
Bridgeman et al. (2009) <br />
[https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring pdf]<br />
<br />
* Automated scoring models for evaluating English essays, or e-rater<br />
* E-Rater system performed comparably accurately for male and female students when assessing their 11th grade essays<br />
<br />
<br />
Bridgeman, Trapani, and Attali (2012) [https://www.researchgate.net/publication/233291671_Comparison_of_Human_and_Machine_Scoring_of_Essays_Differences_by_Gender_Ethnicity_and_Country pdf]<br />
* A later version of automated scoring models for evaluating English essays, or e-rater<br />
* The score difference between human rater and e-rater was marginal when written responses to GRE issue prompt by male and female test-takers were compared<br />
* The difference in score was significantly greater when assessing written responses to GRE argument prompt, as e-rater gave lower score for male test-takers, particularly for African American, American Indian, and Hispanic males, when assessing written responses to GRE argument prompt</div>Seiyonhttps://www.pcla.wiki/index.php?title=Asian/Asian-American_Learners_in_North_America&diff=284Asian/Asian-American Learners in North America2022-05-18T20:20:33Z<p>Seiyon: </p>
<hr />
<div>Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]<br />
* Models predicting student's high school dropout<br />
* The decision trees showed little difference in AUC among White, Black, Hispanic, Asian, American Indian and Alaska Native, and Native Hawaiian and Pacific Islander.<br />
<br />
<br />
Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring pdf]<br />
* Automated scoring models for evaluating English essays, or e-rater <br />
* E-Rater gave significantly better scores for 11th grade essays written by Hispanic students and Asian-American students than White students<br />
<br />
<br />
<br />
Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf]<br />
* Models predicting college success (or median grade or above)<br />
* Random forest algorithms performed significantly worse for underrepresented minority students (URM; American Indian, Black, Hawaiian or Pacific Islander, Hispanic, and Multicultural) than non-URM students (White and Asian)<br />
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values</div>Seiyonhttps://www.pcla.wiki/index.php?title=Automated_Essay_Scoring&diff=283Automated Essay Scoring2022-05-18T20:19:58Z<p>Seiyon: </p>
<hr />
<div>Bridgeman, Trapani, and Attali (2009) [https://d1wqtxts1xzle7.cloudfront.net/52116920/AERA_NCME_2009_Bridgeman-with-cover-page-v2.pdf?Expires=1652902797&Signature=PiItDCa9BN8Mey1wuXaa2Lo0uCVp4I245Xx14GPKAthX7YREEF2wT8HEjmwwiSL~rn9tB21kL6zYIFrL3b44oyHfw5ywE1GQmGSeLCcK7f0WyfQUzYDXbQqWzJCInX9t3QvPKN05XK37iAn7SrEI5iN2HQcYmeF3B0fhtLdszf2-5TtPfT1dNwtdo8A30Z4xxtt~gIBQwXYtNhtPbv3idaZPUZe3lZf6kGGweKj3q9-yuyPc8VkXg7Tc72AOUlQqpjb2TDPH7vze1xLbg3Q1~YxYJnHWhvIINkbAadTLitQZvKhJOmV-it2pNEqqzrnwwl5~gqcgX180xVd89z81iQ__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA pdf]<br />
<br />
* Automated scoring models for evaluating English essays, or e-rater<br />
* E-Rater gave significantly better scores for 11th grade essays written by Hispanic students and Asian-American students than White students<br />
* E-Rater gave significantly better scores for TOEFL essays (independent task) written by speakers of Chinese and Korean<br />
* E-Rater correlated poorly with human rater and give better scores for GRE essays (both issue and argument prompts) written by Chinese speakers<br />
* E-Rater system performed comparably accurately for male and female students when assessing their 11th grade essays, TOEFL, and GRE writings<br />
<br />
<br />
<br />
Bridgeman, Trapani, and Attali (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf]<br />
<br />
* A later version of automated scoring models for evaluating English essays, or e-rater<br />
* E-rater gave particularly lower score for African-American, and American-Indian males, when assessing written responses to issue prompt in GRE<br />
* The score was significantly lower when e-rater was assessing GRE written responses to argument prompt by African-American test-takers, both males and females.<br />
<br />
* E-rater gave slightly higher scores for test-takers from Chinese speakers (Mainland China, Taiwan, Hong Kong) and Korean speakers when assessing written responses to independent prompt in Test of English as a Foreign Language (TOEFL)<br />
* E-rater gave slightly lower scores for Arabic, Hindi, and Spanish speakers when assessing their written responses to independent prompt in TOEFL<br />
* E-rater gave significantly higher scores for test-takers from Mainland China than from Taiwan, Korea and Japan when assessing their GRE writings which tended to be below average on grammar, usage, and mechanics but longest response<br />
<br />
* The score difference between human rater and e-rater was marginal when written responses to GRE issue prompt by male and female test-takers were compared<br />
* The difference in score was significantly greater when assessing written responses to GRE argument prompt, as e-rater gave lower score for male test-takers, particularly for African American, American Indian, and Hispanic males, when assessing written responses to GRE argument prompt<br />
<br />
<br />
<br />
Ramineni & Williamson (2018) [https://onlinelibrary.wiley.com/doi/10.1002/ets2.12192 pdf]<br />
<br />
* Revised automated scoring engine for assessing GSE essay<br />
<br />
* E-rater gave African American test-takers significantly lower scores than human raters when assessing their written responses to argument prompts<br />
* The shorter essays written by African American test-takers were more likely to receive lower scores as showing weakness in content and organization<br />
<br />
<br />
<br />
<br />
Wang et al. (2018) [https://www.researchgate.net/publication/336009443_Monitoring_the_performance_of_human_and_automated_scores_for_spoken_responses pdf]<br />
*Automated scoring model for evaluating English spoken responses<br />
*SpeechRater gave a significantly lower score than human raters for German<br />
*SpeechRater scored in favor of Chinese group, with H1-rater scores higher than mean</div>Seiyonhttps://www.pcla.wiki/index.php?title=Automated_Essay_Scoring&diff=282Automated Essay Scoring2022-05-18T20:19:52Z<p>Seiyon: </p>
<hr />
<div>Bridgeman, Trapani, and Attali (2009) [https://d1wqtxts1xzle7.cloudfront.net/52116920/AERA_NCME_2009_Bridgeman-with-cover-page-v2.pdf?Expires=1652902797&Signature=PiItDCa9BN8Mey1wuXaa2Lo0uCVp4I245Xx14GPKAthX7YREEF2wT8HEjmwwiSL~rn9tB21kL6zYIFrL3b44oyHfw5ywE1GQmGSeLCcK7f0WyfQUzYDXbQqWzJCInX9t3QvPKN05XK37iAn7SrEI5iN2HQcYmeF3B0fhtLdszf2-5TtPfT1dNwtdo8A30Z4xxtt~gIBQwXYtNhtPbv3idaZPUZe3lZf6kGGweKj3q9-yuyPc8VkXg7Tc72AOUlQqpjb2TDPH7vze1xLbg3Q1~YxYJnHWhvIINkbAadTLitQZvKhJOmV-it2pNEqqzrnwwl5~gqcgX180xVd89z81iQ__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA pdf]<br />
<br />
* Automated scoring models for evaluating English essays, or e-rater<br />
* E-Rater gave significantly better scores for 11th grade essays written by Hispanic students and Asian-American students than White students<br />
* E-Rater gave significantly better scores for TOEFL essays (independent task) written by speakers of Chinese and Korean<br />
* E-Rater correlated poorly with human rater and give better scores for GRE essays (both issue and argument prompts) written by Chinese speakers<br />
* E-Rater system performed comparably accurately for male and female students when assessing their 11th grade essays, TOEFL, and GRE writings<br />
<br />
Bridgeman, Trapani, and Attali (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf]<br />
<br />
* A later version of automated scoring models for evaluating English essays, or e-rater<br />
* E-rater gave particularly lower score for African-American, and American-Indian males, when assessing written responses to issue prompt in GRE<br />
* The score was significantly lower when e-rater was assessing GRE written responses to argument prompt by African-American test-takers, both males and females.<br />
<br />
* E-rater gave slightly higher scores for test-takers from Chinese speakers (Mainland China, Taiwan, Hong Kong) and Korean speakers when assessing written responses to independent prompt in Test of English as a Foreign Language (TOEFL)<br />
* E-rater gave slightly lower scores for Arabic, Hindi, and Spanish speakers when assessing their written responses to independent prompt in TOEFL<br />
* E-rater gave significantly higher scores for test-takers from Mainland China than from Taiwan, Korea and Japan when assessing their GRE writings which tended to be below average on grammar, usage, and mechanics but longest response<br />
<br />
* The score difference between human rater and e-rater was marginal when written responses to GRE issue prompt by male and female test-takers were compared<br />
* The difference in score was significantly greater when assessing written responses to GRE argument prompt, as e-rater gave lower score for male test-takers, particularly for African American, American Indian, and Hispanic males, when assessing written responses to GRE argument prompt<br />
<br />
<br />
<br />
Ramineni & Williamson (2018) [https://onlinelibrary.wiley.com/doi/10.1002/ets2.12192 pdf]<br />
<br />
* Revised automated scoring engine for assessing GSE essay<br />
<br />
* E-rater gave African American test-takers significantly lower scores than human raters when assessing their written responses to argument prompts<br />
* The shorter essays written by African American test-takers were more likely to receive lower scores as showing weakness in content and organization<br />
<br />
<br />
<br />
<br />
Wang et al. (2018) [https://www.researchgate.net/publication/336009443_Monitoring_the_performance_of_human_and_automated_scores_for_spoken_responses pdf]<br />
*Automated scoring model for evaluating English spoken responses<br />
*SpeechRater gave a significantly lower score than human raters for German<br />
*SpeechRater scored in favor of Chinese group, with H1-rater scores higher than mean</div>Seiyonhttps://www.pcla.wiki/index.php?title=Latino/Latina/Latinx/Hispanic_Learners_in_North_America&diff=281Latino/Latina/Latinx/Hispanic Learners in North America2022-05-18T19:32:25Z<p>Seiyon: </p>
<hr />
<div>Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf]<br />
* Models predicting six-year college graduation<br />
* False negatives rates were greater for Latino students when Decision Tree and Random Forest yielded was used<br />
* White students had higher false positive rates across all models, Decision Tree, SVM, Logistic Regression, Random Forest, and SGD<br />
<br />
<br />
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]<br />
* Models predicting student's high school dropout<br />
* The decision trees showed little difference in AUC among White, Black, Hispanic, Asian, American Indian and Alaska Native, and Native Hawaiian and Pacific Islander.<br />
<br />
<br />
Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf]<br />
* Models predicting college success (or median grade or above)<br />
* Random forest algorithms performed significantly worse for underrepresented minority students (URM; American Indian, Black, Hawaiian or Pacific Islander, Hispanic, and Multicultural) than non-URM students (White and Asian)<br />
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values<br />
<br />
<br />
Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]<br />
* Model predicting undergraduate short-term (course grades) and long-term (average GPA) success<br />
* Hispanic students were inaccurately predicted to perform worse for both short-term and long-term<br />
* The fairness of models improved when either click or a combination of click and survey data, and not institutional data, was included in the model<br />
<br />
<br />
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]<br />
* Models predicting college dropout for students in residential and fully online program<br />
* Whether the socio-demographic information was included or not, the model showed worse true negative rates for students who are underrepresented minority (URM; or not White or Asian), and worse accuracy if they are studying in person <br />
* The model showed better recall for URM students<br />
<br />
<br />
Bridgeman et al. (2009) [https://d1wqtxts1xzle7.cloudfront.net/52116920/AERA_NCME_2009_Bridgeman-with-cover-page-v2.pdf?Expires=1652902797&Signature=PiItDCa9BN8Mey1wuXaa2Lo0uCVp4I245Xx14GPKAthX7YREEF2wT8HEjmwwiSL~rn9tB21kL6zYIFrL3b44oyHfw5ywE1GQmGSeLCcK7f0WyfQUzYDXbQqWzJCInX9t3QvPKN05XK37iAn7SrEI5iN2HQcYmeF3B0fhtLdszf2-5TtPfT1dNwtdo8A30Z4xxtt~gIBQwXYtNhtPbv3idaZPUZe3lZf6kGGweKj3q9-yuyPc8VkXg7Tc72AOUlQqpjb2TDPH7vze1xLbg3Q1~YxYJnHWhvIINkbAadTLitQZvKhJOmV-it2pNEqqzrnwwl5~gqcgX180xVd89z81iQ__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA pdf]<br />
* Automated scoring models for evaluating English essays, or e-rater<br />
* E-Rater gave significantly better scores for 11th grade essays written by Hispanic students and Asian-American students than White students</div>Seiyonhttps://www.pcla.wiki/index.php?title=Automated_Essay_Scoring&diff=280Automated Essay Scoring2022-05-18T18:40:27Z<p>Seiyon: </p>
<hr />
<div>Bridgeman, Trapani, and Attali (2009) [https://d1wqtxts1xzle7.cloudfront.net/52116920/AERA_NCME_2009_Bridgeman-with-cover-page-v2.pdf?Expires=1652902797&Signature=PiItDCa9BN8Mey1wuXaa2Lo0uCVp4I245Xx14GPKAthX7YREEF2wT8HEjmwwiSL~rn9tB21kL6zYIFrL3b44oyHfw5ywE1GQmGSeLCcK7f0WyfQUzYDXbQqWzJCInX9t3QvPKN05XK37iAn7SrEI5iN2HQcYmeF3B0fhtLdszf2-5TtPfT1dNwtdo8A30Z4xxtt~gIBQwXYtNhtPbv3idaZPUZe3lZf6kGGweKj3q9-yuyPc8VkXg7Tc72AOUlQqpjb2TDPH7vze1xLbg3Q1~YxYJnHWhvIINkbAadTLitQZvKhJOmV-it2pNEqqzrnwwl5~gqcgX180xVd89z81iQ__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA pdf]<br />
<br />
* E-rater gave significantly higher score for 11th grade essays written by Asian American and Hispanic students, particularly, Hispanic female students<br />
* The score difference between human rater and e-rater was significantly smaller for 11th grade essays written by White and African American students.<br />
* E-rater gave slightly lower score for GRE essays (argument and issue) written by Black test-takers while e-rated scores were higher for Asian test-takers in the U.S<br />
<br />
* E-rater gave significantly higher score for students from China and South Korea than 14 other countries when assessing independent writing task in Test of English as a Foreign Language (TOEFL)<br />
* E-rater gave slightly higher scores for GRE analytical writing, both argument and issue prompts, by students from China whose written responses tended to be the longest and below average on grammar, usage and mechanics<br />
<br />
* E-rater performed accurately for male and female students when assessing 11th grade English essays and independent writing task in Test of English as a Foreign Language<br />
* While feature-level score differences were identified across gender and ethnic groups (e.g. e-rater gave better scores for word length and vocabulary level but less on grammar and mechanics when grading 11th grade essays written by Asian American female students), the authors called for larger samples to confirm the findings<br />
<br />
<br />
<br />
Bridgeman, Trapani, and Attali (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf]<br />
<br />
* A later version of automated scoring models for evaluating English essays, or e-rater<br />
* E-rater gave particularly lower score for African-American, and American-Indian males, when assessing written responses to issue prompt in GRE<br />
* The score was significantly lower when e-rater was assessing GRE written responses to argument prompt by African-American test-takers, both males and females.<br />
<br />
* E-rater gave slightly higher scores for test-takers from Chinese speakers (Mainland China, Taiwan, Hong Kong) and Korean speakers when assessing written responses to independent prompt in Test of English as a Foreign Language (TOEFL)<br />
* E-rater gave slightly lower scores for Arabic, Hindi, and Spanish speakers when assessing their written responses to independent prompt in TOEFL<br />
* E-rater gave significantly higher scores for test-takers from Mainland China than from Taiwan, Korea and Japan when assessing their GRE writings which tended to be below average on grammar, usage, and mechanics but longest response<br />
<br />
* The score difference between human rater and e-rater was marginal when written responses to GRE issue prompt by male and female test-takers were compared<br />
* The difference in score was significantly greater when assessing written responses to GRE argument prompt, as e-rater gave lower score for male test-takers, particularly for African American, American Indian, and Hispanic males, when assessing written responses to GRE argument prompt<br />
<br />
<br />
<br />
Ramineni & Williamson (2018) [https://onlinelibrary.wiley.com/doi/10.1002/ets2.12192 pdf]<br />
<br />
* Revised automated scoring engine for assessing GSE essay<br />
<br />
* E-rater gave African American test-takers significantly lower scores than human raters when assessing their written responses to argument prompts<br />
* The shorter essays written by African American test-takers were more likely to receive lower scores as showing weakness in content and organization<br />
<br />
<br />
<br />
<br />
Wang et al. (2018) [https://www.researchgate.net/publication/336009443_Monitoring_the_performance_of_human_and_automated_scores_for_spoken_responses pdf]<br />
*Automated scoring model for evaluating English spoken responses<br />
*SpeechRater gave a significantly lower score than human raters for German<br />
*SpeechRater scored in favor of Chinese group, with H1-rater scores higher than mean</div>Seiyonhttps://www.pcla.wiki/index.php?title=At-risk/Dropout/Stopout/Graduation_Prediction&diff=279At-risk/Dropout/Stopout/Graduation Prediction2022-05-18T14:37:22Z<p>Seiyon: </p>
<hr />
<div>Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf]<br />
* Models predicting student retention in an online college program<br />
* J48 decision trees achieved much lower Kappa and AUC for Black students than White students<br />
* J48 decision trees achieved significantly lower Kappa but higher AUC for male students than female students<br />
* JRip decision rules achieved almost identical Kappa and AUC for Black students and White students<br />
* JRip decision trees achieved much lower Kappa and AUC for male students than female students<br />
<br />
<br />
Hu and Rangwala (2020) [https://files.eric.ed.gov/fulltext/ED608050.pdf pdf]<br />
* Models predicting if a college student will fail in a course<br />
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against African-American students, while other models (particularly Logistic Regression and Rawlsian Fairness) performed far worse<br />
* The level of bias was inconsistent across courses, with MCCM prediction showing the least bias for Psychology and the greatest bias for Computer Science<br />
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against male students, performing particularly better for Psychology course.<br />
* Other models (Logistic Regression and Rawlsian Fairness) performed far worse for male students, performing particularly worse in Computer Science and Electrical Engineering.<br />
<br />
<br />
Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf]<br />
* Models predicting six-year college graduation<br />
* False negatives rates were greater for Latino students when Decision Tree and Random Forest yielded was used<br />
* White students had higher false positive rates across all models, Decision Tree, SVM, Logistic Regression, Random Forest, and SGD<br />
* False negatives rates were greater for male students than female students when SVM, Logistic Regression, and SGD were used<br />
<br />
<br />
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]<br />
* Models predicting student's high school dropout<br />
* The decision trees showed little difference in AUC among White, Black, Hispanic, Asian, American Indian and Alaska Native, and Native Hawaiian and Pacific Islander.<br />
* The decision trees showed very minor differences in AUC between female and male students<br />
<br />
<br />
Gardner, Brooks and Baker (2019) [[https://www.upenn.edu/learninganalytics/ryanbaker/LAK_PAPER97_CAMERA.pdf pdf]]<br />
* Model predicting MOOC dropout, specifically through slicing analysis<br />
* Some algorithms performed worse for female students than male students, particularly in courses with 45% or less male presence<br />
<br />
<br />
Baker et al. (2020) [[https://www.upenn.edu/learninganalytics/ryanbaker/BakerBerningGowda.pdf pdf]]<br />
* Model predicting student graduation and SAT scores for military-connected students<br />
* For prediction of graduation, algorithms applying across population resulted an AUC of 0.60, degrading from their original performance of 70% or 71% to chance.<br />
* For prediction of SAT scores, algorithms applying across population resulted in a Spearman's ρ of 0.42 and 0.44, degrading a third from their original performance to chance.<br />
<br />
<br />
Kai et al. (2017) [https://files.eric.ed.gov/fulltext/ED596601.pdf pdf]<br />
* Models predicting student retention in an online college program<br />
* J-48 decision trees achieved much higher Kappa and AUC for students whose parents did not attend college than those whose parents did<br />
* J-Rip decision rules achieved much higher Kappa and AUC for students whose parents did not attended college than those whose parents did<br />
<br />
<br />
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]<br />
* Models predicting college dropout for students in residential and fully online program<br />
* The model showed better recall for students who are under-represented minority (URM; not White or Asian), male, first-generation, or with greater financial needs, especially if they are studying in person<br />
* Whether the socio-demographic information was included or not, the model showed worse accuracy and true negative rates for residential students who are under-represented minority (URM; not White or Asian), male, first-generation, or with greater financial needs<br />
* Both accuracy and true negative rates were better for students who are first-generation, or with greater financial needs if they were studying online</div>Seiyonhttps://www.pcla.wiki/index.php?title=Latino/Latina/Latinx/Hispanic_Learners_in_North_America&diff=278Latino/Latina/Latinx/Hispanic Learners in North America2022-05-18T14:26:27Z<p>Seiyon: </p>
<hr />
<div>Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf]<br />
* Models predicting six-year college graduation<br />
* False negatives rates were greater for Latino students when Decision Tree and Random Forest yielded was used<br />
* White students had higher false positive rates across all models, Decision Tree, SVM, Logistic Regression, Random Forest, and SGD<br />
<br />
<br />
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]<br />
* Models predicting student's high school dropout<br />
* The decision trees showed little difference in AUC among White, Black, Hispanic, Asian, American Indian and Alaska Native, and Native Hawaiian and Pacific Islander.<br />
<br />
<br />
Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf]<br />
* Models predicting college success (or median grade or above)<br />
* Random forest algorithms performed significantly worse for underrepresented minority students (URM; American Indian, Black, Hawaiian or Pacific Islander, Hispanic, and Multicultural) than non-URM students (White and Asian)<br />
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values<br />
<br />
<br />
Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]<br />
* Model predicting undergraduate short-term (course grades) and long-term (average GPA) success<br />
* Hispanic students were inaccurately predicted to perform worse for both short-term and long-term<br />
* The fairness of models improved when either click or a combination of click and survey data, and not institutional data, was included in the model<br />
<br />
<br />
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]<br />
* Models predicting college dropout for students in residential and fully online program<br />
* Whether the socio-demographic information was included or not, the model showed worse true negative rates for students who are underrepresented minority (URM; or not White or Asian), and worse accuracy if they are studying in person <br />
* The model showed better recall for URM students<br />
<br />
<br />
Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring pdf]<br />
* Automated scoring models for evaluating English essays, or e-rater <br />
* E-rater gave significantly higher score for 11th grade essays written by Asian American and Hispanic students, particularly, Hispanic female students<br />
* The score difference between human rater and e-rater was significantly smaller for 11th grade essays written by White and African American students</div>Seiyonhttps://www.pcla.wiki/index.php?title=Black/African-American_Learners_in_North_America&diff=277Black/African-American Learners in North America2022-05-18T14:25:36Z<p>Seiyon: </p>
<hr />
<div>Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf]<br />
* Models predicting student retention in an online college program<br />
* J48 decision trees achieved much lower Kappa and AUC for Black students than White students<br />
* JRip decision rules achieved almost identical Kappa and AUC for Black students and White students<br />
<br />
<br />
Hu and Rangwala (2020) [https://files.eric.ed.gov/fulltext/ED608050.pdf pdf]<br />
* Models predicting if a college student will fail in a course<br />
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against African-American students, while other models (particularly Logistic Regression and Rawlsian Fairness) performed far worse<br />
* The level of bias was inconsistent across courses, with MCCM prediction showing the least bias for Psychology and the greatest bias for Computer Science<br />
<br />
<br />
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]<br />
* Models predicting student's high school dropout<br />
* The decision trees showed little difference in AUC among White, Black, Hispanic, Asian, American Indian and Alaska Native, and Native Hawaiian and Pacific Islander.<br />
<br />
<br />
Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf]<br />
* Models predicting college success (or median grade or above)<br />
* Random forest algorithms performed significantly worse for underrepresented minority students (URM; American Indian, Black, Hawaiian or Pacific Islander, Hispanic, and Multicultural) than non-URM students (White and Asian)<br />
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values<br />
<br />
<br />
Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]<br />
* Model predicting undergraduate short-term (course grades) and long-term (average GPA) success<br />
* Black students were inaccurately predicted to perform worse for both short-term and long-term<br />
* The fairness of models improved when either click or a combination of click and survey data, and not institutional data, was included in the model<br />
<br />
<br />
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]<br />
* Models predicting college dropout for students in residential and fully online program<br />
* Whether the socio-demographic information was included or not, the model showed worse true negative rates for students who are underrepresented minority (URM; or not White or Asian), and worse accuracy if they are studying in person <br />
* The model showed better recall for URM students<br />
<br />
<br />
Ramineni & Williamson (2018) [https://files.eric.ed.gov/fulltext/EJ1202928.pdf pdf]<br />
* Revised automated scoring engine for assessing GRE essay<br />
* E-rater gave African American test-takers significantly lower scores than human raters when assessing their written responses to argument prompts<br />
* The shorter essays written by African American test-takers were more likely to receive lower scores as showing weakness in content and organization<br />
<br />
<br />
Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring pdf]<br />
* Automated scoring models for evaluating English essays, or e-rater <br />
* E-rater gave significantly higher score for 11th grade essays written by Asian American and Hispanic students, particularly, Hispanic female students<br />
* The score difference between human rater and e-rater was significantly smaller for 11th grade essays written by White and African American students<br />
* E-rater gave slightly lower score for GRE essays (argument and issue) written by Black test-takers while e-rated scores were higher for Asian test-takers in the U.S<br />
<br />
<br />
Bridgeman, Trapani, and Attali (2012) [https://www.researchgate.net/publication/233291671_Comparison_of_Human_and_Machine_Scoring_of_Essays_Differences_by_Gender_Ethnicity_and_Country pdf]<br />
* A later version of automated scoring models for evaluating English essays, or e-rater<br />
* E-rater gave slightly lower scores for African-American, Hispanic, and American-Indian test-takers, particularly lower for African-American, and American-Indian males, when assessing written responses to issue prompt in GRE<br />
* The score was significantly lower when e-rater was assessing GRE written responses to argument prompt by African-American test-takers</div>Seiyon