Difference between revisions of "Gender: Male/Female"

From Penn Center for Learning Analytics Wiki
Jump to navigation Jump to search
(Added Sha et al (2021))
(added Sha et al 2022)
(One intermediate revision by the same user not shown)
Line 96: Line 96:
but other algorithms (Naive Bayes) had ABROCA as high as 0.06
but other algorithms (Naive Bayes) had ABROCA as high as 0.06
* Balancing the size of each group in the training set reduced ABROCA
* Balancing the size of each group in the training set reduced ABROCA
Litman et al. (2021) [https://link.springer.com/chapter/10.1007/978-3-030-78292-4_21 html]
* Automated essay scoring models inferring text evidence usage
* All algorithms studied have less than 1% of error explained by whether student is female and male
Sha et al. (2022) [https://ieeexplore.ieee.org/abstract/document/9849852]
* Three data sets and algorithms: predicting course pass/fail (random forest), dropout (neural network), and forum post relevance (neural network)
* A range of over-sampling methods tested
* Regardless of over-sampling method used, course pass/fail performance was moderately better for males, dropout performance was slightly better for males, and forum post relevance performance was moderately better for females.

Revision as of 16:22, 31 August 2022

Kai et al. (2017) pdf

  • Models predicting student retention in an online college program
  • J48 decision trees achieved significantly lower Kappa but higher AUC for male students than female students
  • JRip decision rules achieved much lower Kappa and AUC for male students than female students


Christie et al. (2019) pdf

  • Models predicting student's high school dropout
  • The decision trees showed very minor differences in AUC between female and male students


Hu and Rangwala (2020) pdf

  • Models predicting if a college student will fail in a course
  • Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against male students, performing particularly better for Psychology course.
  • Other models (Logistic Regression and Rawlsian Fairness) performed far worse for male students, performing particularly worse in Computer Science and Electrical Engineering.


Anderson et al. (2019) pdf

  • Models predicting six-year college graduation
  • False negatives rates were greater for male students than female students when SVM, Logistic Regression, and SGD were used


Gardner, Brooks and Baker (2019) pdf

  • Model predicting MOOC dropout, specifically through slicing analysis
  • Some algorithms studied performed worse for female students than male students, particularly in courses with 45% or less male presence


Riazy et al. (2020) pdf

  • Model predicting course outcome
  • Marginal differences were found for prediction quality and in overall proportion of predicted pass between groups
  • Inconsistent in direction between algorithms.


Lee and Kizilcec (2020) pdf

  • Models predicting college success (or median grade or above)
  • Random forest algorithms performed significantly worse for male students than female students
  • The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values


Yu et al. (2020) pdf

  • Model predicting undergraduate short-term (course grades) and long-term (average GPA) success
  • Female students were inaccurately predicted to achieve greater short-term and long-term success than male students.
  • The fairness of models improved when a combination of institutional and click data was used in the model


Yu et al. (2021) pdf

  • Models predicting college dropout for students in residential and fully online program
  • Whether the socio-demographic information was included or not, the model showed worse true negative rates and worse accuracy for male students
  • The model showed better recall for male students, especially for those studying in person
  • The difference in recall and true negative rates were lower, and thus fairer, for male students studying online if their socio-demographic information was not included in the model


Riazy et al. (2020) pdf

  • Models predicting course outcome of students in a virtual learning environment (VLE)
  • More male students were predicted to pass the course than female students, but this overestimation was fairly small and not consistent across different algorithms
  • Among the algorithms, Naive Bayes had the lowest normalized mutual information value and the highest ABROCA value


Bridgeman et al. (2009) pdf

  • Automated scoring models for evaluating English essays, or e-rater
  • E-Rater system performed comparably accurately for male and female students when assessing their 11th grade essays


Bridgeman et al. (2012) pdf

  • A later version of automated scoring models for evaluating English essays, or e-rater
  • E-Rater system correlated comparably well with human rater when assessing TOEFL and GRE essays written by male and female students


Verdugo et al. (2022) pdf

  • An algorithm predicting dropout from university after the first year
  • Several algorithms achieved better AUC for male than female students; results were mixed for F1.


Zhang et al. (in press)

  • Detecting student use of self-regulated learning (SRL) in mathematical problem-solving process
  • For each SRL-related detector, relatively small differences in AUC were observed across gender groups.
  • No gender group consistently had best-performing detectors


Rzepka et al. (2022) pdf

  • Models predicting whether student will quit spelling learning activity without completing
  • Multiple algorithms have slightly better false positive rates and AUC ROC for male students than female students, but equivalent performance on multiple other metrics.


Li, Xing, & Leite (2022) pdf

  • Models predicting whether two students will communicate on an online discussion forum
  • Multiple fairness approaches lead to ABROCA of under 0.01 for female versus male students


Sha et al. (2021) pdf

  • Models predicting a MOOC discussion forum post is content-relevant or content-irrelevant
  • Some algorithms achieved ABROCA under 0.01 for female students versus male students,

but other algorithms (Naive Bayes) had ABROCA as high as 0.06

  • Balancing the size of each group in the training set reduced ABROCA


Litman et al. (2021) html

  • Automated essay scoring models inferring text evidence usage
  • All algorithms studied have less than 1% of error explained by whether student is female and male

Sha et al. (2022) [1]

  • Three data sets and algorithms: predicting course pass/fail (random forest), dropout (neural network), and forum post relevance (neural network)
  • A range of over-sampling methods tested
  • Regardless of over-sampling method used, course pass/fail performance was moderately better for males, dropout performance was slightly better for males, and forum post relevance performance was moderately better for females.