Difference between revisions of "Gender: Male/Female"

From Penn Center for Learning Analytics Wiki
Jump to navigation Jump to search
(added Sha et al 2022)
(25 intermediate revisions by 3 users not shown)
Line 3: Line 3:
* J48 decision trees achieved significantly lower Kappa but higher AUC for male students than female students
* J48 decision trees achieved significantly lower Kappa but higher AUC for male students than female students
* JRip decision rules achieved much lower Kappa and AUC for male students than female students
* JRip decision rules achieved much lower Kappa and AUC for male students than female students


Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]
Line 20: Line 21:




Gardner, Brooks and Baker (2019) [[https://www.upenn.edu/learninganalytics/ryanbaker/LAK_PAPER97_CAMERA.pdf pdf]]
Gardner, Brooks and Baker (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/LAK_PAPER97_CAMERA.pdf pdf]
* Model predicting MOOC dropout, specifically through slicing analysis
* Model predicting MOOC dropout, specifically through slicing analysis
* Some algorithms studied performed worse for female students than male students, particularly in courses with 45% or less male presence
* Some algorithms studied performed worse for female students than male students, particularly in courses with 45% or less male presence




Riazy et al. (2020) [[https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]]
Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]
* Model predicting course outcome
* Model predicting course outcome
* Marginal differences were found for prediction quality and in overall proportion of predicted pass between groups
* Marginal differences were found for prediction quality and in overall proportion of predicted pass between groups
Line 31: Line 32:




Lee and Kizilcec (2020) [[https://arxiv.org/pdf/2007.00088.pdf pdf]]
Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf]
* Models predicting college success (or median grade or above)
* Models predicting college success (or median grade or above)
* Random forest algorithms performed significantly worse for male students than female students
* Random forest algorithms performed significantly worse for male students than female students
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values




Yu et al. (2020) [[https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]]
Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]
* Model predicting undergraduate short-term (course grades) and long-term (average GPA) success
* Model predicting undergraduate short-term (course grades) and long-term (average GPA) success
* Female students were inaccurately predicted to achieve greater short-term and long-term success than male students.
* Female students were inaccurately predicted to achieve greater short-term and long-term success than male students.
Line 43: Line 44:




Yu and colleagues (2021) [[https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]]
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]
* Models predicting college dropout for students in residential and fully online program
* Models predicting college dropout for students in residential and fully online program
* Whether the protected attributed were included or not, the models had worse true negative rates but better recall for male students
* Whether the socio-demographic information was included or not, the model showed worse true negative rates and worse accuracy for male students
* The model was worse for male students studying in online program in terms of true negative rates, recall and accuracy.
* The model showed better recall for male students, especially for those studying in person
* The difference in recall and true negative rates were lower, and thus fairer, for male students studying online if their socio-demographic information was not included in the model




Riazy et al. (2020) [[pdf](https://www.scitepress.org/Papers/2020/93241/93241.pdf)]
Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]
* Models predicting course outcome of students in a virtual learning environment (VLE)
* Models predicting course outcome of students in a virtual learning environment (VLE)
* More male students were predicted to pass the course than female students, but  this overestimation was fairly small and not consistent across different algorithms
* More male students were predicted to pass the course than female students, but  this overestimation was fairly small and not consistent across different algorithms
* Among the algorithms, Naive Bayes had the lowest normalized mutual information value and the highest ABROCA value
* Among the algorithms, Naive Bayes had the lowest normalized mutual information value and the highest ABROCA value
Bridgeman et al. (2009)
[https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring pdf]
* Automated scoring models for evaluating English essays, or e-rater
* E-Rater system performed comparably accurately for male and female students when assessing their 11th grade essays
Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf]
* A later version of automated scoring models for evaluating English essays, or e-rater
* E-Rater system correlated comparably well with human rater when assessing TOEFL and GRE essays written by male and female students
Verdugo et al. (2022) [https://dl.acm.org/doi/abs/10.1145/3506860.3506902 pdf]
* An algorithm predicting dropout from university after the first year
* Several algorithms achieved better AUC for male than female students; results were mixed for F1.
Zhang et al. (in press)
* Detecting student use of self-regulated learning (SRL) in mathematical problem-solving process
* For each SRL-related detector, relatively small differences in AUC were observed across gender groups.
* No gender group consistently had best-performing detectors
Rzepka et al. (2022) [https://www.insticc.org/node/TechnicalProgram/CSEDU/2022/presentationDetails/109621 pdf]
* Models predicting whether student will quit spelling learning activity without completing
* Multiple algorithms have slightly better false positive rates and AUC ROC for male students than female students, but equivalent performance on multiple other metrics.
Li, Xing, & Leite (2022) [https://dl.acm.org/doi/pdf/10.1145/3506860.3506869?casa_token=OZmlaKB9XacAAAAA:2Bm5XYi8wh4riSmEigbHW_1bWJg0zeYqcGHkvfXyrrx_h1YUdnsLE2qOoj4aQRRBrE4VZjPrGw pdf]
* Models predicting whether two students will communicate on an online discussion forum
* Multiple fairness approaches lead to ABROCA of under 0.01 for female versus male students
Sha et al. (2021) [https://angusglchen.github.io/files/AIED2021_Lele_Assessing.pdf pdf]
* Models predicting a MOOC discussion forum post is content-relevant or content-irrelevant
* Some algorithms achieved ABROCA under 0.01 for female students versus male students,
but other algorithms (Naive Bayes) had ABROCA as high as 0.06
* Balancing the size of each group in the training set reduced ABROCA
Litman et al. (2021) [https://link.springer.com/chapter/10.1007/978-3-030-78292-4_21 html]
* Automated essay scoring models inferring text evidence usage
* All algorithms studied have less than 1% of error explained by whether student is female and male
Sha et al. (2022) [https://ieeexplore.ieee.org/abstract/document/9849852]
* Three data sets and algorithms: predicting course pass/fail (random forest), dropout (neural network), and forum post relevance (neural network)
* A range of over-sampling methods tested
* Regardless of over-sampling method used, course pass/fail performance was moderately better for males, dropout performance was slightly better for males, and forum post relevance performance was moderately better for females.

Revision as of 16:22, 31 August 2022

Kai et al. (2017) pdf

  • Models predicting student retention in an online college program
  • J48 decision trees achieved significantly lower Kappa but higher AUC for male students than female students
  • JRip decision rules achieved much lower Kappa and AUC for male students than female students


Christie et al. (2019) pdf

  • Models predicting student's high school dropout
  • The decision trees showed very minor differences in AUC between female and male students


Hu and Rangwala (2020) pdf

  • Models predicting if a college student will fail in a course
  • Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against male students, performing particularly better for Psychology course.
  • Other models (Logistic Regression and Rawlsian Fairness) performed far worse for male students, performing particularly worse in Computer Science and Electrical Engineering.


Anderson et al. (2019) pdf

  • Models predicting six-year college graduation
  • False negatives rates were greater for male students than female students when SVM, Logistic Regression, and SGD were used


Gardner, Brooks and Baker (2019) pdf

  • Model predicting MOOC dropout, specifically through slicing analysis
  • Some algorithms studied performed worse for female students than male students, particularly in courses with 45% or less male presence


Riazy et al. (2020) pdf

  • Model predicting course outcome
  • Marginal differences were found for prediction quality and in overall proportion of predicted pass between groups
  • Inconsistent in direction between algorithms.


Lee and Kizilcec (2020) pdf

  • Models predicting college success (or median grade or above)
  • Random forest algorithms performed significantly worse for male students than female students
  • The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values


Yu et al. (2020) pdf

  • Model predicting undergraduate short-term (course grades) and long-term (average GPA) success
  • Female students were inaccurately predicted to achieve greater short-term and long-term success than male students.
  • The fairness of models improved when a combination of institutional and click data was used in the model


Yu et al. (2021) pdf

  • Models predicting college dropout for students in residential and fully online program
  • Whether the socio-demographic information was included or not, the model showed worse true negative rates and worse accuracy for male students
  • The model showed better recall for male students, especially for those studying in person
  • The difference in recall and true negative rates were lower, and thus fairer, for male students studying online if their socio-demographic information was not included in the model


Riazy et al. (2020) pdf

  • Models predicting course outcome of students in a virtual learning environment (VLE)
  • More male students were predicted to pass the course than female students, but this overestimation was fairly small and not consistent across different algorithms
  • Among the algorithms, Naive Bayes had the lowest normalized mutual information value and the highest ABROCA value


Bridgeman et al. (2009) pdf

  • Automated scoring models for evaluating English essays, or e-rater
  • E-Rater system performed comparably accurately for male and female students when assessing their 11th grade essays


Bridgeman et al. (2012) pdf

  • A later version of automated scoring models for evaluating English essays, or e-rater
  • E-Rater system correlated comparably well with human rater when assessing TOEFL and GRE essays written by male and female students


Verdugo et al. (2022) pdf

  • An algorithm predicting dropout from university after the first year
  • Several algorithms achieved better AUC for male than female students; results were mixed for F1.


Zhang et al. (in press)

  • Detecting student use of self-regulated learning (SRL) in mathematical problem-solving process
  • For each SRL-related detector, relatively small differences in AUC were observed across gender groups.
  • No gender group consistently had best-performing detectors


Rzepka et al. (2022) pdf

  • Models predicting whether student will quit spelling learning activity without completing
  • Multiple algorithms have slightly better false positive rates and AUC ROC for male students than female students, but equivalent performance on multiple other metrics.


Li, Xing, & Leite (2022) pdf

  • Models predicting whether two students will communicate on an online discussion forum
  • Multiple fairness approaches lead to ABROCA of under 0.01 for female versus male students


Sha et al. (2021) pdf

  • Models predicting a MOOC discussion forum post is content-relevant or content-irrelevant
  • Some algorithms achieved ABROCA under 0.01 for female students versus male students,

but other algorithms (Naive Bayes) had ABROCA as high as 0.06

  • Balancing the size of each group in the training set reduced ABROCA


Litman et al. (2021) html

  • Automated essay scoring models inferring text evidence usage
  • All algorithms studied have less than 1% of error explained by whether student is female and male

Sha et al. (2022) [1]

  • Three data sets and algorithms: predicting course pass/fail (random forest), dropout (neural network), and forum post relevance (neural network)
  • A range of over-sampling methods tested
  • Regardless of over-sampling method used, course pass/fail performance was moderately better for males, dropout performance was slightly better for males, and forum post relevance performance was moderately better for females.