Difference between revisions of "Gender: Male/Female"
		
		
		
		
		
		Jump to navigation
		Jump to search
		
				
		
		
	
| (53 intermediate revisions by 4 users not shown) | |||
| Line 1: | Line 1: | ||
| Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf] | Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf] | ||
| * Models predicting student retention in an online college program | * Models predicting student retention in an online college program | ||
| *  | * J48 decision trees achieved significantly lower Kappa but higher AUC for male students than female students | ||
| * JRip decision  | * JRip decision rules achieved much lower Kappa and AUC for male students than female students | ||
| *  | |||
| Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf] | |||
| * Models predicting student's high school dropout | |||
| * The decision trees showed very minor differences in AUC between female and male students | |||
| Hu and Rangwala (2020) [https://files.eric.ed.gov/fulltext/ED608050.pdf pdf] | Hu and Rangwala (2020) [https://files.eric.ed.gov/fulltext/ED608050.pdf pdf] | ||
| * Models predicting if student at | * Models predicting if a college student will fail in a course | ||
| *  | * Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against male students, performing particularly better for Psychology course. | ||
| * Other models (Logistic Regression and Rawlsian Fairness) performed far worse for male students, performing particularly worse in Computer Science and Electrical Engineering. | |||
| Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf] | Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf] | ||
| * Models predicting six-year college graduation | * Models predicting six-year college graduation | ||
| *  | * False negatives rates were greater for male students than female students when SVM, Logistic Regression, and SGD were used | ||
| Gardner, Brooks and Baker (2019)  | Gardner, Brooks and Baker (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/LAK_PAPER97_CAMERA.pdf pdf] | ||
| * Model predicting MOOC dropout, specifically through slicing analysis | * Model predicting MOOC dropout, specifically through slicing analysis | ||
| * Some algorithms studied performed worse for female students than male students, particularly in courses with 45% or less male presence | * Some algorithms studied performed worse for female students than male students, particularly in courses with 45% or less male presence | ||
| Riazy et al. (2020)  | |||
| Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf] | |||
| * Model predicting course outcome | * Model predicting course outcome | ||
| *  | * Marginal differences were found for prediction quality and in overall proportion of predicted pass between groups | ||
| * Inconsistent in direction between algorithms. | * Inconsistent in direction between algorithms. | ||
| Yu et al. (2020)  | Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf] | ||
| * Model predicting undergraduate course grades and average GPA | * Models predicting college success (or median grade or above) | ||
| *  | * Random forest algorithms performed significantly worse for male students than female students | ||
| * The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values | |||
| Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf] | |||
| * Model predicting undergraduate short-term (course grades) and long-term (average GPA) success | |||
| * Female students were inaccurately predicted to achieve greater short-term and long-term success than male students. | |||
| * The fairness of models improved when a combination of institutional and click data was used in the model | |||
| Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf] | |||
| * Models predicting college dropout for students in residential and fully online program | |||
| * Whether the socio-demographic information was included or not, the model showed worse true negative rates and worse accuracy for male students | |||
| * The model showed better recall for male students, especially for those studying in person | |||
| * The difference in recall and true negative rates were lower, and thus fairer, for male students studying online if their socio-demographic information was not included in the model | |||
| Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf] | |||
| * Models predicting course outcome of students in a virtual learning environment (VLE) | |||
| * More male students were predicted to pass the course than female students, but  this overestimation was fairly small and not consistent across different algorithms | |||
| * Among the algorithms, Naive Bayes had the lowest normalized mutual information value and the highest ABROCA value | |||
| Bridgeman et al. (2009)  | |||
| [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring pdf] | |||
| * Automated scoring models for evaluating English essays, or e-rater | |||
| * E-Rater system performed comparably accurately for male and female students when assessing their 11th grade essays | |||
| Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf] | |||
| * A later version of automated scoring models for evaluating English essays, or e-rater | |||
| * E-Rater system correlated comparably well with human rater when assessing TOEFL and GRE essays written by male and female students | |||
| Verdugo et al. (2022) [https://dl.acm.org/doi/abs/10.1145/3506860.3506902 pdf] | |||
| * An algorithm predicting dropout from university after the first year | |||
| * Several algorithms achieved better AUC for male than female students; results were mixed for F1. | |||
| Zhang et al. (2022) | |||
| * Detecting student use of self-regulated learning (SRL) in mathematical problem-solving process | |||
| * For each SRL-related detector, relatively small differences in AUC were observed across gender groups.  | |||
| * No gender group consistently had best-performing detectors | |||
| Rzepka et al. (2022) [https://www.insticc.org/node/TechnicalProgram/CSEDU/2022/presentationDetails/109621 pdf] | |||
| * Models predicting whether student will quit spelling learning activity without completing | |||
| * Multiple algorithms have slightly better false positive rates and AUC ROC for male students than female students, but equivalent performance on multiple other metrics. | |||
| Li, Xing, & Leite (2022) [https://dl.acm.org/doi/pdf/10.1145/3506860.3506869?casa_token=OZmlaKB9XacAAAAA:2Bm5XYi8wh4riSmEigbHW_1bWJg0zeYqcGHkvfXyrrx_h1YUdnsLE2qOoj4aQRRBrE4VZjPrGw pdf] | |||
| * Models predicting whether two students will communicate on an online discussion forum | |||
| * Multiple fairness approaches lead to ABROCA of under 0.01 for female versus male students | |||
| Sha et al. (2021) [https://angusglchen.github.io/files/AIED2021_Lele_Assessing.pdf pdf] | |||
| * Models predicting a MOOC discussion forum post is content-relevant or content-irrelevant | |||
| * Some algorithms achieved ABROCA under 0.01 for female students versus male students, | |||
| but other algorithms (Naive Bayes) had ABROCA as high as 0.06 | |||
| * Balancing the size of each group in the training set reduced ABROCA | |||
| Litman et al. (2021) [https://link.springer.com/chapter/10.1007/978-3-030-78292-4_21 html] | |||
| * Automated essay scoring models inferring text evidence usage | |||
| * All algorithms studied have less than 1% of error explained by whether student is female and male | |||
| Sha et al. (2022) [https://ieeexplore.ieee.org/abstract/document/9849852] | |||
| * Three data sets and algorithms: predicting course pass/fail (random forest), dropout (neural network), and forum post relevance (neural network) | |||
| * A range of over-sampling methods tested | |||
| * Regardless of over-sampling method used, course pass/fail performance was moderately better for males, dropout performance was slightly better for males, and forum post relevance performance was moderately better for females. | |||
| Deho et al. (2023) [https://files.osf.io/v1/resources/5am9z/providers/osfstorage/63eaf170a3fade041fe7c9db?format=pdf&action=download&direct&version=1] | |||
| * Predicting whether course grade will be above or below 0.5 | |||
| * Better prediction for female students in some courses, better prediction for male students in other courses | |||
| Permodo et al. (2023)  [https://www.researchgate.net/publication/370001437_Difficult_Lessons_on_Social_Prediction_from_Wisconsin_Public_Schools pdf] | |||
| * Paper discusses system that predicts probabilities of on-time graduation | |||
| * DEWS prediction is comparable for males and females | |||
| Zhang et al. (2023) [https://learninganalytics.upenn.edu/ryanbaker/ISLS23_annotation%20detector_short_submit.pdf pdf] | |||
| * Models developed to detect attributes of student feedback for other students’ mathematics solutions, reflecting the presence of three constructs:1) commenting on process, 2) commenting on the answer, and 3) relating to self. | |||
| * Models have approximately equal performance for males and females. | |||
| Almoubayyed et al. (2023)[https://educationaldatamining.org/EDM2023/proceedings/2023.EDM-long-papers.18/2023.EDM-long-papers.18.pdf pdf] | |||
| * Models discovering generalization of the performance for reading comprehension ability in the context of middle school students’ usage of Carnegie Learning’s ITS for mathematics instruction | |||
| *Model trained on smaller dataset achieves greater fairness in prediction for male and female students | |||
| * For model trained on larger dataset, prediction is more accurate for female students than male students. | |||
| Chiu (2020) [https://files.eric.ed.gov/fulltext/EJ1267654.pdf pdf] | |||
| *Model identifies affective states (boredom, concentration, confusion, frustration, off task and gaming) of middle school students’ online mathematics learning in predicting their choice to study STEM in higher education. | |||
| *Model detects interaction with the ASSISTments system | |||
| *Model performs better for males (AUC =0.641 for RFPS; AUC =0.571 for LR) than female students (AUC = 0.492 for RFPS; AUC=0.535 for LR). | |||
| Cock et al.(2023) [[https://dl.acm.org/doi/abs/10.1145/3576050.3576149?casa_token=6Fjh-EUzN-gAAAAA%3AtpRMYzSAVoQFYNzwY5gwSsrnzHIlI0tUjMq6okwgdcCUmuBMVZEtn8eLO52dCtIYUbrHBV_Il9Sx pdf]] | |||
| * Model  | * Paper investigates biases in models designed to early identify middle school students at risk of failing in flipped-classroom course and open-ended exploration environment (TugLet) | ||
| *  | * Model performs worse for males in open-ended environment (FNR=0.70 for males and FNR=0.53 for females)  | ||
| * Model performs worse for females in flipped classrooms(FNR=0.56 for females and FNR=0.43 for males) | |||
Latest revision as of 00:13, 28 November 2023
Kai et al. (2017) pdf
- Models predicting student retention in an online college program
- J48 decision trees achieved significantly lower Kappa but higher AUC for male students than female students
- JRip decision rules achieved much lower Kappa and AUC for male students than female students
Christie et al. (2019) pdf
- Models predicting student's high school dropout
- The decision trees showed very minor differences in AUC between female and male students
Hu and Rangwala (2020) pdf
- Models predicting if a college student will fail in a course
- Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against male students, performing particularly better for Psychology course.
- Other models (Logistic Regression and Rawlsian Fairness) performed far worse for male students, performing particularly worse in Computer Science and Electrical Engineering.
Anderson et al. (2019) pdf
- Models predicting six-year college graduation
- False negatives rates were greater for male students than female students when SVM, Logistic Regression, and SGD were used
Gardner, Brooks and Baker (2019) pdf
- Model predicting MOOC dropout, specifically through slicing analysis
- Some algorithms studied performed worse for female students than male students, particularly in courses with 45% or less male presence
Riazy et al. (2020) pdf
- Model predicting course outcome
- Marginal differences were found for prediction quality and in overall proportion of predicted pass between groups
- Inconsistent in direction between algorithms.
Lee and Kizilcec (2020) pdf
- Models predicting college success (or median grade or above)
- Random forest algorithms performed significantly worse for male students than female students
- The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values
Yu et al. (2020) pdf
- Model predicting undergraduate short-term (course grades) and long-term (average GPA) success
- Female students were inaccurately predicted to achieve greater short-term and long-term success than male students.
- The fairness of models improved when a combination of institutional and click data was used in the model
Yu et al. (2021) pdf
- Models predicting college dropout for students in residential and fully online program
- Whether the socio-demographic information was included or not, the model showed worse true negative rates and worse accuracy for male students
- The model showed better recall for male students, especially for those studying in person
- The difference in recall and true negative rates were lower, and thus fairer, for male students studying online if their socio-demographic information was not included in the model
Riazy et al. (2020) pdf
- Models predicting course outcome of students in a virtual learning environment (VLE)
- More male students were predicted to pass the course than female students, but this overestimation was fairly small and not consistent across different algorithms
- Among the algorithms, Naive Bayes had the lowest normalized mutual information value and the highest ABROCA value
Bridgeman et al. (2009) 
pdf
- Automated scoring models for evaluating English essays, or e-rater
- E-Rater system performed comparably accurately for male and female students when assessing their 11th grade essays
Bridgeman et al. (2012) pdf
- A later version of automated scoring models for evaluating English essays, or e-rater
- E-Rater system correlated comparably well with human rater when assessing TOEFL and GRE essays written by male and female students
Verdugo et al. (2022) pdf
- An algorithm predicting dropout from university after the first year
- Several algorithms achieved better AUC for male than female students; results were mixed for F1.
Zhang et al. (2022)
- Detecting student use of self-regulated learning (SRL) in mathematical problem-solving process
- For each SRL-related detector, relatively small differences in AUC were observed across gender groups.
- No gender group consistently had best-performing detectors
Rzepka et al. (2022) pdf
- Models predicting whether student will quit spelling learning activity without completing
- Multiple algorithms have slightly better false positive rates and AUC ROC for male students than female students, but equivalent performance on multiple other metrics.
Li, Xing, & Leite (2022) pdf
- Models predicting whether two students will communicate on an online discussion forum
- Multiple fairness approaches lead to ABROCA of under 0.01 for female versus male students
Sha et al. (2021) pdf
- Models predicting a MOOC discussion forum post is content-relevant or content-irrelevant
- Some algorithms achieved ABROCA under 0.01 for female students versus male students,
but other algorithms (Naive Bayes) had ABROCA as high as 0.06
- Balancing the size of each group in the training set reduced ABROCA
Litman et al. (2021) html
- Automated essay scoring models inferring text evidence usage
- All algorithms studied have less than 1% of error explained by whether student is female and male
Sha et al. (2022) [1]
- Three data sets and algorithms: predicting course pass/fail (random forest), dropout (neural network), and forum post relevance (neural network)
- A range of over-sampling methods tested
- Regardless of over-sampling method used, course pass/fail performance was moderately better for males, dropout performance was slightly better for males, and forum post relevance performance was moderately better for females.
Deho et al. (2023) [2]
- Predicting whether course grade will be above or below 0.5
- Better prediction for female students in some courses, better prediction for male students in other courses
Permodo et al. (2023)  pdf
- Paper discusses system that predicts probabilities of on-time graduation
- DEWS prediction is comparable for males and females
Zhang et al. (2023) pdf
- Models developed to detect attributes of student feedback for other students’ mathematics solutions, reflecting the presence of three constructs:1) commenting on process, 2) commenting on the answer, and 3) relating to self.
- Models have approximately equal performance for males and females.
Almoubayyed et al. (2023)pdf
- Models discovering generalization of the performance for reading comprehension ability in the context of middle school students’ usage of Carnegie Learning’s ITS for mathematics instruction
- Model trained on smaller dataset achieves greater fairness in prediction for male and female students
- For model trained on larger dataset, prediction is more accurate for female students than male students.
Chiu (2020) pdf
- Model identifies affective states (boredom, concentration, confusion, frustration, off task and gaming) of middle school students’ online mathematics learning in predicting their choice to study STEM in higher education.
- Model detects interaction with the ASSISTments system
- Model performs better for males (AUC =0.641 for RFPS; AUC =0.571 for LR) than female students (AUC = 0.492 for RFPS; AUC=0.535 for LR).
Cock et al.(2023) [pdf]
- Paper investigates biases in models designed to early identify middle school students at risk of failing in flipped-classroom course and open-ended exploration environment (TugLet)
- Model performs worse for males in open-ended environment (FNR=0.70 for males and FNR=0.53 for females)
- Model performs worse for females in flipped classrooms(FNR=0.56 for females and FNR=0.43 for males)