<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://www.pcla.wiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Valdemar</id>
	<title>Penn Center for Learning Analytics Wiki - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://www.pcla.wiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Valdemar"/>
	<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php/Special:Contributions/Valdemar"/>
	<updated>2026-05-04T18:53:45Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.37.1</generator>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=National_Origin_or_National_Location&amp;diff=493</id>
		<title>National Origin or National Location</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=National_Origin_or_National_Location&amp;diff=493"/>
		<updated>2024-09-02T00:13:45Z</updated>

		<summary type="html">&lt;p&gt;Valdemar: Add empty lines&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Švábenský et al. (2024) [https://educationaldatamining.org/edm2024/proceedings/2024.EDM-posters.82/2024.EDM-posters.82.pdf pdf]&lt;br /&gt;
&lt;br /&gt;
*Classification models for predicting grades (worse than an average grade, “unsuccessful”, or equal/better than an average grade, “successful”)&lt;br /&gt;
*Investigating bias based on university students' regional background in the context of the Philippines&lt;br /&gt;
*Demographic groups based on 1 of 5 locations from which students accessed online courses in Canvas&lt;br /&gt;
*Bias evaluation using AUC, weighted F1-score, and MADD showed consistent results across all groups, no unfairness was observed&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Li et al. (2021) [https://arxiv.org/pdf/2103.15212.pdf pdf]&lt;br /&gt;
&lt;br /&gt;
*Model predicting student achievement on the standardized examination PISA&lt;br /&gt;
*Inaccuracy of the U.S.-trained model was greater for students from countries with lower scores of national development (e.g. Indonesia, Vietnam, Moldova)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Wang et al. (2018) [https://www.researchgate.net/publication/336009443_Monitoring_the_performance_of_human_and_automated_scores_for_spoken_responses pdf] &lt;br /&gt;
&lt;br /&gt;
*Automated scoring model for evaluating English spoken responses&lt;br /&gt;
*SpeechRater gave a significantly lower score than human raters for German students&lt;br /&gt;
*SpeechRater scored gave higher scores than human raters for Chinese students, with H1-rater scores higher than mean&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Ogan et al. (2015) [https://link.springer.com/content/pdf/10.1007/s40593-014-0034-8.pdf pdf] &lt;br /&gt;
&lt;br /&gt;
*Multi-national models predicting learning gains from student's help-seeking behavior&lt;br /&gt;
*Models built on only U.S. or combined data sets performed extremely poorly for Costa Rica&lt;br /&gt;
*Models performed better when built on and applied for the same country, except for Philippines where model built on that country which was outperformed slightly by model built on U.S. data&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf] &lt;br /&gt;
&lt;br /&gt;
*A later version of automated scoring models for evaluating English essays, or e-rater&lt;br /&gt;
*E-rater gave  better scores for test-takers from Chinese speakers (Mainland China, Taiwan, Hong Kong) and Korean speakers when assessing TOEFL (independent prompt) essay&lt;br /&gt;
*E-rater gave lower scores for Arabic, Hindi, and Spanish speakers when assessing their written responses to independent prompt in TOEFL&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring page]&lt;br /&gt;
&lt;br /&gt;
*Automated scoring models for evaluating English essays, or e-rater&lt;br /&gt;
*E-Rater gave significantly better scores than human rater for TOEFL essays (independent task) written by speakers of Chinese and Korean&lt;br /&gt;
*E-Rater correlated poorly with human rater and gave better scores than human rater for GRE essays (both issue and argument prompts) written by Chinese speakers&lt;/div&gt;</summary>
		<author><name>Valdemar</name></author>
	</entry>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=National_Origin_or_National_Location&amp;diff=492</id>
		<title>National Origin or National Location</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=National_Origin_or_National_Location&amp;diff=492"/>
		<updated>2024-09-02T00:11:49Z</updated>

		<summary type="html">&lt;p&gt;Valdemar: Reordered from latest (did not change content)&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
Švábenský et al. (2024) [https://educationaldatamining.org/edm2024/proceedings/2024.EDM-posters.82/2024.EDM-posters.82.pdf pdf]&lt;br /&gt;
&lt;br /&gt;
* Classification models for predicting grades (worse than an average grade, “unsuccessful”, or equal/better than an average grade, “successful”)&lt;br /&gt;
* Investigating bias based on university students' regional background in the context of the Philippines&lt;br /&gt;
* Demographic groups based on 1 of 5 locations from which students accessed online courses in Canvas&lt;br /&gt;
* Bias evaluation using AUC, weighted F1-score, and MADD showed consistent results across all groups, no unfairness was observed&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Li et al. (2021) [https://arxiv.org/pdf/2103.15212.pdf pdf]&lt;br /&gt;
* Model predicting student achievement on the standardized examination PISA&lt;br /&gt;
* Inaccuracy of the U.S.-trained model was greater for students from countries with lower scores of national development (e.g. Indonesia, Vietnam, Moldova)&lt;br /&gt;
&lt;br /&gt;
Wang et al. (2018) [https://www.researchgate.net/publication/336009443_Monitoring_the_performance_of_human_and_automated_scores_for_spoken_responses pdf]&lt;br /&gt;
&lt;br /&gt;
* Automated scoring model for evaluating English spoken responses&lt;br /&gt;
* SpeechRater gave a significantly lower score than human raters for German students&lt;br /&gt;
* SpeechRater scored gave higher scores than human raters for Chinese students, with H1-rater scores higher than mean&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Ogan et al. (2015) [https://link.springer.com/content/pdf/10.1007/s40593-014-0034-8.pdf pdf]&lt;br /&gt;
&lt;br /&gt;
* Multi-national models predicting learning gains from student's help-seeking behavior&lt;br /&gt;
* Models built on only U.S. or combined data sets performed extremely poorly for Costa Rica&lt;br /&gt;
* Models performed better when built on and applied for the same country, except for Philippines where model built on that country which was outperformed slightly by model built on U.S. data&amp;lt;br /&amp;gt;&lt;br /&gt;
Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf]&lt;br /&gt;
&lt;br /&gt;
* A later version of automated scoring models for evaluating English essays, or e-rater&lt;br /&gt;
* E-rater gave  better scores for test-takers from Chinese speakers (Mainland China, Taiwan, Hong Kong) and Korean speakers when assessing TOEFL (independent prompt) essay&lt;br /&gt;
* E-rater gave lower scores for Arabic, Hindi, and Spanish speakers when assessing their written responses to independent prompt in TOEFL&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring page]&lt;br /&gt;
&lt;br /&gt;
* Automated scoring models for evaluating English essays, or e-rater&lt;br /&gt;
&lt;br /&gt;
* E-Rater gave significantly better scores than human rater for TOEFL essays (independent task) written by speakers of Chinese and Korean&lt;br /&gt;
* E-Rater correlated poorly with human rater and gave better scores than human rater for GRE essays (both issue and argument prompts) written by Chinese speakers&lt;/div&gt;</summary>
		<author><name>Valdemar</name></author>
	</entry>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=Course_Grade_and_GPA_Prediction&amp;diff=491</id>
		<title>Course Grade and GPA Prediction</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=Course_Grade_and_GPA_Prediction&amp;diff=491"/>
		<updated>2024-09-02T00:06:09Z</updated>

		<summary type="html">&lt;p&gt;Valdemar: Add Svabensky@EDM'24&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Švábenský et al. (2024) [https://educationaldatamining.org/edm2024/proceedings/2024.EDM-posters.82/2024.EDM-posters.82.pdf pdf]&lt;br /&gt;
&lt;br /&gt;
* Classification models for predicting grades (worse than an average grade, “unsuccessful”, or equal/better than an average grade, “successful”)&lt;br /&gt;
* Investigating bias based on university students' regional background in the context of the Philippines&lt;br /&gt;
* Demographic groups based on 1 of 5 locations from which students accessed online courses in Canvas&lt;br /&gt;
* Bias evaluation using AUC, weighted F1-score, and MADD showed consistent results across all groups, no unfairness was observed&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf]&lt;br /&gt;
&lt;br /&gt;
* Models predicting college success (or median grade or above)&lt;br /&gt;
*Random forest algorithms performed significantly worse for underrepresented minority students (URM; American Indian, Black, Hawaiian or Pacific Islander, Hispanic, and Multicultural) than non-URM students (White and Asian), for male students than female students&lt;br /&gt;
*Random forest algorithms performed significantly worse for male students than female students&lt;br /&gt;
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]&lt;br /&gt;
&lt;br /&gt;
* Models predicting undergraduate course grades and average GPA&lt;br /&gt;
&lt;br /&gt;
* Students who are international, first-generation, or from low-income households were inaccurately predicted to get lower course grade and average GPA than their peer, and fairness of models improved with the inclusion of clickstream and survey data&lt;br /&gt;
*Female students were inaccurately predicted to achieve greater short-term and long-term success than male students, and  fairness of models improved when a combination of institutional and click data was used in the model&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]&lt;br /&gt;
&lt;br /&gt;
* Models predicting course outcome of students in a virtual learning environment (VLE)&lt;br /&gt;
* More male students were predicted to pass the course than female students, but  this overestimation was fairly small and not consistent across different algorithms&lt;br /&gt;
*Among the algorithms, Naive Bayes had the lowest normalized mutual information value and the highest ABROCA value, or differences between the area under curve&lt;br /&gt;
* Students with self-declared disability were predicted to pass the course more often&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Jiang &amp;amp; Pardos (2021) [https://dl.acm.org/doi/pdf/10.1145/3461702.3462623 pdf]&lt;br /&gt;
* Predicting university course grades using LSTM&lt;br /&gt;
* Roughly equal accuracy across racial groups&lt;br /&gt;
* Slightly better accuracy (~1%) across racial groups when including race in model&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Kung &amp;amp; Yu (2020)&lt;br /&gt;
[https://dl.acm.org/doi/pdf/10.1145/3386527.3406755 pdf]&lt;br /&gt;
* Predicting course grades and later GPA at public U.S. university&lt;br /&gt;
* Five algorithms and three metrics (independence, separation, sufficiency) analyzed&lt;br /&gt;
* Poorer performance for Latinx students on course grade prediction for all three metrics; poorer performance for Latinx students on GPA prediction in terms of independence and sufficiency, but not separation&lt;br /&gt;
* Poorer performance for first-generation students on course grade prediction for independence and separation, and for some algorithms for GPA prediction as well&lt;br /&gt;
* Poorer performance for low-income students in several cases, about 1/3 of cases checked&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Jeong et al. (2022) [https://fated2022.github.io/assets/pdf/FATED-2022_paper_Jeong_Racial_Bias_ML_Algs.pdf]&lt;br /&gt;
* Predicting 9th grade math score from academic performance, surveys, and demographic information&lt;br /&gt;
* Despite comparable accuracy, model tends to overpredict Asian and White students' performance, and underpredict Black, Hispanic, and Native American students' performance&lt;br /&gt;
* Several fairness correction methods equalize false positive and false negative rates across groups.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Sha et al. (2022) [https://ieeexplore.ieee.org/abstract/document/9849852]&lt;br /&gt;
* Predicting course pass/fail with random forest in Open University data&lt;br /&gt;
* A range of over-sampling methods tested&lt;br /&gt;
* Regardless of over-sampling method used, course pass/fail performance was moderately better for males&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Deho et al. (2023) [https://files.osf.io/v1/resources/5am9z/providers/osfstorage/63eaf170a3fade041fe7c9db?format=pdf&amp;amp;action=download&amp;amp;direct&amp;amp;version=1]&lt;br /&gt;
* Predicting whether course grade will be above or below 0.5&lt;br /&gt;
* Better prediction for female students in some courses, better prediction for male students in other courses&lt;br /&gt;
* Generally worse prediction for international students&lt;/div&gt;</summary>
		<author><name>Valdemar</name></author>
	</entry>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=National_Origin_or_National_Location&amp;diff=490</id>
		<title>National Origin or National Location</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=National_Origin_or_National_Location&amp;diff=490"/>
		<updated>2024-09-02T00:05:42Z</updated>

		<summary type="html">&lt;p&gt;Valdemar: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
Švábenský et al. (2024) [https://educationaldatamining.org/edm2024/proceedings/2024.EDM-posters.82/2024.EDM-posters.82.pdf pdf]&lt;br /&gt;
&lt;br /&gt;
* Classification models for predicting grades (worse than an average grade, “unsuccessful”, or equal/better than an average grade, “successful”)&lt;br /&gt;
* Investigating bias based on university students' regional background in the context of the Philippines&lt;br /&gt;
* Demographic groups based on 1 of 5 locations from which students accessed online courses in Canvas&lt;br /&gt;
* Bias evaluation using AUC, weighted F1-score, and MADD showed consistent results across all groups, no unfairness was observed&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Ogan et al. (2015) [https://link.springer.com/content/pdf/10.1007/s40593-014-0034-8.pdf pdf]&lt;br /&gt;
&lt;br /&gt;
* Multi-national models predicting learning gains from student's help-seeking behavior&lt;br /&gt;
* Models built on only U.S. or combined data sets performed extremely poorly for Costa Rica&lt;br /&gt;
* Models performed better when built on and applied for the same country, except for Philippines where model built on that country which was outperformed slightly by model built on U.S. data&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Li et al. (2021) [https://arxiv.org/pdf/2103.15212.pdf pdf]&lt;br /&gt;
&lt;br /&gt;
* Model predicting student achievement on the standardized examination PISA&lt;br /&gt;
* Inaccuracy of the U.S.-trained model was greater for students from countries with lower scores of national development (e.g. Indonesia, Vietnam, Moldova)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Wang et al. (2018) [https://www.researchgate.net/publication/336009443_Monitoring_the_performance_of_human_and_automated_scores_for_spoken_responses pdf]&lt;br /&gt;
&lt;br /&gt;
* Automated scoring model for evaluating English spoken responses&lt;br /&gt;
* SpeechRater gave a significantly lower score than human raters for German students&lt;br /&gt;
* SpeechRater scored gave higher scores than human raters for Chinese students, with H1-rater scores higher than mean&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring page]&lt;br /&gt;
&lt;br /&gt;
* Automated scoring models for evaluating English essays, or e-rater&lt;br /&gt;
&lt;br /&gt;
* E-Rater gave significantly better scores than human rater for TOEFL essays (independent task) written by speakers of Chinese and Korean&lt;br /&gt;
* E-Rater correlated poorly with human rater and gave better scores than human rater for GRE essays (both issue and argument prompts) written by Chinese speakers&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf]&lt;br /&gt;
&lt;br /&gt;
* A later version of automated scoring models for evaluating English essays, or e-rater&lt;br /&gt;
* E-rater gave  better scores for test-takers from Chinese speakers (Mainland China, Taiwan, Hong Kong) and Korean speakers when assessing TOEFL (independent prompt) essay&lt;br /&gt;
* E-rater gave lower scores for Arabic, Hindi, and Spanish speakers when assessing their written responses to independent prompt in TOEFL&lt;/div&gt;</summary>
		<author><name>Valdemar</name></author>
	</entry>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=National_Origin_or_National_Location&amp;diff=489</id>
		<title>National Origin or National Location</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=National_Origin_or_National_Location&amp;diff=489"/>
		<updated>2024-09-02T00:05:13Z</updated>

		<summary type="html">&lt;p&gt;Valdemar: Add Svabensky@EDM'24&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
Švábenský et al. (2024) [https://educationaldatamining.org/edm2024/proceedings/2024.EDM-posters.82/2024.EDM-posters.82.pdf pdf]&lt;br /&gt;
&lt;br /&gt;
* Classification models for predicting grades (worse than an average grade, “unsuccessful”, or equal/better than an average grade, “successful”)&lt;br /&gt;
* Investigating bias based on university students' regional background in the context of the Philippines&lt;br /&gt;
* Demographic groups based on 1 of 5 locations from which students accessed online courses in Canvas&lt;br /&gt;
* Bias evaluation using AUC, weighted F1-score, and MADD showed consistent results across all groups, no unfairness was observed&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Ogan et al. (2015) [https://link.springer.com/content/pdf/10.1007/s40593-014-0034-8.pdf pdf]&lt;br /&gt;
&lt;br /&gt;
* Multi-national models predicting learning gains from student's help-seeking behavior&lt;br /&gt;
* Models built on only U.S. or combined data sets performed extremely poorly for Costa Rica&lt;br /&gt;
* Models performed better when built on and applied for the same country, except for Philippines where model built on that country which was outperformed slightly by model built on U.S. data&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Li et al. (2021) [https://arxiv.org/pdf/2103.15212.pdf pdf]&lt;br /&gt;
&lt;br /&gt;
* Model predicting student achievement on the standardized examination PISA&lt;br /&gt;
* Inaccuracy of the U.S.-trained model was greater for students from countries with lower scores of national development (e.g. Indonesia, Vietnam, Moldova)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Wang et al. (2018) [https://www.researchgate.net/publication/336009443_Monitoring_the_performance_of_human_and_automated_scores_for_spoken_responses pdf]&lt;br /&gt;
&lt;br /&gt;
* Automated scoring model for evaluating English spoken responses&lt;br /&gt;
* SpeechRater gave a significantly lower score than human raters for German students&lt;br /&gt;
* SpeechRater scored gave higher scores than human raters for Chinese students, with H1-rater scores higher than mean&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring page]&lt;br /&gt;
&lt;br /&gt;
* Automated scoring models for evaluating English essays, or e-rater&lt;br /&gt;
&lt;br /&gt;
* E-Rater gave significantly better scores than human rater for TOEFL essays (independent task) written by speakers of Chinese and Korean&lt;br /&gt;
* E-Rater correlated poorly with human rater and gave better scores than human rater for GRE essays (both issue and argument prompts) written by Chinese speakers&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf]&lt;br /&gt;
&lt;br /&gt;
* A later version of automated scoring models for evaluating English essays, or e-rater&lt;br /&gt;
* E-rater gave  better scores for test-takers from Chinese speakers (Mainland China, Taiwan, Hong Kong) and Korean speakers when assessing TOEFL (independent prompt) essay&lt;br /&gt;
* E-rater gave lower scores for Arabic, Hindi, and Spanish speakers when assessing their written responses to independent prompt in TOEFL&lt;/div&gt;</summary>
		<author><name>Valdemar</name></author>
	</entry>
</feed>