Workshop 1: Using Natural Language Processing, Writing Process Analysis, Artificial Intelligence, and Machine Learning to Model Learner Writing
Paul Deane
Monday June 5 and Tuesday June 6 from 9 am - 4 pm (In-Person)
This workshop is FULL. You can add your name to the waitlist
Paul Deane
Monday June 5 and Tuesday June 6 from 9 am - 4 pm (In-Person)
This workshop is FULL. You can add your name to the waitlist
Paul D. Deane, a Principal Research Scientist in the Research Division at Educational Testing Service, earned a Ph.D. in linguistics at the University of Chicago in 1987. He taught linguistics at the University of Central Florida from 1986 to 1994, where he authored Grammar in Mind and Brain (Mouton de Gruyter, 1994), a study of the interaction of cognitive structures in syntax and semantics. From 1994 to 2001, he worked in industrial natural language processing, where he focused on lexicon development, parser design, and semantic information retrieval. He joined Educational Testing Service in 2001. His current research interests include automated writing evaluation, vocabulary assessment, and building cognitive and computational models of reading and writing skills. During his career at ETS he has worked on a variety of natural language processing and assessment projects, including automated item generation, tools to support verbal test development, scoring of collocation errors, reading and vocabulary assessment, and automated essay scoring. His work currently focuses on the development and scoring of writing assessments.
|
As writing has more and more become a digital task, and the use of Artificial Intelligence (AI), Machine Learning (ML), and Natural Language Processing (NLP) techniques have matured, it has increasingly become possible to use these techniques to analyze L2 student writing. Recently, analysis of student texts has been supplemented by the use of writing process logs (keystroke logs) to better understand student writing processes. The aim of this workshop is to help teachers and researchers better understand how these techniques can be used to evaluate and describe the English language and writing skills of students at every level. During the workshop, participants will be introduced to readily available tools that can be used to model student writing and will apply some of these tools to sample datasets to get hands-on experience with some of the relevant analytical techniques.
The workshop will focus on several related but separable topics. Multidimensional models of writing, of the kind pioneered by Biber (1980), capture important generalizations about variation in language. Classical automated essay scoring techniques, of the kind pioneered by Page (1966) and Burstein, Kukich, Wolff, & Lu (1998) capture important generalizations about the characteristics of stronger and weaker writers, and are often used as measures of written-mode L2 language acquisition in high-stakes language assessments. Many of these classical techniques can be applied using open-source tools or tools that are either relatively inexpensive and/or freely available for research, such as LIWC for analyzing word usage, the Coh-Metrix tool for analyzing cohesion, or the SALAT suite of NLP tools for the social sciences. Process-based measures of writing derived from keystroke logs, of the kind pioneered by Severinson Eklundh & Kollberg (1992), make it possible to study the dynamics of writing in real time, which can provide important insights into the development of language learners. Once again, there are freely available tools, such as InputLog that make it relatively easy for researchers to implement writing process studies. Finally, there are the newer deep-learning based methods of analyzing texts, such as BERT and GPT-3. These provide ways to predict linguistic behavior without explicitly modeling all of the key dimensions in advance, and have led to significant advances in predictive power.
The workshop will review each of these major areas of research, walk participants through sample analyses that apply some of these techniques, and provide them with the opportunity to try some of the freely available tools with publicly available datasets. It is designed for novices to AI-based linguistic analysis and for experienced researchers who wish to review and deepen their knowledge in some of the specific areas covered by the workshop. One of the major issues I will discuss is the balance between using classical techniques, which support construction of an explicit argument that links the specific features measured to an explicit definition of the constructs being measured, and newer machine learning and deep learning techniques, which demonstrate considerable predictive power but present greater challenges in interpretability.
The workshop will focus on several related but separable topics. Multidimensional models of writing, of the kind pioneered by Biber (1980), capture important generalizations about variation in language. Classical automated essay scoring techniques, of the kind pioneered by Page (1966) and Burstein, Kukich, Wolff, & Lu (1998) capture important generalizations about the characteristics of stronger and weaker writers, and are often used as measures of written-mode L2 language acquisition in high-stakes language assessments. Many of these classical techniques can be applied using open-source tools or tools that are either relatively inexpensive and/or freely available for research, such as LIWC for analyzing word usage, the Coh-Metrix tool for analyzing cohesion, or the SALAT suite of NLP tools for the social sciences. Process-based measures of writing derived from keystroke logs, of the kind pioneered by Severinson Eklundh & Kollberg (1992), make it possible to study the dynamics of writing in real time, which can provide important insights into the development of language learners. Once again, there are freely available tools, such as InputLog that make it relatively easy for researchers to implement writing process studies. Finally, there are the newer deep-learning based methods of analyzing texts, such as BERT and GPT-3. These provide ways to predict linguistic behavior without explicitly modeling all of the key dimensions in advance, and have led to significant advances in predictive power.
The workshop will review each of these major areas of research, walk participants through sample analyses that apply some of these techniques, and provide them with the opportunity to try some of the freely available tools with publicly available datasets. It is designed for novices to AI-based linguistic analysis and for experienced researchers who wish to review and deepen their knowledge in some of the specific areas covered by the workshop. One of the major issues I will discuss is the balance between using classical techniques, which support construction of an explicit argument that links the specific features measured to an explicit definition of the constructs being measured, and newer machine learning and deep learning techniques, which demonstrate considerable predictive power but present greater challenges in interpretability.
Workshop 2: Creating Authentic L2 Listening Assessments
Elvis Wagner & Gary J. Ockey
Monday June 5 from 9 am to 4 pm (In-Person)
Elvis Wagner & Gary J. Ockey
Monday June 5 from 9 am to 4 pm (In-Person)
Elvis Wagner is Associate Professor of TESOL at Temple University, where he coordinates the PhD in Applied Linguistics program, and the World Languages Education program. His research interests include the assessment of L2 oral ability, specifically focusing on how L2 listeners process and comprehend unscripted, spontaneous spoken language.
Gary J. Ockey is a Professor of Applied Linguistics and Technology at Iowa State University where he is the director of the university’s oral communication placement test. He investigates second language oral ability with a focus on technology and quantitative methods to facilitate its measurement. |
The assessment of second language (L2) listening presents unique challenges to teachers and test developers. In this workshop, we will address some of those challenges while providing hands-on training in creating practical, reliable, and valid listening assessment tasks. We focus on the concept of authenticity, using Bachman and Palmer’s (1996) definition: “…the degree of correspondence of the characteristics of a given language test task to the features of a TLU (target language use) task” (p. 23), and argue for the creation of listening assessment tasks that have the characteristics and qualities of the types of listening that L2 learners will have to do in the real world. We believe that this approach can lead to effective listening assessments and provide positive washback on teaching and learning practices.
We focus on four features of L2 listening tasks in order to promote the authenticity of listening assessment tasks:
The workshop will include an extensive discussion of ways to define listening for various assessment contexts and then create tasks to measure these constructs. It will explore how to choose and/or create spoken texts for listening assessment. It will also include a discussion of various item formats, including multiple-choice comprehension items, retell, pair/group discussion, as well as listening as part of larger integrated tasks. We will also discuss how listening can be assessed as part of interactive speaking and listening, rather than interpretive listening only.
The workshop will be very much “hands on”, as participants will be asked to:
Ultimately, the goal of this workshop is to help test developers create authentic listening test tasks that can effectively assess test takers’ listening and oral communication abilities. Using more authentic listening test tasks should not only result in tests that can make more valid inferences about test takers’ L2 listening ability in real-world contexts, but it should also serve to have a positive washback effect on how L2 listening is taught and learned. Participants with little or no listening test development experience as well as those with extensive experience should be able to benefit from this workshop.
Gary J. Ockey and Elvis Wagner are co-authors of Assessing L2 Listening: Moving Towards Authenticity (John Benjamins, 2018).
We focus on four features of L2 listening tasks in order to promote the authenticity of listening assessment tasks:
- Using real-world spoken texts
- Including different speech varieties in the spoken input
- Using audio-visual spoken texts
- Assessing listening as part of an interactive speaking/listening construct (Ockey & Wagner, 2018)
The workshop will include an extensive discussion of ways to define listening for various assessment contexts and then create tasks to measure these constructs. It will explore how to choose and/or create spoken texts for listening assessment. It will also include a discussion of various item formats, including multiple-choice comprehension items, retell, pair/group discussion, as well as listening as part of larger integrated tasks. We will also discuss how listening can be assessed as part of interactive speaking and listening, rather than interpretive listening only.
The workshop will be very much “hands on”, as participants will be asked to:
- Create listening constructs for particular contexts
- Judge the appropriateness of different spoken texts for use as listening input for a given context
- Modify those spoken texts to make them more representative of real-world spoken input
- Explore how multimedia texts can be used to enhance the authenticity of listening assessment tasks
- Create test tasks and items of various types
- Share these tasks with group members for feedback and discussion
- Create a list of “dos and don’ts” for creating and adapting listening test tasks
- Discuss potential constructs and tasks for their own listening contexts
Ultimately, the goal of this workshop is to help test developers create authentic listening test tasks that can effectively assess test takers’ listening and oral communication abilities. Using more authentic listening test tasks should not only result in tests that can make more valid inferences about test takers’ L2 listening ability in real-world contexts, but it should also serve to have a positive washback effect on how L2 listening is taught and learned. Participants with little or no listening test development experience as well as those with extensive experience should be able to benefit from this workshop.
Gary J. Ockey and Elvis Wagner are co-authors of Assessing L2 Listening: Moving Towards Authenticity (John Benjamins, 2018).
Workshop 3: Language Assessment for Immigration and Citizenship (while visiting the Statue of Liberty and Ellis Island)
Antony Kunnan (assisted by Coral Yiwei Qin)
Tuesday, June 6 (time TBD but approximately between 9am and 4pm - In person)
Antony Kunnan (assisted by Coral Yiwei Qin)
Tuesday, June 6 (time TBD but approximately between 9am and 4pm - In person)
Antony John Kunnan, Principal Assessment Scientist at Duolingo, Inc. and Senior Research Fellow at Carnegie Mellon University, has given talks and published on the topic of language assessment for immigration and citizenship. His latest publication on this topic was a chapter titled: “Revisiting language assessment for immigration and citizenship” in The Routledge Handbook of Language Testing (2nd edition).
Coral Yiwei Qin is a PhD candidate at the University of Ottawa specializing in language assessment for immigration and citizenship in Canada. Language assessments for immigration and citizenship have been introduced now in about 40 countries as instruments of government public policy. There are many problems related to these language assessments in terms of theory and practice. For example, there is often a lack of explicitly stated rationale for the assessment, there are confusing interpretations of assessment results, and there are, most often, detrimental social consequences. This workshop will address these through a short presentation, illustrative examples, participant activities and discussion, and a tour of the Statue of Liberty and Ellis Island. |
In Part 1, participants will be introduced to the Naturalization Act of 1790 and to modern examples of language assessment policies and assessment instruments from a few countries with special reference to U.S. policy and practice including the Literacy Test of 1917 (with literacy cards in many languages) and the Immigration and Nationality Act of 1952 that stipulated the educational requirements that are used today. The educational requirements listed English Language and History and Civics as necessary hurdles for immigrants to cross on their way to becoming naturalized citizens of the U.S. The current U.S. Naturalization Test is the operationalization of the INA of 1952.
In Part 2, participants will examine statistical information from the U.S. Department of Homeland Security to discover for themselves citizenship patterns (of applications, naturalizations, and denials) for the last few decades by country of origin and destination area in the U.S.
The workshop will also be enhanced by a trip to the Statue of Liberty and Ellis Island’s National Immigration Museum, an immigration multi-faceted museum. This immigrant station functioned from 1892 to 1954 where more than 12 million immigrants were processed. It is estimated that more than 40% (about 140 million) of the American population can trace their ancestry to immigrants who came through Ellis Island. Workshop participants can walk through the museum as well as seek a variety of information from the archives including immigrant arrivals and stories through searchable records.
In Part 2, participants will examine statistical information from the U.S. Department of Homeland Security to discover for themselves citizenship patterns (of applications, naturalizations, and denials) for the last few decades by country of origin and destination area in the U.S.
The workshop will also be enhanced by a trip to the Statue of Liberty and Ellis Island’s National Immigration Museum, an immigration multi-faceted museum. This immigrant station functioned from 1892 to 1954 where more than 12 million immigrants were processed. It is estimated that more than 40% (about 140 million) of the American population can trace their ancestry to immigrants who came through Ellis Island. Workshop participants can walk through the museum as well as seek a variety of information from the archives including immigrant arrivals and stories through searchable records.
Workshop 4: Open Science in Language Testing: Theories, Tools, and Practice (ONLINE AND RECORDED)
Paula Winke and Dylan Burton
Tuesday, June 6 from 9:00am to 12:00pm (Online and recorded. You do not need to attend LTRC to register for this workshop)
Paula Winke and Dylan Burton
Tuesday, June 6 from 9:00am to 12:00pm (Online and recorded. You do not need to attend LTRC to register for this workshop)
Dr. Paula Winke, a former Peace Corps Volunteer to China, a two-time Fulbright Scholar (Hungary 2008; Germany 2022), and a four-time, national and international research award winner, teaches courses on language assessment and language teaching methods. Dr. Winke’s research, which has been featured in The New York Times, The Los Angeles Times, Public Radio International, SLATE Magazine, The Associated Press, and other outlets, focuses on making language assessments that are fair, reliable, and valid. She is a Co-Editor of the journal “Language Testing,” and she is an advisor to the U.S. Foreign Service Institute (FSI) on language testing principles. She is a project lead at Michigan State University's National Language Resource Center and Co-PI on a U.S. National Security Agency STARTALK Grant that supports K-12 Chinese teachers' professional development in assessment skills.
|
J. Dylan Burton is a PhD candidate in the Second Language Studies program at Michigan State University, USA, and editorial assistant for the journal Language Testing with co-editors Paula Winke and Luke Harding. He holds an MA in Language Testing from Lancaster University. Dylan’s research interests include the development and validation of speaking tests, rater cognition, and non-verbal behavior. He is an advocate for Open Science, which emphasizes research transparency in the social sciences.
Over the past decade, applied linguists have joined the broader social sciences and STEM in adopting open science approaches that support accessible, transparent, and reproducible research. The need to support these practices in language testing is paramount; access to materials and data allow researchers to independently assess the veridicality of research findings, to plan replication studies, and to extend the use of validated tools to new contexts. The benefits of adopting open science practices are numerous, including a reduction in publication bias (Nosek et al., 2018) and supporting or challenging past findings through replicating research (Porte & McManus, 2019). Nonetheless, language testers have been slow to adopt these principles (Burton, in press), possibly because of a lack of exposure and training on how to best leverage these tools. This half-day, online workshop will thus attempt to bridge that gap by introducing attendees to the various principles of open science and the most popular tools to disseminate findings and build a stronger open research profile.
The first aim of the workshop is to build a solid knowledge base of open science. The workshop will include discussions of what open science is and how it is used in applied linguistics. We will discuss, amongst other topics, research preregistration, open data and materials, registered reports, preprints, and postprints. For each key area, we will present a rationale for the practice and key examples from the field. Attendees will be encouraged to participate in discussions surrounding the benefits, challenges, and possible disadvantages of their use. We will provide data on ways engagement in open science is recognized, and how it is being filtered into tenure promotion decisions.
The second core aim of the workshop will be to explore practical tools, giving attendees the opportunity to gain hands-on experience with setting up research profiles and data repositories. We will guide attendees through four different tools available online that can help expand, disseminate, and track researchers’ scientific output: ORCID, the Open Science Framework, IRIS, and Publons. With ORCID IDs, we will show attendees how to set up an account, add vital biographical data, and link publications to their account. Using the Open Science Framework, we will guide attendees through account creation, preregistering studies, creating projects for data and materials, blinding materials for peer review, and setting up their own public digital object identifier, DOI, for their dataset. We will also introduce attendees to open materials databases such as IRIS. Finally, we will introduce Publons, an online service to track peer review for promotion and tenure purposes, and show attendees how to set up their own profile and log their academic service.
Participants will walk away from the workshop with a greater understanding of open science and the tools they can use in their own practice. The workshop will be valuable for graduate students and even senior scholars that would like to learn more about this increasingly important topic.
Burton, J. D. (2023). Reflections on the past and future of language testing and assessment: An emerging scholar’s perspective. Language Testing, 40(1).
Nosek, B. A., Beck, E. D., Campbell, L., Flake, J. K., Hardwicke, T. E., Mellor, D. T., van’t Veer, A. E., & Vazire, S. (2019). Preregistration is hard, and worthwhile. Trends in cognitive sciences, 23(10), 815-818. https://doi.org/10.1016/j.tics.2019.07.009
Porte, G., & McManus, K. (2019). Doing replication research in applied linguistics. Routledge.
Over the past decade, applied linguists have joined the broader social sciences and STEM in adopting open science approaches that support accessible, transparent, and reproducible research. The need to support these practices in language testing is paramount; access to materials and data allow researchers to independently assess the veridicality of research findings, to plan replication studies, and to extend the use of validated tools to new contexts. The benefits of adopting open science practices are numerous, including a reduction in publication bias (Nosek et al., 2018) and supporting or challenging past findings through replicating research (Porte & McManus, 2019). Nonetheless, language testers have been slow to adopt these principles (Burton, in press), possibly because of a lack of exposure and training on how to best leverage these tools. This half-day, online workshop will thus attempt to bridge that gap by introducing attendees to the various principles of open science and the most popular tools to disseminate findings and build a stronger open research profile.
The first aim of the workshop is to build a solid knowledge base of open science. The workshop will include discussions of what open science is and how it is used in applied linguistics. We will discuss, amongst other topics, research preregistration, open data and materials, registered reports, preprints, and postprints. For each key area, we will present a rationale for the practice and key examples from the field. Attendees will be encouraged to participate in discussions surrounding the benefits, challenges, and possible disadvantages of their use. We will provide data on ways engagement in open science is recognized, and how it is being filtered into tenure promotion decisions.
The second core aim of the workshop will be to explore practical tools, giving attendees the opportunity to gain hands-on experience with setting up research profiles and data repositories. We will guide attendees through four different tools available online that can help expand, disseminate, and track researchers’ scientific output: ORCID, the Open Science Framework, IRIS, and Publons. With ORCID IDs, we will show attendees how to set up an account, add vital biographical data, and link publications to their account. Using the Open Science Framework, we will guide attendees through account creation, preregistering studies, creating projects for data and materials, blinding materials for peer review, and setting up their own public digital object identifier, DOI, for their dataset. We will also introduce attendees to open materials databases such as IRIS. Finally, we will introduce Publons, an online service to track peer review for promotion and tenure purposes, and show attendees how to set up their own profile and log their academic service.
Participants will walk away from the workshop with a greater understanding of open science and the tools they can use in their own practice. The workshop will be valuable for graduate students and even senior scholars that would like to learn more about this increasingly important topic.
Burton, J. D. (2023). Reflections on the past and future of language testing and assessment: An emerging scholar’s perspective. Language Testing, 40(1).
Nosek, B. A., Beck, E. D., Campbell, L., Flake, J. K., Hardwicke, T. E., Mellor, D. T., van’t Veer, A. E., & Vazire, S. (2019). Preregistration is hard, and worthwhile. Trends in cognitive sciences, 23(10), 815-818. https://doi.org/10.1016/j.tics.2019.07.009
Porte, G., & McManus, K. (2019). Doing replication research in applied linguistics. Routledge.