Analyzing the Use of ChatGPT for the Generation of Automatic Feedback
Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/Gutachten › Beitrag in Konferenzband › Beigetragen › Begutachtung
Beitragende
Abstract
This poster describes an analysis of the possible applications of ChatGPT 3.5 for the creation of automatically generated feedback (AGF) focused on learning content on the subject computer science. The aim of the research was to evaluate the quality and applicability of AGF in comparison to human-generated feedback and to analyze its potential as a source of task related feedback for digital learning environments. In particular, the possibility of generating elaborate feedback (EF) was examined. The following research questions were at the center of the investigation: 1. How do experts evaluate the feedback generated by ChatGPT 3.5 on a given task and learner response? 2. What types of feedback can be generated with ChatGPT? 3. Can the predefined prompts be applied to a new task context? For the creation of a fitting AGF the used prompt was iteratively refined until the AGF met given criteria on useful feedback. These criteria were derived from research of Narciss (e.g., 2005). According to Narciss useful feedback needs a cognitive and a motivational component as well as several subcomponents (e.g., informative function, specifying function). In order to evaluate if these criteria were better met by AGF or human feedback, both types of feedback were compared with each other. To investigate whether the application of the predefined prompt in different tasks contexts, the task description and student answers were changed while the remaining prompt remained the same. The final evaluation was conducted through semi-structured open expert interviews with 12 professionals in the area of education and computer science and was based on a self-created example scenario that included a task to test basic knowledge in data security and a fictitious student response. As mentioned above, for this assessment the criteria of elaborated feedback derived from Narciss (e.g., 2005) for the assessment of elaborated feedback served as the basis for the interview. The focus here was on identifying and describing mistakes in the students' answers and (cognitive component) as well as motivating students to perform better in follow-up tasks (motivational component). In addition, semantic and linguistic aspects of the feedback were compared (e.g., in terms of concreteness or lengthiness). The results show that AGF effectively identifies mistakes and provides relevant cues, but is overall less accurate than human feedback. The motivational component of AGF were rated as comparable to human feedback, while the cognitive components performed worse. In addition, AGF has difficulties in effectively incorporating “knowledge of results”, which limits its suitability for elaborated feedback. Despite these challenges, ChatGPT 3.5 demonstrated the ability to generate detailed text-based feedback, and the prompt templates developed could be transferred to similar tasks in the subject of computer science. However, it has taken many supervised attempts to produce a satisfactory AGF which leads to the conclusion that an autonomous implementation of ChatGPT 3.5 as a source of feedback is not recommended at the moment. Furthermore, more research is needed because this study focused only on one subject (computer science) and compared generated AGF only with one human generated feedback. However, the present results show that with the further development of LLM, possible application of LLM also in the field of education will rise.
Details
| Originalsprache | Englisch |
|---|---|
| Titel | Proceedings of CSEDU 2025. Poster |
| Publikationsstatus | Veröffentlicht - Apr. 2025 |
| Peer-Review-Status | Ja |
Externe IDs
| ORCID | /0009-0000-0900-2158/work/183564560 |
|---|