JDSE

The Journal of Dental Sciences and Education deals with General Dentistry, Pediatric Dentistry, Restorative Dentistry, Orthodontics, Oral diagnosis and DentomaxilloFacial Radiology, Endodontics, Prosthetic Dentistry, Periodontology, Oral and Maxillofacial Surgery, Oral Implantology, Dental Education and other dentistry fields and accepts articles on these topics. Journal of Dental Science and Education publishes original research articles, review articles, case reports, editorial commentaries, letters to the editor, educational articles, and conference/meeting announcements.

EndNote Style
Index
Original Article
Performance comparison of artificial intelligence-based language models on oral pathology questions in dental education
Aims: This study aimed to comparatively evaluate the diagnostic accuracy levels of current large language models (LLMs)—including ChatGPT-3.5, ChatGPT-4, Gemini, and Deepseek—through multiple-choice questions related to oral pathology, a core subject in undergraduate dental education.
Methods: A total of 30 multiple-choice questions covering oral pathology topics within the dental curriculum were prepared by a domain expert using current textbooks and course materials. These questions were applied in a text-based format to four LLMs (ChatGPT-3.5, ChatGPT-4, Gemini, Deepseek), and only the first responses were recorded without any prompting or correction. Diagnostic accuracy was assessed based on the percentage of correct answers, and inter-model agreement was evaluated using the Kappa statistic. A p-value < 0.05 was considered statistically significant.
Results: ChatGPT-3.5 and ChatGPT-4.0 each achieved an accuracy rate of 93.33%, with corresponding kappa values of ?=0.951 and ?=0.927, respectively. Gemini reached 86.67% accuracy (?=0.879), while Deepseek achieved 80% accuracy (?=0.855). All models showed statistically significant agreement with the reference answers (p=0.000).
Conclusion: This study demonstrated that ChatGPT-3.5 and ChatGPT-4.0 outperformed other models in terms of diagnostic accuracy and agreement in answering fundamental oral pathology questions. Although all models showed significant concordance, ChatGPT-based models stood out in terms of overall accuracy. However, considering potential error margins, these models should be used only as complementary tools in dental education and should not replace primary sources of knowledge acquisition for students.


1. Shorey S, Mattar C, Pereira TL, Choolani M. A scoping review of ChatGPT&rsquo;s role in healthcare education and research. Nurse Educ Today. 2024;135:106121. doi:10.1016/j.nedt.2024.106121
2. Dashti M, Londono J, Ghasemi S, et al. Attitudes, knowledge, and perceptions of dentists and dental students toward artificial intelligence: a systematic review. J Taibah Univ Med Sci. 2024;19(2):327-337. doi:10. 1016/j.jtumed.2023.12.010
3. Huynh LM, Bonebrake BT, Schultis K, Quach A, Deibert CM. New artificial intelligence ChatGPT performs poorly on the 2022 self-assessment study program for urology. Urol Pract. 2023;10(4):409-415. doi:10.1097/UPJ.0000000000000406
4. OpenAI. ChatGPT: Optimizing language models for dialogue [Internet]. 2022. Available from: https://openai.com/blog/chatgpt.
5. Gibney E. Scientists flock to DeepSeek: how they&rsquo;re using the blockbuster AI model. Nature. 2025. doi:10.1038/d41586-025-00275-0
6. Kinikoglu I. Evaluating ChatGPT and Google Gemini performance and implications in Turkish Dental Education. Cureus. 2025;17(1):e77292. doi:10.7759/cureus.77292
7. Sabri H, Saleh MHA, Hazrati P, et al. Performance of three artificial intelligence (AI)-based large language models in standardized testing; implications for AI-assisted dental education. J Periodontal Res. 2025; 60(2):121-133. doi:10.1111/jre.13323
8. Uribe SE, Maldupa I, Kavadella A, et al. Artificial intelligence chatbots and large language models in dental education: Worldwide survey of educators. Eur J Dent Educ. 2024;28(4):865-876. doi:10.1111/eje.13009
9. Eraslan R, Ayata M, Yagci F, Albayrak H. Exploring the potential of artificial intelligence chatbots in prosthodontics education. BMC Med Educ. 2025;25(1):321. doi:10.1186/s12909-025-06849-w
10. W&oacute;jcik D, Nowak M, Lewandowska M, et al. A comparative analysis of the performance of ChatGPT-4, Gemini and Claude for the Polish Medical Final Diploma Exam and Medical-Dental Verification Exam. medRxiv. 2024;2024.07.29.24311077. doi:10.1101/2024.07.29.24311077
11. Ayg&uuml;n T, Keskin M, &Ouml;zkan T, et al. A performance of generative pre-trained transformers (GPT) in answering questions on anatomy in the Turkish dentistry specialization exam. J Ilm Teknol Sist Inform. 2024; 5(4):188-192. doi:10.62527/jitsi.5.4.317
12. Yilmaz BE, Gokkurt Yilmaz BN, Ozbey F. Artificial intelligence performance in answering multiple-choice oral pathology questions: a comparative analysis. BMC Oral Health. 2025;25(1):573. doi:10.1186/s12903-025-05926-2
13. Wu YH, Tso KY, Chiang CP. Performance of ChatGPT in answering the oral pathology questions of various types or subjects from Taiwan National Dental Licensing Examinations. J Dent Sci. 2025. doi:10.1016/j.jds.2025.03.030
14. Tosun B, Yılmaz ZŞ. Comparison of artificial intelligence systems in answering prosthodontics questions from the dental specialty exam in Turkey. J Dent Sci. 2025. doi:10.1016/j.jds.2025.01.025
15. K&uuml;nzle P, Paris S. Performance of large language artificial intelligence models on solving restorative dentistry and endodontics student assessments. Clin Oral Investig. 2024;28(11):575. doi:10.1007/s00784-024-05968-w </ol> <p>
Volume 3, Issue 2, 2025
Page : 31-34
_Footer