نوع مقاله : مقاله پژوهشی
نویسندگان
1 نویسنده مسئول، گروه مدیریت فناوری اطلاعات، دانشکده مدیریت صنعتی و فناوری، دانشکدگان مدیریت، دانشگاه تهران، تهران، ایران
2 گروه مدیریت فناوری اطلاعات، دانشکده مدیریت صنعتی و فناوری، دانشکدگان مدیریت، دانشگاه تهران، تهران، ایران.
3 گروه مدیریت فناوری اطلاعات، دانشکده مدیریت صنعتی و فناوری، دانشکدگان مدیریت، دانشگاه تهران، تهران، ایران
4 استاد گروه فلسفه و منطق، دانشکده علوم انسانی، دانشگاه تربیت مدرس
چکیده
کلیدواژهها
عنوان مقاله [English]
نویسندگان [English]
Objective:This study investigates the role of large language models (LLMs) in detecting logical fallacies during the peer-review process, aiming to improve the accuracy, transparency, and reliability of scientific publications. Additionally, the research evaluates the potential of LLMs to reduce the workload on human reviewers and standardize evaluation practices.
Method: The research involved a series of experiments designed to evaluate the ability of advanced language models, such as ChatGPT (versions 4 and o1), to identify and classify logical fallacies, solve reasoning problems, and analyze academic texts of varying lengths and complexities. Standard datasets, including the ElecDeb2060 dataset and logic questions from the Iranian Ph.D. Entrance Exam, were used. Classical machine learning models, including Support Vector Machine (SVM) and Random Forest, were employed as baseline comparisons. Advanced optimization techniques and zero-shot learning approaches were applied to prepare the language models for the analyses.
Results: The results demonstrated the exceptional performance of advanced language models, particularly ChatGPT o1, which achieved 98.1% accuracy in detecting logical fallacies and 100% accuracy in solving logic problems from the Ph.D. Entrance Exam. In contrast, classical machine learning models, such as SVM and Random Forest, recorded significantly lower accuracies of 48% and 49%, respectively. Other advanced models, such as Mistral and LLama, exhibited moderate performances, with accuracies ranging from 76% to 78.5% in identifying logical fallacies. For longer and more complex texts, ChatGPT o1 maintained 100% accuracy in identifying and naming fallacies, while other models demonstrated reduced capabilities, with accuracies below 50%.
In addition to their accuracy, the advanced LLMs displayed a remarkable ability to analyze complex arguments, identify subtle logical errors, and provide structured feedback. These features highlight their potential for improving both the efficiency and the quality of the peer-review process by reducing human error and offering detailed, objective evaluations.
Conclusion: Large language models, particularly ChatGPT o1, have shown substantial potential to redefine traditional peer-review practices. These models can enhance the speed, precision, and transparency of evaluations, thereby supporting the publication of high-quality research articles. By identifying logical fallacies and cognitive biases, they offer structured feedback that aids authors in refining their work and ensures the integrity of scientific literature. However, human reviewers remain essential as final arbiters in the process, ensuring a balanced integration of AI's analytical capabilities with human expertise. This synergy can pave the way for a more robust, efficient, and transparent peer-review system, fostering progress in scientific research.
کلیدواژهها [English]