Skip Navigation
Skip to contents

Science Editing : Science Editing

OPEN ACCESS
SEARCH
Search

Articles

Page Path
HOME > Sci Ed > Volume 12(2); 2025 > Article
Essay
Research trends and comparisons of major generative artificial intelligence platforms for systematic literature reviews
Sang-Jun Kimorcid
Science Editing 2025;12(2):200-205.
DOI: https://doi.org/10.6087/kcse.384
Published online: August 12, 2025

Korea Research Institute of Bioscience and Biotechnology, Daejeon, Korea

Correspondence to Sang-Jun Kim sjkim@kribb.re.kr
• Received: July 21, 2025   • Accepted: July 28, 2025

Copyright © 2025 Korean Council of Science Editors

This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

prev next
  • 2,408 Views
  • 90 Download
Generative artificial intelligence (AI), such as ChatGPT (OpenAI), has revolutionized all sectors, including academia, with the majority of ChatGPT-related articles published in the medical field since its emergence in late 2022 [1]. The estimated use of articles by large language models (LLMs) has been increasing since ChatGPT’s release [2]. While LLMs possess the capability to generate human-like content across multiple modalities, including text, images, audio, and video, their dependence on pretrained knowledge can lead to inaccuracies or hallucinations [2]. Moreover, AI is having such a significant impact on academia that LLMs are now even being used to assist in systematic literature reviews (SLRs) to minimize human intervention.
With the rising adoption of AI platforms for SLRs (SLR-AIs) that support SLR functionalities in academic writing, researchers face the challenge of selecting the most appropriate SLR-AI for high-quality research. As various SLR-AIs continue to appear, each offering distinct features, choosing and using the right SLR-AI has become a key determinant of research productivity. Researchers now face a new dilemma: should they opt for an SLR-AI trained on bibliographic databases, or should they utilize the SLR functionalities of general-purpose AIs? This is an increasingly relevant concern in the research community.
This study aims to provide a concise introduction to SLR-AIs that support SLR processes, to compare their features and performance, and to present this information to researchers as guidance for their selection and use of SLR-AIs. I intend to focus exclusively on major SLR-AIs, with no conflicts of interest regarding any products, and to share personal experiences as objectively and fairly as possible. This analysis draws on my experience as a practicing researcher and librarian, in addition to a review of relevant literature, in order to present a balanced overview of both theoretical and practical aspects of SLR-AIs.
For this essay, I initially gathered foundational materials by searching Google and bibliographic databases. Subsequently, using major SLR-AIs that were freely or affordably accessible, I collected additional data for comparison by reviewing responses to targeted queries. By comparing the theoretical orientation of SLR-AI as outlined in recent research with the real-world characteristics of available SLR-AIs, this essay is written from the perspective of practical utilization rather than technical development.
Usefulness and limitations of SLR-AIs
ChatGPT-like AIs can serve as effective SLR-AIs to assist novice researchers due to their breadth and accuracy of knowledge. ChatGPT demonstrates potential in automating article screening during SLRs, achieving high sensitivity and substantial reductions in workload, although it exhibits lower specificity compared to human reviewers [3]. GPT-4, in particular, has proven to be an effective tool for enhancing the efficiency of article screening, demonstrating “human-like” and “almost perfect” performance comparable to human experts [4]. Additionally, LLMs contribute to SLR processes by selecting relevant studies and extracting key information [5]. In the SLR workflow, which encompasses literature search, review, data extraction, and synthesis, a retrieval-augmented generation (RAG)-based LLM, which combines real-time information retrieval with generative capabilities, can markedly improve accuracy, relevance, and contextual understanding [6]. Although LLMs have not yet fully supplanted human experts in abstract screening, ChatGPT 4.0 has shown considerable promise for improving SLR with a good balance of sensitivity, specificity, and high overall accuracy [7]. Therefore, LLM-based SLR-AIs are increasingly used to enhance research discovery, visualization, and efficient summarization. A comparative assessment of academic AIs is needed to support the implementation of robust AI-driven scholarly information services, facilitate reliable data-driven research, and improve academic writing [8]. Thus, more detailed quality comparisons of LLMs for SLR should employ qualitative metrics such as accuracy, response time, consistency, depth of knowledge, contextual understanding, and transparency [9]. While LLMs for SLR can reduce the time required and increase the value of analytical results, human oversight remains necessary to ensure the accuracy and reliability of findings [10].
Present and future of SLR-AIs
General generative AI models, such as ChatGPT, originally trained on broad, nonacademic data from the internet, are now offering SLR functionalities due to rapid technological advancements. In response, bibliographic database providers have released SLR-AIs trained on the abstracts within their databases and are offering these as additional subscription services to libraries. Consequently, recent research on SLR-AIs has primarily focused on the screening and extraction phases of article review [11]. According to an analysis of articles retrieved from PubMed, ChatGPT and other GPT-based LLMs represent the most promising architectures for SLR-AIs, despite their relatively low recall, and are expected to transform SLR methodologies [12]. However, SLR-AIs often generate divergent queries even when provided with the same prompt, resulting in suboptimal recall. Therefore, caution is advised when employing these tools for rapid systematic reviews. Continued technological development of SLR-AIs is necessary to enhance the precision of article retrieval, enable systematic integration of retrieved data, and support learning from a wide array of specialized and emerging information sources to generate value-added insights—such as identifying research gaps. Producing reliable and actionable SLR-AI results remains a significant technical and ethical challenge, necessitating increased accuracy for sophisticated analytics. In domains where accuracy and reliability are paramount, such as medicine, there is also a pressing need to implement the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines when utilizing SLR-AIs [13].
Table 1 summarizes the characteristics, advantages, and disadvantages of major SLR-AIs that I have used or that have been described in the literature; only the key highlights are presented here. DeepSeek R1 (DeepSeek) achieves performance comparable to OpenAI’s o1 on reasoning tasks, but with transparent, explainable reasoning, and structured problem solving with break down complex problems. I had no direct experience with it, because Korea restricts the use of DeepSeek R1.
Bibliographic database–based SLR-AIs

Scopus AI

Scopus AI (Elsevier) was developed using abstracts from articles published after 2003 in the Scopus database. The aim of Scopus AI is to help avoid hallucinated references and ensure there are no “fake” citations [14]. Scopus AI formulates queries and performs vector searches to construct a summary of the query, subsequently generating answers by checking cited references for validation and transparency. Scopus AI may be particularly useful for the initial stages of SLRs or for fields with less stringent methodological requirements, as it offers a level of quality and reliability grounded in peer-reviewed articles that surpasses that of general LLM-generated answers. Now available in a mobile version, Scopus AI may also impact competing AIs by introducing new approaches to SLR [15].

Research Assistant

The Research Assistant (RA; Literature Review 2.0, Clarivate) has been offered as an add-on to the Web of Science (WoS) subscription libraries since 2025. Clarivate developed RA for the WoS Core Collection, which has been compiling a bibliographic database of articles since before Scopus. Clarivate trained the RA model on WoS article abstracts to ensure high quality and reliability of the learning sources. However, unlike Scopus AI, RA displays, “I’ll create a search strategy with keywords and synonyms covering these aspects,” during the question-and-answer process. It works by generating answers using GPT technology through an expanded search based on existing keywords. While this approach is similar to Scopus AI, it differs in that RA first automates the selection of search keywords and then uses an LLM to generate answer texts for the most relevant studies in the search results, with results restricted to WoS subscription years.

Dimensions Research GPT

Dimensions (Digital Science) manages research data that includes more than 70% of publications with full-text indexing, and has built one of the most comprehensive collections of grants, publications, patents, clinical trials, and datasets [14]. Since 2000, the Dimensions database has grown more rapidly than WoS and includes many smaller publishers and open access articles [16]. As a result, Dimensions Research GPT has access to a broader range of references, enabling it to generate summaries, insights, and citations based on scientific evidence, including datasets, and thus offer new perspectives.

Elicit Reports

Elicit (Ought Inc) relies on article abstracts from bibliographic database [17]. Elicit Reports (Elicit Systematic Review) feature generates answers based on highly-cited articles sourced from Semantic Scholar (Ai2), PubMed, arXiv, JAMA, and other platforms. Elicit Reports uses semantic search, focusing on search intent rather than simple keywords, and presents results as summaries of eight relevant articles selected from up to 500 retrieved results.

SciSpace Deep Review

SciSpace (Business Integra and d3i) is promoted as an all-in-one AI platform for students and researchers, offering functionalities such as PDF chatting and AI writing support. It allows users to create custom columns, with both free standard and premium Deep Review options, and produces reports based on 50 highly relevant papers. While SciSpace Deep Review features a well-organized table of contents compared to other SLR-AIs, its answers tend to be overly long and detailed, and their accuracy is sometimes uncertain.
Generative AI–based SLR-AIs

Deep Research

OpenAI, Perplexity, and Google have all released SLR-AIs called “Deep Research,” intensifying technological competition and, at times, causing confusion for users, even among LLM-oriented SLR-AIs. These tools summarize findings using logical reasoning and provide citations from both scholarly sources and publicly available web information to support up-to-date SLR. “Deep Research” tools reorganize, expand, modify, and adjust search terms in response to user prompts, but their repeatability, reproducibility, and result consistency for identical prompts tend to be weak. Nonetheless, the use of OpenAI’s Deep Research is strongly recommended [18]. Perplexity and Google’s “Deep Research” were omitted from the comparison because we did not find any significant differences with OpenAI’s “Deep Research.”

Felo Agent

Felo Agent (Felo Inc) is not technically an SLR-AI, as it is not built on a bibliographic database, but is included here for convenience. It searches across numerous academic publications, incorporating models such as DeepSeek R1, o4-mini (OpenAI), GPT-4o (OpenAI), and Claude 4.0 Sonnet (Anthropic), enabling users to easily select and utilize responses from multiple AIs with a single prompt. When Felo Agent provides integrated responses, its answer organization and PPT generation system are logical and well-structured, offering an agent-like SLR-AI experience. In my experience, Felo Agent was stronger at non–peer-reviewed, recent academic evidences than other SLR-AIs, but also more likely to present fictitious references.
Understand the accuracy and reliability of the SLR-AI training sources
With the emergence of bibliographic database–based AI, SLR-AIs now have the capability to generate the introduction section of a research article without human intervention. However, even the most accurate and reliable LLMs should function as helpful adjuncts to the SLR process, rather than as leading contributors [2]. Although bibliographic database–based SLR-AIs are highly accurate and reliable in terms of their learning sources, questions remain regarding whether the AI technologies used to generate answers are sufficiently robust. For SLR-AIs trained on internet-sourced information, there are concerns that the accuracy and reliability of the results cannot be guaranteed. When evaluating the quality of SLR-AIs, it is essential to consider not only the reliability of their training sources but also the frequent technical updates and rapid evolution of these systems, which can impact the consistency of their learning sources [19].
Be aware of possible plagiarism, copyright law, and research ethics violations
Although bibliographic database–based SLR-AIs claim to have been trained on the abstracts of all the articles within their databases, it remains unclear whether all abstracts have actually been used, including those with copyright restrictions [14]. This issue is even more pronounced for SLR-AIs from Clarivate and Digital Science, which are not themselves publishers and therefore face greater limitations in accessing copyrighted abstracts, compared to Elsevier’s Scopus AI [14]. Generative AI technologies used for manuscript writing, data analysis, peer review, and editorial activities can introduce a range of ethical concerns for authors, reviewers, and editors alike [17,20]. To avoid these issues, outputs generated by SLR-AIs should be thoroughly rewritten in human-understandable language and properly incorporated into manuscripts. Overreliance on AIs has already resulted in article retractions due to research ethics violations [21]. Therefore, it is crucial to disclose the use of SLR-AIs, such as by providing a dedicated methods section to increase transparency [17].
Weigh the usefulness of SLR-AI’s capabilities against potential hallucinations
The risk of hallucinations by SLR-AIs increases when they are tasked with new topics or fields that lack sufficient article coverage, leading to optimism for academic AI models trained on full-text and web-based information rather than just abstracts. Because of the inherent limitations of SLR-AIs, human intervention remains indispensable. SLR-AIs should ideally allow users to upload their own PDFs and interact directly with the results. To promote widespread adoption, it is necessary to thoroughly evaluate both the usefulness and potential hallucinations of SLR-AIs, as well as to support question-and-answer interactions in languages beyond English.
Keep an eye on new technologies such as RAG or AI agent
Frequent updates to incorporate new features can result in rapid obsolescence of SLR-AI models, imposing a continuous learning burden on users. After an initial SLR-AI run, it may become necessary to re-prompt for supplementary research as time advances. This does not mean relying on a single SLR-AI; rather, users should understand the strengths and weaknesses of various SLR-AIs and select the most appropriate tool according to the topic and research context. By leveraging advanced technologies such as RAG or AI agent systems, SLR-AI can become more accurate and widely used, ultimately enhancing research productivity [6]. For deep research tasks, agent-based AI (autonomous problem solving without user intervention) can even outperform RAG and prompt engineering-based approaches, so further development of agent-based SLR-AIs is anticipated. With new releases like OpenAI’s ChatGPT agent, which boasts deep research capabilities similar to Felo Agent, competition among SLR-AIs is intensifying.
A variety of SLR-AIs have emerged since ChatGPT, accelerating the processes of literature search and summarization. Bibliographic database–based SLR-AIs, in particular, are effective as starting points at the outset of research. They improve researchers’ understanding of new concepts, enable the collection of reliable articles, and generate summaries by automating keyword selection. However, they may produce incomplete answers in emerging fields. Nonbibliographic database–based SLR-AIs, on the other hand, often overlook key references and important findings. More SLR-AIs offer natural language summaries, visualized results, and suggested follow-up questions. The ongoing technological race in the SLR-AI landscape, marked by advances in LLM reasoning, agent technology, and RAG, remains fierce. Current SLR-AI technology cannot fully replace researchers but rather serves to augment human expertise. Human researchers provide critical analysis, uniquely identify trends, and offer perspectives on future research directions that go beyond mere summarization. Exclusive reliance on SLR-AIs does not foster the growth necessary to become an independent researcher. Based on my experience, it remains difficult to precisely determine the reliability of SLR-AIs. The future of SLR-AI should be grounded in trustworthy and efficient content, aided by responsible AI practices. This study suggests that careful selection of SLR-AIs can optimize the efficiency of literature reviews. Solutions such as Felo Agent, in conjunction with adherence to the four implications discussed above, represent a prudent choice. However, excessive reliance on SLR-AIs that draw from public sources poses immediate research ethics risks, making final, comprehensive human review essential. As more articles are published utilizing SLR-AI, increasing effort will be required to verify their authenticity, which risks becoming a societal burden.

Conflict of interest

No potential conflict of interest relevant to this article was reported.

Funding

The author received no financial support for this article.

Acknowledgments

Artificial intelligence was used for testing products, but not for writing articles.

Data availability

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

The author did not provide any supplementary materials for this article.
Table 1.
Comparison of the characteristics of the main SLR-AIs tested by the author or researched in literature
Product Learning source Default no. of articles RAG technology Advantage Disadvantage
Bibliographic database–based SLR-AI
 Scopus AI (Elsevier) Scopus database (2003 to present) 10, 30 Yes Vector and keyword search, citation tracking, deeper questions, visual maps Restricted to post-2003 articles, requirement for library database subscription, occasional hallucinations on niche queries
Training approximately 60 million article abstracts
 Research Assistant (Clarivate) WoS Core Collection database 8, 25, 50 Yes Semantic search, citation tracking, deeper questions, visual maps Restricted to subscribed WoS years, requirement for library database subscription, occasional hallucinations on niche queries
Training approximately 60 million article abstracts
 Dimensions Research GPT (Digital Science) Dimensions database - - Including grants and datasets Subscription required, lack of RAG
Training approximately 160 million publications
 Elicit Reports (Ought Inc.) Semantic Scholar (Ai2), PubMed, arXiv, user PDFs 8, 50 Yes Vector and keyword search, PRISMA workflows, supporting PDF uploads Limited customization for domain-specific queries
Training approximately 126 million publications
 SciSpace Deep Review (Business Integra and d3i) Semantic Scholar (Ai2), PubMed, open access repositories 5, 50 Yes Writing assistant, uploading to chat with PDF files Occasional inaccuracies in the summary, overlapping of the repository and the published article
Training approximately 285 million publications
Generative AI–based SLR-AI
 Deep Research (OpenAI) Web-scale corpora, iterative agentic search - Yes Deep chain-of-thought reasoning, iterative refinement Slower generation, expensive subscription fees
 Felo Agent (Felo Inc.) Web index of multiple AIs - Yes Using multiple AIs at one time, generating the table and PPT file Different answers depending on the selected AI

This table is based on personal opinions about the platforms tested by the author and researched in relevant literature. All listed products reportedly support non-English languages.

SLR, systematic literature review; AI, artificial intelligence; RAG, retrieval-augmented generation; WoS, Web of Science; PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses; PPT, PowerPoint (Microsoft Corp).

Figure & Data

References

    Citations

    Citations to this article as recorded by  

      Research trends and comparisons of major generative artificial intelligence platforms for systematic literature reviews
      Research trends and comparisons of major generative artificial intelligence platforms for systematic literature reviews
      Product Learning source Default no. of articles RAG technology Advantage Disadvantage
      Bibliographic database–based SLR-AI
       Scopus AI (Elsevier) Scopus database (2003 to present) 10, 30 Yes Vector and keyword search, citation tracking, deeper questions, visual maps Restricted to post-2003 articles, requirement for library database subscription, occasional hallucinations on niche queries
      Training approximately 60 million article abstracts
       Research Assistant (Clarivate) WoS Core Collection database 8, 25, 50 Yes Semantic search, citation tracking, deeper questions, visual maps Restricted to subscribed WoS years, requirement for library database subscription, occasional hallucinations on niche queries
      Training approximately 60 million article abstracts
       Dimensions Research GPT (Digital Science) Dimensions database - - Including grants and datasets Subscription required, lack of RAG
      Training approximately 160 million publications
       Elicit Reports (Ought Inc.) Semantic Scholar (Ai2), PubMed, arXiv, user PDFs 8, 50 Yes Vector and keyword search, PRISMA workflows, supporting PDF uploads Limited customization for domain-specific queries
      Training approximately 126 million publications
       SciSpace Deep Review (Business Integra and d3i) Semantic Scholar (Ai2), PubMed, open access repositories 5, 50 Yes Writing assistant, uploading to chat with PDF files Occasional inaccuracies in the summary, overlapping of the repository and the published article
      Training approximately 285 million publications
      Generative AI–based SLR-AI
       Deep Research (OpenAI) Web-scale corpora, iterative agentic search - Yes Deep chain-of-thought reasoning, iterative refinement Slower generation, expensive subscription fees
       Felo Agent (Felo Inc.) Web index of multiple AIs - Yes Using multiple AIs at one time, generating the table and PPT file Different answers depending on the selected AI
      Table 1. Comparison of the characteristics of the main SLR-AIs tested by the author or researched in literature

      This table is based on personal opinions about the platforms tested by the author and researched in relevant literature. All listed products reportedly support non-English languages.

      SLR, systematic literature review; AI, artificial intelligence; RAG, retrieval-augmented generation; WoS, Web of Science; PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses; PPT, PowerPoint (Microsoft Corp).


      Science Editing : Science Editing
      TOP