Recommended practices for supplemental data
Article information
Abstract
Since various forms of supplemental data (SD) have been introduced in academic publications, it has become necessary to establish guidelines to systematically process, indicate, and distribute such data. This material aims to help the science journals establish rational SD policies and guidelines and to ensure compliance with such policies and to manage them consistently. Generally, SD can be approached in a literal way by categorizing ‘appendices’ as ‘additional or separately added complementary materials’ and ‘supplements’ as ‘materials supplemental to the research in a comprehensive sense,’ rather than by viewing SD as an independent component of an article. The recommended practices of the National Information Standards Organization of USA advise the classification of SD into either ‘integral content’ or ‘additional content’ according to the content’s functional relationship to the associated article. If a public depository is used for SD, the author can ensure the perpetuity of data accessibility by assigning a digital object identifier. Science journals should adopt appropriate SD policies and describe them in detail in the instructions for authors to ensure consistent compliance with those policies. Additionally, they should be able to inspect and maintain links, repositories, and metadata associated with the SD for specific articles on an ongoing basis.
Introduction
In traditional academic publishing, the concept of supplemental data (SD) has historically referred only to non-critical collateral data posted in order to provide an additional reference or data including resources not presented in the main article due to technical reasons; such data are typically included at the end of the article in the form of an appendix. However, as the contents and format, including media, of materials used for academic publications—especially scientific publications—continue to undergo paradigm-shifting breakthroughs, SD has become massively diverse in content and format. Moreover, as the awareness of the importance of data sharing has developed, the inclusion of SD has been expanding as an axis of these data availability.
However, confusion has mounted among the various parties involved in academic publishing in Korea, including journal publishers and manuscript editors, as these parties lack appropriate guidelines on how to process, indicate, archive, and distribute SD.
In light of these circumstances, we suggested establishing recommended practices for the appropriate and uniform treatment of SD at a workshop for the holders of a Korea Manuscript Editor Certification, hosted by the Korean Council of Science Editors, on July 12, 2019, and lively discussions among the participants followed this presentation.
Hence, by summarizing and publishing our presentation in this article, we aim to inspire the construction of complete and systematic guidelines so that the editors of scholarly journals can establish a reasonable SD policy covering items such as the selection of SD, criteria for repository, and regulation of the review process. Editors should be able to explain this policy in detail in the context of the instructions for authors to ensure consistent compliance and to effectively maintain related repositories and metadata.
Within this article, the definition of SD does not include supplementary databases on the level of journals or books, but rather specifically refers to additional content (AC) within an individual article. We focused on the processing of SD in a web-based environment, rather than on traditional print or PDF publications.
Concept of SD
With the recent heightened expectation that data sharing will advance science and improve the reproducibility of research, the usage of SD also has been greatly expanding. According to this modern understanding, data sharing should not only enable communications between colleagues but should also ensure unrestricted access to research-related information through archiving in a web-based platform. A data sharing policy should adopt the level of data sharing policy, select or recommend the appropriate repositories for each field, evince the data availability in the instructions for authors, and identify how to mention or cite these datasets in the body of an article. Establishing and implementing a proper data sharing policy is becoming not just a recommended measure, but a necessary step for academic journals. Over time, we have seen progressively more cases of famous and influential journals (or publishers) with higher impact factors establishing data sharing policies and requiring submissions to include research data in the form of SD or supplemental uniform resource locators (URLs). As an example of a Korean journal, the Journal of Educational Evaluation for Health Professions (JEEHP) delineates its data sharing policy in detail within its instructions for authors (https://www.jeehp.org/authors/authors.php) [1].
As previously mentioned, the conventional definition of SD is facing dramatic changes in response to the changing environment. Thus, we evaluated the guidelines provided for SD by major science editing committees.
Currently, many influential editing manuals do not provide detailed policies regarding SD beyond brief remarks. Thus, future enhancement of these guidelines is needed. The American Medical Association (AMA) [2] advises authors to post supplementary materials only if they are necessary. The AMA guidelines suggest that supplementary materials shall be indicated in the main text with a notation in the format of ‘Appendix 1’ and that such materials should be placed in front of the reference list at the end of an article. Additionally, the AMA recommends utilizing online-only materials for resources too onerous to be included in a print-version article for technical reasons; The AMA guidelines recommend providing a separate designation such as ‘eTable 1’ and providing readers with instructions on how to access these resources. The AMA guidelines also require that any supplementary materials undergo peer review and editorial processes similar to those undergone by the article itself. The guidelines of the American Psychological Association (APA) [3] present two different types of supplementary materials. If resources can be presented in print form because they are relatively short in length, they are to be classified as appendices; if resources are web-based and difficult to present in print form, they should be categorized as supplements. APA guidelines suggest that the former must be indicated as an ‘Appendix (if a single piece of material)’ or in the format of ‘Appendix A (if multiple materials are presented)’ in the main body. For web-based supplements, the content should be subjected to the peer review process; however, the materials are not required to be processed further to align with the format of the journal. The Council of Science Editors (CSE) [4] guide that materials that may obstruct the flow of context due to excessive length or detail should be added as appendices, but does not recommend publishing of such materials. The CSE guidelines provide only simple instructions to indicate these materials using the format of ‘Appendix’ or ‘Appendix 1’ if one or multiple appendices are included, respectively. Notably, CSE guidelines require appendices to be published online only.
The guidelines of the American Physiological Society [5] broadly categorize SD into three types: 1) SD that do not represent significant findings but can be used to assist in understanding an article; 2) source data, which corroborate the research findings; and 3) supporting information, which is provided only for peer review.
Based on our examination of various guidelines concerning SD, we believe that most scientific communities do not treat SD as an independent part of an article, but rather treat SD in a literal sense, by categorizing an appendix as ‘additional or separately added complementary materials’ and a supplement as ‘materials supplemental to the research in a comprehensive sense.’ We can conclude that there is a current lack of international standards for the processing of SD, as the existing standards differ enormously across disciplines and among individual journals.
Standard Recommendations for American STM journals
Like Korean scholarly communities, international journals have experienced confusion and difficulties in processing SD. Early on, many involved parties of American academic journals grappled with the problem of only a superficial review or even an omission of the review process for a large proportion of SD produced in online publishing environments, the treatment of citation within SD, and SD processing methods, as these often are not standardized even within an individual journal. One might argue that the excessive and largely unregulated use of SD may be harmful to science [6]. In response to such confusion, the National Information Standards Organization (NISO) and the National Federation of Advanced Information Services of the United States organized a working group with the goal of developing recommendations to standardize SD policies. First officially convened in August 2010, the working group published standard recommendations in 2013 (Recommended Practices for Online Supplemental Journal Article Materials, https://www.niso.org/publications/niso-rp-15-2013-recommended-practices-online-supplemental-journal-article-materials; Suppl 1). NISO’s Recommended Practices (RPs) are intended to provide publishers and editors with an international standard for general processing guidelines for SD so that they can guide authors and peer reviewers through the process. The NISO RPs are consequential guidelines and could be considered as part of the establishment of SD policies in Korean journals.
The NISO RPs consist of two parts. Part A contains practices recommended by the Business Working Group concerning selecting, editing, managing and hosting materials, and ensuring the discoverability of data. Part A also contains information about citation, retention of links, provision of acceptable metadata and context, archiving, and copyright management, as well as roles and responsibilities for various parties as related to SD. Part B contains recommendations made by the Technical Working Group and covers the more technical aspects of Part A, such as providing metadata, assigning persistent identifiers (ID), archiving, and packaging and exchange.
The NISO RPs promote the categorization of SD into integral content (IC) and AC according to the full functional relationship between the SD and the article. IC refers to materials that are essential for understanding the research, but that is included in the article as SD for technical, business, or logistical reasons. AC refers to materials that are considered non-essential but are included in the article as SD because they may be useful for understanding the article contents. The NISO RPs posit that the proportion of IC will become smaller as technology advances. The RPs also recommend slightly different processing procedures between IC and AC with regard to curation, hosting, and selection (including editorial evaluation), editing, provision of reference information for other publications, citation within SD, and preservation of the SD. The relevant content is briefly summarized in Table 1. Since IC and AC may be added either or both throughout the entire lifecycle of a scientific article, each journal can adopt a flexible definition of document type. RPs recommend that detailed context be provided as metadata for SD files, which may include a persistent ID for part of or the entire set of SD files (or objects); a persistent ID for the article; information regarding the relationship type of the SD to the article (for IC, AC, or both); a description of the SD, such as a title or summary; the file name of, or an external link to, SD files; or the format of the SD file. The guidelines also suggest recording bibliographic information in the SD file and assigning a direct bidirectional link (for internal or external archiving) as well as an independent public ID, such as a digital object identifier (DOI), DataCite DOI, or Protein Data Bank ID.
The NISO RPs’ two-pronged approach to SD processing depending on the content of SD provides reasonable alternatives to the points of confusion that processing SD uniformly may cause. However, it must be considered that categorizing SD, as advised by the NISO RPs, may be associated with difficulties in determining each type of content.
Making the SD
While SD can take different forms depending on the journal in which they are published, readers must be able to easily recognize the presence of SD, and SD must be identifiable and extractable for internal or external index databases or archives and must be consistent throughout all articles published by a single journal.
While SD can be given a variety of names—such as Supplementary Material, Supplemental Material, or Supplementary Information—each individual journal should impose a consistent name used for all articles published in the journal. Like figures or tables, SD should be numbered within an individual article in order, and the SD itself should refer to the article it belongs to in a consistent manner. Many journals in Korean and international scientific communities utilize different SD policies; inserting pointers to SD within the main body, placing them as the separate element with links to SD in front of the reference list at the end of the article, or adding captions indicating the existence of SD on the title page. Journals may also choose two or more of the above methods. Regardless of the particular policy, it is crucial to place pointers in a consistent location within a journal, so that personnel does not omit them throughout the distribution chain and so that readers may easily spot them.
The NISO RPs state that the choice of whether to process SD to align with the house style of a journal or to publish SD in the original format can depend on the nature of the SD (IC or AC).
Each journal employs a different process to convert its articles into an offline version (e.g., PDF format) and an online version (e.g., XML format). When processing, SD should be handled with awareness of any discrepancies or omissions that may arise from creating two versions (online and offline). To avoid the discrepancies in SD due to dualization of the online and PDF versions, Nature Medicine makes the SD for an article accessible on the journal’s website by providing DOIs presented in the original article within the PDF provided in the website; and the source of the SD is mentioned as “Refer to the web version on PubMed Central for supplementary material” within the PDF provided via PubMed Central. Moreover, PubMed Central assigns a direct URL to each piece of SD included in each article, so the readers can access each SD set directly (e.g., https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6039259/).
Genes places SD at the end of an article, in front of the reference list, to provide a URL and metadata and to allow SD files to be downloaded directly to the viewing device once a URL is clicked (e.g., https://www.mdpi.com/2073-4425/7/12/108). Administrative Science Quarterly, published by SAGE, references SD with a URL within the main body. When the URL is clicked, the reader is redirected to a supplementary repository of the journal to review previews and metadata. Then, the reader may download the SD file or inspect the data within ‘figshare,’ an archiving repository (e.g., https://journals.sagepub.com/doi/10.1177/0001839219855033). The Journal of Applied Laboratory Medicine also references SD within the main body of an article using a URL. However, the web versions of articles provide supplementary files with an individual URL for each file; when a reader clicks on an SD URL within the main body, the link redirects the reader to the table of contents of the issue of the journal, which seems inconvenient (e.g., https://academic.oup.com/jalm/article/2/2/244/5587484).
Moving on to examples of journals published by Korean scientific societies, JEEHP presents SD at the end of the main body of an article, in front of the reference list. The SD section contains the URL and metadata regarding SD included in the article. When the reader clicks the URL, they are redirected to the Harvard Dataverse, a public repository, to download the SD files (e.g., https://doi.org/10.7910/DVN/T6WC1T). Korean editors should establish SD identification policies that fit the requirements of each journal, considering situations such as those presented in this article, as well as the efficiency and convenience of the readers.
SD and Citation
SD can be important scholarly documents or resources on their own merit. Therefore, each journal must provide appropriate guidelines to ensure that SD are actively cited and reasonably evaluated. Although only a small number of institutions have established policies or guidelines related to citations of SD, there are some examples of guidelines regarding publication and citation (Table 2) that the Science Journal Group [7], the Nature Journal Group [8], and the American Geophysical Union [9] have adopted, as well as sample articles that include supplementary content in published materials (e.g., https://science.sciencemag.org/content/363/6424/276, https://www.nature.com/articles/s41467-019-10549-7#Sec19, and https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2019EF001190). The NISO RPs also describe the citation of SD in relatively great detail (Table 1). However, even with guidelines in place, maintaining the integrity of these guidelines is a difficult task, as there are instances where a publication fails to comply with the standards accurately and consistently.
Apart from the citations of the SD themselves, if references are cited only within the SD but not in the main body of an article, processing such references is not a simple task. Whether to include such references in the reference list is an issue that extends beyond the concern of exceeding limits on the number of references. If a citation cannot be traced, this could raise an issue of fairness, as the reference would not be counted towards the impact factor. Conversely, if less important references are listed similarly to more important references of the main body, this could cause citation exaggeration, distortion of significance, and bias in the analysis [6]. Hence, we must find appropriate methods to thoroughly indicate and process references included in SD, as outlined in the following four examples.
The citation is indicated within the main body and included in the reference list
One potential method to reduce distortion or confusion is to indicate the citation number of a reference cited within the SD within an SD pointer in the main body, as well as including the reference in the main body’s reference list; an example of this format is as follows.
CompPASS employs a database of interacting proteins (including data for all baits reported here and an additional 102 unrelated proteins)9 and three specificity metrics (WDN-Score, an associated p-value, and Zscore) to identify HCIPs (Methods and Supplementary Fig. S2A–D10-13).
However, this method may be troublesome to employ in journals that have not adopted a numeral citation system, do not place SD pointers within the main body, or set limits on the number of references. Moreover, if SD is classified as AC and the references cited in the supplemental content are treated similarly to the references cited in the main body, this could cause citation exaggeration.
A notice is posted that references are included in the SD in a separate SD pointer section, without marking the citations in the main body
The second method is to post a notice that SD contain references within the separately inserted SD pointer at the end of an article without identifying the citations for references of the SD in the main body. In this case, the references within the SD can be listed after the reference list of the main body. However, this method is also subject to the issues of reference limitation and citation exaggeration mentioned above for method 1.
Supplementary Materials
eTable A can be found in the online version of this article with 11 references.
Citations are not included in the main body, only in the SD
Even if a journal includes a separate reference list only for SD, while not indicating citations for the references within SD at the main body, the following concerns should be considered. First, should the same reference be allowed to be cited in the main body and the SD? Second, if a journal adopts a numeral citation system, how should the references in the SD be numbered (as in, should counting continue after the last number of the main-body reference list, or should a separate numbering system be started for the SD)? Third, if a significant reference is cited only in the SD, the literature may be undervalued in impact factor calculations.
Differentiated treatment according to the properties of SD
Finally, the treatment of SD can be differentiated between IC and AC, in reference to the NISO RPs. If references are included in the SD classified as IC, the citations of them should be noted in the main body and included in the reference list. If references are included in the SD classified as AC, the references should be noted separately within the SD or briefly pointed out in the main body as footnotes or other acceptable forms. While this is an elegant way to address the variety of issues noted above, the downside is an increased workload due to the necessity of determining the category under which each piece of SD falls.
Management of SD
Authors provide SD in various forms to improve the quality of their publications and to help readers better understand their research. However, on some occasions, SD may be omitted in the distribution chain of an article. One noteworthy example of such omission is the interlibrary loan process. If SD pointers are difficult to identify, librarians tend to proceed with the interlibrary loan without recognizing the existence of the SD. Furthermore, some journals do not provide separate download links for certain forms of SD (e.g., a video clip in an article; https://doi.org/10.1007/s00464-019-06849-0), and these SD are only accessible through right-clicking the original link in the publication. The proper distribution of the article may be thwarted by the unclear demarcation of SD pointers. In light of these concerns, each publisher should establish systematic policy measures to ensure the efficient distribution and management of SD. The NISO RPs also briefly introduce suggested methods of SD distribution; since a scholarly publication consists of the article, the SD, and metadata, the best method of distribution is the utilization of standardized webbased repository and packaging formats designed to enable the transmission of digital content. The NISO RPs particularly emphasize that such a distribution system is essential in interlibrary loan processes.
In addition to proper management, another responsibility of publishers is to properly archive SD. For the purpose of preservation, publishers may host their own archives or outsource specific archiving repositories. They also may only guide the criteria for acceptable repositories. Those criteria for an external repository may include public access, the capability of permanent preservation, and the availability of direct bidirectional links on the level of an individual dataset. Especially about IC SD, the NISO RPs suggest that it must undergo an archiving process similar to that used for the main article. They also recommend that publishers provide authors with a list of trusted repositories, then request resources from the listed repositories while storing multiple copies in several repositories to ensure the safety of the data. However, in practice, many journals [5,10-12] appear to inform authors that SD are to be archived in an appropriate repository and then link to that repository without classifying the SD into IC or AC (Table 3).
Various Practices of SD Management
As mentioned above, procedures for processing SD widely differ across disciplines and among journals, and multiple terms exist for SD in general. A few examples of SD practices are presented in Figs. 1-5. One article in Advances in Physiology indicated in the main body that informed consent forms, quizzes, and booklets used in research are archived in a repository, and we can confirm that these resources are deposited in figshare as SD (Fig. 1). The Journal of the American Heart Association includes all search strategies used in each database for literature reviews and references used in metaanalyses as SD (Fig. 2). Circulationpresents all members of the group identified as an author as appendix (Fig. 3); if an article has the group author, including the members’ names as SD is one potential alternative to listing every author on the title page. Nutrients provides a direct link to a repository hosted by the journal instead of archiving SD in a public repository (Fig. 4). On the other hand, JEEHP archives SD, such as raw data, in the Harvard Dataverse, a public repository (Fig. 5). Finally, The Korean Journal of Physiology & Pharmacology provides direct links to a server managed by its publishing company, which is rare for a Korean journal. However, this practice is not recommended, as these private servers are not as stable as public repositories. In such cases, publishers need to consider a backup repository due to this inherent instability. If each publisher adopts a method that fits their situation while considering the research conducted in the context of a variety of precedents, we believe that more systematic management of SD will be possible.
Conclusion
We have analyzed guidelines related to SD in scientific publications and actual practices in various journals. However, there are limitations to the establishment of concise and firm guidelines that can be applied to SD processing in the publication of Korean STM journals. Therefore, more case studies and discussions of Korean publishers must follow.
However, we believe that this article has fulfilled its purpose if it helps enhance understanding of best practices for SD and to form a subsequent consensus on editors’ roles and responsibilities. First, scientific journal editors should adopt a reasonable SD policy considering the characteristics of the journal, the convenience of authors and readers, and the accessibility of online and offline publishing. Moreover, the details of the SD policy—including the selection of supplemental materials, regulations on formats for each type of SD, media, handling IDs and URLs, repository standards, and review standards—must be clearly posted in the instructions for authors. Second, editors must ensure consistent compliance with these established SD processing methods. Third, links, repositories, and metadata related to the published article should be regularly inspected and maintained, and the rules should be updated on an ongoing basis in search for a more sustainable and effective way to align with the shifting landscape of the web environment.
Furthermore, we must continue to emphasize that the publication of scientific articles is transforming from a linear form centered around the print version into an online-based multilateral form. Thus, in response to this ever-changing environment, we believe that it is especially important to pursue flexibility in a way that is grounded in fundamental principles, rather than through strict regulations.
Notes
Conflict of Interest
No potential conflict of interest relevant to this article was reported.
Acknowledgements
The authors would like to express our sincere gratitude to Sun-Im Ryu, Se Jueng Kim, and Ji Hi Jung for investigating, analyzing, and organizing the various ways in which SD are presented in the many journals included this publication.
Supplementary Material
Supplementary file is available from https://doi.org/10.6087/kcse.200