Open-source code to convert Journal Article Tag Suite Extensible Markup Language (JATS XML) to various viewers and other XML types for scholarly journal publishing
Article information
Abstract
There are many ways to use open source code to implement digital standards for scholarly journal publishing. However, providing digital services using open-source code can be a challenge, especially for small and local academic society journals. This paper provides some critical examples of using some of the many open-source code resources available to the public. Journal Article Tag Suite (JATS) Extensible Markup Language (XML) has been established as an essential tool, and is now used by most journals for digital publication. JATS XML can be converted to other viewer formats, including Extensible Hypertext Markup Language, PubReader, and EPUB 3.0. It can also be used to create dynamic interactive PDFs. It can be converted to other XMLs, incluing Crossref XML, PubMed XML, and DOAJ XML. Open-source code published on GitHub, National Information Standards Organization, and the US National Library of Medicine can be used for Crossref XML deposition for digital object identifier and Crossmark stamp registration. These examples of open-source code need to be implemented on journal websites to provide local academic journal publishers with various critical functions. This paper provides instructions on the best ways to realize these digital standards so that journal content can be provided to readers in a more friendly and effective way.
Introduction
Background
Open-source code is widely used in scholarly journal publishing in all fields, including in scientific society journals. The biggest changes in scholarly journal publishing over the past decade have been the change from analog publishing to digital and the expansion of open access [1]. As part of the evolution of open access in the digital era, journal publishing services using open-source code have improved considerably. While it would be ideal for local academic society journal publishers to develop their own technology with abundant funding, the way international commercial publishing companies do, the lack of available experts and technical limitations generally make open-source code solutions more practical. Just as with the open-science movement, the open-source movement is actively underway. The movement of constructing a global public indexing database is suggested, which would include more scholarly journals than those in the commercial databases. If many academic society journal publishers can produce the full-text Journal Article Tag Suite (JATS) Extensible Markup Language (XML) files and deposit those files to the suggested database, this movement can be realized quickly [2].
Objectives
This paper shows how to implement digital standards in journal publishing with several examples of open-source code applications. Specifically, examples of JATS XML conversion to various viewers such as Extensible Hypertext Markup Language (XHTML), PubReader (National Library of Medicine, Bethesda, MD, USA; https://www.ncbi.nlm.nih.gov/pmc/about/pubreader/), EPUB 3.0 (International Digital Publishing Forum; https://idpf.org/epub/30/), and interactive PDFs are described, as well as conversion from JATS XML to Crossref XML, PubMed XML, and Directory of Open Access Journals (DOAJ) XML, with automatic deposition of Crossref XML and insertion of Crossmark stamp.
JATS XML as a Key Format in Scholarly Journal Publishing
JATS provides a common XML format in which publishers and archives can exchange journal content. It provides a set of XML elements and attributes for describing the textual and graphical content of journal articles as well as some non-article material such as letters, editorials, and book and product reviews [3]. The conversion of JATS XML for various viewers is required to provide access to the content to readers of the journal. Conversions to XHTML, PubReader, EPUB 3.0, and PDF are basic utilities for journal publishing. JATS XML is a standard document in scholarly journal publishing adopted by American National Standards Institute and National Information Standards Organization and its code is open for anyone to use.
JATS XML to XHTML for Viewing in Browsers
Since JATS XML has only elements and attributes of XML, it must be converted to XHTML to display on a website. Fig. 1 describes the relationship between Cascading Style Sheet (CSS) and Extensible Stylesheet Language (XSL) and how they specify the overall layout of web documents when they are displayed in XHTML converted from JATS XML-written articles.
The open-source preview XSL Transformation (XSLT) stylesheets required to convert JATS XML to XHTML are available from NCBI GitHub (https://github.com/ncbi/JATSPreviewStylesheets). The “jats-html.xsl” document converts to html, which, when combined with a CSS file that specifies the layout of the web document, can be displayed as a dynamic web document. It can be declared as shown below.
Example process of converting kcse-205.xml to XHTML (Suppl. 1)
Step 1.
<?xml version=”1.0” encoding=”UTF-8”?>
// At the declaration of “kcse-205.xml”, add and call the “jats-html.xsl” as below. CSS styles defined in “jats-preview.css” is included in jats-html.xsl.
<?xml-stylesheet type=”text/xsl” href=”jats-html.xsl”?>
< !DOCTYPE article PUBLIC “-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN” “http://jats.nlm.nih.gov/publishing/1.0/JATS-journalpublishing1.dtd”>
< article article-type=”research-article” dtd-version=”1.0” xml:lang=”en” xmlns:mml=”http://www.w3.org/1998/Math/MathML” xmlns:xlink=”http://www.w3.org/1999/xlink” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”>
…………………..,
Step 2.
Fig. 2 is the result of browsing “kcse-205.xml” with Firefox Browser ver. 102.0.1. Fig. 3 presents a screen view of “kcse-205.xml” that includes font size, font style, font color, line breaks, and other elements in the CSS file.
The resulting output becomes visible on the website as XHTML.
JATS XML to PubReader
PubReader is another, more reader-friendly web presentation option for literature in Pubmed Central (PMC) and Bookshelf. Designed especially for enhanced readability on tablets and other small screen devices, PubReader can also be used on desktops and laptops and with multiple web browsers [4]. The open-source code and process for converting JATS XML documents to PubReader are available from NCBI GitHub (https://github.com/ncbi/PubReader). This conversion method generates HTML files for PubReader from JATS XML. Alternatively, for smoother continuous maintenance of published articles, XSLT is also available using the PubReader open-source code. XSLT is a language for converting XML documents into other formats, such as other XML documents or HTML for web pages. The following is an example of code that converts the JATS XML document to PubReader format using an XSLT processor, which must be installed on the application server.
Two ways to produce PubReader files:
The first method is to compile XML file to produce HTML using “saxon9he.jar” as below:
>> java -jar saxon9he.jar -xsl:/xsl/test-page.xsl -s: /xml/kcse-205.xml > kcse-205.html
The second method is to use XSLTProcessor. XSLTProcessor applies an XSLT stylesheet transformation to an XML document to produce a new XML document as an output.
After installing XSLTProcessor, the xml file is converted to PubReader with the following function. There was no creation of HTML file:
$xsl_file = /xsl/ test-page.xsl”;
$xsl = new DomDocument();
$xsl->load($xsl_file);
$processor = new xsltprocessor();
$processor->importStyleSheet($xsl);
$ html = $processor->transformToXML(/xml/kcse-205.xml);
JATS XML to EPUB 3.0
EPUB is an open, free electronic book viewer format established by the International Digital Publishing Forum as a standard for eBooks worldwide. EPUB files have automatic space adjustments that allow them to deliver content optimized for various device characteristics, and can be read through eBook viewers that support the EPUB format [5]. The file has a “.epub” extension and can be considered as a single compressed file. Fig. 4 explains the components of EPUB 3.0 after decompression, which renames the extension of the file to a “.zip” file. The open-source code for converting JATS XML to EPUB is available from the GitHub account of the World Wide Web Consortium (W3C; https://github.com/w3c/epub-specs).
There are three essential files in the “kcse-205-epub” folder in the Supplement, “/OPS/epub.ncx,” “/OPS/epub.opf,” and “se-205.xml.” The conversion process should be guided by appropriately trained computer engineers.
JATS XML to PDF
To convert JATS XML to PDF, an Extensible Stylesheet Language Formatting Objects (XSL-FO) document should first be created. XSL-FO is an XML-based markup language for output to various media including paper screens. The process for producing an FO document is explained in Fig. 5. An FO document is generated by combining an XML file and an FOXSLT file. The JATS XML can be converted to the final PDF by the generated XSL-FO document, using various easily available formatters. Open-source code that can generate XSL-FO documents from JATS XML is also available on the NCBI GitHub site (https://github.com/ncbi/JATSPreviewStylesheets/blob/master/xslt/main/jats-xslfo.xsl or https://github.com/ncbi/JATSPreviewStylesheets/tree/master/shells/saxon/). The conversion method is similar to the conversion of JATS XML to XHML. Note that if the FO document was generated using the above method, it the PDF will be generated in A4 size, as shown below.
<?xml version=”1.0” encoding=”UTF-8”?>
< fo:root xmlns:fo=”http://www.w3.org/1999/XSL/Format”>
<fo:layout-master-set>
< fo:simple-page-master margin-right=”0.5in” margin-left=”0.5in” margin-bottom=”0.5in” margin-top=“0.5in” page-width=”8.267717in” page-height=”11.023622in” master-name=”cover”>
< fo:region-body margin-right=”0in” margin-left=”0in” margin-bottom=”0.5in” margin-top=”24pt” region-name=”body”/>
</fo:simple-page-master>
The generated FO document can be easily converted to a PDF using a publicly available formatter. One of these is the Apache FOP Project (https://xmlgraphics.apache.org/fop/fo.html). PDFs can also be generated using paid programs.
An example of converting an XML file to PDF is presented below.
Example of generating “kcse-205.pdf” from “kcse-205.xml”
Step 1. Compile “kcse-205.xml” using saxon9.jar to produce “kcse-2005.fo” under the JAVA environment
>> java -jar /saxon/saxon9.jar -o:/pdf/kcse-2005.fo’ -s:/xml/kcse-205.xml. ‘ -xsl:/ shells/saxon/ jats-PMCcit-print-fo.xsl
Step 2. Format “kcse-205.fo” with “Antenna House AH Formatter” to produce “kcse-205.pdf.”
>> sh /Formatter/AHFormatterV62_64/run.sh -d /xml2pdf/fo/kcse-205.fo -o /xml2pdf/pdf/kcse-205.pdf -x 4 -i /xml2pdf/config.xml
JATS XML to Crossref XML, PubMed XML, and DOAJ XML
Open-source options are available for sending article metadata to several indexing databases. JATS XML can be converted to other XML formats to deposit metadata XML files to Crossref, PubMed, and DOAJ using open-source code. Receiving and sending metadata in XML is currently the most efficient and convenient way to send article metadata to the indexing databases. Crossref XML, PubMed XML, and JATS XML are the best formats to deposit into Crossref, PubMed, and DOAJ, respectively. The open-source for converting JATS XML into Crossref XML and PubMed XML is available at the NCBI GitHub account (https://github.com/ncbi/PMCXMLConverters). Since JATS XML is an extension of PMC XML, it can be easily converted using two open sources, “pmc2pubmed.xsl” and “pmc2crossref.xsl.” DOAJ provides schema and sample xml files for DOAJ XML (https://doaj.org/docs/xml/#the-doajarticlesxsd-schema-file), allowing JATS XML to be converted DOAJ XML using the schema. Below is the example of converting JATS XML to Crossref XML, PubMed XML, and DOAJ XML.
Declaration of DTD of “kcse-205.xml” is required as below:
<?xml version=”1.0” encoding=”utf-8”?>
< !DOCTYPE article PUBLIC “-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN” “http://jats.nlm.nih.gov/publishing/1.0/JATS-journalpublishing1.dtd”>
< article article-type=”research-article” dtd-version=”1.0” xml:lang=”en” xmlns:mml=”http://www.w3.org/1998/Math/MathML” xmlns:xlink=”http://www.w3.org/1999/xlink” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”>
……………,
1. Conversion of JATS XML to Crossref XML
Compile “kcse-205.xml” using saxon9.jar to produce “crossref-kcse-205.xml” under the JAVA environment
>> java -jar saxon9he.jar -xsl:/xsl/pmc2crossref.xsl -s:/xml/kcse-205.xml > /crossref/crossref-kcse-205.xml
2. Conversion of JATS XML to PubMed XML
>> java -jar saxon9he.jar -xsl:/xsl/pmc2pubmed.xsl -s:/xml/kcse-205.xml > /pubmed/pubmed-kcse-205.xml
3. Conversion of JATS XML to DOAJ XML
Example DOAJ XML file and DOAJ XML schema file are provided at the DOAJ website, https://doaj.org/docs/xml/.
A JATS XML file can be converted to DOAJ XML using an XPath query as below:
$ publisher-name = $xml->xpath(“/article/front/journal-meta/publisher/publisher-name”);
$ journal_p = $xml->xpath(“/article/front/journal-meta/journal-p-group/journal-p”);
$ issn = $xml->xpath(“/article/front/journal-meta/issn[@pub-type=’ppub’]”);
$ eissn = $xml->xpath(“/article/front/journal-meta/issn[@pub-type=’epub’]”);
The converted DOAJ XML is as follows:
<records>
<record>
<language>eng</language>
< publisher>Korean Council of Science Editors
</publisher>
<journalTitle>Science Editing</journalTitle>
<issn>2288-8063</issn>
<eissn>2288-7474</eissn>
.....,
</record>
</records>
Automatic Deposition of Crossref XML and Crossmark Stamp
Since the Crossref service is now essential in scholarly journal publishing, most journals grant digital object identifiers (DOIs). To facilitate Crossref XML deposition for DOI and quickly put Crossmark images into PDFs, a Crossref PDF stamp can be downloaded from the GitHub Crossref account (https://github.com/CrossRef/pdfstamp). Using pdfstamp.jar, the Crossmark stamp can be inserted in a PDF using the following commands.
In the command line, DPI is set as 250; Crossmark logo file is “CROSSMARK_Color_square_108.png”; and, XY axes are set as Position X: 520, Position Y: 75.
Below is the example JAVA compile:
>> java -jar./pdfstamp.jar -d 250 -I /img/CROSSMARK_Color_square_108.png -l 520,756 -o /pdfstamp/ -u https://crossmark.crossref.org/dialog/?doi=10.6087/kcse.205&domain=pdf -p 1 /pdf/kcse-205.pdf
Fig. 6 is a simple example of how to load a PDF file from a website, specify the X and Y coordinates, load the Crossmark logo file, and insert the Crossmark logo using the above command.
There are various ways to deposit Crossref XML, including web deposit and logging in to https://doi.crossref.org with a Crossref account. As it can be cumbersome to log in and deposit every time, however, Crossref provides a tool to automatically deposit from the user’s server (https://www.crossref.org/documentation/content-registration/direct-deposit-xml/https-post-using-java-program/). The “crossref-upload-tool.jar” file can be downloaded from the above URL and installed on the user’s server to easily deposit Crossref XML. The following example commands can be used for deposition.
Direct deposit of a Crossref XML file is possible with the following command.
>> java -jar doUpload.jar -u {Crossref_username} -p {Crossref password} -f /xml/crossref-kcse-205.xml
The files used for converting the JATS XML to various viewers and other XML types for scholarly publishing were available from Suppl. 1.
Conclusion
By creating JATS XML documents for articles, the digital publication of academic journals can be improved using the various examples of open-source code discussed above. JATS XML files are, therefore, a core element of online journal publication, and JATS XML creation is a necessary step toward building more digital services for the publication of academic journals,. Using the open-source methods discussed above will aid the publishing of scholarly journals online by creating a simple publishing process and various digital services. In the future, new open-source applications based on the presently available code will likely be used more frequently by volunteers and other interested parties.
Notes
Conflict of Interest
No potential conflict of interest relevant to this article was reported.
Funding
The author received no financial support for this article.
Supplementary Material
Supplementary file is available from: https://doi.org/10.7910/DVN/S1BHP0