On the Marginal Cost of Scholarly Communication

Authors

  1. Tiffany Bogicha
  2. Sébastien Ballesterosa
  3. Robin Berjona

Affiliation

  1. a science.ai, Standard AnalyticsNew York, NY, USA

License

CC-BY-4.0

Abstract

We assessed the marginal cost of scholarly communication from the perspective of an agent looking to start an independent, peer-reviewed scholarly journal. We found that various vendors can accommodate all of the services required for scholarly communication for a price ranging between $69 and $318 per article. In contrast, if an agent had access to software solutions replacing the services provided by vendors, the marginal cost of scholarly communication would be reduced to the cloud infrastructure cost alone and drop to between $1.36 and $1.61 per article. Incidentally, DOI registration alone accounts for between 82% and 98% of this cost. While vendor cost typically decreases with higher volume, new offerings in cloud computing exhibit the opposite trend, challenging the notion that large volume publishers benefit from economies of scales as compared to smaller publishers. Given the current lack of software solutions fulfilling the functions of scholarly communication, we conclude that the development of high quality “plug-and-play” open source software solutions would have a significant impact in reducing the marginal cost of scholarly communication, making it more open to experimentation and innovation.

Introduction

According to the Association for Research Libraries (Association of Research Libraries 2014), scholarly communication is the system of creating, evaluating via peer review, disseminating, and preserving scholarly work. While the dissemination of scholarly work was originally facilitated by the printing press, today, nearly all scholarly work is disseminated on the Web (Ware and Mabe 2015).

In this article, we assess the marginal cost of scholarly communication from the perspective of an agent looking to start an independent, peer-reviewed, web-based scholarly journal. The marginal cost only takes into account the cost of producing one additional scholarly article, therefore excluding fixed costs related to normal business operations.

We contrast the marginal costs of two extreme approaches:

  • One where all services are provided by various vendors (excluding any setup, customization, or service fees).

  • One where an end-to-end in-house technological solution has been developed, and therefore requires only cloud infrastructure costs (excluding the cost of technology development).

Methods

Vendor Costs

We searched the Web to identify vendor pricing data from vendor websites as well as other sources such as blog posts, news articles, and conference proceedings. If pricing data could not be found on a given vendor’s website or in secondary sources, we emailed the vendor to request pricing data directly (Figure 1).

Figure 1

Vendor cost data collection methodology with the number of vendors n by outcome given for each possible outcome. A list of vendors is provided in Supporting Table 1.

We defined the minimum requirements for scholarly communication as: 1) submission, 2) management of editorial workflow and peer review, 3) typesetting, 4) DOI registration, and 5) long-term preservation. We then matched vendor capabilities to each of these five requirements using a combination of vendor feature lists and white papers from vendor websites (Supporting Table 1). We consolidated submission and management of editorial workflow and peer review into one category for the purposes of calculating cost, as these services are typically offered together by a single vendor. For typesetting, we included document formatting and styling, document conversion, and the extraction and disambiguation of article and resource metadata (e.g. citations, funding sources, and affiliation data). DOI registration and preservation were included as their own categories.

Submission and Management of Editorial Workflow and Peer Review

We obtained pricing data on submission and management of editorial workflow and peer review for 3 vendors (1 email response, 2 from secondary sources). We calculated submission on a per published article basis and assumed an average 50% acceptance rate (Ware and Mabe 2015) for submitted articles. Aries Editorial Manager (A. O’Connell 2015, pers. comm., 26 Oct) and ScholarOne Manuscript Central (Casler and Byron 2009) pricing data were originally reported on a per submission basis. Therefore, the per article cost for these two vendors was calculated as two times the submission cost. Submission costs were reported on a per article basis for SciELO ($130/article (Brembs 2015)).

Typesetting

We obtained pricing data on typesetting for 5 vendors (3 email responses, 1 primary source, and 1 secondary source) and verified our findings with data on the cost of typesetting for the National Library of Medicine (Anderson 2013). To obtain a range of per article typesetting prices for vendors, we took the per page price of typesetting and multiplied by the minimum and maximum average article length (7.4 pages/article and 12.4 pages/article), as reported in Ware & Mabe (Ware and Mabe 2015) and further corroborated by Falagas, Zarkali, Karageorgopoulos, Bardakas & Mavros (Falagas et al. 2013). We validated our findings with data from the National Library of Medicine on the cost of typesetting author submitted manuscripts (Anderson 2013).

DOI

We obtained DOI pricing data for 5 vendors, all from publicly available pricing pages on vendor websites. As agents typically only have one option for DOI registration based on country of operation, we used data from a single source, Crossref, when calculating DOI costs. Crossref charges a fee of $1 per article and $0.06 per component, data set, or data element in addition to a revenue-based annual membership fee (Crossref 2015). As our focus is on marginal cost, we excluded the membership fee from our calculations.

To calculate the total DOI fee, we first estimated the total number of components per submission. Using data from the open access subset of articles on PubMed Central (>900,000 articles), we found an average of 1 text file, 0.2 tables, 0.02 audio or video files, and 4 images per submission. We calculated the total DOI cost as the sum of the per component fee ($0.06) times the total number of components (5.22 components) and the per article fee ($1).

Preservation

For the cost of preservation, we used publicly available pricing data for CLOCKSS as a proxy1. CLOCKSS charges a $0.25 per article fee in addition to a revenue-based annual membership fee (CLOCKSS 2015). As our focus is on marginal cost, we excluded the membership fee from our calculations. As CLOCKSS waives the per article fee is waived for the first 500 articles, the per article preservation cost varies by total annual number of articles published. Therefore, we calculated the cost of preservation across a range of number of articles published per year.

We determined a representative set of categories for the number of articles published per year using data collected from a sample of 136 publishers. We used the quantiles from this data to calculate the categories presented in Table 1 and Table 2. Additionally, we included a category for the total number of scientific articles published per year globally (2,000,000 articles, (Ware and Mabe 2015)), as this provides an upper bound to our cost calculations.

For agents publishing more than 500 articles per year, the per article cost of preservation was the total number of articles published minus the first free 500 articles, times $0.25, and divided by the total number of articles published.

Infrastructure Cost

We evaluated two offerings from Amazon Web Services (AWS) as a proxy to compute the cost of running scholarly communication services on cloud infrastructure: Elastic Compute Cloud (EC2) and Lambda. EC2 is ideal for users with predictable usage and volume requirements over time, and offers significant discounts when services are reserved for one or three year terms and paid for upfront. For agents wishing to start a journal, however, usage requirements are hard to predict and upfront payments may be suboptimal. Further, even the lowest tier option of EC2 is overpowered for most publishing operations, resulting in paid resources going unused. With Lambda, costs are only incurred for compute time used and all users have access to one low cost, on demand solution. Therefore, we selected Lambda to compute the cost of running scholarly communication services on cloud infrastructure, as it minimizes cost irrespective of projected usage and maintains the same level of service quality as EC2.

Submission and Management of Editorial Workflow and Peer Review

To calculate the cost of submission and management of editorial workflow and peer review, we first estimated the total number and size of components per submission, using data from the open access subset of articles on PubMed Central (>900,000 articles) as before. We found an average of 1 text file (10MB/file), 0.2 tables (1MB/table), 0.02 audio or video files (300MB/audio or video), and 4 images (5MB/image) per submission. We calculated the cost of submission as the cost to put one copy of all components on AWS Simple Storage Service (S3). Then, we calculated the cost of management of editorial workflow and peer review as the sum of the cost of S3 for 6.41 months (the mean time from submission to acceptance (Björk and Solomon 2013)), 50 emails using AWS Simple Email Service (SES), and 100 requests to put and get all components of the submission from S3 three times (as a proxy for the revision cycle of an average submission).

Typesetting

We defined the activities of typesetting and the compute time and memory requirements for each as 1) file type detection (0.1 CPU-s, 1GB memory), 2) document formatting, styling, conversion, and metadata extraction, e.g. citations or author affiliations (60 CPU-s, 1GB memory), 3) tabular data extraction (5 CPU-s, 1GB memory), 4) audio and video transcoding and metadata extraction (60 CPU-s, 1GB memory), and 5) image transcoding and metadata extraction (10 CPU-s, 1GB memory).

We calculated the total typesetting costs as the sum of the cost to run all activities described above using Lambda plus the cost to put all artefacts on S3 after processing has completed. We accounted for the fact that as of today2, the first 1M requests and first 400,000 GB-s per month are free on Lambda in all calculations.

DOI

DOI costs were calculated using the same method as above in vendor-based costs.

Preservation

Preservation costs were calculated using the same method as above in vendor-based costs.

Results

Vendor Costs

We found that an end-to-end scholarly communication solution can be provided exclusively by a combination of vendor services. Based on price data from 15 vendors (Figure 1,Supporting Table 1), we found that the marginal cost of scholarly communication was between $69 per article and $318 per article (Table 1). We found that these costs were not impacted by the different input formats used by authors (e.g Microsoft Word or LaTeX).

All vendors we communicated with for submission, management of editorial workflow and peer review, and typesetting services indicated that pricing could be discounted for high volumes, further corroborated by references to economies of scale for large publishers (Clarke 2015). However, the per article price data we obtained did not include any such volume-based discounts. Therefore, we provided a single estimate across volumes (Table 1), though we expect prices to decrease at the highest volumes.

Number articles published per year Submission, Management of Editorial workflow and peer review Typesetting DOI (Crossref) Long-term preservation (CLOCKSS) TOTAL
50 $60 - $130 $7.40 - $186.37 $1.33 $0.00 $68 - $317
500 $0.00 $68 - $317
1,000 $0.13 $69 - $317
2,500 $0.20 $69 - $318
300,000 $0.25 $69 - $318
1,000,000 $0.25 $69 - $318
2,000,000 $0.25 $69 - $318
Table 1

Annual cost per article for manuscript submission and management of editorial workflow and peer review, typesetting, DOI registration (Crossref), and preservation (CLOCKSS).

Infrastructure Cost

We found that the cost of the different cloud infrastructure services necessary to run software solutions able to support all the functions of scholarly communication was between $1.36 and $1.61 per article (Table 2). DOI registration (Crossref) and long-term preservation costs (CLOCKSS) were included in calculating the infrastructure cost as these services are critical to scholarly communication and cannot currently be substituted by software.

Aside from the fact that cloud infrastructure providers considered offered a free tier at low volume, we found that the costs were invariant to volume (number of submissions). This is due to the fact that services like AWS guarantees that charges are only applied when computations are actively running, obliterating optimal computational resource allocation issues. In practice, we found that low volume publishers can benefit from marginal cost identical to high volume publishers (or lower given given the availability of free option for low volume).

Number articles published per year Submission, Management of Editorial workflow and peer review Typesetting DOI (Crossref) Long-term preservation (CLOCKSS) Total
50 $0.030 $0.0001 $1.33 $0.00 $1.36
500 $0.0001 $0.00 $1.36
1,000 $0.0001 $0.13 $1.48
2,500 $0.0001 $0.20 $1.56
300,000 $0.0048 $0.25 $1.61
1,000,000 $0.0052 $0.25 $1.61
2,000,000 $0.030 $0.0054 $0.25 $1.61
Table 2

Annual cost per article for submission, management of editorial workflow and peer review, typesetting, cloud infrastructure, DOI registration (Crossref), and preservation (CLOCKSS).

Discussion

Today, an agent looking to start an independent, peer-reviewed scholarly journal has two main options: develop software in-house or contract vendors. These two options are not mutually exclusive, and interesting solutions can be built by combining these two approaches. However, contrasting these two extreme options offers interesting insights.

Vendors offer ready-made solutions able to fulfill all the functions of scholarly communication at a price ranging between $69 and $318 per new scholarly article. This cost is enough to create a substantial financial burden for organizations like the National Library of Medicine (Text Box 1) or innovative communities starting new journals like eLife (Text Box 2).

In contrast, we found that the marginal cost of the cloud infrastructure necessary to fulfill the functions of scholarly communication is relatively low—between $1.36 and $1.61 per article. Given that cloud computing follows Moore’s Law (Hoff 2015), we can expect that this cost will continue to decrease (with the caveat that DOI registration alone account for 82% to 98% of this cost). The rise of new, decentralized solutions for the generation of reliable identifiers, such as solutions based on the blockchain, may prove to be useful in providing an alternative to DOIs—unlocking the potential for further cost reductions.

While vendor cost typically decreases with higher volume, new offerings in cloud computing such as AWS lambda exhibit the opposite trend, challenging the notion that large volume publishers benefit from economies of scales as compared to smaller publishers.

As of today, there is a lack of high quality “plug-and-play” software solutions covering the functions of scholarly communication. Therefore, a significant investment in software development must be made before an agent can benefit from the relatively low marginal cost of cloud infrastructure found here. Open source initiatives such as pandoc or mammoth.js offer good reason to believe that this gap will eventually be bridged, but existing solutions are currently not robust enough to recreate the quality offered by traditional vendors or publishers. Until this issue is addressed, the marginal cost of scholarly communication is likely to remain high enough that it constrains massive experimentation and rapid innovation (Text Box 1,Text Box 2).

Supplemental Material

Requirement Vendors Pricing data
Submission, management of editorial workflow and peer review
  • Aries Editorial Manager

  • eJournal Press

  • Highwire Bench Press

  • SciELO

  • ScholarOne Manuscript Central

  • $60.00/article ($30/submission, (A. O’Connell 2015, pers. comm., 26 Oct)).

  • $70/article ($35/submission, ScholarOne, (Casler and Byron 2009)).

  • $130/article (SciELO,(Brembs 2015)).

Typesetting
  • Cenveo

  • Charlesworth Group

  • Devaland

  • Formax

  • National Library of Medicine

  • River Valley Technologies

  • SPi

  • $59.20/article - $148.80/article (S. Tangri 2015, pers. comm., Nov 12).

  • $74/article - $148.80/article (C. Meadows 2015, pers. comm., Nov 11).

  • $27.84/article - $186.37/article (EU 3.50 to EU 7.00 per page for page layout and EU 3.50 to EU 7.00 per page for proofs and corrections, (Devaland)).

  • $7.40/article - $12.40/article ($1/page for for DOCX to NISO JATS conversion, (B. Blodgett 2015, pers. comm., Nov 18).

  • $35/article (National Library of Medicine,(Anderson 2013)).

  • $27.84/article - $119.78/article (River Valley Technologies, EU 3.50 and EU 9 per page, page (Taylor, Wedel, and Naish 2015)).

DOI
  • Airiti DOI

  • ChinaDOI

  • CrossRef

  • DataCite

  • mEDRA

  • $0.90 - $6.00/article (for publishers with 25 to 5000 or more DOIs per year, (mEDRA)).

  • $1/article plus membership fees based on annual revenues, e.g. $275/year for a publisher with <$1M in revenues vs $50K for publisher with >$500M in revenues (Crossref 2015).

  • $0.16/article plus membership fees, starting at $78/year (China DOI; Crossref 2015; DataCite).

  • $0.82 - $0.98/article plus membership fees from $214/year to $305/year (Airiti DOI).

  • $9,027/year for unlimited DOI registrations, but must be non-profit organization (DataCite).

Preservation
  • CLOCKSS

  • Portico

  • Membership fee based on annual revenues from $250/year for publishers with <$250K/year revenues up to $81,960/year for publishers with >$200M/year revenues (Portico 2015).

  • $1/article plus $0.25/article with no charge for first 500 articles plus membership fee of $225/year for publishers with <$250K/year revenues and $26,500/year for publishers with >$200M/year revenues (CLOCKSS 2015).

Supporting Table 1

Publishing requirements, vendors offering services to fulfill each of the requirements, and pricing data where available.

Funding

The work was funded by
  • Knight Enterprise Fund
The work was funded by
  • Starpower Fund
The work was funded by
The work was funded by
The work was funded by

Footnotes

  1. Agents may have access to other free or membership only options for preservation (e.g. LOCKSS or Portico) depending on their organization status.

  2. AWS Lambda free tier services are available to both new and existing customers indefinitely (see: https://aws.amazon.com/free/)

References

  1. (Not) giving credit where credit is due: Citation of data sets, by Joan E. SieberJoanE.Sieber and Bruce E. TrumboBruceE.Trumbo; published in Science and Engineering Ethics, in .
  2. Scholarly Communication, by Association of Research Libraries; published on (accessed on ).
  3. The STM Report. An overview of scientific scholarly journal publishing., by Mark WareMarkWare and Michael MabeMichaelMabe; published in .
  4. Out of Cite, Out of Mind: The Current State of Practice, Policy, and Technology for the Citation of Data, by CODATA-ICSTI Task Group on Data Citation Standards and Practice; published in volume 12 of Data Science Journal, on .
  5. Managing Peer Review Online, by Robert CaslerRobertCasler and Janet ByronJanetByron; published in (accessed on ).
  6. Joint Declaration of Data Citation Principles, by Data Citation Synthesis Group; published in (accessed on ).
  7. Achieving human and machine accessibility of cited data in scholarly publications, by Joan StarrJoanStarr, Eleni CastroEleniCastro, Mercè CrosasMercèCrosas, Michel DumontierMichelDumontier, Robert R. DownsRobertR.Downs, Ruth DuerrRuthDuerr, Laurel L. HaakLaurelL.Haak, Melissa HaendelMelissaHaendel, Ivan HermanIvanHerman, Simon HodsonSimonHodson, Joe HourcléJoeHourclé, John Ernest KratzJohnErnestKratz, Jennifer LinJenniferLin, Lars Holm NielsenLarsHolmNielsen, Amy NurnbergerAmyNurnberger, Stefan ProellStefanProell, Andreas RauberAndreasRauber, Simone SacchiSimoneSacchi, Arthur SmithArthurSmith, Mike TaylorMikeTaylor, and Tim Clark​TimClark​; published in PeerJ CompSci, on .
  8. What goes into making a scientific manuscript public, by Bjorn BrembsBjornBrembs; published in The Winnower, in . DOI: 10.15200/winn.143497.72670.
  9. Enabling Scientific Data on the Web, by Raymond Alexander MiłowskiRaymondAlexanderMiłowski; published in .
  10. The Price of Posting — PubMed Central Spends Most of Its Budget Handling Author Manuscripts, by Kent AndersonKentAnderson; published on (accessed on ).
  11. Citing Data in Journal Articles using JATS, by Deborah Aleyne LapeyreDeborahAleyneLapeyre; published in .
  12. The Impact of Article Length on the Number of Future Citations: A Bibliometric Analysis of General Medicine Journals, by Matthew E FalagasMatthewEFalagas, Angeliki ZarkaliAngelikiZarkali, Drosos E KarageorgopoulosDrososEKarageorgopoulos, Vangelis BardakasVangelisBardakas, and Michael N MavrosMichaelNMavros; published in PLOS One, on . DOI: 10.1371/journal.pone.0049476.
  13. Publisher fees, by Crossref; published on (accessed on ).
  14. For Attribution — Developing Data Attribution and Citation Practices and Standards, by NRC; published in .
  15. Contribute to CLOCKSS, by CLOCKSS; published in (accessed on ).
  16. Semantic Versioning 2.0.0, by Tom Preston-WernerTomPreston-Werner; published in (accessed on ).
  17. The publishing delay in scholarly peer-reviewed journals, by B-C BjörkB-CBjörk and D SolomonDSolomon; published in issue 4, volume 7, of Journal of Informetrics, in , pages 914-923. DOI: 10.1016/j.joi.2013.09.001.
  18. semver — the semantic versioner for npm, by npm; published on (accessed on ).
  19. The Changing Nature of Scale in STM and Scholarly Publishing, by M ClarkeMClarke; published on (accessed on ).
  20. Scholarly HTML, by Robin BerjonRobinBerjon and Sébastien BallesterosSébastienBallesteros; published in .
  21. http://schema.org/, by schema.org; published in (accessed in ).
  22. Google Says Cloud Prices Will Follow Moore’s Law: Are We All Renters Now?, by T HoffTHoff; published on (accessed on ).
  23. Typesetting Prices, by Devaland (accessed on ).
  24. DOCX Standard Scientific Style, by Tiffany BogichTiffanyBogich and Sébastien BallesterosSébastienBallesteros; published in .
  25. How much does "typesetting" cost?, by Mike TaylorMikeTaylor, Matt WedelMattWedel, and Darren NaishDarrenNaish; published on (accessed on ).
  26. First 1000 responses – most popular tools per research activity, by 101 Innovations; published on .
  27. mEDRA doi, by mEDRA (accessed on ).
  28. Metadata Vocabulary for Tabular Data, by Jeni TennisonJeniTennison, Gregg KelloggGreggKellogg, and W3C; published on .
  29. Style Guide DB, by Standard Analytics; published in (accessed in ).
  30. http://dx.chinadoi.cn/, by China DOI (accessed on ).
  31. https://www.datacite.org/, by DataCite (accessed on ).
  32. http://doi.airiti.com/, by Airiti DOI (accessed on ).
  33. Publishers, by Portico; published in (accessed on ).
  34. A view on the future: The eLife Sciences 2014 Annual report, by eLife Sciences; published in (accessed on ).
  35. Open access: The true cost of science publishing, by Richard Van NoordenRichardVan Noorden; published in volume 495 of Nature, on , pages 426-429. DOI: 10.1038/495426a.
  36. Revisiting: The Price of Posting — PubMed Central Spends Most of Its Budget Handling Author Manuscripts, by David CrottyDavidCrotty; published in the Scholarly kitchen, on (accessed on ).
drag to resize shell