Python utilities for Manubot: Manuscripts, open and automated
Includes backwards incompatable changes to the manubot.cite API. Major enhancements to the flexibility of citation processing.
Manubot version 0.3.1 includes the pandoc-manubot-cite
command, which is a Pandoc filter for citation by identifier.
manubot process
has a --skip-citations
option to leave citation processing for pandoc-manubot-cite
.
This option may become required in the future.
See commits for additional enhancements in this release.
Manubot version 0.3.0 updates the schema of output variables & metadata for the manubot process
command. Now, Pandoc's header-includes
metadata field is set to provide manuscript-specific metadata that improves indexing by bibliographic databases and assists sharing on social media.
The terminology around citations has been updated. We now refer to identifiers for specific references as "citekeys" or "citation keys". The following external-facing functions have updated names: manubot.cite.citekey_to_csl_item
and manubot.cite.standardize_citekey
.
There is a new subcommand manubot webpage
for managing creation of a webpage directory for manuscripts.
<meta>
(#138)process
referenced before assignment (#166)Authors of commits included in this release:
Manubot version 0.2.4 contains various enhancements and improvements to the citation processing workflow.
Create the new manubot.pandoc
submodule with code for interacting with the system Pandoc installation (see GH103). This module creates an organized location for Pandoc-related code, which will help with development of creating a Pandoc filter for citation-by-identifier (see GH99).
Manual references can now be supplied in formats other than CSL JSON. Formats supported by pandoc-citeproc --bib2json
can now be supplied to manubot process
. See GH100 and GH104.
Additional refactoring of the manubot.cite
submodule has moved the package closer to a well-defining processing pipeline for citations (GH113 and GH114). The column names in citations.tsv
changed to [manuscript_id
, detagged_id
, standard_id
, short_id
].
Make any missing parent directories for the --output-directory
and --cache-directory
arguments of manubot process
. See GH102 and GH115.
Read text files using the utf-8-sig
encoding (to strip BOMs if present). Write text files using utf-8
encoding. UTF-8 ensures compatibility with Pandoc, which uses it for I/O. Also keeps operation consistent across files / platforms. See GH125 and GH127.
The README has been updated with improved installation instructions and the Manubot software paper citation. See GH118 and GH121.
Manubot version 0.2.3 contains various enhancements. In addition, the source code location has moved from https://github.com/greenelab/manubot to https://github.com/manubot/manubot (see GH94).
Citations of shortDOIs are now supported (see GH92 and GH93). shortDOIs, which start with 10/
rather than 10.
, can now be cited just like a DOI. For example, @doi:10/gddkhn
is a supported citation. Manubot expands shortDOI citations to their regular DOIs, e.g. @doi:10.1098/rsif.2017.0387
, such that manubot process
will treat both the short and regular form as the same citation.
Queries to Manubot's translation-server now specify single=1
to enforce returning a single record per persistent identifier (see GH90). Previously, multiple results were sometimes returned, causing Manubot's JSON CSL retrieval to fail. Furthermore, Zotero child notes are now ignored, fixing another failure mode for CSL export of Zotero metadata.
Null authors are now allowed in metadata.yaml
and do not crash Manubot with a TypeError (see GH91).
The codebase has been updated to avoid deprecation warnings in Pandas v0.24 (see GH95).
Manubot version 0.2.2 contains citation and web request enhancements.
This release adds citation support for two additional types of identifiers (isbn
and wikidata
).
ISBNs are the primary persistent identifier for many books, so many books no longer need to be cited by URL (see GH79 and GH14). However, ISBN metadata is sometimes missing or erroneous.
Users may need to still need to set manual CSL JSON, but Manubot can at least produce a reasonable starting template. Try for example manubot cite isbn:9780062316097
.
Wikidata is a free and open knowledge base that contains many records of scholarly works. Wikidata can store metadata on records that do not have their own persistent identifiers, and thus can help Manubot users assign a stable identifier to works that otherwise would not have one (see GH67 and GH86). Try for example manubot cite wikidata:Q50051684
.
Manubot now uses Zotero's translation-server infrastructure to provide metadata for wikidata, ISBN, and URL citations (see GH70 and GH84). Manubot now hosts its own instance of translation-server at https://translate.manubot.org (see GH82). As such, Manubot users can benefit from Zotero's impressive collection of translators for retrieving metadata from different webpages. Manubot's ISBN and URL citation metadata retrievers now first attempt to generate metadata using translation-server, and fallback to other methods if that fails.
NCBI E-Utility requests are now rate limited to 2 per second (see GH83). Previously, certain situations that caused rapid E-Utility requests would return status code 429 for "too many requests".
Outgoing web requests made by Manubot now set the User-Agent header (see GH83). These headers provide high-level information of a user's system, as shown in the following examples:
manubot/0.2.2 (Linux; Python/3.6) <[email protected]>
manubot/0.2.2 (Windows; Python/3.7) <[email protected]>
Setting the header will help upstream resources contact the Manubot developers should our requests be problematic or should downtime be anticipated. Furthermore, it will allow Manubot's translation-server to monitor Manubot usage, including which operating system, Python version, and package version are being used.
Manubot's test suite has been reorganized such that testing modules correspond one-to-one with package modules (see GH87).
Manubot version 0.2.1 contains several improvements to the package's citation infrastructure.
Support has been added for raw
citations for references without supported persistent identifiers (see GH62 and GH74). Raw citations require the user to manually specify the corresponding CSL JSON.
Error messages for invalid citations have been improved (see GH76 and GH71). More types of incorrect citations are now caught internally before any external APIs are queried to retrieve metadata.
The manubot cite
command has been updated to generate metadata for all valid citations, while logging error messages for invalid citations (see GH77). Previously, a single invalid citation would cause the program to exit before outputting references for valid citations.
Previously, metadata for pmcid
citations was retrieved from the NCBI Citation Exporter. This service was taken offline without notice causing citation retrieval to fail. NCBI replaced the previous service with the Literature Citation Exporter. The manubot.cite.pubmed.get_pmc_citeproc
function has been changed to use the new service (see GH80).
Previously, CSL JSON Items were being generated with empty date-parts
arrays, which would cause pandoc-citeproc to fail. Manubot's CSL JSON pruning infrastructure has been updated to delete empty date-parts
arrays (see GH66 and GH65).
Entrez E-Utils returned integer-encoded months for certain pmid
citations causing citeproc_from_pubmed_article
to fail. Both integer and character month encodings are now supported (see GH72).
Manubot version 0.2.0 introduces subcommands to the command line interface. The previous command manubot
to process manuscript content for Pandoc is now manubot process
. New functionality has been added via the manubot cite
subcommand to retrieve bibliographic metadata for citations (see GH37). The cite subcommand can either return CSL JSON (default) or formatted references (--render, requires Pandoc, see GH48).
This release adds support for removing invalid fields from CSL JSON Data, which is enabled by default (see GH47). Previously, certain citeproc APIs returned CSL JSON with extra fields or fields with invalid values according to the CSL JSON Schema. Now CSL JSON is validated against the schema, with invalid fields removed.
Now PMID & PMCID fields are automatically populated when generating CSL data for DOIs (see GH45). CSL for DOIs now uses shortDOIs in the URL field.
As this package now supports more varied use cases and workflows, the code has been refactored to use lazy imports (see GH56). Most functions directly under manubot.cite
and manubot.process
have been moved to util
submodules. manubot.cite.citation_to_citeproc
and manubot.cite.standardize_citation
remain for backward compatibility.