Manubot Versions Save

Python utilities for Manubot: Manuscripts, open and automated

v0.4.0

4 years ago

Includes backwards incompatable changes to the manubot.cite API. Major enhancements to the flexibility of citation processing.

Commits

12d4537 reduce travis and appveyor CI duties (#232)
3196819 one-time reset of the CSL JSON Schema requests cache (#235)
1c3c74b Update CSL JSON Schema processing & replace OrderedDicts (#234)
2c5fb7a Update README badges
47b03e0 apply flexible class-based citation pipeline (#225)
a2c54c8 Adopt BSD-2-Clause-Patent license
20ba7e3 move linting, docs, and releases to github actions (#228)
f891508 Improve maintainability of cite command tests (#227)
04cbe9f fix test_web_query_returns_single_result_pubmed_url
7055bcc Citation refactor: CiteKey & Handle classes, CURIE support (#223)
4d3a97f actions: restore python-version functionality (#224)
730527f style: ignore F541: f-string without any placeholders
7261220 cite.curie module for compact identifiers via identifiers.org (#219)
dc5ad40 relax test_unpaywall_from_csl_item_with_doi
890b768 webpage_command.py: fix bug in subprocess usage
9c33b7a metadata: convert author.funders to list (#221)
a4c826c get_continuous_integration_parameters: fix warning (#215)
b4a6238 subprocess calls: do not accidentally write to stdout (#212)
ea11343 CI: use r-lib/actions/setup-pandoc to install pandoc
10fa21b fix typo in cite.citeproc._remove_error warning message
a022e71 manubot process: require --skip-citations (#210)
bc79746 Fix test_unpaywall_from_citekey
e912856 DOI metadata: use DataCite Content Negotiation (#206)
9a8c080 Fix test_get_arxiv_csl_item_oai: new OpenBioLink title
68ae7cf yamllint failing YAML files (#205)
e18fed8 Fix get_rootstock_commit on pull requests
e88055f Update release instruction [skip ci]

v0.3.1

4 years ago

Manubot version 0.3.1 includes the pandoc-manubot-cite command, which is a Pandoc filter for citation by identifier. manubot process has a --skip-citations option to leave citation processing for pandoc-manubot-cite. This option may become required in the future.

See commits for additional enhancements in this release.

Commits

95ec387 Do not warn about citation-tags.tsv when missing (#194)
6a35918 Fix black formatting error
54ed780 get_continuous_integration_parameters: support GitHub actions (#195)
b807720 Version citaiton in testing input.md
5ac49d7 pandoc-manubot-cite filter (#99)
92ad55b CI: do not use pytest --verbose
8b6e177 Remove python 3.8 from GH actions test
9ce0c62 Add GitHub workflow for tests (#136)
87f6314 Configure & enforce flake8 linting (#193)
7d6bb21 readme: add documentation badge
11d189d process.util: more modular get_citekeys_df & generate_csl_items
36bc1c4 CSL_Item: add date IO functionality (#189)
b00e49f CI: Update pandoc to version 2.8 (#178)
9c58b4e process: YAML metadata use 'authors' not 'author_info' (#188)
82c3d40 Expand url_to_citekey support for Sci-Hub links (#185)
f1772c1 CSL Item: Remove newlines in arXiv abstract (#184)
a59e73d manubot.cite.citekey.url_to_citekey URL parser (#182)
d223302 Fix minor linter errors
b3f6a23 zotero search_or_web_query deduces whether URL or ID (#183)

v0.3.0

4 years ago

Manubot version 0.3.0 updates the schema of output variables & metadata for the manubot process command. Now, Pandoc's header-includes metadata field is set to provide manuscript-specific metadata that improves indexing by bibliographic databases and assists sharing on social media.

The terminology around citations has been updated. We now refer to identifiers for specific references as "citekeys" or "citation keys". The following external-facing functions have updated names: manubot.cite.citekey_to_csl_item and manubot.cite.standardize_citekey.

There is a new subcommand manubot webpage for managing creation of a webpage directory for manuscripts.

Commits

a9fc2ea Log user-provided variables at debug level (#179)
2ffcec1 don't encapsulate header-includes in code block
e892999 Set Pandoc's header-includes with HTML <meta> (#138)
23e687a Revise metadata.yaml processing (#175)
74f7a85 Update copyright assertion statement
98de2bd miscellaneous edits from gitter reviews
ebac7ab process: combined pandoc & manubot metadata via load_variables (#173)
49343c7 readme: improve development commands (#171)
785b4aa move is_http_url to manubot.util (#174)
9023b7f process: set / detect manuscript thumbnail image (#169)
0b7c14a Apply black style to Python codebase (#164)
43be195 webpage: fix process referenced before assignment (#166)
ddd0995 Function to get doi metadata via translation-server
d5f9871 Travis CI: test on Python 3.8 (#162)
2f494fe skip test_get_continuous_integration_parameters_* on forks (#160)
94e3f4c Migrate standardize ids & notes to CSL_Item class (#157)
04a3574 Generate docs and deploy to gh-pages (#153)
065ae7f Migrate csl_item_passthrough to CSL_Item.clean (#156)
40e5b7a test_cite_command_render_stdout: version expected files
1aa21dc CSL_Item class for bibliographic metadata of a single reference
75b259e TST: pytest option to cache requests (#151)
0478ce2 Switch pytest to use verbose & colored output (#154)
cf9564d split cite.util into csl_item & citekey submods
307b339 TST: read files as utf-8-sig (#143)
71d4d39 Travis: use API token for PyPI authorization
df480ae Fix webpage warning during ots upgrade (#134)
23a0841 Adopt "citekeys" terminology (#129)
efb0adb webpage redirect-template.html: remove blank line
00405aa manubot webpage subcommand (#132)
1b885c6 process.ci: generalize CI variables & support AppVeyor

Contributors

Authors of commits included in this release:

v0.2.4

4 years ago

Manubot version 0.2.4 contains various enhancements and improvements to the citation processing workflow.

New features

Create the new manubot.pandoc submodule with code for interacting with the system Pandoc installation (see GH103). This module creates an organized location for Pandoc-related code, which will help with development of creating a Pandoc filter for citation-by-identifier (see GH99).

Manual references can now be supplied in formats other than CSL JSON. Formats supported by pandoc-citeproc --bib2json can now be supplied to manubot process. See GH100 and GH104.

Changes

Additional refactoring of the manubot.cite submodule has moved the package closer to a well-defining processing pipeline for citations (GH113 and GH114). The column names in citations.tsv changed to [manuscript_id, detagged_id, standard_id, short_id].

Make any missing parent directories for the --output-directory and --cache-directory arguments of manubot process. See GH102 and GH115.

Read text files using the utf-8-sig encoding (to strip BOMs if present). Write text files using utf-8 encoding. UTF-8 ensures compatibility with Pandoc, which uses it for I/O. Also keeps operation consistent across files / platforms. See GH125 and GH127.

The README has been updated with improved installation instructions and the Manubot software paper citation. See GH118 and GH121.

v0.2.3

5 years ago

Manubot version 0.2.3 contains various enhancements. In addition, the source code location has moved from https://github.com/greenelab/manubot to https://github.com/manubot/manubot (see GH94).

New features

Citations of shortDOIs are now supported (see GH92 and GH93). shortDOIs, which start with 10/ rather than 10., can now be cited just like a DOI. For example, @doi:10/gddkhn is a supported citation. Manubot expands shortDOI citations to their regular DOIs, e.g. @doi:10.1098/rsif.2017.0387, such that manubot process will treat both the short and regular form as the same citation.

Bug fixes

Queries to Manubot's translation-server now specify single=1 to enforce returning a single record per persistent identifier (see GH90). Previously, multiple results were sometimes returned, causing Manubot's JSON CSL retrieval to fail. Furthermore, Zotero child notes are now ignored, fixing another failure mode for CSL export of Zotero metadata.

Null authors are now allowed in metadata.yaml and do not crash Manubot with a TypeError (see GH91).

The codebase has been updated to avoid deprecation warnings in Pandas v0.24 (see GH95).

v0.2.2

5 years ago

Manubot version 0.2.2 contains citation and web request enhancements.

New features

This release adds citation support for two additional types of identifiers (isbn and wikidata).

ISBNs are the primary persistent identifier for many books, so many books no longer need to be cited by URL (see GH79 and GH14). However, ISBN metadata is sometimes missing or erroneous. Users may need to still need to set manual CSL JSON, but Manubot can at least produce a reasonable starting template. Try for example manubot cite isbn:9780062316097.

Wikidata is a free and open knowledge base that contains many records of scholarly works. Wikidata can store metadata on records that do not have their own persistent identifiers, and thus can help Manubot users assign a stable identifier to works that otherwise would not have one (see GH67 and GH86). Try for example manubot cite wikidata:Q50051684.

Manubot now uses Zotero's translation-server infrastructure to provide metadata for wikidata, ISBN, and URL citations (see GH70 and GH84). Manubot now hosts its own instance of translation-server at https://translate.manubot.org (see GH82). As such, Manubot users can benefit from Zotero's impressive collection of translators for retrieving metadata from different webpages. Manubot's ISBN and URL citation metadata retrievers now first attempt to generate metadata using translation-server, and fallback to other methods if that fails.

Bug fixes

NCBI E-Utility requests are now rate limited to 2 per second (see GH83). Previously, certain situations that caused rapid E-Utility requests would return status code 429 for "too many requests".

Internal changes

Outgoing web requests made by Manubot now set the User-Agent header (see GH83). These headers provide high-level information of a user's system, as shown in the following examples:

manubot/0.2.2 (Linux; Python/3.6) <[email protected]>
manubot/0.2.2 (Windows; Python/3.7) <[email protected]>

Setting the header will help upstream resources contact the Manubot developers should our requests be problematic or should downtime be anticipated. Furthermore, it will allow Manubot's translation-server to monitor Manubot usage, including which operating system, Python version, and package version are being used.

Manubot's test suite has been reorganized such that testing modules correspond one-to-one with package modules (see GH87).

v0.2.1

5 years ago

Manubot version 0.2.1 contains several improvements to the package's citation infrastructure.

New features

Support has been added for raw citations for references without supported persistent identifiers (see GH62 and GH74). Raw citations require the user to manually specify the corresponding CSL JSON.

Error messages for invalid citations have been improved (see GH76 and GH71). More types of incorrect citations are now caught internally before any external APIs are queried to retrieve metadata.

The manubot cite command has been updated to generate metadata for all valid citations, while logging error messages for invalid citations (see GH77). Previously, a single invalid citation would cause the program to exit before outputting references for valid citations.

Bug fixes

Previously, metadata for pmcid citations was retrieved from the NCBI Citation Exporter. This service was taken offline without notice causing citation retrieval to fail. NCBI replaced the previous service with the Literature Citation Exporter. The manubot.cite.pubmed.get_pmc_citeproc function has been changed to use the new service (see GH80).

Previously, CSL JSON Items were being generated with empty date-parts arrays, which would cause pandoc-citeproc to fail. Manubot's CSL JSON pruning infrastructure has been updated to delete empty date-parts arrays (see GH66 and GH65).

Entrez E-Utils returned integer-encoded months for certain pmid citations causing citeproc_from_pubmed_article to fail. Both integer and character month encodings are now supported (see GH72).

v0.2.0

5 years ago

Manubot version 0.2.0 introduces subcommands to the command line interface. The previous command manubot to process manuscript content for Pandoc is now manubot process. New functionality has been added via the manubot cite subcommand to retrieve bibliographic metadata for citations (see GH37). The cite subcommand can either return CSL JSON (default) or formatted references (--render, requires Pandoc, see GH48).

This release adds support for removing invalid fields from CSL JSON Data, which is enabled by default (see GH47). Previously, certain citeproc APIs returned CSL JSON with extra fields or fields with invalid values according to the CSL JSON Schema. Now CSL JSON is validated against the schema, with invalid fields removed.

Now PMID & PMCID fields are automatically populated when generating CSL data for DOIs (see GH45). CSL for DOIs now uses shortDOIs in the URL field.

As this package now supports more varied use cases and workflows, the code has been refactored to use lazy imports (see GH56). Most functions directly under manubot.cite and manubot.process have been moved to util submodules. manubot.cite.citation_to_citeproc and manubot.cite.standardize_citation remain for backward compatibility.