A high throughput sequence read toolset using a streaming approach facilitated by Linux pipes
Bugfix release with the following changes:
Bugfix release with the following changes:
Fixes hts_LengthFilter not counting reads correctly.
Changes Include:
A major refactor by @joe-angell to make the code cleaner and reduce repetition. Maintaining existing apps and contributing new apps to HTStream will now be easier and involve less coding than ever before.
GitHub Continuous Integration support added by @joe-angell.
Many changes to add new statistics and standardize statistics collection in order to support better reporting by @msettles. These changes almost certainly break compatability with any tools that collected information from the old JSON log files. Please test before updating. These changes are part of ongoing work to add MultiQC support.
Code and repository cleanup by @msettles.
Appending to log file no longer requires both -L and -A. Just use -A now. @msettles
Added a Code of Conduct and guidelines for contributing. @msettles
Moved the repository to a new organization and repository to better represent the contributors associated with the project (https://github.com/s4hts/HTStream). @msettles
Major modifications to the way that read filtering is done and added a new hts_LengthFilter applicaton by @keithgmitchell. Previously various different tools had length filtering options, making it difficult to compile accurate logging information about the number of reads discarded due to lenth. The new approach is to only allow length filtering in this one app. Note that other apps can still discard reads for other reasons.
Fixed a number of small bugs that were discovered as part of a regression testing system added by @joe-angell
Refactoring and a modification of how reads are processed and stored to remove dynamic casting. @joe-angell
This release adds a new app for primer trimming. The trimmer can also orient reads based on primer match, making it useful for certain types of amplicon libraries. Currently the app will add the name of the trimmed primer to the read ID. Please provide feedback if you have suggestions for better behavior.
Additionally a performance regression that was introduced mid 2019 has been identified and corrected. All HTS tools should run much faster now!
Input and Outputs files options were changed making stdin/stdout the defaults, gzip output is now default, removed parameter output prefix in favor of adding in prefix to output file format option. unmapped sam file is now output properly as well as can be output to stdout with -z stdout.
Following parameters are deprecated -p [ --prefix ] DEPRECATED PARAMETER -g [ --gzip-output ] DEPRECATED PARAMETER -O [ --to-stdout ] DEPRECATED PARAMETER -S [ --from-stdin ] DEPRECATED PARAMETER
New algorithm for polyAT For each read, a sliding window of length window_size is shifted along the left or right end of the fragment sequence and the fraction of A’s (or T’s depending on strandedness of sequencing) is calculated within each window. A minimum of perfect_windows (default 1 window) must have 6 (default) As/Ts and the remaining windows > 0.3 A’s and this is used as a candidate poly(A) site. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5279776/
Binaries tested with Ubuntu 18.04 and CentOS 7.6.1810 (64-bit).
After months of testing and bug fixes, HTStream is ready for a first release.