Lczech Genesis Versions Save

A library for working with phylogenetic and population genetic data.

v0.31.1

4 weeks ago

This is mainly a release to fix the window averaging approach for FST, which was statistically nonsensical in the last release by accident.

Notable Changes

  • Redesign FST window averaging implementation
  • Refine diversity denominator for low read depths
  • Add Window Stream begin and end callbacks

v0.31.0

1 month ago

This release is a major clean-up of the population classes and functions. In particular: (1) "Iterator" classes have been renamed to the more appropriate "Stream", and (2) the Variant filtering approach has been completely redesigned to use tags instead of fully removing positions from the stream, allowing us to properly compute per-window averages of statistics.

Notable Changes

  • General
    • Rename all "Visitor" instances to "Observer" to follow the pattern
    • Refactor observers to have on-enter and on-leave functionality
  • population
    • Rename Base Counts class to Sample Counts
    • Rename all Variant and Window "Iterator" classes to "Stream"
    • Rename Sliding Entries Window Stream to Queue Window Stream
    • Rename usage of "coverage" to "read depth" in function names
    • Major refactor of Variant and Sample Counts filter to use tagging filters
    • Add filter categories and summaries, to simplify user output
    • Refactor file formats and streams to use tagging filters
    • Refactor statistics computations to use tagging filters
    • Add proper window averaging support for statistics using tagging filters
    • Refactor Queue Window Stream to use tagging filters
    • Outsource Genome Stream from Chromosome Stream
    • Add Position Window Stream
    • Add Variant Gapless Input Stream
    • Add Variant Input Stream that merges sample groups
    • Add Diversity Processor helper class
    • Add re-scaling and re-sampling functions for Sample Counts
  • utils
    • Add Kendall's Tau correlation functions
    • Add multinomial and multivariate hypergeometric distribution functions
    • Add betas and intercept coefficients estimation functions for GLM
    • Refactor Thread Pool using Proactive Future for nested tasks
    • Add thread-safe random engines
    • Refine guess thread number functions
    • Add auto waiting to parallel for loop functions
    • Use global thread pool in gzip block compression
    • Disable the local build of htslib if HTSLIB_DIR is provided

Bug fixes

  • Fix end of iteration bug in Lambda Iterator
  • Fix cmake clang htslib incompatibility
  • Fix virtual override destructors
  • Fix Matrix output stream operator for char types
  • Fix htslib lib64 issue lczech/grenedalf#12
  • Add regression interaction test for lczech/gappa#29

v0.30.0

9 months ago

Notable Changes

  • population
    • Add improved Tajima D empirical pool size estimators
    • Add cathedral plot functions with efficient algorithm
    • Add support for sample name header row in Sync Reader
    • Improve Fst Pool Calculator classes
    • Allow multiallelic SNPs in pool VCF and Karlsson Fst
    • Refine automatic sample naming for formats without names
    • Refine sample filter and numerical filter functions
    • Improve input order and chromosome length check functionality
  • utils
    • Add Matrix inplace transpose function
    • Add advanced compensated summation algorithms
    • Add begin and end callbacks to Lambda Iterator
    • Refine text join functions

Bug fixes

  • Fix Matrix output operator for char types
  • Fix missing return statements in Lambda Iterator and Base Window Iterator
  • Fix htslib check in Variant Input Iterator test case
  • Fix Dataframe and Matrix string to double conversions
  • Fix generic convert function
  • Fix thread collision for cache in Reference Genome class
  • Fix backslash escape bug

v0.29.0

1 year ago

Notable changes

  • population
    • Add Reference Genome based ref and alt handling
    • Add Chromosome/Genome Iterator classes
    • Add Window, WindowView, and Iterator abstractions and helpers
    • Add diversity pool calculator, refactor diversity functions
    • Refine diversity measures, for speed and robustness for large coverages
    • Refactor FST pool functions into classes, for streaming
    • Refactor Variant filters and transformations
    • Add Kapun-style missing data entries to Sync Reader
  • sequence
    • Add Reference Genome class and functions
    • Add Sequence Dict class and functions
  • utils
    • Refine binomial functions for larger values, increase speed
    • Add visitor functions to Lambda Iterator
    • Add ranged pop count function to Bitvector
    • Move exceptions back to utils namespace
  • build
    • Export the cmake include targets so that genesis can be used as a subproject

Bug fixes

  • Fix MRU Cache copy constructor
  • Fix thread pool nested deadlocks and seg faults
  • Fix date time sprintf function and gcc macro test
  • Fix string split default argument overload

v0.28.1

1 year ago

Notable changes

  • population
    • Add Sliding Entries Window Iterator
    • Add user-provided column names to Frequency Table Reader
  • utils
    • Compute proper bounding boxes for SVG Path objects
    • Compute proper SVG bounding boxes with transformations
    • Add pie chart SVG helper function
    • Add cache stats to MRU Cache
  • build
    • Update htslib version to fix autoconf issues
    • Add LTO/IPO support with CMake build
    • Add GitHub Actions CI
    • Deactivate OpenMP on MacOS by default, too much trouble
    • Change CMakeLists to use an object library to speed up compliation

Bug fixes

  • Fix various minor compiler warnings found due to CI
  • Fix clang issue with std::tm initialization
  • Proper linking against OpenMP for tests and apps
  • Remove deprecrated std dependency
  • Fix Base Window Iterator categories

v0.28.0

1 year ago

Notable changes

  • Add generic Frequency Table Input Iterator
  • Add generic Genome Region Reader
  • Add Genome Locus Set for fast position queries
  • Add whole chromosome coverage functionality to Genome Region List
  • Add Genome Region Window Iterator
  • Add Map/Bim Reader
  • Make Fst functions more lenient for small pool sizes
  • Add global thread pool for eliminating core oversubscription

Bug fixes

  • Fix memory leak in Base Window Iterator
  • Fix sliding window iterator for empty input

v0.27.0

2 years ago

Notable Changes

  • Add SAM/BAM/CRAM Input Iterator, with RG read group splitting and filtering, and SAM flags filters
  • Refactor Variant Input Iterators for ease of use
  • Add Variant Input Iterator for Parallel Input
  • Refactor Genome Region List to use Interval Tree, and add surrounding functionality
  • Rename and refactor Kofler and Karlsson F_ST pool functions for clarity
  • Add our unbiased F_ST estimators for pool sequencing data
  • Refactor and refine diversity measure settings
  • Refactor Window Iterator
    • Non-virtual iterator interface
    • Base class abstraction for SlidingWindowIterator
    • Deprecate SlidingWindowGenerater, use SlidingWindowIterator instead
  • Deprecate Vcf Window Generator function
  • Add BED Reader
  • Add Genome Region List reader for GFF
  • Speed improvements and async block buffering for Lambda Iterator
  • Refine CMake setup for htslib, improve autotools combatibility

v0.26.1

2 years ago

Notable Changes

  • This is mostly a version bump because the version.hpp file did not get updated properly with genesis v0.26.0 due to the new year, but also:
  • Add pendant length filters for placements

v0.26.0

2 years ago

Notable Changes

Population:

  • Add Genome Locus class and comparison operators
  • Add Variant Parallel Input Iterator
  • Add Variant Input Iterator
  • Refine Pileup Reader to allow parsing Variants directly
  • Change Window Iterator start position to 1
  • Switch to always using local htslib
  • Disable htslib libcurl requirement

Tree:

  • Fix EMD computation with zero branch lengths
  • Improve tree diameter function for speed and memory efficiency
  • Add Simple Newick Reader and Writer

Utils:

  • Add Interval Tree implementation
  • Add bare Optional class
  • Refactor Lambda Iterator
  • Refine Options guess number of threads
  • Relax binomial coefficient behaviour to allow for larger numbers

v0.25.0

3 years ago

This is a long overdue release that adds a lot of new features, and in particular introduces support for population genetics data, methods, and file formats, and adds an (optional) dependency on htslib.

Important Changes

  • Add (optional) support for VCF files by wrapping htslib (which now is an optional dependency)
  • Add reading support for (m)pileup, GFF/GTF, and PoPoolation2 sync files
  • Add tools to work with alleleic variants (SNPs), genome regions, and sliding windows
  • Add pool-sequencing variants of population genetic statistics, such as heterozygosity, Theta Watterson, Theta Pi, Tajima's D, and variants of F_ST, by re-implementing methods of PoPoolation and PoPoolation2

Notable Changes

  • Add filtering and transforming iterator classes
  • Add harmonic mean functions
  • Add binomial distribution function and binomial coefficient (n choose k) functions
  • Add base64 encoding and decoding functions
  • Add natural sorting function
  • Add simple pure function cache class
  • Add svg image embedding and rendering options
  • Add simple thread pool class
  • Add GzipBlockOStream class
  • Refactor gzip input stream to work on concatenated gzip streams
  • Adapt BmpWriter and SequencePrinter to new OutputTarget classes
  • Fix placement sample PqueryName filtering functions
  • Fix tree bipartition find subtree function
  • Fix bug in tuple hash function
  • Fix undefined behaviour in GzipStream destructor
  • Bug fixes and speed improvements