Skip to content

Releases: tskit-dev/tskit

Python 0.6.4

21 May 18:14
Compare
Choose a tag to compare

Breaking changes

  • TreeSequence.write_vcf now filters non-sample nodes from individuals
    by default, instead of raising an error. These nodes can be included using the
    new include_non_sample_nodes argument.
    By default individual names (sample IDs) in VCF output are now of the form
    tsk_{individual.id} Previously these were always
    "tsk_{j}" for j in range(num_individuals). This may break some downstream
    code if individuals are specified. To fix, manually specify individual_names
    to the required pattern.
    (@benjeffery, #3163)

Features

  • Add TreeSequence.sample_nodes_by_ploidy method to return the sample nodes
    in a tree sequence, grouped by a ploidy value.
    (@benjeffery, #3157)

  • Add TreeSequence.individuals_nodes attribute to return the nodes
    associated with each individual as a numpy array.
    (@benjeffery, #3153)

  • Add shift method to both TableCollection and TreeSequence classes
    allowing the coordinate system to be shifted, and TreeSequence.concatenate
    so a set of tree sequence can be added to the right of an existing one.
    (@hyanwong, #3165, #3164)

  • Add TreeSequence.map_to_vcf_model method to return a mapping of
    the tree sequence to the VCF model.
    (@benjeffery, #3163)

  • Use a thin space as the thousands separator in HTML output,
    and a comma in CLI output.
    (@hossam26644, #3167, #2951)

Fixes

Python 0.6.3

28 Apr 16:12
Compare
Choose a tag to compare

Bugfixes

  • TreeSequence.draw_svg(path=...) was failing due to a missing
    import xml.dom.minidom (@petrelharp, #3144, #3145)

Python 0.6.2

01 Apr 16:55
Compare
Choose a tag to compare

Bugfixes

  • Meatdata.schema was returning a modified schema, this is fixed to return a copy of
    the original schema instead (@benjeffery, #3129, #3130)

Python 0.6.1

31 Mar 15:59
Compare
Choose a tag to compare

Bugfixes

  • Fix to TreeSequence.pair_coalescence_counts output dimension when
    provided with time windows containing no nodes (@nspope,
    #3046, #3058)

  • Fix to TreeSequence.pair_coalescence_counts to normalise by non-missing
    span if span_normalise=True. This resolves a bug where
    TreeSequence.pair_coalescence_rates would return incorrect values for
    intervals with missing trees. (@natep, #3053, #3059)

  • Fix to TreeSequence.pair_coalescence_rates causing an
    assertion to be triggered by floating point error, when all coalescence events are inside a single time window (@natep, #3035, #3038)

Features

  • Add support for fixed-length arrays in metadata struct codec using the length property.
    (@benjeffery, #3088,#3090)

  • Add a new TreeSequence.pca method that uses randomized linear algebra
    to find the top eigenvectors/values of the genetic relatedness matrix
    (@hanbin973, @petrelharp, #3008)

  • Add methods on TreeSequence to efficiently get table metadata as a
    numpy structured array. (@benjeffery, #3098)

  • Add Python 3.13 support (@benjeffery, #3107)

  • Add a preamble argument to draw_svg() methods to allow adding arbitrary extra
    graphics (e.g. legends) to SVG plots (@hyanwong, issue:3086`, #3121)

C API C_1.1.4

31 Mar 15:59
Compare
Choose a tag to compare

Changes

  • Added the TSK_TRACE_ERRORS macro to enable tracing of errors in the C library.
    This is useful for debugging as errors will print to stderr when set.
    (@jeromekelleher, #3095).

Python 0.6.0

16 Oct 15:31
8342e74
Compare
Choose a tag to compare

Breaking Changes

  • The definition of TreeSequence.genetic_relatedness and
    TreeSequence.genetic_relatedness_weighted are changed
    to average over sample sets, rather than summing over them.
    For computation with diploid sample sets, this will change the result
    by a factor of four; for larger sample sets it will now produce
    sensible values that are comparable between sample sets of different sizes.
    The default for these methods is also changed to polarised=True,
    but the output is unchanged for centre=True (the default).
    See the documentation for these methods for more discussion.
    (@petrelharp, @mmosmond, #1623)

Bugfixes

  • Fix to TreeSequence.genetic_relatedness with indexes=None and
    proportion=True. (@petrelharp, #2984, #1623)

  • Fix to TreeSequence.general_stat when using non-strict summary functions
    in the presence of non-ancestral material (very rare).
    (@petrelharp, #2983, #1623)

  • Printing tskit.MetadataSchema(schema=None) now shows "Null_schema" rather
    than None, to avoid confusion (@hyanwong, #2720)

  • Limit output HTML when a tree sequence is displayed that has a large amount of metadata.
    (@benjeffery, #2999)

  • Fix warning in draw_svg to use correct warnings module.
    (@duncanMR, #2870, #2871)

Features

  • Add the centre option to TreeSequence.genetic_relatedness and
    TreeSequence.genetic_relatedness_weighted.
    (@petrelharp, @mmosmond, #1623)

  • Edges now have an .interval attribute returning a tskit.Interval object.
    (@hyanwong, #2531)

  • Variants now have a states() method that returns the genotypes as an
    (inefficient) array of strings, rather than integer indexes, to
    aid comparison of genetic variation (@hyanwong, #2617)

  • Added distance_between that calculates the total distance between two nodes in a tree.
    (@Billyzhang1229, #2771)

  • Added genetic_relatedness_matrix method to compute
    pairwise genetic relatedness between sample sets.
    (@jeromekelleher, @petrelharp, #2823)

  • Add TreeSequence.extend_haplotypes method that extends ancestral haplotypes
    using recombination information, leading to unary nodes in many trees and
    fewer edges. (@petrelharp, @hfr1tz3, :user: nspope,
    @avabamf, #2651, #2938)

  • Add Table.drop_metadata to make clearing metadata from tables easy.
    (@jeromekelleher, #2944)

  • Add Interval.mid and Tree.mid properties to return the midpoint of the interval.
    (@currocam, #2960)

  • Added genetic_relatedness_vector method to compute product of genetic relatedness
    matrix and weight vector.
    (@petrelharp, #2980)

  • Added pair_coalescence_counts method to calculate coalescence events per node or time
    interval, pair_coalescence_quantiles method to estimate quantiles of pair
    coalescence times using empirical CDF inversion, and pair_coalescence_rates method to
    estimate instantaneous rates of pair coalescence within time intervals from the empirical CDF.
    (@nspope, #2915, #2976, #2985)

  • Add provenance information to the HTML notebook representation of a tree sequence.
    (@benjeffery, #3001)

  • The .draw_svg() methods can add annotated genomic regions (e.g. genes) to the
    x-axis. (@hyanwong, #3002)

  • Added a node_titles and a mutation_titles parameter to .draw_svg() methods
    which assigns a string to node and mutation symbols, commonly shown on mouseover. This
    can reduce label clutter while retaining useful info (@hyanwong, #3007)

  • Added (currently undocumented) use of the order parameter in Tree.draw_svg() to
    pass a subset of nodes, so subtrees can be visually collapsed. Additionally, an option
    pack_untracked_polytomies allows large polytomies involving untracked samples to
    be summarised as a dotted line (@hyanwong, #3011 #3010, #3012)

  • Added a title parameter to .draw_svg() methods (@hyanwong, #3015)

  • Add comma separation to all display numbers. (@benjeffery, #3017, #3018)

  • Added Tree.ancestors(u) method. (@hyanwong, #2706, #3021)

  • Add resources section to provenance schema. (@benjeffery, #3016)

  • Add Tree.rf_distance method to calculate the unweighted Robinson-Foulds distance
    between two trees. (@Billyzhang1229, #995, #2643, #3032)

C API C_1.1.3

16 Oct 15:12
8342e74
Compare
Choose a tag to compare

Features

  • Add the tsk_treeseq_extend_haplotypes method that can compress a tree sequence
    by extending edges into adjacent trees and thus creating unary nodes in those
    trees (@petrelharp, @hfr1tze, @avabamf, #2651, #2938).

Python 0.5.8

27 Jun 13:53
Compare
Choose a tag to compare

Python 0.5.7

17 Jun 17:26
Compare
Choose a tag to compare

Breaking Changes

  • The VCF writing methods (ts.write_vcf, ts.as_vcf) now error if a site with
    position zero is encountered. The VCF spec does not allow zero position sites.
    Suppress this error with the allow_position_zero argument.
    (@benjeffery, #2901, #2838)

Bugfixes

  • Fix to the folded, expected allele frequency spectrum (i.e.,
    TreeSequence.allele_frequency_spectrum(mode="branch", polarised=False),
    which was half as big as it should have been. (@petrelharp,
    @nspope, #2933)

Python 0.5.6

10 Oct 10:55
Compare
Choose a tag to compare

Breaking Changes

  • tskit now requires Python 3.8, as Python 3.7 became end-of-life on 2023-06-27

Features

Bugfixes

  • Fix incompatibility with jsonschema>4.18.6 which caused
    AttributeError: module jsonschema has no attribute _validators
    (@benjeffery, #2844, #2840)