Releases: tskit-dev/tskit
Python 0.6.4
Breaking changes
TreeSequence.write_vcf
now filters non-sample nodes from individuals
by default, instead of raising an error. These nodes can be included using the
newinclude_non_sample_nodes
argument.
By default individual names (sample IDs) in VCF output are now of the form
tsk_{individual.id}
Previously these were always
"tsk_{j}" for j in range(num_individuals)
. This may break some downstream
code if individuals are specified. To fix, manually specifyindividual_names
to the required pattern.
(@benjeffery, #3163)
Features
-
Add
TreeSequence.sample_nodes_by_ploidy
method to return the sample nodes
in a tree sequence, grouped by a ploidy value.
(@benjeffery, #3157) -
Add
TreeSequence.individuals_nodes
attribute to return the nodes
associated with each individual as a numpy array.
(@benjeffery, #3153) -
Add
shift
method to bothTableCollection
andTreeSequence
classes
allowing the coordinate system to be shifted, andTreeSequence.concatenate
so a set of tree sequence can be added to the right of an existing one.
(@hyanwong, #3165, #3164) -
Add
TreeSequence.map_to_vcf_model
method to return a mapping of
the tree sequence to the VCF model.
(@benjeffery, #3163) -
Use a thin space as the thousands separator in HTML output,
and a comma in CLI output.
(@hossam26644, #3167, #2951)
Fixes
- Correct assertion message when tables are compared with metadata ignored.
(@benjeffery, #3162, #3161)
Python 0.6.3
Bugfixes
TreeSequence.draw_svg(path=...)
was failing due to a missing
import xml.dom.minidom
(@petrelharp, #3144, #3145)
Python 0.6.2
Bugfixes
- Meatdata.schema was returning a modified schema, this is fixed to return a copy of
the original schema instead (@benjeffery, #3129, #3130)
Python 0.6.1
Bugfixes
-
Fix to
TreeSequence.pair_coalescence_counts
output dimension when
provided with time windows containing no nodes (@nspope,
#3046, #3058) -
Fix to
TreeSequence.pair_coalescence_counts
to normalise by non-missing
span ifspan_normalise=True
. This resolves a bug where
TreeSequence.pair_coalescence_rates
would return incorrect values for
intervals with missing trees. (@natep, #3053, #3059) -
Fix to
TreeSequence.pair_coalescence_rates
causing an
assertion to be triggered by floating point error, when all coalescence events are inside a single time window (@natep, #3035, #3038)
Features
-
Add support for fixed-length arrays in metadata struct codec using the
length
property.
(@benjeffery, #3088,#3090) -
Add a new
TreeSequence.pca
method that uses randomized linear algebra
to find the top eigenvectors/values of the genetic relatedness matrix
(@hanbin973, @petrelharp, #3008) -
Add methods on
TreeSequence
to efficiently get table metadata as a
numpy structured array. (@benjeffery, #3098) -
Add Python 3.13 support (@benjeffery, #3107)
-
Add a
preamble
argument todraw_svg()
methods to allow adding arbitrary extra
graphics (e.g. legends) to SVG plots (@hyanwong,issue:
3086`, #3121)
C API C_1.1.4
Changes
- Added the TSK_TRACE_ERRORS macro to enable tracing of errors in the C library.
This is useful for debugging as errors will print to stderr when set.
(@jeromekelleher, #3095).
Python 0.6.0
Breaking Changes
- The definition of
TreeSequence.genetic_relatedness
and
TreeSequence.genetic_relatedness_weighted
are changed
to average over sample sets, rather than summing over them.
For computation with diploid sample sets, this will change the result
by a factor of four; for larger sample sets it will now produce
sensible values that are comparable between sample sets of different sizes.
The default for these methods is also changed topolarised=True
,
but the output is unchanged forcentre=True
(the default).
See the documentation for these methods for more discussion.
(@petrelharp, @mmosmond, #1623)
Bugfixes
-
Fix to
TreeSequence.genetic_relatedness
withindexes=None
and
proportion=True
. (@petrelharp, #2984, #1623) -
Fix to
TreeSequence.general_stat
when using non-strict summary functions
in the presence of non-ancestral material (very rare).
(@petrelharp, #2983, #1623) -
Printing
tskit.MetadataSchema(schema=None)
now shows"Null_schema"
rather
thanNone
, to avoid confusion (@hyanwong, #2720) -
Limit output HTML when a tree sequence is displayed that has a large amount of metadata.
(@benjeffery, #2999) -
Fix warning in
draw_svg
to use correct warnings module.
(@duncanMR, #2870, #2871)
Features
-
Add the
centre
option toTreeSequence.genetic_relatedness
and
TreeSequence.genetic_relatedness_weighted
.
(@petrelharp, @mmosmond, #1623) -
Edges now have an
.interval
attribute returning atskit.Interval
object.
(@hyanwong, #2531) -
Variants now have a
states()
method that returns the genotypes as an
(inefficient) array of strings, rather than integer indexes, to
aid comparison of genetic variation (@hyanwong, #2617) -
Added
distance_between
that calculates the total distance between two nodes in a tree.
(@Billyzhang1229, #2771) -
Added
genetic_relatedness_matrix
method to compute
pairwise genetic relatedness between sample sets.
(@jeromekelleher, @petrelharp, #2823) -
Add
TreeSequence.extend_haplotypes
method that extends ancestral haplotypes
using recombination information, leading to unary nodes in many trees and
fewer edges. (@petrelharp, @hfr1tz3, :user:nspope
,
@avabamf, #2651, #2938) -
Add
Table.drop_metadata
to make clearing metadata from tables easy.
(@jeromekelleher, #2944) -
Add
Interval.mid
andTree.mid
properties to return the midpoint of the interval.
(@currocam, #2960) -
Added
genetic_relatedness_vector
method to compute product of genetic relatedness
matrix and weight vector.
(@petrelharp, #2980) -
Added
pair_coalescence_counts
method to calculate coalescence events per node or time
interval,pair_coalescence_quantiles
method to estimate quantiles of pair
coalescence times using empirical CDF inversion, andpair_coalescence_rates
method to
estimate instantaneous rates of pair coalescence within time intervals from the empirical CDF.
(@nspope, #2915, #2976, #2985) -
Add provenance information to the HTML notebook representation of a tree sequence.
(@benjeffery, #3001) -
The
.draw_svg()
methods can add annotated genomic regions (e.g. genes) to the
x-axis. (@hyanwong, #3002) -
Added a
node_titles
and amutation_titles
parameter to.draw_svg()
methods
which assigns a string to node and mutation symbols, commonly shown on mouseover. This
can reduce label clutter while retaining useful info (@hyanwong, #3007) -
Added (currently undocumented) use of the
order
parameter inTree.draw_svg()
to
pass a subset of nodes, so subtrees can be visually collapsed. Additionally, an option
pack_untracked_polytomies
allows large polytomies involving untracked samples to
be summarised as a dotted line (@hyanwong, #3011 #3010, #3012) -
Added a
title
parameter to.draw_svg()
methods (@hyanwong, #3015) -
Add comma separation to all display numbers. (@benjeffery, #3017, #3018)
-
Add
resources
section to provenance schema. (@benjeffery, #3016) -
Add
Tree.rf_distance
method to calculate the unweighted Robinson-Foulds distance
between two trees. (@Billyzhang1229, #995, #2643, #3032)
C API C_1.1.3
Features
- Add the
tsk_treeseq_extend_haplotypes
method that can compress a tree sequence
by extending edges into adjacent trees and thus creating unary nodes in those
trees (@petrelharp, @hfr1tze, @avabamf, #2651, #2938).
Python 0.5.8
- Add support for numpy 2 (@jeromekelleher, @benjeffery, #2964)
Python 0.5.7
Breaking Changes
- The VCF writing methods (
ts.write_vcf
,ts.as_vcf
) now error if a site with
position zero is encountered. The VCF spec does not allow zero position sites.
Suppress this error with theallow_position_zero
argument.
(@benjeffery, #2901, #2838)
Bugfixes
- Fix to the folded, expected allele frequency spectrum (i.e.,
TreeSequence.allele_frequency_spectrum(mode="branch", polarised=False)
,
which was half as big as it should have been. (@petrelharp,
@nspope, #2933)
Python 0.5.6
Breaking Changes
- tskit now requires Python 3.8, as Python 3.7 became end-of-life on 2023-06-27
Features
-
Tree.trmca
now accepts >2 nodes and returns nicer errors
(@hyanwong, :pr:2808, #2801, #2070, #2611) -
Add
TreeSequence.genetic_relatedness_weighted
stats method.
(@petrelharp, @brieuclehmann, @jeromekelleher,
#2785, #1246) -
Add
TreeSequence.impute_unknown_mutations_time
method to return an
array of mutation times based on the times of associated nodes
(@duncanMR, #2760, #2758) -
Add
asdict
to all dataclasses. These are returned when you access a row or
other tree sequence object. (@benjeffery, #2759, #2719)
Bugfixes
- Fix incompatibility with
jsonschema>4.18.6
which caused
AttributeError: module jsonschema has no attribute _validators
(@benjeffery, #2844, #2840)