Releases: tskit-dev/tskit
Python 0.4.1
Bugfix release
Changes
TableCollection.name_map
has been deprecated in favour oftable_name_map
.
(@benjeffery, #1981, #2086)
Fixes
-
TreeSequence.dump_text
now prints decoded metadata if there is a schema.
(@benjeffery, #1860, #1527) -
Add missing
ReferenceSequence.__eq__
method.
(@benjeffery, #2063, #2085)
Python 0.4.0
Major Python release
Breaking changes
-
The
Tree.num_nodes
method is now deprecated with a warning, because it confusingly
returns the number of nodes in the entire tree sequence, rather than in the tree. Text
summaries of trees (e.g.str(tree)
) now return the number of nodes in the tree,
not in the entire tree sequence (@hyanwong, #1966 #1968) -
The CLI
info
command now gives more detailed information on the tree sequence
(@benjeffery, #1611) -
64 bits are now used to store the sizes of ragged table columns such as metadata,
allowing them to hold more data. This change is fully backwards and forwards compatible
for all tree-sequences whose ragged column sizes fit into 32 bits. New tree-sequences with
large offset arrays that require 64 bits will fail to load in previous versions with
error_tskit.FileFormatError: An incompatible type for a column was found in the file
.
(@jeromekelleher, #343, #1527, #1528, #1530,
#1554, #1573, #1589,#1598,#1628, #1571,
#1579, #1585, #1590, #1602, #1618, #1620, #1652). -
The Tree class now conceptually has an extra node, the "virtual root" whose
children are the roots of the tree. The quintuply linked tree arrays
(parent_array, left_child_array, right_child_array, left_sib_array and right_sib_array)
all have one extra element.
(@jeromekelleher, #1691, #1704). -
Tree traversal orders returned by the
nodes
method have changed when there
are multiple roots. Previously orders were defined locally for each root, but
are now globally across all roots. (@jeromekelleher, #1704). -
Individuals are no longer guaranteed or required to be topologically sorted in a tree sequence.
TableCollection.sort
no longer sorts individuals.
(@benjeffery, #1774, #1789) -
Metadata encoding errors now raise
MetadataEncodingError
(@benjeffery, #1505, #1827). -
For
TreeSequence.samples
all arguments afterpopulation
are now keyword only
(@benjeffery, #1715, #1831). -
Remove the method
TreeSequence.to_nexus
and replace withTreeSequence.as_nexus
.
As the old method was not generating standards-compliant output, it seems unlikely
that it was used by anyone. Calls toto_nexus
will result in a
NotImplementedError, informing users of the change. See below for details on
as_nexus
. -
Change default value for
missing_data_char
in theTreeSequence.haplotypes
method from "-" to "N". This is a more idiomatic usage to indicate
missing data rather than a gap in an alignment. (@jeromekelleher,
#1893, #1894)
Features
-
Add the
ibd_segments
method and associated classes to compute, summarise
and store segments of identity by descent from a tree sequence
(@gtsambos, @jeromekelleher). -
Allow skipping of site and mutation tables in
TableCollection.sort
(@benjeffery, #1475, #1826). -
Add
TableCollection.sort_individuals
to sort the individuals as this is no longer done by the
default sort (@benjeffery, #1774, #1789). -
Add
__setitem__
to all tables allowing single rows to be updated. For example
tables.nodes[0] = tables.nodes[0].replace(flags=tskit.NODE_IS_SAMPLE)
(@jeromekelleher, @benjeffery, #1545, #1600). -
Added a new parameter
time
toTreeSequence.samples()
allowing to select
samples at a specific time point or time interval.
(@mufernando, @petrelharp, #1692, #1700) -
Add
table.metadata_vector
to all table classes to allow easy extraction of a single
metadata key into an array
(@petrelharp, #1676, #1690). -
Add
time_units
toTreeSequence
to describe the units of the time dimension of the
tree sequence. This is then used to generate an error iftime_units
isuncalibrated
when
using the branch lengths in statistics. (@benjeffery, #1644, #1760, #1832) -
Add the
virtual_root
property to the Tree class (@jeromekelleher, #1704). -
Add the
num_edges
property to the Tree class (@jeromekelleher, #1704). -
Improved performance for tree traversal methods in the
nodes
iterator.
Roughly a 10X performance increase for "preorder", "postorder", "timeasc"
and "timedesc" (@jeromekelleher, #1704). -
Substantial performance improvement for
Tree.total_branch_length
(@jeromekelleher, #1794 #1799) -
Add the
discrete_genome
property to the TreeSequence class which is true if
all coordinates are discrete (@jeromekelleher, #1144, #1819) -
Add a
random_nucleotides
function. (user:jeromekelleher
, #1825) -
Add the
TreeSequence.alignments
method. (user:jeromekelleher
, #1825) -
Add alignment export in the FASTA and nexus formats using the
TreeSequence.write_nexus
andTreeSequence.write_fasta
methods.
(@jeromekelleher, @hyanwong, #1894) -
Add the
discrete_time
property to the TreeSequence class which is true if
all time coordinates are discrete or unknown (@benjeffery, #1839, #1890) -
Add the
skip_tables
option toload
to support only loading
top-level information from a file. Also add theignore_tables
option to
TableCollection.equals
andTableCollection.assert_equals
to
compare only top-level information. (@clwgg, #1882, #1854). -
Add the
skip_reference_sequence
option toload
. Also add the
ignore_reference_sequence
optionequals
to compare two table
collections without comparing their reference sequence. (@clwgg,
#2019, #1971). -
tskit now supports python 3.10 (@benjeffery, #1895, #1949)
Fixes
-
dump_tables
omitted individual parents. (@benjeffery, #1828, #1884) -
Add the
Tree.as_newick
method and deprecate ...
Python 0.4.0 BETA 1
BETA RELEASE
- Install with
pip install --pre tskit
- Please report any issues.
Breaking changes
-
The
Tree.num_nodes
method is now deprecated with a warning, because it confusingly
returns the number of nodes in the entire tree sequence, rather than in the tree. Text
summaries of trees (e.g.str(tree)
) now return the number of nodes in the tree,
not in the entire tree sequence (@hyanwong, #1966 #1968) -
The CLI
info
command now gives more detailed information on the tree sequence
(@benjeffery, #1611) -
64 bits are now used to store the sizes of ragged table columns such as metadata,
allowing them to hold more data. This change is fully backwards and forwards compatible
for all tree-sequences whose ragged column sizes fit into 32 bits. New tree-sequences with
large offset arrays that require 64 bits will fail to load in previous versions with
error_tskit.FileFormatError: An incompatible type for a column was found in the file
.
(@jeromekelleher, #343, #1527, #1528, #1530,
#1554, #1573, #1589,#1598,#1628, #1571,
#1579, #1585, #1590, #1602, #1618, #1620, #1652). -
The Tree class now conceptually has an extra node, the "virtual root" whose
children are the roots of the tree. The quintuply linked tree arrays
(parent_array, left_child_array, right_child_array, left_sib_array and right_sib_array)
all have one extra element.
(@jeromekelleher, #1691, #1704). -
Tree traversal orders returned by the
nodes
method have changed when there
are multiple roots. Previously orders were defined locally for each root, but
are now globally across all roots. (@jeromekelleher, #1704). -
Individuals are no longer guaranteed or required to be topologically sorted in a tree sequence.
TableCollection.sort
no longer sorts individuals.
(@benjeffery, #1774, #1789) -
Metadata encoding errors now raise
MetadataEncodingError
(@benjeffery, #1505, #1827). -
For
TreeSequence.samples
all arguments afterpopulation
are now keyword only
(@benjeffery, #1715, #1831). -
Remove the method
TreeSequence.to_nexus
and replace withTreeSequence.as_nexus
.
As the old method was not generating standards-compliant output, it seems unlikely
that it was used by anyone. Calls toto_nexus
will result in a
NotImplementedError, informing users of the change. See below for details on
as_nexus
. -
Change default value for
missing_data_char
in theTreeSequence.haplotypes
method from "-" to "N". This is a more idiomatic usage to indicate
missing data rather than a gap in an alignment. (@jeromekelleher,
#1893, #1894)
Features
-
Allow skipping of site and mutation tables in
TableCollection.sort
(@benjeffery, #1475, #1826). -
Add
TableCollection.sort_individuals
to sort the individuals as this is no longer done by the
default sort (@benjeffery, #1774, #1789). -
Add
__setitem__
to all tables allowing single rows to be updated. For example
tables.nodes[0] = tables.nodes[0].replace(flags=tskit.NODE_IS_SAMPLE)
(@jeromekelleher, @benjeffery, #1545, #1600). -
Added a new parameter
time
toTreeSequence.samples()
allowing to select
samples at a specific time point or time interval.
(@mufernando, @petrelharp, #1692, #1700) -
Add
table.metadata_vector
to all table classes to allow easy extraction of a single
metadata key into an array
(@petrelharp, #1676, #1690). -
Add
time_units
toTreeSequence
to describe the units of the time dimension of the
tree sequence. This is then used to generate an error iftime_units
isuncalibrated
when
using the branch lengths in statistics. (@benjeffery, #1644, #1760, #1832) -
Add the
virtual_root
property to the Tree class (@jeromekelleher, #1704). -
Add the
num_edges
property to the Tree class (@jeromekelleher, #1704). -
Improved performance for tree traversal methods in the
nodes
iterator.
Roughly a 10X performance increase for "preorder", "postorder", "timeasc"
and "timedesc" (@jeromekelleher, #1704). -
Substantial performance improvement for
Tree.total_branch_length
(@jeromekelleher, #1794 #1799) -
Add the
discrete_genome
property to the TreeSequence class which is true if
all coordinates are discrete (@jeromekelleher, #1144, #1819) -
Add a
random_nucleotides
function. (user:jeromekelleher
, #1825) -
Add the
TreeSequence.alignments
method. (user:jeromekelleher
, #1825) -
Add alignment export in the FASTA and nexus formats using the
TreeSequence.write_nexus
andTreeSequence.write_fasta
methods.
(@jeromekelleher, @hyanwong, #1894) -
Add the
discrete_time
property to the TreeSequence class which is true if
all time coordinates are discrete or unknown (@benjeffery, #1839, #1890) -
Add the
skip_tables
option toload
to support only loading
top-level information from a file. Also add theignore_tables
option to
TableCollection.equals
andTableCollection.assert_equals
to
compare only top-level information. (@clwgg, #1882, #1854). -
Add the
skip_reference_sequence
option toload
. Also add the
ignore_reference_sequence
optionequals
to compare two table
collections without comparing their reference sequence. (@clwgg,
#2019, #1971). -
tskit now supports python 3.10 (@benjeffery, #1895, #1949)
Fixes
-
dump_tables
omitted individual parents. (@benjeffery, #1828, #1884) -
Add the
Tree.as_newick
method and deprecateTree.newick
. The
as_newick
method by default labels samples with the pattern"n{node_id}"
which is much more useful that the behaviour ofTree.newick
(which mimics
...
C API 0.99.15
Breaking changes
-
The
tables
argument totsk_treeseq_init
is no longerconst
, to allow for future no-copy tree sequence creation.
(@benjeffery, #1718, #1719) -
Additional consistency checks for mutation tables are now run by
tsk_table_collection_check_integrity
even whenTSK_CHECK_MUTATION_ORDERING
is not passed in. (@petrelharp, #1713, #1722) -
num_tracked_samples
andnum_samples
intsk_tree_t
are now typed astsk_size_t
(@benjeffery, #1723, #1727) -
The previously deprecated option
TSK_SAMPLE_COUNTS
has been removed. (@benjeffery, #1744, #1761). -
Individuals are no longer guaranteed or required to be topologically sorted in a tree sequence.
tsk_table_collection_sort
no longer sorts individuals.
(@benjeffery, #1774, #1789) -
The
tsk_tree_t.left_root
member has been removed. Client code can be updated
most easily by using the equivalenttsk_tree_get_left_root
function. However,
it may be worth considering updating code to use either the standard traversal
functions (which automatically iterate over roots) or to use thevirtual_root
member (which may lead to more concise code). (@jeromekelleher, #1796,
#1862) -
Rename
tsk_tree_t.left
andtsk_tree_t.right
members to
tsk_tree_t.interval.left
andtsk_tree_t.interval.right
respectively.
(@jeromekelleher, #1686, #1913) -
kastore
is now vendored into this repo instead of being a git submodule. Developers need to run
git submodule update
. (@jeromekelleher, #1687, #1973) -
Tree
arrays such asleft_sib
,right_child
etc. now have an additional
"virtual root" node at the end. (@jeromekelleher, #1691, #1704) -
num_samples
,num_tracked_samples
,marked
andmark
have been removed from
tsk_tree_t
. (@jeromekelleher, #1936)
Features
-
Add
tsk_table_collection_individual_topological_sort
to sort the individuals as this is no longer done by the
default sort. (@benjeffery, #1774, #1789) -
The default behaviour for table size growth is now to double the current size of the table,
up to a threshold. To keep the previous behaviour, use (e.g.)
tsk_edge_table_set_max_rows_increment(tables->edges, 1024)
, which results in adding
space for 1024 additional rows each time we run out of space in the edge table.
(@benjeffery, #5, #1683) -
tsk_table_collection_check_integrity
now has aTSK_CHECK_MIGRATION_ORDERING
flag. (@petrelharp, #1722) -
The default behaviour for ragged column growth is now to double the current size of the column,
up to a threshold. To keep the previous behaviour, use (e.g.)
tsk_node_table_set_max_metadata_length_increment(tables->nodes, 1024)
, which results in adding
space for 1024 additional entries each time we run out of space in the ragged column.
(@benjeffery, #1703, #1709) -
Support for compiling the C library on Windows using msys2 (@jeromekelleher,
#1742). -
Add
time_units
totsk_table_collection_t
to describe the units of the time dimension of the
tree sequence. This is then used to geerate an error iftime_units
isuncalibrated
when
using the branch lengths in statistics. (@benjeffery, #1644, #1760) -
Add the
TSK_LOAD_SKIP_TABLES
option to load just the top-level information from a
file. Also add theTSK_CMP_IGNORE_TABLES
option to compare only the top-level
information in two table collections. (@clwgg, #1882, #1854). -
Add reference sequence.
(@jeromekelleher, @benjeffery, #146, #1911, #1944, #1911) -
Add the
TSK_LOAD_SKIP_REFERENCE_SEQUENCE
option to load a table collection
without the reference sequence. Also add the TSK_CMP_IGNORE_REFERENCE_SEQUENCE
option to compare two table collections without comparing their reference
sequence. (@clwgg, #2019, #1971). -
Add a "virtual root" to
Tree
arrays such asleft_sib
,right_child
etc.
The virtual root is appended to each array, has all real roots as its children,
but is not the parent of any node. Simplifies traversal algorithms.
(@jeromekelleher, #1691, #1704) -
Add
num_edges
totsk_tree_t
to count the edges that define the topology of
the tree. (@jeromekelleher, #1704) -
Add the
tsk_tree_get_size_bound
function which returns an upper bound on the number of nodes reachable from
the roots of a tree. Useful for tree stack allocations (@jeromekelleher, #1704).
C API 0.99.14
Breaking changes
- 64 bits are now used to store the sizes of ragged table columns such as metadata,
allowing them to hold more data. As suchtsk_size_t
is now 64 bits wide.
This change is fully backwards and forwards compatible for all tree-sequences whose
ragged column sizes fit into 32 bits. New tree-sequences with
large offset arrays that require 64 bits will fail to load in previous versions with
errorTSK_ERR_BAD_COLUMN_TYPE
.
(@jeromekelleher, #343, #1527, #1528, #1530,
#1554, #1573, #1589,#1598,#1628, #1571,
#1579, #1585, #1590, #1602, #1618, #1620, #1652).
Features
- Add
tsk_X_table_update_row
methods which allow modifying single rows of tables
(@jeromekelleher, #1545, #1552).
Python 0.3.7
Minor release
Features
map_mutations
now allows the ancestral state to be specified
(@hyanwong, @jeromekelleher, #1542, #1550)
Fixes
- Fix segfault when very large columns overflow
(@bhaller, @benjeffery, #1509, #1511).
C API 0.99.13
Bugfix release
Fixes
- Fix segfault when very large columns overflow
(@bhaller, @benjeffery, #1509, #1511).
Python 0.3.6
Minor feature release
Notebook detailing the new features here: https://gist.github.com/benjeffery/aff619fd8da6799bccab81655c391965
Breaking changes
Mutation.position
andMutation.index
which were deprecated in 0.2.2 (Sep '19) have
been removed.
Features
-
Add direct, copy-free access to the arrays representing the quintuply-linked structure
ofTree
(e.g.left_child_array
). Allows performant algorithms over the tree
structure using, for example, numba
(@jeromekelleher, #1299, #1320). -
Add fancy indexing to tables. E.g.
table[6:86]
returns a new table with the
specified rows. Supports slices, index arrays and boolean masks
(@benjeffery, #1221, #1348, #1342). -
Add
Table.append
method for adding rows from classes such asSiteTableRow
and
Site
(@benjeffery, #1111, #1254). -
SVG visualization of a tree sequence can be restricted to displaying between left
and right genomic coordinates using thex_lim
parameter. The default settings
now mean that if the left or right flanks of a tree sequence are entirely empty,
these regions will not be plotted in the SVG (@hyanwong, #1288). -
SVG visualization of a single tree allows all mutations on an edge to be plotted
via theall_edge_mutations
param (@hyanwong,#1253, #1258). -
Entity classes such as
Mutation
,Node
are now python dataclasses
(@benjeffery, #1261). -
Metadata decoding for table row access is now lazy (@benjeffery, #1261).
-
Add html notebook representation for
Tree
and changeTree.__str__
from dict
representation to info table. (@benjeffery, #1269, #1304). -
Improve display of tables when
print
ed, limiting lines set via
tskit.set_print_options
(@benjeffery,#1270, #1300). -
Add
Table.assert_equals
andTableCollection.assert_equals
which give an exact
report of any differences. (@benjeffery,#1076, #1328)
Changes
- In drawing methods
max_tree_height
andtree_height_scale
have been deprecated
in favour ofmax_time
andtime_scale
(@benjeffery,#1262, #1331).
Fixes
- Tree sequences were not properly init'd after unpickling
(@benjeffery, #1297, #1298)
C API 0.99.12
Minor feature release
Breaking changes
- Removed
TSK_NO_BUILD_INDEXES
.
Not building indexes is now the default behaviour oftsk_table_collection_dump
and related functions.
(@molpopgen, #1327, #1337).
Features
- Add
tsk_*_table_extend
methods to append to a table from another
(@benjeffery, #1271, #1287)
Python 0.3.5
Breaking changes
- tskit now requires Python 3.7 (@benjeffery, #1235)
Features
-
SVG visualization plots mutations at the correct time, if it exists, and a y-axis,
with label can be drawn. Both x- and y-axes can be plotted on trees as well as
tree sequences (@hyanwong,#840, #580, #1236) -
SVG visualization now uses squares for sample nodes and red crosses for mutations,
with the site/mutation positions marked on the x-axis. Additionally, an x-axis
label can be set (@hyanwong,#1155, #1194, #1182, #1213) -
Add
parents
column to the individual table to allow recording of pedigrees
(@ivan-krukov, @benjeffery, #852, #1125, #866, #1153, #1177, #1192 #1199). -
Added
Tree.generate_random_binary
static method to create random
binary trees (@hyanwong, @jeromekelleher, #1037). -
Change the default behaviour of Tree.split_polytomies to generate
the shortest possible branch lengths instead of a fixed epsilon of
1e-10. (@jeromekelleher, #1089, #1090) -
Default value metadata in
add_row
functions is now schema-dependant, so that
metadata={}
is no longer needed as an argument when a schema is present
(@benjeffery, #1084). -
default
in metadata schemas is used to fill in missing values when encoding for
the struct codec. (@benjeffery, #1073, #1116). -
Added
canonical
option to table collection sorting (@mufernando,
@petrelharp, #705) -
Added various arguments to
TreeSequence.subset
, to allow for stable
population indexing and lossless node reordering with subset.
(@petrelharp, #1097)
Changes
-
Allow mutations that have the same derived state as their parent mutation.
(@benjeffery, #1180, #1233) -
File minor version change to support individual parents