Releases: Eventual-Inc/Daft
Releases · Eventual-Inc/Daft
v0.4.8
What's Changed 🚀
✨ Features
- feat: syntactic sugar for Python list and struct gets @kevinzwang (#4027)
- feat: Add a memory-efficient iterator for Series @desmondcheongzx (#4006)
- feat(catalog): adds s3tables iceberg rest endpoint @rchowell (#4018)
- feat: adds gz as gzip alias for encode, decode methods @rchowell (#4020)
- feat: Functions: sign, signum, negative, negate @petern48 (#3941)
- feat(sql): namespace support with in-memory catalog @rchowell (#4013)
- feat(sql): adds show tables statement and documentation @rchowell (#4011)
- feat(catalog): adds native s3tables read and catalog apis @rchowell (#3929)
- feat: offset indices in sparse tensor @itzhakstern (#3725)
- feat: Flight shuffle @colin-ho (#3904)
- feat: daft.range function @universalmind303 (#3956)
- feat: cast using a string type @universalmind303 (#3951)
🐛 Bug Fixes
- fix: Fix join condition swaps when left/right sides swap @desmondcheongzx (#4028)
- fix: Fix boolean expression simplifier @desmondcheongzx (#4016)
- fix: Fix list sort with groupby @desmondcheongzx (#3990)
- fix: datetime deprecation @universalmind303 (#3987)
- fix: Fix incorrect numeric identity optimizations @desmondcheongzx (#3988)
- fix: tutorial code @kevinzwang (#3972)
- fix: allow decimal precision equal to scale @rchowell (#3973)
- fix: Add more retries to sql server connection in test @colin-ho (#3953)
- fix(ci): distributed tpch benchmark @kevinzwang (#3967)
- fix: depend on pylance instead of lancedb @kevinzwang (#3962)
- fix(ci): slack failure notification parameters @kevinzwang (#3952)
- fix: fix error when casting monotonically_increasing_id directly @f4t4nt (#3950)
- fix: Add target dialect when making subquery in read_sql @colin-ho (#3948)
- fix: Count bytes read correctly for local WARC reads @desmondcheongzx (#3946)
- fix: Pass CommitProperties object custom metadata in deltalake @tkauf15k (#3914)
- fix: iceberg table name is a method @rchowell (#3949)
🚀 Performance
- perf: Enable join reordering @colin-ho (#4029)
- perf: Favor smaller relations on the left for join ordering @desmondcheongzx (#4003)
- perf: Refactor selectivity estimates @colin-ho (#4010)
📖 Documentation
- docs: update install instructions for daft-lts and nightly @kevinzwang (#4026)
- docs: Fix s3 tables docs @colin-ho (#4025)
- docs: change all mentions of getdaft -> daft @jaychia (#3986)
- docs: fix docs examples and add missing docs @kevinzwang (#3974)
- docs: initializes sql and data type documentation @rchowell (#3959)
👷 CI
- ci: update distributed tpch benchmark @kevinzwang (#3971)
- ci: fix typo in nightly workflow @kevinzwang (#3968)
- ci: distributed TPC-H benchmarks @kevinzwang (#3961)
🔧 Maintenance
- chore: Track imports on scarf @colin-ho (#4024)
- chore: Upgrade kanal to 0.1 @colin-ho (#4017)
- chore: create in-memory scans using rust arrow arrays @rchowell (#4005)
- chore(dashboard): update Next.js dependency to version 15.2.2 @universalmind303 (#3999)
- chore: add pr template @ccmao1130 (#3981)
- chore: dashboard build cleanup @universalmind303 (#3931)
- chore: fix slack link in readme @kevinzwang (#3966)
- chore: Favor OnceLock over lazy_static for WARC column sizes @desmondcheongzx (#3939)
Full Changelog: v0.4.7...v0.4.8
v0.4.7
What's Changed 🚀
- build: build and publish daft package @kevinzwang (#3913)
- build: bump rust toolchain version @kevinzwang (#3910)
✨ Features
- feat: adds encode and decode for deflate, gzip, zlib @rchowell (#3907)
- feat(catalog): adds catalog ddl actions like create_table and create_namespace @rchowell (#3902)
- feat(sql): adds the 'use' sql session statement @rchowell (#3912)
- feat(catalog): adds append and overwrite to table apis @rchowell (#3889)
- feat(catalog): adds additional table sources for Catalog.from_pydict @rchowell (#3901)
- feat: functions sinh, cosh, tanh @petern48 (#3903)
- feat: Functions log1p and expm1 @petern48 (#3887)
- feat: trig functions csc and sec @petern48 (#3884)
🐛 Bug Fixes
- fix: nightly build and local tpch benchmark workflow @kevinzwang (#3898)
- fix: add retry to getting GCS client config @kevinzwang (#3930)
- fix: bun install in build-wheel.yml @kevinzwang (#3932)
- fix: allow resolving tables at catalog root @rchowell (#3928)
- fix: Don't use
_position_to_field_name
@Fokko (#3917) - fix: write_lance append mode when storage_options required @ascillitoe (#3924)
- fix(dashboard): get dashboard working again @universalmind303 (#3918)
- fix: coalesce panics, supertype handling, and null handling bugs @rchowell (#3908)
- fix: small fix for pyspark+ray. @universalmind303 (#3899)
- fix: map.get on empty dataset @universalmind303 (#3892)
- fix: remove dashboard imports and dep @samster25 (#3888)
🚀 Performance
- perf: Reduce memory consumption for WARC reads and improve estimates @desmondcheongzx (#3935)
📖 Documentation
- docs: adds additional catalog and session documentation @rchowell (#3926)
- docs: add spark connect doc page @universalmind303 (#3919)
- docs: adds a usage doc for catalogs @rchowell (#3878)
- docs: Add documentation for functions module @f4t4nt (#3880)
- docs: remove cairo @ccmao1130 (#3900)
👷 CI
- ci: update all --release workflows @universalmind303 (#3915)
- ci: replace build-artifact-s3 with new workflow, add local tpch benches @kevinzwang (#3864)
🔧 Maintenance
- chore: use ref name instead of ref in tpch bench metadata @kevinzwang (#3937)
- chore: use stdlib importlib.metadata for python>3.9 @kevinzwang (#3916)
- chore: move dashboard in to main project @universalmind303 (#3909)
- chore: make dashboard assets part of build process. @universalmind303 (#3905)
Full Changelog: v0.4.6...v0.4.7
v0.4.6
What's Changed 🚀
✨ Features
- feat: Add WARC reader @desmondcheongzx (#3871)
- feat(functions): add monotonically_increasing_id expression function @f4t4nt (#3838)
- feat: union ops @universalmind303 (#3872)
- feat: Enable capturing and broadcasting logs when running on the
Native
runner @raunakab (#3875) - feat(connect): joins @universalmind303 (#3849)
🐛 Bug Fixes
- fix: Add check for numpy in from_pylist @colin-ho (#3881)
- fix: Fix ray data link @colin-ho (#3874)
- fix: arrow to Series for nested map array @kevinzwang (#3870)
- fix: Add metadata to subgraph options in python @colin-ho (#3869)
- fix: Update dashboard import @raunakab (#3865)
🚀 Performance
- perf: Clear task inputs upon dispatch @colin-ho (#3877)
- perf: Fix join cost estimates @desmondcheongzx (#3831)
Full Changelog: v0.4.5...v0.4.6
v0.4.5
What's Changed 🚀
💥 Breaking Changes
- refactor!: split column expression into unresolved and resolved types @kevinzwang (#3804)
✨ Features
- feat(connect):
daft.pyspark
module @universalmind303 (#3861) - feat: Emit children of join before shuffle + add stats to explain analyze @colin-ho (#3852)
- feat: Stageify plan on shuffle boundaries @colin-ho (#3781)
- feat(sql): adds session sql for leveraging attached catalogs @rchowell (#3860)
- feat(catalog): Cutover deprecated APIs to use session, catalog, table abstractions [3/3] @rchowell (#3830)
- feat(connect): read csv/parquet/json options @universalmind303 (#3791)
- feat(sql): select from multiple joins @kevinzwang (#3842)
- feat(catalog): Integrate session and catalog actions alongside existing APIs [2/3] @rchowell (#3825)
- feat(catalog): Prepare existing catalog APIs for integration [1/3] @rchowell (#3820)
- feat(sql): supports schemas in read_json, read_csv, read_parquet @rchowell (#3836)
- feat(sql): supports array of paths in read_ table-value functions @rchowell (#3835)
- feat: Add a daft dashboard to display queries plans and stats @raunakab (#3790)
🐛 Bug Fixes
- fix: sql round without precision @universalmind303 (#3863)
- fix: pypi publish workflow @kevinzwang (#3862)
- fix: build wheel Github action inputs @kevinzwang (#3858)
- fix: protocol in iceberg writes @colin-ho (#3851)
- fix: LogicalPlan::get_schema_for_alias should stop when it hits any alias @kevinzwang (#3848)
- fix: Reduce number of nodes in random join graph test @desmondcheongzx (#3839)
- fix: Add excludes to broken link checker @colin-ho (#3834)
- fix: Grab Daft config from environment variables for new contexts @desmondcheongzx (#3832)
- fix: create series of np.datetime64['D'] @rchowell (#3829)
🚀 Performance
- perf(optimizer): Infer additional join graph edges during join reordering @desmondcheongzx (#3807)
♻️ Refactor
- refactor!: split column expression into unresolved and resolved types @kevinzwang (#3804)
📖 Documentation
- docs: respect daft analytics env var @ccmao1130 (#3856)
- docs: Update configuration docs to show
set_runner_native
@colin-ho (#3833)
🔧 Maintenance
- chore: replace anaconda with S3 for nightly build publish @kevinzwang (#3857)
- chore: minor cleanup to table-value functions @rchowell (#3854)
- chore: remove accidental printlins @universalmind303 (#3845)
Full Changelog: v0.4.4...v0.4.5
v0.4.4
What's Changed 🚀
- build: update python-publish workflow @ccmao1130 (#3797)
- build(docs): fix docgen failed workflow @ccmao1130 (#3766)
✨ Features
- feat: Adds .summarize() to compute statistics @rchowell (#3810)
- feat(sql): SELECT without FROM @rchowell (#3814)
- feat: Simplify is ins to an OR chain of eqs @colin-ho (#3800)
- feat(session): Adds session class to python @rchowell (#3809)
- feat(session): Replaces direct usage of DaftCatalog with Session @rchowell (#3794)
- feat: Sequentially materialize left and right sides during hash join @colin-ho (#3735)
- feat(connect): add temporal functions @universalmind303 (#3799)
- feat: nulls first kernels @universalmind303 (#3789)
- feat(table): implement list_unique and Set aggregation @f4t4nt (#3710)
- feat: add functions to daft-connect @universalmind303 (#3780)
- feat(catalog): Defines a session for connection state @rchowell (#3782)
- feat: implement bool_and and bool_or @f4t4nt (#3754)
- feat(catalog): Defines an identifier for use across catalogs @rchowell (#3763)
- feat(optimizer): Brute force join ordering @desmondcheongzx (#3688)
- feat(swordfish): Properly buffer unordered scan tasks @colin-ho (#3751)
- feat: better sql datatype support @universalmind303 (#3750)
- feat: Adds list constructor to Expression and SQL APIs @rchowell (#3737)
- feat: spark connect set operations @universalmind303 (#3739)
- feat: add spark explain @universalmind303 (#3741)
🐛 Bug Fixes
- fix: unity managed table reads @pmogren (#3806)
- fix: boolean casts to strings and null propagation @rchowell (#3770)
- fix: catalog table names @universalmind303 (#3760)
🚀 Performance
- perf(swordfish): Parallel expression evaluation @colin-ho (#3593)
- perf: Use parquet metadata from schema inference for accurate scan task statistics @desmondcheongzx (#3784)
♻️ Refactor
- refactor: rename
table
torecordbatch
@universalmind303 (#3771) - refactor: port DaftContext to rust side @universalmind303 (#3767)
- refactor: renames to_struct to just struct @rchowell (#3755)
📖 Documentation
- docs: fix readthedocs build @ccmao1130 (#3824)
- docs: add scarf analytics @ccmao1130 (#3773)
- docs: Update distributed docs to add byoc mode, change name to daft cli @jessie-young (#3768)
- docs: update README.rst diagram @ccmao1130 (#3803)
- docs: update links in readme @ccmao1130 (#3779)
- docs: add footer and update broken links @ccmao1130 (#3764)
👷 CI
- ci: Allow TPCH benchmarks to use ARM cluster profile @desmondcheongzx (#3777)
- ci: Record info for TPCH benchmarks @desmondcheongzx (#3729)
- ci: send slack notification for broken links @ccmao1130 (#3742)
Full Changelog: v0.4.3...v0.4.4
v0.4.3
What's Changed 🚀
✨ Features
- feat: Add a new dashboard UI to Daft @raunakab (#3738)
- feat(shuffles): Determination logic for pre shuffle merge @colin-ho (#3674)
- feat: Limit number of sources in merged scan task @colin-ho (#3695)
- feat: Expose parquet chunk size to swordfish reads @colin-ho (#3714)
- feat: add LiteralValue::Int8 and Int16 @ugoa (#3736)
- feat: with_column(s)_renamed expression for DataFrame @jessie-young (#3732)
- feat: Explain for swordfish @colin-ho (#3667)
- feat: Adds .describe() to DataFrame and DESCRIBE to SQL @rchowell (#3720)
- feat: Add column format option to iter rows @colin-ho (#3681)
- feat(core): make micropartition streamable over tables @universalmind303 (#3709)
- feat(iceberg): Adds support for read_iceberg with metadata_location to Daft-SQL @rchowell (#3701)
- feat: Overwrite partitions mode @colin-ho (#3687)
- feat(docs): Adds copy-to-clipboard to code samples @rchowell (#3702)
- feat(connect): sql @universalmind303 (#3696)
- feat: add binary string operations (length and concatenation) @f4t4nt (#3646)
- feat(sql): Adds url_download and url_upload to daft-sql @rchowell (#3690)
- feat(connect): distinct + sort @universalmind303 (#3677)
- feat(core): Implement null-safe equality operator @f4t4nt (#3663)
- feat(sql): Adds JsonScanBuilder to daft-scan and read_json to daft-sql @rchowell (#3683)
- feat: support using S3Config.credentials_provider for writes @kevinzwang (#3648)
- feat(sql): Adds FROM source check for string paths @rchowell (#3679)
- feat(connect): Rust ray exec @universalmind303 (#3666)
🐛 Bug Fixes
- fix: Set filter selectivity estimate lower bound @colin-ho (#3694)
- fix(join): joining on different types @kevinzwang (#3716)
- fix: to_cnf and to_dnf functions @kevinzwang (#3728)
- fix: pushdowns for unpivot @universalmind303 (#3724)
- fix(optimizer): Fix issues with join graph construction @desmondcheongzx (#3668)
- fix: Run filter null join key optimization once @colin-ho (#3657)
🚀 Performance
- perf: Track accumulated selectivity in logical plan to improve probe side decisions @desmondcheongzx (#3734)
- perf: simplify boolean expression rules @kevinzwang (#3731)
- perf(shuffles): Incrementally retrieve metadata in reduce @colin-ho (#3545)
- perf: Improve stats for join side determination @colin-ho (#3655)
♻️ Refactor
- refactor: remove eyre from daft-connect @universalmind303 (#3719)
- refactor(execution): NativeExecutor refactor @universalmind303 (#3689)
- refactor: logical op constructor+builder boundary @kevinzwang (#3684)
- refactor(connect): internal refactoring to make connect code more organized & extensible @universalmind303 (#3680)
📖 Documentation
- docs: linked mkdocs & api docs @ccmao1130 (#3703)
- docs: higher quality daft diagram for readme @ccmao1130 (#3697)
- docs: add daft launcher docs to docs v2 @ccmao1130 (#3678)
👷 CI
- ci: skip tests during publishing of release @jaychia (#3744)
- ci: Allow upstream git refs to be used for benchmarking @desmondcheongzx (#3730)
- ci: Remove daft tracing @raunakab (#3692)
- ci: Add new benchmarking cluster profile @raunakab (#3665)
🔧 Maintenance
- chore: Pin sql server version in docker compose @colin-ho (#3715)
- chore(connect): better error propagation & handling @universalmind303 (#3675)
- chore(connect): consolidate multiple files in tests/connect @universalmind303 (#3676)
Full Changelog: v0.4.2...v0.4.3
v0.4.2
What's Changed 🚀
- build: Publish A Long Term Support CPU Release of Daft @samster25 (#3650)
✨ Features
- feat(connect):
printSchema
@andrewgazelka (#3617) - feat: Allow building probe table for either side of anti semi joins @colin-ho (#3643)
- feat(optimizer): Add join reordering as an optimizer rule @desmondcheongzx (#3642)
- feat(swordfish): Memory manager @colin-ho (#3599)
- feat(scantask-2): Implement new module for splitting Parquet ScanTask @jaychia (#3628)
- feat(scantask-1): add a config flag for new scantask splitting algorithm @jaychia (#3615)
- feat: Support intersect all and except distinct/all in DataFrame API @advancedxy (#3537)
- feat: support new PyIceberg IO properties and custom IOConfig in write_iceberg @kevinzwang (#3633)
- feat(expressions): Extend Expression.url.upload() to support row-specific URLs @desmondcheongzx (#3518)
🐛 Bug Fixes
- fix: special characters in GCS urls @kevinzwang (#3651)
- fix(swordfish): Track future poll times for explain analyze @colin-ho (#3511)
👷 CI
🔧 Maintenance
- chore: update PyO3 version to 0.23 @kevinzwang (#3647)
- chore: Fix parquet benchmark test @colin-ho (#3632)
- chore: Clean up join order iteration @desmondcheongzx (#3638)
⬆️ Dependencies
- build(deps-dev): bump moto[s3,server] from 5.0.21 to 5.0.26 @dependabot (#3640)
Full Changelog: v0.4.1...v0.4.2
v0.4.1
What's Changed 🚀
✨ Features
- feat(optimizer): Implement naive join ordering @desmondcheongzx (#3616)
- feat(connect): add more unresolved functions @andrewgazelka (#3618)
- feat(connect):
with_columns_renamed
@andrewgazelka (#3386) - feat(connect): read/write → csv, write → json @andrewgazelka (#3361)
🐛 Bug Fixes
🚀 Performance
- perf(optimizer): convert filter predicate to CNF to push through join @kevinzwang (#3623)
📖 Documentation
- docs: daft documentation v2 @ccmao1130 (#3595)
✅ Tests
- test(connect): verify
show()
output @andrewgazelka (#3610)
👷 CI
- ci: Output results in a CSV format @raunakab (#3625)
- ci: Add build step to run-cluster @raunakab (#3606)
🔧 Maintenance
- chore: Build progress bar only on first update @colin-ho (#3626)
- chore: Fix csv benchmark test @colin-ho (#3631)
Full Changelog: v0.4.0...v0.4.1
v0.4.0
What's Changed 🚀
💥 Breaking Changes
- feat: Default native runner @colin-ho (#3608)
- chore!: upgrade Ray pins and pyarrow pins @jaychia (#3612)
- chore!: drop support for Python 3.8 @kevinzwang (#3592)
- chore!: remove pyarrow-based file reader @kevinzwang (#3587)
✨ Features
- feat: Default native runner @colin-ho (#3608)
- feat(swordfish): Progress Bar @colin-ho (#3571)
- feat(connect): df.show @universalmind303 (#3560)
- feat(connect): support
DdlParse
@andrewgazelka (#3580) - feat(swordfish): Optimize grouped aggregations @colin-ho (#3534)
- feat(swordfish): Enable left/right joins to build probe table on either side @colin-ho (#3548)
- feat: Add DataType inference from Python types @jaychia (#3555)
- feat(shuffles): Locality aware pre shuffle merge @colin-ho (#3505)
- feat: Implement count-distinct for sql @raunakab (#3553)
- feat(connect): add drop support @andrewgazelka (#3345)
- feat: support for basic subquery execution @kevinzwang (#3536)
- feat(connect): add
df.filter
@andrewgazelka (#3346) - feat: Make serialization code not unwrap and panic on failures @raunakab (#3546)
- feat: Unity Catalog writes using
daft.DataFrame.write_deltalake()
@anilmenon14 (#3522) - feat(connect): add parquet support @andrewgazelka (#3360)
- feat: Add iterators to more types @raunakab (#3539)
- feat(optimizer): Add scaffolding to create join graphs from logical plans @desmondcheongzx (#3501)
- feat(tpcds-benchmarking): Add basic tpcds benchmarking for local testing @raunakab (#3509)
- feat(list): add fixed-size list support for value_counts @andrewgazelka (#3521)
- feat(parquet): Limit parallel tasks in remote parquet reader @colin-ho (#3490)
- feat(parquet): Target parquet writes by size bytes instead of rows @colin-ho (#3457)
- feat: cross join @kevinzwang (#3437)
- [FEAT] connect: remove excessive warnings from spark connect @universalmind303 (#3499)
- [CHORE] connect, test:
df.withColumn
@andrewgazelka (#3359) - [FEAT]: expr simplifier @universalmind303 (#3393)
- [FEAT] shuffle testing @raunakab (#3492)
- [FEAT]: add
coalesce
to dataframe and SQL @universalmind303 (#3482) - [FEAT] add register-table helper to sql-catalog @chuanlei-coding (#2837)
- [FEAT] Respect resource request for projections in swordfish @colin-ho (#3460)
- [FEAT] Enable Actor Pool UDFs by default @kevinzwang (#3488)
- [FEAT] connect: add modulus operator and withColumns support @andrewgazelka (#3351)
- [FEAT] connect: createDataFrame @andrewgazelka (#3363)
- [FEAT] Support parquet RLE decoding for booleans @desmondcheongzx (#3477)
- [FEAT] Cap parallelism on local parquet reader @colin-ho (#3310)
- [FEAT] connect: add binary operators @andrewgazelka (#3350)
- [FEAT] connect: support basic column operations @andrewgazelka (#3362)
- [FEAT] extend
build-commit
workflow to support different compile-archs @raunakab (#3459) - [FEAT] Add
count-distinct
aggregation @raunakab (#3455)
🐛 Bug Fixes
- fix(udf): udf call with empty table and batch size @kevinzwang (#3604)
- fix: use arrow's schema instead of spark's for local rel @universalmind303 (#3602)
- fix: guard concurrent extension datatype setting with a lock @jaychia (#3589)
- fix(parquet): Fix parquet reads of required fields nested within optional fields @desmondcheongzx (#3598)
- fix: boolean and/or expressions with null @kevinzwang (#3544)
- fix(run-cluster-workflow): Add null check when parsing metadata @raunakab (#3507)
- fix(tpcds): fix bugs in tpcds datagen script @universalmind303 (#3495)
- [BUG] Fix build commit workflow @raunakab (#3487)
- [BUG]: dont panic on count(distinct) @universalmind303 (#3481)
- [BUG] Block on parquet schema future in estimate_size_bytes @colin-ho (#3484)
🚀 Performance
- perf: filter null join key optimization rule @kevinzwang (#3583)
- perf: lazily import pyiceberg and unity catalog if available @jaychia (#3565)
♻️ Refactor
- refactor: allow InMemory to take in non python based entries @universalmind303 (#3554)
- refactor: create a rust based
PartitionSet
@universalmind303 (#3515) - refactor(swordfish): Generic broadcast state bridge @colin-ho (#3508)
📖 Documentation
- docs: update tpch benchmark link @ccmao1130 (#3542)
- docs: Enable Linting of docstrings @samster25 (#3506)
- [FEAT] Enable Actor Pool UDFs by default @kevinzwang (#3488)
✅ Tests
- test(connect): add more tests for
createDataFrame
@andrewgazelka (#3607) - test: Add more size estimation tests from our s3 bucket @jaychia (#3514)
👷 CI
- ci: Always download logs @jaychia (#3588)
- ci: Add ability to array-ify args and run multiple jobs @raunakab (#3584)
- ci: Add "build" label type to accepted PR titles @raunakab (#3541)
- ci: add a tool to launch workloads on cluster @jaychia (#3516)
- ci(release-drafter): use conventional commit labels @andrewgazelka (#3503)
🔧 Maintenance
- chore!: upgrade Ray pins and pyarrow pins @jaychia (#3612)
- chore: add warning for native runner @jaychia (#3613)
- chore!: drop support for Python 3.8 @kevinzwang (#3592)
- chore!: remove pyarrow-based file reader @kevinzwang (#3587)
- chore: Fix ordering in sql tests + pin docker images in read_sql tests @colin-ho (#3596)
- chore: move symbolic and boolean algebra code into new crate @kevinzwang (#3570)
- [CHORE] use conventional commits @andrewgazelka (#3493)
- [CHORE] connect, test:
df.withColumn
@andrewgazelka (#3359) - [CHORE] Add tests for parquet size estimations @jaychia (#3405)
- [CHORE] Move all python wrapping logic to separate module @raunakab (#3458)
Full Changelog: v0.3.15...v0.3.16
v0.3.15
Changes
✨ New Features
- [FEAT] run cluster on commit @raunakab (#3461)
- [FEAT]: Support
.clip
function @conradsoon (#3136) - [FEAT] Add cluster profiles @raunakab (#3426)
- [FEAT] add pyiceberg 0.8.0 support @rongfengliang (#3448)
- [FEAT] migrate schema inference → async, block at py boundary @andrewgazelka (#3432)
- [CHORE] connect:
df.schema
@andrewgazelka (#3353) - [CHORE] connect test:
df.get_attr
@andrewgazelka (#3349) - [FEAT] Get native execution enablement from DAFT_RUNNER @desmondcheongzx (#3409)
- [FEAT] Add ability to download log files from ray-cluster @raunakab (#3406)
- [FEAT] Add ability to run arbitrary command on a set working directory @raunakab (#3404)
- [FEAT] Add steps to spin up, submit job, and spin down ray clusters @raunakab (#3403)
- [CHORE] connect: add tests for
df.take()
method @andrewgazelka (#3385) - [FEAT] Create new run workflow @raunakab (#3402)
- [FEAT] Enable group by keys in aggregation expressions @kevinzwang (#3399)
- [FEAT] Build release python wheels and upload to AWS S3 @raunakab (#3398)
- [FEAT] connect: Add support for
select
@andrewgazelka (#3344) - [FEAT] connect: add
df.limit
anddf.first
@andrewgazelka (#3309) - [FEAT] connect:
to_daft_*
use ref instead of value @andrewgazelka (#3355) - [FEAT] connect: add alias support @andrewgazelka (#3342)
- [FEAT] Filter predicates in SQL join @kevinzwang (#3371)
- [FEAT] connect: collect @andrewgazelka (#3326)
🚀 Performance Improvements
- [PERF] Improve hash table probe side decisions for Swordfish @desmondcheongzx (#3327)
👾 Bug Fixes
- [BUG] Fix extension type display @jaychia (#3456)
- [BUG] Remove enum imports from match statements @raunakab (#3436)
- [BUG] Explicitly set IO config in unity catalog load table @colin-ho (#3453)
- [BUG] Include storage options in lance write commit @colin-ho (#3451)
- [BUG] Replace semicolons in filenames with underscore @raunakab (#3430)
- [BUG] Terminate nodes instead of stopping them @raunakab (#3427)
- [BUG] Fix run-cluster passing in environment variables wrongly @jaychia (#3422)
📖 Documentation
- [FEAT]: Support
.clip
function @conradsoon (#3136) - [DOCS] Shorten union of Literals @desmondcheongzx (#3449)
- [DOCS] Add missing list expression entries @desmondcheongzx (#3428)
🧰 Maintenance
- [CHORE] Add warning in PyRunner to switch to Native @colin-ho (#3472)
- [CHORE] Address comments on previous PR @raunakab (#3473)
- [CHORE] Write tpch parquet files one at a time @colin-ho (#3396)
- [CHORE] Remove CountMode and ResourceRequest from public API @desmondcheongzx (#3429)
- [CHORE] Add schemas for remaining local plan ops @colin-ho (#3446)
- [CHORE] Put empty table when building probe table @colin-ho (#3445)
- [CHORE] Explain block_on function in common-runtime @colin-ho (#3442)
- [CHORE] connect:
df.schema
@andrewgazelka (#3353) - [CHORE] Update execution config to turn on Ray tracing @jaychia (#3431)
- [CHORE] connect test:
df.get_attr
@andrewgazelka (#3349) - [CHORE] Cleanup ExprResolver @kevinzwang (#3401)
- [CHORE] connect: add tests for
df.take()
method @andrewgazelka (#3385) - [CHORE] Change IOConfig to be serialized into binary instead of JSON @kevinzwang (#3400)
- [CHORE] Pin PyIceberg version to <0.8 @kevinzwang (#3391)
- [CHORE] Add TPC-H queries in SQL @kevinzwang (#3392)
- [CHORE] connect: Optimize plans in connect @colin-ho (#3378)
- [CHORE] delete empty file xyz @andrewgazelka (#3370)
⬆️ Dependencies
14 changes
- Bump orjson from 3.10.11 to 3.10.12 @dependabot (#3464)
- Bump grpcio from 1.67.0 to 1.68.1 @dependabot (#3465)
- Bump arrow-buffer from 51.0.0 to 53.3.0 @dependabot (#3467)
- Bump regex-syntax from 0.7.5 to 0.8.4 @dependabot (#3468)
- Bump memmap2 from 0.9.4 to 0.9.5 @dependabot (#3470)
- Bump image from 0.25.4 to 0.25.5 @dependabot (#3471)
- Bump bytes from 1.7.1 to 1.8.0 @dependabot (#3411)
- Bump astral-sh/setup-uv from 3 to 4 @dependabot (#3410)
- Bump serde_json from 1.0.124 to 1.0.133 @dependabot (#3413)
- Bump sample-arrow2 from 0.1.0 to 0.17.2 @dependabot (#3414)
- Bump chrono-tz from 0.8.6 to 0.10.0 @dependabot (#3415)
- Bump azure-storage-blob from 12.17.0 to 12.24.0 @dependabot (#3416)
- Bump opencv-python from 4.8.1.78 to 4.10.0.84 @dependabot (#3417)
- Bump sqlalchemy from 2.0.25 to 2.0.36 @dependabot (#3418)