Releases · Eventual-Inc/Daft

21 Mar 03:56

github-actions

v0.4.8

3a75f24

v0.4.8 Latest

Latest

What's Changed 🚀

✨ Features

feat: syntactic sugar for Python list and struct gets @kevinzwang (#4027)
feat: Add a memory-efficient iterator for Series @desmondcheongzx (#4006)
feat(catalog): adds s3tables iceberg rest endpoint @rchowell (#4018)
feat: adds gz as gzip alias for encode, decode methods @rchowell (#4020)
feat: Functions: sign, signum, negative, negate @petern48 (#3941)
feat(sql): namespace support with in-memory catalog @rchowell (#4013)
feat(sql): adds show tables statement and documentation @rchowell (#4011)
feat(catalog): adds native s3tables read and catalog apis @rchowell (#3929)
feat: offset indices in sparse tensor @itzhakstern (#3725)
feat: Flight shuffle @colin-ho (#3904)
feat: daft.range function @universalmind303 (#3956)
feat: cast using a string type @universalmind303 (#3951)

🐛 Bug Fixes

fix: Fix join condition swaps when left/right sides swap @desmondcheongzx (#4028)
fix: Fix boolean expression simplifier @desmondcheongzx (#4016)
fix: Fix list sort with groupby @desmondcheongzx (#3990)
fix: datetime deprecation @universalmind303 (#3987)
fix: Fix incorrect numeric identity optimizations @desmondcheongzx (#3988)
fix: tutorial code @kevinzwang (#3972)
fix: allow decimal precision equal to scale @rchowell (#3973)
fix: Add more retries to sql server connection in test @colin-ho (#3953)
fix(ci): distributed tpch benchmark @kevinzwang (#3967)
fix: depend on pylance instead of lancedb @kevinzwang (#3962)
fix(ci): slack failure notification parameters @kevinzwang (#3952)
fix: fix error when casting monotonically_increasing_id directly @f4t4nt (#3950)
fix: Add target dialect when making subquery in read_sql @colin-ho (#3948)
fix: Count bytes read correctly for local WARC reads @desmondcheongzx (#3946)
fix: Pass CommitProperties object custom metadata in deltalake @tkauf15k (#3914)
fix: iceberg table name is a method @rchowell (#3949)

🚀 Performance

perf: Enable join reordering @colin-ho (#4029)
perf: Favor smaller relations on the left for join ordering @desmondcheongzx (#4003)
perf: Refactor selectivity estimates @colin-ho (#4010)

📖 Documentation

docs: update install instructions for daft-lts and nightly @kevinzwang (#4026)
docs: Fix s3 tables docs @colin-ho (#4025)
docs: change all mentions of getdaft -> daft @jaychia (#3986)
docs: fix docs examples and add missing docs @kevinzwang (#3974)
docs: initializes sql and data type documentation @rchowell (#3959)

👷 CI

ci: update distributed tpch benchmark @kevinzwang (#3971)
ci: fix typo in nightly workflow @kevinzwang (#3968)
ci: distributed TPC-H benchmarks @kevinzwang (#3961)

🔧 Maintenance

chore: Track imports on scarf @colin-ho (#4024)
chore: Upgrade kanal to 0.1 @colin-ho (#4017)
chore: create in-memory scans using rust arrow arrays @rchowell (#4005)
chore(dashboard): update Next.js dependency to version 15.2.2 @universalmind303 (#3999)
chore: add pr template @ccmao1130 (#3981)
chore: dashboard build cleanup @universalmind303 (#3931)
chore: fix slack link in readme @kevinzwang (#3966)
chore: Favor OnceLock over lazy_static for WARC column sizes @desmondcheongzx (#3939)

Full Changelog: v0.4.7...v0.4.8

Contributors

rchowell, desmondcheongzx, and 9 other contributors

Assets 2

08 Mar 02:19

github-actions

v0.4.7

c7df611

v0.4.7

What's Changed 🚀

build: build and publish daft package @kevinzwang (#3913)
build: bump rust toolchain version @kevinzwang (#3910)

✨ Features

feat: adds encode and decode for deflate, gzip, zlib @rchowell (#3907)
feat(catalog): adds catalog ddl actions like create_table and create_namespace @rchowell (#3902)
feat(sql): adds the 'use' sql session statement @rchowell (#3912)
feat(catalog): adds append and overwrite to table apis @rchowell (#3889)
feat(catalog): adds additional table sources for Catalog.from_pydict @rchowell (#3901)
feat: functions sinh, cosh, tanh @petern48 (#3903)
feat: Functions log1p and expm1 @petern48 (#3887)
feat: trig functions csc and sec @petern48 (#3884)

🐛 Bug Fixes

fix: nightly build and local tpch benchmark workflow @kevinzwang (#3898)
fix: add retry to getting GCS client config @kevinzwang (#3930)
fix: bun install in build-wheel.yml @kevinzwang (#3932)
fix: allow resolving tables at catalog root @rchowell (#3928)
fix: Don't use _position_to_field_name @Fokko (#3917)
fix: write_lance append mode when storage_options required @ascillitoe (#3924)
fix(dashboard): get dashboard working again @universalmind303 (#3918)
fix: coalesce panics, supertype handling, and null handling bugs @rchowell (#3908)
fix: small fix for pyspark+ray. @universalmind303 (#3899)
fix: map.get on empty dataset @universalmind303 (#3892)
fix: remove dashboard imports and dep @samster25 (#3888)

🚀 Performance

perf: Reduce memory consumption for WARC reads and improve estimates @desmondcheongzx (#3935)

📖 Documentation

docs: adds additional catalog and session documentation @rchowell (#3926)
docs: add spark connect doc page @universalmind303 (#3919)
docs: adds a usage doc for catalogs @rchowell (#3878)
docs: Add documentation for functions module @f4t4nt (#3880)
docs: remove cairo @ccmao1130 (#3900)

👷 CI

ci: update all --release workflows @universalmind303 (#3915)
ci: replace build-artifact-s3 with new workflow, add local tpch benches @kevinzwang (#3864)

🔧 Maintenance

chore: use ref name instead of ref in tpch bench metadata @kevinzwang (#3937)
chore: use stdlib importlib.metadata for python>3.9 @kevinzwang (#3916)
chore: move dashboard in to main project @universalmind303 (#3909)
chore: make dashboard assets part of build process. @universalmind303 (#3905)

Full Changelog: v0.4.6...v0.4.7

Contributors

Fokko, samster25, and 8 other contributors

Assets 2

0 Join discussion

01 Mar 05:59

github-actions

v0.4.6

fb89b2a

v0.4.6

What's Changed 🚀

✨ Features

feat: Add WARC reader @desmondcheongzx (#3871)
feat(functions): add monotonically_increasing_id expression function @f4t4nt (#3838)
feat: union ops @universalmind303 (#3872)
feat: Enable capturing and broadcasting logs when running on the Native runner @raunakab (#3875)
feat(connect): joins @universalmind303 (#3849)

🐛 Bug Fixes

fix: Add check for numpy in from_pylist @colin-ho (#3881)
fix: Fix ray data link @colin-ho (#3874)
fix: arrow to Series for nested map array @kevinzwang (#3870)
fix: Add metadata to subgraph options in python @colin-ho (#3869)
fix: Update dashboard import @raunakab (#3865)

🚀 Performance

perf: Clear task inputs upon dispatch @colin-ho (#3877)
perf: Fix join cost estimates @desmondcheongzx (#3831)

Full Changelog: v0.4.5...v0.4.6

Contributors

desmondcheongzx, f4t4nt, and 4 other contributors

Assets 2

0 Join discussion

27 Feb 01:06

github-actions

v0.4.5

d468589

v0.4.5

What's Changed 🚀

💥 Breaking Changes

refactor!: split column expression into unresolved and resolved types @kevinzwang (#3804)

✨ Features

feat(connect): daft.pyspark module @universalmind303 (#3861)
feat: Emit children of join before shuffle + add stats to explain analyze @colin-ho (#3852)
feat: Stageify plan on shuffle boundaries @colin-ho (#3781)
feat(sql): adds session sql for leveraging attached catalogs @rchowell (#3860)
feat(catalog): Cutover deprecated APIs to use session, catalog, table abstractions [3/3] @rchowell (#3830)
feat(connect): read csv/parquet/json options @universalmind303 (#3791)
feat(sql): select from multiple joins @kevinzwang (#3842)
feat(catalog): Integrate session and catalog actions alongside existing APIs [2/3] @rchowell (#3825)
feat(catalog): Prepare existing catalog APIs for integration [1/3] @rchowell (#3820)
feat(sql): supports schemas in read_json, read_csv, read_parquet @rchowell (#3836)
feat(sql): supports array of paths in read_ table-value functions @rchowell (#3835)
feat: Add a daft dashboard to display queries plans and stats @raunakab (#3790)

🐛 Bug Fixes

fix: sql round without precision @universalmind303 (#3863)
fix: pypi publish workflow @kevinzwang (#3862)
fix: build wheel Github action inputs @kevinzwang (#3858)
fix: protocol in iceberg writes @colin-ho (#3851)
fix: LogicalPlan::get_schema_for_alias should stop when it hits any alias @kevinzwang (#3848)
fix: Reduce number of nodes in random join graph test @desmondcheongzx (#3839)
fix: Add excludes to broken link checker @colin-ho (#3834)
fix: Grab Daft config from environment variables for new contexts @desmondcheongzx (#3832)
fix: create series of np.datetime64['D'] @rchowell (#3829)

🚀 Performance

perf(optimizer): Infer additional join graph edges during join reordering @desmondcheongzx (#3807)

♻️ Refactor

refactor!: split column expression into unresolved and resolved types @kevinzwang (#3804)

📖 Documentation

docs: respect daft analytics env var @ccmao1130 (#3856)
docs: Update configuration docs to show set_runner_native @colin-ho (#3833)

🔧 Maintenance

chore: replace anaconda with S3 for nightly build publish @kevinzwang (#3857)
chore: minor cleanup to table-value functions @rchowell (#3854)
chore: remove accidental printlins @universalmind303 (#3845)

Full Changelog: v0.4.4...v0.4.5

Contributors

rchowell, desmondcheongzx, and 5 other contributors

Assets 2

19 Feb 02:20

github-actions

v0.4.4

47fde8a

v0.4.4

What's Changed 🚀

build: update python-publish workflow @ccmao1130 (#3797)
build(docs): fix docgen failed workflow @ccmao1130 (#3766)

✨ Features

feat: Adds .summarize() to compute statistics @rchowell (#3810)
feat(sql): SELECT without FROM @rchowell (#3814)
feat: Simplify is ins to an OR chain of eqs @colin-ho (#3800)
feat(session): Adds session class to python @rchowell (#3809)
feat(session): Replaces direct usage of DaftCatalog with Session @rchowell (#3794)
feat: Sequentially materialize left and right sides during hash join @colin-ho (#3735)
feat(connect): add temporal functions @universalmind303 (#3799)
feat: nulls first kernels @universalmind303 (#3789)
feat(table): implement list_unique and Set aggregation @f4t4nt (#3710)
feat: add functions to daft-connect @universalmind303 (#3780)
feat(catalog): Defines a session for connection state @rchowell (#3782)
feat: implement bool_and and bool_or @f4t4nt (#3754)
feat(catalog): Defines an identifier for use across catalogs @rchowell (#3763)
feat(optimizer): Brute force join ordering @desmondcheongzx (#3688)
feat(swordfish): Properly buffer unordered scan tasks @colin-ho (#3751)
feat: better sql datatype support @universalmind303 (#3750)
feat: Adds list constructor to Expression and SQL APIs @rchowell (#3737)
feat: spark connect set operations @universalmind303 (#3739)
feat: add spark explain @universalmind303 (#3741)

🐛 Bug Fixes

fix: unity managed table reads @pmogren (#3806)
fix: boolean casts to strings and null propagation @rchowell (#3770)
fix: catalog table names @universalmind303 (#3760)

🚀 Performance

perf(swordfish): Parallel expression evaluation @colin-ho (#3593)
perf: Use parquet metadata from schema inference for accurate scan task statistics @desmondcheongzx (#3784)

♻️ Refactor

refactor: rename table to recordbatch @universalmind303 (#3771)
refactor: port DaftContext to rust side @universalmind303 (#3767)
refactor: renames to_struct to just struct @rchowell (#3755)

📖 Documentation

docs: fix readthedocs build @ccmao1130 (#3824)
docs: add scarf analytics @ccmao1130 (#3773)
docs: Update distributed docs to add byoc mode, change name to daft cli @jessie-young (#3768)
docs: update README.rst diagram @ccmao1130 (#3803)
docs: update links in readme @ccmao1130 (#3779)
docs: add footer and update broken links @ccmao1130 (#3764)

👷 CI

ci: Allow TPCH benchmarks to use ARM cluster profile @desmondcheongzx (#3777)
ci: Record info for TPCH benchmarks @desmondcheongzx (#3729)
ci: send slack notification for broken links @ccmao1130 (#3742)

Full Changelog: v0.4.3...v0.4.4

Contributors

pmogren, rchowell, and 6 other contributors

Assets 2

31 Jan 03:49

github-actions

v0.4.3

2d0d8b2

v0.4.3

What's Changed 🚀

build(docs): Adds make docs phony target @rchowell (#3693)

✨ Features

feat: Add a new dashboard UI to Daft @raunakab (#3738)
feat(shuffles): Determination logic for pre shuffle merge @colin-ho (#3674)
feat: Limit number of sources in merged scan task @colin-ho (#3695)
feat: Expose parquet chunk size to swordfish reads @colin-ho (#3714)
feat: add LiteralValue::Int8 and Int16 @ugoa (#3736)
feat: with_column(s)_renamed expression for DataFrame @jessie-young (#3732)
feat: Explain for swordfish @colin-ho (#3667)
feat: Adds .describe() to DataFrame and DESCRIBE to SQL @rchowell (#3720)
feat: Add column format option to iter rows @colin-ho (#3681)
feat(core): make micropartition streamable over tables @universalmind303 (#3709)
feat(iceberg): Adds support for read_iceberg with metadata_location to Daft-SQL @rchowell (#3701)
feat: Overwrite partitions mode @colin-ho (#3687)
feat(docs): Adds copy-to-clipboard to code samples @rchowell (#3702)
feat(connect): sql @universalmind303 (#3696)
feat: add binary string operations (length and concatenation) @f4t4nt (#3646)
feat(sql): Adds url_download and url_upload to daft-sql @rchowell (#3690)
feat(connect): distinct + sort @universalmind303 (#3677)
feat(core): Implement null-safe equality operator @f4t4nt (#3663)
feat(sql): Adds JsonScanBuilder to daft-scan and read_json to daft-sql @rchowell (#3683)
feat: support using S3Config.credentials_provider for writes @kevinzwang (#3648)
feat(sql): Adds FROM source check for string paths @rchowell (#3679)
feat(connect): Rust ray exec @universalmind303 (#3666)

🐛 Bug Fixes

fix: Set filter selectivity estimate lower bound @colin-ho (#3694)
fix(join): joining on different types @kevinzwang (#3716)
fix: to_cnf and to_dnf functions @kevinzwang (#3728)
fix: pushdowns for unpivot @universalmind303 (#3724)
fix(optimizer): Fix issues with join graph construction @desmondcheongzx (#3668)
fix: Run filter null join key optimization once @colin-ho (#3657)

🚀 Performance

perf: Track accumulated selectivity in logical plan to improve probe side decisions @desmondcheongzx (#3734)
perf: simplify boolean expression rules @kevinzwang (#3731)
perf(shuffles): Incrementally retrieve metadata in reduce @colin-ho (#3545)
perf: Improve stats for join side determination @colin-ho (#3655)

♻️ Refactor

refactor: remove eyre from daft-connect @universalmind303 (#3719)
refactor(execution): NativeExecutor refactor @universalmind303 (#3689)
refactor: logical op constructor+builder boundary @kevinzwang (#3684)
refactor(connect): internal refactoring to make connect code more organized & extensible @universalmind303 (#3680)

📖 Documentation

docs: linked mkdocs & api docs @ccmao1130 (#3703)
docs: higher quality daft diagram for readme @ccmao1130 (#3697)
docs: add daft launcher docs to docs v2 @ccmao1130 (#3678)

👷 CI

ci: skip tests during publishing of release @jaychia (#3744)
ci: Allow upstream git refs to be used for benchmarking @desmondcheongzx (#3730)
ci: Remove daft tracing @raunakab (#3692)
ci: Add new benchmarking cluster profile @raunakab (#3665)

🔧 Maintenance

chore: Pin sql server version in docker compose @colin-ho (#3715)
chore(connect): better error propagation & handling @universalmind303 (#3675)
chore(connect): consolidate multiple files in tests/connect @universalmind303 (#3676)

Full Changelog: v0.4.2...v0.4.3

Contributors

ugoa, rchowell, and 9 other contributors

Assets 2

0 Join discussion

09 Jan 01:00

github-actions

v0.4.2

43bbbeb

v0.4.2

What's Changed 🚀

build: Publish A Long Term Support CPU Release of Daft @samster25 (#3650)

✨ Features

feat(connect): printSchema @andrewgazelka (#3617)
feat: Allow building probe table for either side of anti semi joins @colin-ho (#3643)
feat(optimizer): Add join reordering as an optimizer rule @desmondcheongzx (#3642)
feat(swordfish): Memory manager @colin-ho (#3599)
feat(scantask-2): Implement new module for splitting Parquet ScanTask @jaychia (#3628)
feat(scantask-1): add a config flag for new scantask splitting algorithm @jaychia (#3615)
feat: Support intersect all and except distinct/all in DataFrame API @advancedxy (#3537)
feat: support new PyIceberg IO properties and custom IOConfig in write_iceberg @kevinzwang (#3633)
feat(expressions): Extend Expression.url.upload() to support row-specific URLs @desmondcheongzx (#3518)

🐛 Bug Fixes

fix: special characters in GCS urls @kevinzwang (#3651)
fix(swordfish): Track future poll times for explain analyze @colin-ho (#3511)

👷 CI

ci: Improve visualization of tpcds + tpch benchmarking outputs @raunakab (#3654)

🔧 Maintenance

chore: update PyO3 version to 0.23 @kevinzwang (#3647)
chore: Fix parquet benchmark test @colin-ho (#3632)
chore: Clean up join order iteration @desmondcheongzx (#3638)

⬆️ Dependencies

build(deps-dev): bump moto[s3,server] from 5.0.21 to 5.0.26 @dependabot (#3640)

Full Changelog: v0.4.1...v0.4.2

Contributors

advancedxy, samster25, and 7 other contributors

Assets 2

21 Dec 06:48

github-actions

v0.4.1

1c0f780

v0.4.1

What's Changed 🚀

✨ Features

feat(optimizer): Implement naive join ordering @desmondcheongzx (#3616)
feat(connect): add more unresolved functions @andrewgazelka (#3618)
feat(connect): with_columns_renamed @andrewgazelka (#3386)
feat(connect): read/write → csv, write → json @andrewgazelka (#3361)

🐛 Bug Fixes

fix: unity catalog import from write_deltalake @jaychia (#3630)

🚀 Performance

perf(optimizer): convert filter predicate to CNF to push through join @kevinzwang (#3623)

📖 Documentation

docs: daft documentation v2 @ccmao1130 (#3595)

✅ Tests

test(connect): verify show()output @andrewgazelka (#3610)

👷 CI

ci: Output results in a CSV format @raunakab (#3625)
ci: Add build step to run-cluster @raunakab (#3606)

🔧 Maintenance

chore: Build progress bar only on first update @colin-ho (#3626)
chore: Fix csv benchmark test @colin-ho (#3631)

Full Changelog: v0.4.0...v0.4.1

Contributors

andrewgazelka, desmondcheongzx, and 5 other contributors

Assets 2

19 Dec 08:34

github-actions

v0.4.0

a76f800

v0.4.0

What's Changed 🚀

build: Use uv for maturin builds instead @raunakab (#3540)

💥 Breaking Changes

feat: Default native runner @colin-ho (#3608)
chore!: upgrade Ray pins and pyarrow pins @jaychia (#3612)
chore!: drop support for Python 3.8 @kevinzwang (#3592)
chore!: remove pyarrow-based file reader @kevinzwang (#3587)

✨ Features

feat: Default native runner @colin-ho (#3608)
feat(swordfish): Progress Bar @colin-ho (#3571)
feat(connect): df.show @universalmind303 (#3560)
feat(connect): support DdlParse @andrewgazelka (#3580)
feat(swordfish): Optimize grouped aggregations @colin-ho (#3534)
feat(swordfish): Enable left/right joins to build probe table on either side @colin-ho (#3548)
feat: Add DataType inference from Python types @jaychia (#3555)
feat(shuffles): Locality aware pre shuffle merge @colin-ho (#3505)
feat: Implement count-distinct for sql @raunakab (#3553)
feat(connect): add drop support @andrewgazelka (#3345)
feat: support for basic subquery execution @kevinzwang (#3536)
feat(connect): add df.filter @andrewgazelka (#3346)
feat: Make serialization code not unwrap and panic on failures @raunakab (#3546)
feat: Unity Catalog writes using daft.DataFrame.write_deltalake() @anilmenon14 (#3522)
feat(connect): add parquet support @andrewgazelka (#3360)
feat: Add iterators to more types @raunakab (#3539)
feat(optimizer): Add scaffolding to create join graphs from logical plans @desmondcheongzx (#3501)
feat(tpcds-benchmarking): Add basic tpcds benchmarking for local testing @raunakab (#3509)
feat(list): add fixed-size list support for value_counts @andrewgazelka (#3521)
feat(parquet): Limit parallel tasks in remote parquet reader @colin-ho (#3490)
feat(parquet): Target parquet writes by size bytes instead of rows @colin-ho (#3457)
feat: cross join @kevinzwang (#3437)
[FEAT] connect: remove excessive warnings from spark connect @universalmind303 (#3499)
[CHORE] connect, test: df.withColumn @andrewgazelka (#3359)
[FEAT]: expr simplifier @universalmind303 (#3393)
[FEAT] shuffle testing @raunakab (#3492)
[FEAT]: add coalesce to dataframe and SQL @universalmind303 (#3482)
[FEAT] add register-table helper to sql-catalog @chuanlei-coding (#2837)
[FEAT] Respect resource request for projections in swordfish @colin-ho (#3460)
[FEAT] Enable Actor Pool UDFs by default @kevinzwang (#3488)
[FEAT] connect: add modulus operator and withColumns support @andrewgazelka (#3351)
[FEAT] connect: createDataFrame @andrewgazelka (#3363)
[FEAT] Support parquet RLE decoding for booleans @desmondcheongzx (#3477)
[FEAT] Cap parallelism on local parquet reader @colin-ho (#3310)
[FEAT] connect: add binary operators @andrewgazelka (#3350)
[FEAT] connect: support basic column operations @andrewgazelka (#3362)
[FEAT] extend build-commit workflow to support different compile-archs @raunakab (#3459)
[FEAT] Add count-distinct aggregation @raunakab (#3455)

🐛 Bug Fixes

fix(udf): udf call with empty table and batch size @kevinzwang (#3604)
fix: use arrow's schema instead of spark's for local rel @universalmind303 (#3602)
fix: guard concurrent extension datatype setting with a lock @jaychia (#3589)
fix(parquet): Fix parquet reads of required fields nested within optional fields @desmondcheongzx (#3598)
fix: boolean and/or expressions with null @kevinzwang (#3544)
fix(run-cluster-workflow): Add null check when parsing metadata @raunakab (#3507)
fix(tpcds): fix bugs in tpcds datagen script @universalmind303 (#3495)
[BUG] Fix build commit workflow @raunakab (#3487)
[BUG]: dont panic on count(distinct) @universalmind303 (#3481)
[BUG] Block on parquet schema future in estimate_size_bytes @colin-ho (#3484)

🚀 Performance

perf: filter null join key optimization rule @kevinzwang (#3583)
perf: lazily import pyiceberg and unity catalog if available @jaychia (#3565)

♻️ Refactor

refactor: allow InMemory to take in non python based entries @universalmind303 (#3554)
refactor: create a rust based PartitionSet @universalmind303 (#3515)
refactor(swordfish): Generic broadcast state bridge @colin-ho (#3508)

📖 Documentation

docs: update tpch benchmark link @ccmao1130 (#3542)
docs: Enable Linting of docstrings @samster25 (#3506)
[FEAT] Enable Actor Pool UDFs by default @kevinzwang (#3488)

✅ Tests

test(connect): add more tests for createDataFrame @andrewgazelka (#3607)
test: Add more size estimation tests from our s3 bucket @jaychia (#3514)

👷 CI

ci: Always download logs @jaychia (#3588)
ci: Add ability to array-ify args and run multiple jobs @raunakab (#3584)
ci: Add "build" label type to accepted PR titles @raunakab (#3541)
ci: add a tool to launch workloads on cluster @jaychia (#3516)
ci(release-drafter): use conventional commit labels @andrewgazelka (#3503)

🔧 Maintenance

chore!: upgrade Ray pins and pyarrow pins @jaychia (#3612)
chore: add warning for native runner @jaychia (#3613)
chore!: drop support for Python 3.8 @kevinzwang (#3592)
chore!: remove pyarrow-based file reader @kevinzwang (#3587)
chore: Fix ordering in sql tests + pin docker images in read_sql tests @colin-ho (#3596)
chore: move symbolic and boolean algebra code into new crate @kevinzwang (#3570)
[CHORE] use conventional commits @andrewgazelka (#3493)
[CHORE] connect, test: df.withColumn @andrewgazelka (#3359)
[CHORE] Add tests for parquet size estimations @jaychia (#3405)
[CHORE] Move all python wrapping logic to separate module @raunakab (#3458)

Full Changelog: v0.3.15...v0.3.16

Contributors

samster25, andrewgazelka, and 9 other contributors

Assets 2

0 Join discussion

02 Dec 20:29

github-actions

v0.3.15

465510f

v0.3.15

Changes

✨ New Features

[FEAT] run cluster on commit @raunakab (#3461)
[FEAT]: Support .clip function @conradsoon (#3136)
[FEAT] Add cluster profiles @raunakab (#3426)
[FEAT] add pyiceberg 0.8.0 support @rongfengliang (#3448)
[FEAT] migrate schema inference → async, block at py boundary @andrewgazelka (#3432)
[CHORE] connect: df.schema @andrewgazelka (#3353)
[CHORE] connect test: df.get_attr @andrewgazelka (#3349)
[FEAT] Get native execution enablement from DAFT_RUNNER @desmondcheongzx (#3409)
[FEAT] Add ability to download log files from ray-cluster @raunakab (#3406)
[FEAT] Add ability to run arbitrary command on a set working directory @raunakab (#3404)
[FEAT] Add steps to spin up, submit job, and spin down ray clusters @raunakab (#3403)
[CHORE] connect: add tests for df.take() method @andrewgazelka (#3385)
[FEAT] Create new run workflow @raunakab (#3402)
[FEAT] Enable group by keys in aggregation expressions @kevinzwang (#3399)
[FEAT] Build release python wheels and upload to AWS S3 @raunakab (#3398)
[FEAT] connect: Add support for select @andrewgazelka (#3344)
[FEAT] connect: add df.limit and df.first @andrewgazelka (#3309)
[FEAT] connect: to_daft_* use ref instead of value @andrewgazelka (#3355)
[FEAT] connect: add alias support @andrewgazelka (#3342)
[FEAT] Filter predicates in SQL join @kevinzwang (#3371)
[FEAT] connect: collect @andrewgazelka (#3326)

🚀 Performance Improvements

[PERF] Improve hash table probe side decisions for Swordfish @desmondcheongzx (#3327)

👾 Bug Fixes

[BUG] Fix extension type display @jaychia (#3456)
[BUG] Remove enum imports from match statements @raunakab (#3436)
[BUG] Explicitly set IO config in unity catalog load table @colin-ho (#3453)
[BUG] Include storage options in lance write commit @colin-ho (#3451)
[BUG] Replace semicolons in filenames with underscore @raunakab (#3430)
[BUG] Terminate nodes instead of stopping them @raunakab (#3427)
[BUG] Fix run-cluster passing in environment variables wrongly @jaychia (#3422)

📖 Documentation

[FEAT]: Support .clip function @conradsoon (#3136)
[DOCS] Shorten union of Literals @desmondcheongzx (#3449)
[DOCS] Add missing list expression entries @desmondcheongzx (#3428)

🧰 Maintenance

[CHORE] Add warning in PyRunner to switch to Native @colin-ho (#3472)
[CHORE] Address comments on previous PR @raunakab (#3473)
[CHORE] Write tpch parquet files one at a time @colin-ho (#3396)
[CHORE] Remove CountMode and ResourceRequest from public API @desmondcheongzx (#3429)
[CHORE] Add schemas for remaining local plan ops @colin-ho (#3446)
[CHORE] Put empty table when building probe table @colin-ho (#3445)
[CHORE] Explain block_on function in common-runtime @colin-ho (#3442)
[CHORE] connect: df.schema @andrewgazelka (#3353)
[CHORE] Update execution config to turn on Ray tracing @jaychia (#3431)
[CHORE] connect test: df.get_attr @andrewgazelka (#3349)
[CHORE] Cleanup ExprResolver @kevinzwang (#3401)
[CHORE] connect: add tests for df.take() method @andrewgazelka (#3385)
[CHORE] Change IOConfig to be serialized into binary instead of JSON @kevinzwang (#3400)
[CHORE] Pin PyIceberg version to <0.8 @kevinzwang (#3391)
[CHORE] Add TPC-H queries in SQL @kevinzwang (#3392)
[CHORE] connect: Optimize plans in connect @colin-ho (#3378)
[CHORE] delete empty file xyz @andrewgazelka (#3370)

⬆️ Dependencies

14 changes

Bump orjson from 3.10.11 to 3.10.12 @dependabot (#3464)
Bump grpcio from 1.67.0 to 1.68.1 @dependabot (#3465)
Bump arrow-buffer from 51.0.0 to 53.3.0 @dependabot (#3467)
Bump regex-syntax from 0.7.5 to 0.8.4 @dependabot (#3468)
Bump memmap2 from 0.9.4 to 0.9.5 @dependabot (#3470)
Bump image from 0.25.4 to 0.25.5 @dependabot (#3471)
Bump bytes from 1.7.1 to 1.8.0 @dependabot (#3411)
Bump astral-sh/setup-uv from 3 to 4 @dependabot (#3410)
Bump serde_json from 1.0.124 to 1.0.133 @dependabot (#3413)
Bump sample-arrow2 from 0.1.0 to 0.17.2 @dependabot (#3414)
Bump chrono-tz from 0.8.6 to 0.10.0 @dependabot (#3415)
Bump azure-storage-blob from 12.17.0 to 12.24.0 @dependabot (#3416)
Bump opencv-python from 4.8.1.78 to 4.10.0.84 @dependabot (#3417)
Bump sqlalchemy from 2.0.25 to 2.0.36 @dependabot (#3418)

Contributors

rongfengliang, andrewgazelka, and 7 other contributors

Assets 2

0 Join discussion

Releases: Eventual-Inc/Daft

v0.4.8

What's Changed 🚀

✨ Features

🐛 Bug Fixes

🚀 Performance

📖 Documentation

👷 CI

🔧 Maintenance

Contributors

v0.4.7

What's Changed 🚀

✨ Features

🐛 Bug Fixes

🚀 Performance

📖 Documentation

👷 CI

🔧 Maintenance

Contributors

v0.4.6

What's Changed 🚀

✨ Features

🐛 Bug Fixes

🚀 Performance

Contributors

v0.4.5

What's Changed 🚀

💥 Breaking Changes

✨ Features

🐛 Bug Fixes

🚀 Performance

♻️ Refactor

📖 Documentation

🔧 Maintenance

Contributors

v0.4.4

What's Changed 🚀

✨ Features

🐛 Bug Fixes

🚀 Performance

♻️ Refactor

📖 Documentation

👷 CI

Contributors

v0.4.3

What's Changed 🚀

✨ Features

🐛 Bug Fixes

🚀 Performance

♻️ Refactor

📖 Documentation

👷 CI

🔧 Maintenance

Contributors

v0.4.2

What's Changed 🚀

✨ Features

🐛 Bug Fixes

👷 CI

🔧 Maintenance

⬆️ Dependencies

Contributors

v0.4.1

What's Changed 🚀

✨ Features

🐛 Bug Fixes

🚀 Performance

📖 Documentation

✅ Tests

👷 CI

🔧 Maintenance

Contributors

v0.4.0

What's Changed 🚀

💥 Breaking Changes

✨ Features

🐛 Bug Fixes

🚀 Performance

♻️ Refactor

📖 Documentation