Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: sodadata/soda-core
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v3.3.18
Choose a base ref
...
head repository: sodadata/soda-core
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: main
Choose a head ref

Commits on Sep 5, 2024

  1. Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature.
    Copy the full SHA
    9147255 View commit details
  2. Bump to 3.3.19

    tombaeyens committed Sep 5, 2024
    Copy the full SHA
    5cf12cb View commit details

Commits on Sep 9, 2024

  1. Fixing the lacking data source error message on contract build (#2158)

    * Fixing the lacking data source error message on contract build
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    ---------
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    tombaeyens and pre-commit-ci[bot] authored Sep 9, 2024
    Copy the full SHA
    8adf017 View commit details
  2. Bump to 3.3.20

    tombaeyens committed Sep 9, 2024
    Copy the full SHA
    8c42941 View commit details

Commits on Sep 11, 2024

  1. Fixing Spark session API (#2159)

    * Fixing Spark session API
    
    * Fixed duckdb version to 1.0.0
    tombaeyens authored Sep 11, 2024
    Copy the full SHA
    0937aa0 View commit details
  2. Bump to 3.3.21

    tombaeyens committed Sep 11, 2024
    Copy the full SHA
    a6f85fe View commit details
  3. Removing the data source name lower case requirement (#2161)

    * Removing the data source name lower case requirement
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    ---------
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    tombaeyens and pre-commit-ci[bot] authored Sep 11, 2024
    Copy the full SHA
    39ff346 View commit details
  4. Bump to 3.3.22

    tombaeyens committed Sep 11, 2024
    Copy the full SHA
    2c8c5bd View commit details

Commits on Sep 23, 2024

  1. Copy the full SHA
    8586ed4 View commit details

Commits on Oct 15, 2024

  1. Add hiring banner to README (#2179)

    * Update README.md to include Job Description
    
    * Add source
    dirkgroenen authored Oct 15, 2024
    Copy the full SHA
    52dc476 View commit details

Commits on Oct 21, 2024

  1. Add support for Azure SQL, Synapse, and Microsoft Fabric and extend s…

    …upport for SQL Server (#2160)
    
    * working fabric data source inheriting from sqlserver
    
    * fix failing tests
    
    * fix table creation in fabric
    
    * restore dev-reqs
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * add email check for sqlserver and fabric
    
    * add test for email format
    
    * remove useless line
    
    * remove useless line
    
    * remove extra deps
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * enable auth with mssparkutils
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * add fabric spark auth
    
    * Update tbump+version
    
    ---------
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    Co-authored-by: Milan Lukac <m1n0@users.noreply.github.com>
    Co-authored-by: Milan Lukac <milan@lukac.online>
    4 people authored Oct 21, 2024
    Copy the full SHA
    a08bbcc View commit details
  2. Bump to 3.4.0

    m1n0 committed Oct 21, 2024
    Copy the full SHA
    c70fb78 View commit details

Commits on Oct 22, 2024

  1. Copy the full SHA
    a59292a View commit details
  2. [pre-commit.ci] pre-commit autoupdate (#2177)

    updates:
    - [github.com/pre-commit/pre-commit-hooks: v4.6.0 → v5.0.0](pre-commit/pre-commit-hooks@v4.6.0...v5.0.0)
    - [github.com/asottile/pyupgrade: v3.17.0 → v3.18.0](asottile/pyupgrade@v3.17.0...v3.18.0)
    - [github.com/psf/black: 24.8.0 → 24.10.0](psf/black@24.8.0...24.10.0)
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    Co-authored-by: Milan Lukac <m1n0@users.noreply.github.com>
    pre-commit-ci[bot] and m1n0 authored Oct 22, 2024
    Copy the full SHA
    fb60b83 View commit details
  3. Comparison row count check secondary datasource filter fix (#2165)

    * Comparison row count check secondary datasource filter fix
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    ---------
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    Co-authored-by: Milan Lukac <m1n0@users.noreply.github.com>
    3 people authored Oct 22, 2024
    Copy the full SHA
    b91b1a9 View commit details
  4. Bump to 3.4.1

    m1n0 committed Oct 22, 2024
    Copy the full SHA
    bbe338b View commit details

Commits on Nov 4, 2024

  1. Copy the full SHA
    0ecbec4 View commit details

Commits on Nov 14, 2024

  1. Chore: use jinja sandbox for templates (#2185)

    * Chore: use jinja sandbox for templates
    
    * add test
    
    * fix feedback
    m1n0 authored Nov 14, 2024
    Copy the full SHA
    a8c7d34 View commit details

Commits on Nov 28, 2024

  1. Bump to 3.4.2

    m1n0 committed Nov 28, 2024
    Copy the full SHA
    b605610 View commit details
  2. [pre-commit.ci] pre-commit autoupdate (#2182)

    updates:
    - [github.com/asottile/pyupgrade: v3.18.0 → v3.19.0](asottile/pyupgrade@v3.18.0...v3.19.0)
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    Co-authored-by: Milan Lukac <m1n0@users.noreply.github.com>
    pre-commit-ci[bot] and m1n0 authored Nov 28, 2024
    Copy the full SHA
    067c535 View commit details
  3. Add include null to valid_count and invalid_count and percentage vers…

    …ion. (#2186)
    
    * Add include null to valid_count and invalid_count and percentage version.
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * Fix missing configuration.
    
    ---------
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    Co-authored-by: Milan Lukac <m1n0@users.noreply.github.com>
    3 people authored Nov 28, 2024
    Copy the full SHA
    d030911 View commit details
  4. Copy the full SHA
    9d4b349 View commit details

Commits on Dec 2, 2024

  1. Bump to 3.4.3

    m1n0 committed Dec 2, 2024
    Copy the full SHA
    5021b74 View commit details

Commits on Jan 6, 2025

  1. Copy the full SHA
    e319d37 View commit details
  2. Copy the full SHA
    5c41aa3 View commit details

Commits on Jan 7, 2025

  1. Bump to 3.4.4

    jzalucki committed Jan 7, 2025
    Copy the full SHA
    6f169d4 View commit details

Commits on Feb 4, 2025

  1. Copy the full SHA
    53686db View commit details
  2. Copy the full SHA
    f1ec392 View commit details
  3. ISO 8601 date should accept 24-hr times (#2133)

    * ISO 8601 date should accept 24-hr times
    
    * Adjust mysql iso 8601 date regex
    
    ---------
    
    Co-authored-by: Milan Lukac <m1n0@users.noreply.github.com>
    pholser and m1n0 authored Feb 4, 2025
    Copy the full SHA
    7df8e84 View commit details

Commits on Feb 20, 2025

  1. Fix dask COUNT queries on windows (#2210)

    This commit fixes an issue in the dask cursor that would cause it to
    return `None` instead of `0` when performing a COUNT() query on Windows.
    `dask-sql` returns an empty data frame when there are no matches. This
    is normally translated to 0 in the DaskCursor class when using fetchone
    with a numeric result.
    
    Unfortunately the logic to detect a numeric result did not behave
    correctly on Windows. The dtype used is int64, which on linux appears to
    compare equal to int but does not compare equal on Windows.
    
    This commit uses `np.issubdtype` to have a more robust check for integer
    types.
    mivds authored Feb 20, 2025
    Copy the full SHA
    08e9ef8 View commit details
  2. Copy the full SHA
    bcbf959 View commit details
  3. Add support for Apache Impala (#2191)

    * add impala
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * Update setup.py
    
    ---------
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    Co-authored-by: Milan Lukac <m1n0@users.noreply.github.com>
    3 people authored Feb 20, 2025
    Copy the full SHA
    b224ff8 View commit details
  4. Remove unused MarkupSafe dependency (#2103) (#2200)

    MarkupSafe doesn't seem to be used anywhere in the
    code and the required version is outdated and
    creates dependency conflicts with other libraries.
    
    Co-authored-by: Milan Lukac <m1n0@users.noreply.github.com>
    ghjklw and m1n0 authored Feb 20, 2025
    Copy the full SHA
    eacd1e3 View commit details
  5. Release opentelemetry version contraint (#2192) (#2199)

    Release soda-core version contraint for
    opentelemetry as there is no reason to support
    Python 3.7 and this is creating dependency
    conflicts with many other libraries.
    
    Co-authored-by: Milan Lukac <m1n0@users.noreply.github.com>
    ghjklw and m1n0 authored Feb 20, 2025
    Copy the full SHA
    17e2c38 View commit details
  6. [pre-commit.ci] pre-commit autoupdate (#2195)

    updates:
    - [github.com/asottile/pyupgrade: v3.19.0 → v3.19.1](asottile/pyupgrade@v3.19.0...v3.19.1)
    - [github.com/PyCQA/isort: 5.13.2 → 6.0.0](PyCQA/isort@5.13.2...6.0.0)
    - [github.com/psf/black: 24.10.0 → 25.1.0](psf/black@24.10.0...25.1.0)
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    pre-commit-ci[bot] authored Feb 20, 2025
    Copy the full SHA
    11756bf View commit details
  7. Copy the full SHA
    3a019a2 View commit details
  8. Add session token parameter to athena data source connection (#2095)

    Co-authored-by: Milan Lukac <m1n0@users.noreply.github.com>
    ruixuantan and m1n0 authored Feb 20, 2025
    Copy the full SHA
    405054b View commit details
  9. Postgres: retry connection if unsupported options parameter (#2176)

    Co-authored-by: Milan Lukac <m1n0@users.noreply.github.com>
    RekunDzmitry and m1n0 authored Feb 20, 2025
    Copy the full SHA
    b4be075 View commit details
  10. Duckdb: loosen duckdb to < 1.1.0 (#2162)

    * Loosening duckdb dependency to less than 1.1.0
    
    * Update setup.py
    
    * Update setup.py
    
    * Update setup.py
    
    ---------
    
    Co-authored-by: Milan Lukac <m1n0@users.noreply.github.com>
    tombaeyens and m1n0 authored Feb 20, 2025
    Copy the full SHA
    f29c312 View commit details
  11. Bump to 3.5.0

    m1n0 committed Feb 20, 2025
    Copy the full SHA
    1a631fc View commit details
Showing with 2,257 additions and 449 deletions.
  1. +3 −0 .env.example
  2. +4 −4 .pre-commit-config.yaml
  3. +1 −1 CONTRIBUTING.md
  4. +6 −1 README.md
  5. +1 −1 dev-requirements.txt
  6. +16 −1 docs/configuration.md
  7. +485 −0 docs/data-contracts-language.md
  8. +9 −5 docs/installation.md
  9. +2 −0 pytest.ini
  10. +2 −0 requirements.txt
  11. +1 −1 soda/athena/setup.py
  12. +1 −0 soda/athena/soda/data_sources/athena_data_source.py
  13. +1 −1 soda/atlan/setup.py
  14. +1 −1 soda/bigquery/setup.py
  15. +1 −1 soda/contracts/setup.py
  16. +1 −3 soda/contracts/soda/contracts/contract_verification.py
  17. +0 −5 soda/contracts/soda/contracts/impl/contract_data_source.py
  18. +0 −47 soda/contracts/soda/contracts/impl/sql_dialect.py
  19. +53 −3 soda/contracts/tests/contracts/helpers/contract_data_source_test_helper.py
  20. +3 −4 soda/core/setup.py
  21. +1 −1 soda/core/soda/__version__.py
  22. +6 −3 soda/core/soda/common/jinja.py
  23. +10 −10 soda/core/soda/common/yaml_helper.py
  24. +2 −2 soda/core/soda/execution/check/metric_check.py
  25. +11 −0 soda/core/soda/execution/check/row_count_comparison_check.py
  26. +3 −1 soda/core/soda/execution/check/user_defined_failed_rows_check.py
  27. +1 −1 soda/core/soda/execution/data_source.py
  28. +1 −1 soda/core/soda/execution/metric/derived_metric.py
  29. +19 −4 soda/core/soda/execution/metric/numeric_query_metric.py
  30. +18 −7 soda/core/soda/sodacl/antlr/SodaCLAntlr.g4
  31. +8 −8 soda/core/soda/sodacl/antlr/SodaCLAntlr.interp
  32. +13 −13 soda/core/soda/sodacl/antlr/SodaCLAntlr.tokens
  33. +12 −12 soda/core/soda/sodacl/antlr/SodaCLAntlrLexer.interp
  34. +179 −173 soda/core/soda/sodacl/antlr/SodaCLAntlrLexer.py
  35. +13 −13 soda/core/soda/sodacl/antlr/SodaCLAntlrLexer.tokens
  36. +1 −1 soda/core/soda/sodacl/antlr/SodaCLAntlrListener.py
  37. +36 −36 soda/core/soda/sodacl/antlr/SodaCLAntlrParser.py
  38. +1 −1 soda/core/soda/sodacl/antlr/SodaCLAntlrVisitor.py
  39. +2 −0 soda/core/soda/sodacl/check_cfg.py
  40. +10 −0 soda/core/soda/sodacl/missing_and_valid_cfg.py
  41. +10 −2 soda/core/soda/sodacl/sodacl_parser.py
  42. +1 −1 soda/core/tests/data_source/test_bug_double_metric_computation.py
  43. +1 −1 soda/core/tests/data_source/test_data_source_specific_aggregation_functions.py
  44. +2 −2 soda/core/tests/data_source/test_distribution_check.py
  45. +26 −0 soda/core/tests/data_source/test_for_each_dataset.py
  46. +14 −2 soda/core/tests/data_source/test_formats.py
  47. +4 −4 soda/core/tests/data_source/test_freshness.py
  48. +25 −0 soda/core/tests/data_source/test_group_evolution.py
  49. +38 −2 soda/core/tests/data_source/test_invalid.py
  50. +2 −2 soda/core/tests/data_source/test_metric_check_filter.py
  51. +2 −2 soda/core/tests/data_source/test_numerical_metric_checks_on_text_columns.py
  52. +1 −1 soda/core/tests/data_source/test_percentage_metrics.py
  53. +18 −0 soda/core/tests/data_source/test_reference_check.py
  54. +31 −0 soda/core/tests/data_source/test_row_count_comparison.py
  55. +2 −2 soda/core/tests/data_source/test_table_filter.py
  56. +18 −0 soda/core/tests/data_source/test_templates.py
  57. +28 −1 soda/core/tests/data_source/test_user_defined_metric_checks.py
  58. +7 −0 soda/core/tests/helpers/test_scan.py
  59. +1 −1 soda/dask/setup.py
  60. +1 −1 soda/dask/soda/data_sources/dask_cursor.py
  61. +19 −0 soda/dask/tests/test_dask.py
  62. +1 −1 soda/db2/setup.py
  63. +1 −1 soda/dbt/setup.py
  64. +1 −1 soda/denodo/setup.py
  65. +1 −1 soda/dremio/setup.py
  66. +2 −2 soda/duckdb/setup.py
  67. +201 −0 soda/fabric/LICENSE
  68. +16 −0 soda/fabric/setup.py
  69. +43 −0 soda/fabric/soda/data_sources/fabric_data_source.py
  70. +20 −0 soda/fabric/tests/fabric_data_source_fixture.py
  71. +2 −0 soda/fabric/tests/test_fabric.py
  72. +201 −0 soda/impala/LICENSE
  73. +16 −0 soda/impala/setup.py
  74. +125 −0 soda/impala/soda/data_sources/impala_data_source.py
  75. +74 −0 soda/impala/tests/docker/config/hive-site.xml
  76. +91 −0 soda/impala/tests/docker/docker-compose.yaml
  77. +32 −0 soda/impala/tests/impala_data_source_fixture.py
  78. +2 −0 soda/impala/tests/test_impala.py
  79. +29 −0 soda/impala/tests/test_impala_connection.py
  80. +1 −1 soda/mysql/setup.py
  81. +1 −1 soda/mysql/soda/data_sources/mysql_data_source.py
  82. +1 −1 soda/oracle/setup.py
  83. +1 −1 soda/postgres/setup.py
  84. +24 −10 soda/postgres/soda/data_sources/postgres_data_source.py
  85. +1 −1 soda/redshift/setup.py
  86. +1 −1 soda/scientific/setup.py
  87. +1 −1 soda/snowflake/setup.py
  88. +1 −1 soda/spark/setup.py
  89. +1 −1 soda/spark_df/setup.py
  90. +2 −5 soda/sqlserver/setup.py
  91. +163 −25 soda/sqlserver/soda/data_sources/sqlserver_data_source.py
  92. +1 −1 soda/teradata/setup.py
  93. +1 −1 soda/trino/setup.py
  94. +1 −1 soda/vertica/setup.py
  95. +9 −1 tbump.toml
3 changes: 3 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
@@ -51,3 +51,6 @@ CONTRACTS_POSTGRES_PASSWORD=***
CONTRACTS_POSTGRES_DATABASE=***

ATLAN_API_KEY=***

FABRIC_ENDPOINT=***
FABRIC_DWH=***
8 changes: 4 additions & 4 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -4,7 +4,7 @@ files: ^soda/
exclude: antlr/
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.6.0
rev: v5.0.0
hooks:
- id: trailing-whitespace
- id: check-added-large-files
@@ -24,19 +24,19 @@ repos:
- id: autoflake
args: ["--in-place", "--remove-all-unused-imports"]
- repo: https://github.com/asottile/pyupgrade
rev: v3.17.0
rev: v3.19.1
hooks:
- id: pyupgrade
exclude: _models?\.py$
args: [--py38-plus, --keep-runtime-typing]
- repo: https://github.com/PyCQA/isort
rev: 5.13.2
rev: 6.0.0
hooks:
- id: isort
additional_dependencies: [toml]
name: Sort imports using isort
- repo: https://github.com/psf/black
rev: 24.8.0
rev: 25.1.0
hooks:
- id: black
name: Run black formatter
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -162,7 +162,7 @@ I believe this will launch separate test container(s), even if you already have

## CI

CI is configured in `.github/workflows/workflow.yml`
CI is configured in `.github/workflows/pr.workflow.yml`

The secrets used in that file are configured in GitHub: [https://github.com/sodadata/soda-core/settings/secrets/actions](https://github.com/sodadata/soda-core/settings/secrets/actions)

7 changes: 6 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -7,8 +7,13 @@
<a href="https://join.slack.com/t/soda-community/shared_invite/zt-m77gajo1-nXJF7JtbbRht2zwaiLb9pg"><img alt="Slack" src="https://img.shields.io/badge/chat-slack-green.svg"></a>
<a href="#"><img src="https://static.pepy.tech/personalized-badge/soda-core?period=total&units=international_system&left_color=black&right_color=green&left_text=Downloads"></a>
</p>
<br />

<hr />

> [!IMPORTANT]
> **🚀 We're hiring! Are you passionate about open-source and love working on projects like Soda Core? Join our team as a Software Engineer and help shape the future of data quality tools. [Apply now!](https://careers.soda.io/o/software-engineer-data-testing-python-data-engineering-mediorsenior?source=gh-core)**
<hr />

&#10004; An open-source, CLI tool and Python library for data quality testing<br />
&#10004; Compatible with the <a href="https://docs.soda.io/soda-cl/soda-cl-overview.html" target="_blank">Soda Checks Language (SodaCL)</a> <br />
2 changes: 1 addition & 1 deletion dev-requirements.txt
Original file line number Diff line number Diff line change
@@ -175,4 +175,4 @@ zipp==3.19.2

# The following packages are considered to be unsafe in a requirements file:
# pip
# setuptools
# setuptools
17 changes: 16 additions & 1 deletion docs/configuration.md
Original file line number Diff line number Diff line change
@@ -15,7 +15,7 @@ Alternatively, you can provide data source connection configurations in the cont
1. Soda Core connects with Spark DataFrames in a unique way, using programmtic scans.
* If you are using Spark DataFrames, follow the configuration details in [Connect to Spark DataFrames](https://docs.soda.io/soda/connect-spark.html#connect-to-spark-dataframes).
* If you are *not* using Spark DataFrames, continue to step 2.
2. Create a `configuration.yml` file. This file stores connection details for your data sources. Use the data source-specific connection configurations listed below to copy+paste the connection syntax into your file, then adjust the values to correspond with your data source's details. You can use [system variables](#provide-credentials-as-system-variables) to pass sensitive values, if you wish. Access connection details in [Connect a data source](https://docs.soda.io/soda/connect-athena.html) section of Soda documentation.
2. Create a `configuration.yml` file. This file stores connection details for your data sources. Use the data source-specific connection configurations listed below to copy+paste the connection syntax into your file, then adjust the values to correspond with your data source's details. You can use [system variables](#provide-credentials-as-system-variables) to pass sensitive values, if you wish. Access connection details in [Connect a data source](https://docs.soda.io/soda/connect-athena.html) section of Soda documentation; see below for MS Fabric connection config as it is only supported in Soda Core.
3. Save the `configuration.yml` file, then create another new YAML file named `checks.yml`.
4. A Soda Check is a test that Soda Core performs when it scans a dataset in your data source. The checks YAML file stores the Soda Checks you write using [SodaCL](https://docs.soda.io/soda-cl/soda-cl-overview.html). Copy+paste the following basic check syntax in your file, then adjust the value for `dataset_name` to correspond with the name of one of the datasets in your data source.
```yaml
@@ -25,6 +25,21 @@ Alternatively, you can provide data source connection configurations in the cont
5. Save the changes to the `checks.yml` file.
6. Next: [run a scan](/docs/scan-core.md) of the data in your data source.

#### MS Fabric connection configuration

To your `configuration.yml` file, add the following.
```yaml
data_source my_data_source_name:
type: fabric
host: xxx
database: xxx
schema: xxx
driver: ODBC Driver 18 for SQL Server
client_id: xxx
client_secret: xxx
encrypt: True
authentication: xxx
```

## Provide credentials as system variables

Loading