Changelog

Unreleased

0.21.0 - 2022-03-03

Added

Add MDC to the LoggingMdcFilter to include API method, path, and request ID @fm100
Add Postgres sub-chart to Helm deployment for easier installation option @KevinMellott91
GitHub Action workflow to validate changes to Helm chart @KevinMellott91

Changed

Upgrade from Java11 to Java17 @ucg8j
Switch JDK image from alpine to temurin enabling Marquez to run on multiple CPU architectures @ucg8j

Fixed

Error when running Marquez on Apple M1 @ucg8j

Removed

The /api/v1-beta/lineage endpoint @wslulciuc

The marquez-airflow lib. has been removed, Please use the openlineage-airflow library instead. To migrate to using openlineage-airflow, make the following changes @wslulciuc:

# Update the import in your DAG definitions
-from marquez_airflow import DAG
+from openlineage.airflow import DAG

# Update the following environment variables in your Airflow instance
-MARQUEZ_URL
+OPENLINEAGE_URL
-MARQUEZ_NAMESPACE
+OPENLINEAGE_NAMESPACE

The marquez-spark lib. has been removed. Please use the openlineage-spark library instead. To migrate to using openlineage-spark, make the following changes @wslulciuc:

SparkSession.builder()
- .config("spark.jars.packages", "io.github.marquezproject:marquez-spark:0.20.+")
+ .config("spark.jars.packages", "io.openlineage:openlineage-spark:0.2.+")
- .config("spark.extraListeners", "marquez.spark.agent.SparkListener")
+ .config("spark.extraListeners", "io.openlineage.spark.agent.OpenLineageSparkListener")
  .config("spark.openlineage.host", "https://api.demo.datakin.com")
  .config("spark.openlineage.apiKey", "your datakin api key")
  .config("spark.openlineage.namespace", "<NAMESPACE_NAME>")
.getOrCreate()

0.20.0 - 2021-12-13

Added

Add deploy docs for running Marquez on AWS @wslulciuc @merobi-hub

Changed

Clarify docs on using OpenLineage for metadata collection @fm100
Upgrade to gradle 7.x @wslulciuc
Use eclipse-temurin for Marquez API base docker image @fm100

Deprecated

The following endpoints have been deprecated and are scheduled to be removed in 0.25.0. Please use the /lineage endpoint when collecting source, dataset, and job metadata @wslulciuc:
- /sources endpoint to collect source metadata
- /datasets endpoint to collect dataset metadata
- /jobs endpoint to collect job metadata

Fixed

Validation of OpenLineage events on write @collado-mike
Increase name column size for tables namespaces and sources @mmeasic

Security

Fix log4j exploit @fm100

0.19.1 - 2021-11-05

Fixed

URI and URL DB mappper should handle empty string as null @OleksandrDvornik
Fix NodeId parsing when dataset name contains struct<> @fm100
Add encoding for dataset names in URL construction @collado-mike

0.19.0 - 2021-10-21

Added

Add simple python client example @wslulciuc
Display dataset versions in web UI 🎉 @phixMe
Display runs and run facets in web UI 🎉 @phixMe
Facet formatting and highlighting as Json in web UI @phixMe
Add option for docker/up.sh to run in the background @rossturk
Return totalCount in lists of jobs and datatsets @phixMe

Changed

Change type column in dataset_fields table to TEXT @wslulciuc
Set ZonedDateTime parsing to support optional offsets and default to server timezone @collado-mike

Fixed

Job.location and Source.connectionUrl should be in URI format on write @OleksandrDvornik
Z-Index fix for nodes and edges in lineage graph @phixMe
Format of the index files for web UI @phixMe
Fix OpenLineage API to return correct response codes for exceptions propagated from async calls @collado-mike
Stopped overwriting nominal time information with nulls @mobuchowski

Removed

WriteOnly clients for java and python. Before OpenLineage, we added a WriteOnly implementation to our clients to emit calls to a backend. A backend enabled collecting raw HTTP requests to an HTTP endpoint, console, or file. This was our way of capturing lineage events that could then be used to automatically create resources on the Marquez backend. We soon worked on a standard that eventually became OpenLineage. That is, OpenLineage removed the need to make individual calls to create a namespace, a source, a datasets, etc, but rather accept an event with metadata that the backend could process. @wslulciuc

0.18.0 - 2021-09-14

Added

New Add Search API 🎉 @wslulciuc
Add .env.example to override variables defined in docker-compose files @wslulciuc

Changed

Add openlineage-java as dependency @OleksandrDvornik
Move class SentryConfig from marquez to marquez.tracing pkg
Major UI improvements; the UI now uses the Search and Lineage APIs 🎉 @phixMe
Set default API port to 8080 when running the Marquez shadow jar @wslulciuc

Fixed

Update examples/airflow to use openlineage-airflow and fix the SQL in DAG troubleshooting step @wslulciuc

Removed

Drop job_versions_io_mapping_inputs and job_versions_io_mapping_outputs tables @OleksandrDvornik

0.17.0 - 2021-08-20

Changed

Update Lineage runs query to improve performance, added tests @collado-mike
Add POST /api/v1/lineage endpoint to docs and deprecate run endpoints @wslulciuc
Drop FieldType enum @wslulciuc

Deprecated

Run API endpoints that create or modify a job run (scheduled to be removed in 0.19.0). Please use the POST /api/v1/lineage endpoint when collecting job run metadata. @wslulciuc
Airflow integration, please use the openlineage-airflow library instead. @wslulciuc
Spark integration, please use the openlineage-spark library instead. @wslulciuc
Write only clients for java and python (scheduled to be removed in 0.19.0) @wslulciuc

Removed

Dbt integration lib. @wslulciuc
Common integration lib. @wslulciuc

0.16.1 - 2021-07-13

Fixed

dbt packages should look for namespace packages @mobuchowski
Add common integration dependency to dbt plugins @mobuchowski
DatasetVersionDao queries missing input and output facets @dominiquetipton
(De)serialization issue for Run and JobData models @collado-mike
Prefix spark openlineage.* configuration parameters with spark.* @collado-mike
Parse multi-statement sql in class SqlParser used in Airflow integration @wslulciuc
URL-encode namespace on calls to API backend @phixMe

0.16.0 - 2021-07-01

Added

New Add JobVersion API 🎉 @collado-mike
New Add DBT integrations for BigQuery and Snowflake 🎉 @mobuchowski

Changed

Reverted delete of BigQueryNodeVisitor to work with vanilla SparkListener @collado-mike
Promote Lineage API out of beta @OleksandrDvornik

Fixed

Display job SQL in UI @phixMe
Allow upsert of tags @hanbei
Allow potentially ambiguous URIs with encoded path segments @mobuchowski
Use source naming convetion defined by OpenLineage @mobuchowski
Return dataset facets @collado-mike
BigQuery source naming in integrations @mobuchowski

0.15.2 - 2021-06-17

Added

Add endpoint to create tags @hanbei

Fixed

Fixed build & release process for python marquez-integration-common package @collado-mike
Fixed snowflake and bigquery errors when connector libraries not loaded @collado-mike
Fixed Openlineage API does not set Dataset current_version_uuid #1361 @collado-mike

0.15.1 - 2021-06-11

Added

Factored out common functionality in Python airflow integration @mobuchowski
Added Airflow task run macro to expose task run id @collado-mike

Changed

Refactored ValuesAverageExpectationParser to ValuesSumExpectationParser and ValuesCountExpectationParser @collado-mike
Updated SparkListener to extend Spark's SparkListener abstract class @collado-mike

Fixed

Use current project version in spark openlineage client @mobuchowski
Rewrote LineageDao queries and LineageService for performance @collado-mike
Updated lineage query to include new jobs that have no job version yet @collado-mike

0.15.0 - 2021-05-24

Added

Add tracing visibility @julienledem
New Add snowflake extractor 🎉 @mobuchowski
Add SSLContext to MarquezClient @lewiesnyder
Add support for LogicalRDDs in spark plan visitors @collado-mike
New Add Great Expectations based data quality facet support 🎉 @mobuchowski

Changed

Augment tutorial instructions & screenshots for Airflow example @rossturk
Rewrite correlated subqueries when querying the lineage_events table @collado-mike

Fixed

Web time formatting display fix @kachontep

0.14.2 - 2021-05-06

Changed

Unpin requests dep in marquez-airflow integration @wslulciuc
Unpin attrs dep in marquez-airflow integration @wslulciuc

0.14.1 - 2021-05-05

Changed

Updated dataset lineage query to find most recent job that wrote to it @collado-mike
Pin http-proxy-middleware to 0.20.0 @wslulciuc

0.14.0 - 2021-05-03

Added

GA tag for website tracking @rossturk
Basic CTE support in marquez-airflow @mobuchowski
Airflow custom facets, bigquery statistics facets @mobuchowski
Unit tests for class JobVersionDao @wslulciuc
Sentry tracing support @julienledem
OpenLineage facets support to API response models 🎉 @wslulciuc

Changed

BigQueryRelationTransformer and deleted BigQueryNodeVisitor @collado-mike
Bump postgres to 12.1.0 @wslulciuc
Update spark job name to reflect spark application name and execution node @collado-mike
Update marquez-airflow integration to use OpenLineage 🎉 @mobuchowski
Migrate tests to junit 5 @mobuchowski
Rewrite lineage IO sql queries to avoid job_versions_io_mapping_* tables @collado-mike
Updated OpenLineage impl to only update dataset version on run completion @collado-mike

0.13.1 - 2021-04-01

Changed

Remove unused implementation of SQL parser in marquez-airflow @mobuchowski

Fixed

Add inputs and outputs to lineage graph @henneberger
Updated NodeId regex to support URIs with scheme and ports @collado-mike

0.13.0 - 2021-03-30

Added

Secret support for helm chart @KevinMellott91
New seed cmd to populate marquez database with source, dataset, and job metadata allowing users to try out features of Marquez (data lineage, view job run history, etc) 🎉
Docs on applying db migrations manually
New Lineage API to support data lineage queries 🎉
Support for logging errors via sentry
New Airflow example with Marquez 🎉

Changed

Update OpenLinageDao to stop converting URI structures to contain underscores instead of colons and slashes @collado-mike
Bump testcontainers dependency to v1.15.2 @ ShakirzyanovArsen
Register output datasets for a run lazily @henneberger
Refactor spark plan traversal to find input/output datasets from datasources @collado-mike
Web UI project settings and default marquez port @phixMe
Associate dataset inputs on run start @henneberger

Fixed

Dataset description is not overwritten on update @henneberger
Latest tags are returned from dataset @henneberger
Airflow integration tests on forked PRs @mobuchowski
Empty nominal end time support @henneberger
Ensure valid dataset fields for OpenLineage @henneberger
Ingress context templating for helm chart @KulykDmytro

0.12.2 - 2021-03-16

Changed

Use alpine image for marquez reducing image size by +50% @KevinMellott91
Use alpine image for marquez-web reducing image size by +50% @KevinMellott91

Fixed

Ensure marquez.DAG is (de)serializable

0.12.0 - 2021-02-08

Added

Modules: api, web, clients , chart, and integrations
Working airflow example
runs table indices for columns: created_at and current_run_state @phixMe
New /lineage endpoint for OpenLineage support @henneberger
New graphql endpoint @henneberger
New spark integration @henneberger
New API to list versions for a dataset

Changed

Drop Source.type enum (now a string type)

Fixed

Replace jdbi.getHandle() with jdbi.withHandle() to free DB connections from pool @henneberger
Fix RunListener when registering outside of the MarquezContext builder @henneberger

0.11.3 - 2020-11-02

Added

Add support for external ID on run creation @julienledem
Throw RunAlreadyExistsException on run ID already exists
Add BigQuery, Pulsar, and Oracle source types @sreev
Add run ID support in job meta; the optional run ID will be used to link a newly created job version to an existing job run, while supporting updating the run state and avoiding having to create another run

Fixed

Use postgres instead of db in marquez.dev.yml
Allow multiple postgres containers in test suite @phixMe

0.11.2 - 2020-08-21

Changed

Always migrate db schema on app start in development config
Update default db username / password
Use marquez.dev.yml in on docker compose up

0.11.1 - 2020-08-19

Added

Use shorten name for namespaces in version IDs
Add namespace to Dataset and Job models
Add ability to deserialize int type to columns @phixMe
Add SqlLogger for SQL profiling
Add DatasetVersionId.asDatasetId() and JobVersionId.asJobId()
Add DatasetService.getBy(DatasetVersionId): Dataset
Add JobService.getBy(JobVersionId): Job
Allow for run transition override via at=<TIMESTAMP>, where TIMESTMAP is an ISO 8601 timestamp representing the date/time of the state transition. For example:
```
POST /jobs/runs/{id}/start?at=<TIMESTAMP>
```

Changed

config.yml -> marquez.yml

Fixed

Fix dataset version column mappings

0.11.0 - 2020-05-27

Added

Run.startedAt, Run.endedAt, Run.duration @julienledem
class MarquezContext @julienledem
class RunTransitionListener @julienledem
Unique identifier class DatasetId for datasets @julienledem
Unique identifier class JobId for jobs @julienledem
class RunId @ravikamaraj
enum RunState @ravikamaraj
class Version @ravikamaraj

Changed

Job inputs / outputs are defined as DatasetId
Bump to JDK 11

Removed

Use of API models under marquez.api.models pkg

Fixed

API docs example to show correct SQL key in job context @frankcash

0.10.4 - 2020-01-17

Fixed

Fix RunState.isComplete()

0.10.3 - 2020-01-17

Added

Add new logo
Add JobResource.locationFor()

Fixed

Fix dataset field versioning
Fix list job runs

0.10.2 - 2020-01-16

Added

Added Location header to run creation @nkijak

0.10.1 - 2020-01-11

Changed

Rename datasets.last_modified

0.10.0 - 2020-01-08

Changed

Rename table dataset_tag_mapping

0.9.2 - 2020-01-07

Added

Add Flyway.baselineOnMigrate flag

0.9.1 - 2020-01-06

Added

Add redshift data types
Add links to dropwizard overrides in config.yml

0.9.0 - 2020-01-05

Added

Validate runID when linked to dataset change
Add Utils.toUuid()
Add tests for class TagDao
Add default tags to config
Add tagging support for dataset fields
Add docker/config.dev.yml
Add flyway config support

Changed

Replace deprecated App.onFatalError()

Fixed

Fix error on tag exists
Fix malformed sql in RunDao.findAll()

0.8.0 - 2019-12-12

Added

Add `Dataset.lastModified``
Add tags table schema
Add GET /tags

Changed

Use new Flyway version to fix migration with custom roles
Modify args column in table `run_args

0.7.0 - 2019-12-05

Added

Link dataset versions with run inputs
Add schema required by tagging
More tests for class common.Utils
Add ColumnsTest
Add RunDao.insert()
Add RunStateDao.insert()
Add METRICS.md
Add prometheus dep and expose GET /metrics

Fixed

Fix dataset field serialization

0.6.0 - 2019-11-29

Added

Add Job.latestRun
Add debug logging

Changed

Adjust class RunResponse property ordering on serialization
Update logging on default namespace creation

0.5.1 - 2019-11-20

Added

Add dataset field versioning support
Add link to web UI
Add Job.context

Changed

Update semver regex in build-and-push.sh
Minor updates to job and dataset versioning functions
Make Job.location optional

0.5.0 - 2019-11-04

Added

Add lombok.config
Add code review guidelines
Add JobType
Add limit and offset support to NamespaceAPI
Add Development section to CONTRIBUTING.md
Add class DatasetMeta
Add class MorePreconditions
Added install instructions for docker

Changed

Rename guid column to uuid
Use admin ping and health
Update owner to ownerName

Removed

Remove experimental db table versioning code

Fixed

Fix marquez.jar rename on COPY

0.4.0 - 2019-06-04

Added

Add quickstart
Add GET /namespaces/{namespace}/jobs/{job}/runs

0.3.4 - 2019-05-17

Changed

Change Datasetdao.findAll() to order by Dataset.name

0.3.3 - 2019-05-14

Changed

Set timestamps to CURRENT_TIMESTAMP

0.3.2 - 2019-05-14

Changed

Set job_versions.updated_at to CURRENT_TIMESTAMP

0.3.1 - 2019-05-14

Added

Handle Flyway.repair() error

0.3.0 - 2019-05-14

Added

Add JobResponse.updatedAt

Changed

Return timestamp strings as ISO format

Removed

Remove unused tables in db schema

0.2.1 - 2019-04-22

Changed

Support dashes (-) in namespace

0.2.0 - 2019-04-15

Added

Add @NoArgsConstructor to exceptions
Add license to *.java
Add column constants
Add response/error metrics to API endpoints
Add build info to jar manifest
Add release steps and plugin
Add /jobs/runs/{id}/run
Add jdbi metrics
Add gitter link
Add column constants
Add MarquezServiceException
Add -parameters compiler flag
Add JSON logging support

Changed

Minor pkg restructuring
Throw NamespaceNotFoundException on NamespaceResource.get()

Fixed

Fix dataset list error

0.1.0 - 2018-12-18

Marquez initial public release.

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Changelog

Unreleased

0.21.0 - 2022-03-03

Added

Changed

Fixed

Removed

0.20.0 - 2021-12-13

Added

Changed

Deprecated

Fixed

Security

0.19.1 - 2021-11-05

Fixed

0.19.0 - 2021-10-21

Added

Changed

Fixed

Removed

0.18.0 - 2021-09-14

Added

Changed

Fixed

Removed

0.17.0 - 2021-08-20

Changed

Deprecated

Removed

0.16.1 - 2021-07-13

Fixed

0.16.0 - 2021-07-01

Added

Changed

Fixed

0.15.2 - 2021-06-17

Added

Fixed

0.15.1 - 2021-06-11

Added

Changed

Fixed

0.15.0 - 2021-05-24

Added

Changed

Fixed

0.14.2 - 2021-05-06

Changed

0.14.1 - 2021-05-05

Changed

0.14.0 - 2021-05-03

Added

Changed

0.13.1 - 2021-04-01

Changed

Fixed

0.13.0 - 2021-03-30

Added

Changed

Fixed

0.12.2 - 2021-03-16

Changed

Fixed

0.12.0 - 2021-02-08

Added

Changed

Fixed

0.11.3 - 2020-11-02

Added

Fixed

0.11.2 - 2020-08-21

Changed

0.11.1 - 2020-08-19

Added