-
Notifications
You must be signed in to change notification settings - Fork 0
docs: Add content on validation and additional checks #65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
duncandewhurst
wants to merge
2
commits into
live
Choose a base branch
from
27-validation-and-additional-checks
base: live
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,11 +1,36 @@ | ||
# Validator and Quality Tool | ||
|
||
## Summary | ||
A data validator and quality tool checks that data conforms to a standard, providing both pass/fail validation against the standard's schema and codelists, and additional checks on data quality, coverage, and adherence to best practices. | ||
|
||
Providing a report on technical validity of data against the schema. Providing feedback on the content of datasets, based on a set of data quality rules. Machine and human-readable rules used to check data quality. | ||
Implementers can use a validator to get feedback on the quality of their draft and published data. They can also integrate validation into their data publication pipelines. Data users can use a validator to identify data quality issues that might impact their analysis. Similarly, data registries can incorporate validation results to provide a summary of quality issues in each dataset. Furthermore, support staff can use a validator to provide feedback and guidance to implementers. | ||
|
||
## Description | ||
To cater to different audiences, validators can offer various interfaces. For example, a user-friendly web application for implementers to upload data and receive immediate feedback, a command-line tool for developers to run local checks, and a software library that developers can embed within their data pipelines. | ||
|
||
Part of a standard is often schema, and reporting on technical validity against the schema is a way of programmatically checking that the data conforms to the schema and can be used by other tools that expect data to conform to the schema. By providing validation as an online service, implementers can validate their data without | ||
For more information about how schema validation relates to additional checks, see [author your schema, codelists and additional rules](../development/schema.md#author-your-schema-codelists-and-additional-rules). | ||
|
||
## Prioritisation Factors | ||
|
||
* Specific error reporting and user expreience: If implementers need context-specific error messages and guidance, target feedback or multiple output formats (e.g. human-readable reports and machine-readable JSON for integration with other tools). | ||
* Complexity beyond the schema language: If the standard involves additional rules that cannot be expressed in its schema language, validation of codelists specified outside the schema, or semantic validation of the data beyond its structure and format. | ||
|
||
## Deprioritisation Factors | ||
|
||
* Simplicity: If the standard is purely structural, can be fully expressed in a schema language, and validated by existing tooling, an 'off-the-shelf- validator might be sufficient. | ||
* Technical audience: If the standard's audience is developers with experience of standardising data, existing validation libraries or command-line tools might be sufficient. | ||
|
||
## Examples | ||
|
||
The Open Contracting Data Standard (OCDS) provides a web-based validator (the [OCDS Data Review Tool](https://review.standard.open-contracting.org/)) and a command-line tool and Python library ([Lib CoVE OCDS](https://github.com/open-contracting/lib-cove-ocds)). | ||
|
||
360Giving provides a web-based [Data Quality Tool](https://dataquality.threesixtygiving.org/). | ||
|
||
## Related components | ||
|
||
* [Schema](schema) | ||
* [Required fields](required_fields) | ||
* [Codelists](codelists) | ||
* [Registry of datasets](registry_of_datasets) | ||
|
||
## Related patterns | ||
|
||
* [Permissive schema](../patterns/schema.md#permissive-schema) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to remove the bullets here? I think the structure is good for this longer list