DuckDBRows336776Columns18 |
+ ||||||||||||||||||
+ | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
The nycflights
dataset is a large dataset with 336,776 rows and 18 columns. This dataset is truly a real-world dataset and provides information about flights originating from New York City airports in 2013.
Examples
Suppose we have a table with columns name
, id_old
, new_identifier
, and pay_2021
and we’d like to validate that text values in columns having "id"
or "identifier"
in the name have a specific syntax. We can use the matches()
column selector function to specify the columns that match the pattern.
import pointblank as pb
import polars as pl
@@ -456,7 +456,7 @@
From the results of the validation table we get two validation steps, one for id_old
and one for new_identifier
. The values in both columns all match the pattern "ID\d{4}"
.
We can also use the matches()
function in combination with other column selectors (within col()
) to create more complex column selection criteria (i.e., to select columns that satisfy multiple conditions). For example, to select columns that contain "pay"
and match the text "2023"
or "2024"
, we can use the &
operator to combine column selectors.
= pl.DataFrame(
tbl
{"name": ["Alice", "Bob", "Charlie"],
diff --git a/reference/missing_vals_tbl.html b/reference/missing_vals_tbl.html
index 145f7f51..47c9c9d6 100644
--- a/reference/missing_vals_tbl.html
+++ b/reference/missing_vals_tbl.html
@@ -57,7 +57,7 @@
-
+
@@ -100,6 +100,9 @@
"search-label": "Search"
}
}
+
+
+
@@ -156,24 +159,36 @@
On this page
missing_vals_tbl
+missing_vals_tbl
missing_vals_tbl(data)
Display a table that shows the missing values in the input table.
The missing_vals_tbl()
function generates a table that shows the missing values in the input table. The table is displayed using the Great Tables (GT
) API, which allows for further customization of the table’s appearance if so desired.
The Missing Values Table
The missing values table shows the proportion of missing values in each column of the input table. The table is divided into sectors, with each sector representing a range of rows in the table. The proportion of missing values in each sector is calculated for each column. The table is displayed using the Great Tables API, which allows for further customization of the table’s appearance.
To ensure that the table can scale to tables with many columns, each row in the reporting table represents a column in the input table. There are 10 sectors shown in the table, where the first sector represents the first 10% of the rows, the second sector represents the next 10% of the rows, and so on. Any sectors that are light blue indicate that there are no missing values in that sector. If there are missing values, the proportion of missing values is shown by a gray color (light gray for low proportions, dark gray to black for very high proportions).
+Examples
+The missing_vals_tbl()
function is useful for quickly identifying columns with missing values in a table. Here’s an example using the nycflights
dataset (loaded using the load_dataset()
function as a Polars DataFrame):
import pointblank as pb
+
+= pb.load_dataset("nycflights", tbl_type="polars")
+ nycflights
+ pb.missing_vals_tbl(nycflights)
Missing Values 46595 in total | +||||||||||
PolarsRows336776Columns18 |
+ ||||||||||
Column | ++ Row Sector + | +|||||||||
---|---|---|---|---|---|---|---|---|---|---|
1 | +2 | +3 | +4 | +5 | +6 | +7 | +8 | +9 | +10 | +|
NO MISSING VALUES PROPORTION MISSING: 0% 100% ROW SECTORS
|
+
The table shows the proportion of missing values in each column of the nycflights
dataset. The table is divided into sectors, with each sector representing a range of rows in the table (with around 34,000 rows per sector). The proportion of missing values in each sector is calculated for each column. The various shades of gray indicate the proportion of missing values in each sector. Many columns have no missing values at all, and those sectors are colored light blue.
Examples
It’s easy to preview a table using the preview()
function. Here’s an example using the small_table
dataset (itself loaded using the load_dataset()
function):
import pointblank as pb
= pb.load_dataset("small_table")
@@ -472,7 +472,7 @@ small_table_polars
This table is a Polars DataFrame, but the preview()
function works with any table supported by pointblank
, including Pandas DataFrames and Ibis backend tables. Here’s an example using a DuckDB table handled by Ibis:
= pb.load_dataset("small_table", tbl_type="duckdb")
small_table_duckdb
pb.preview(small_table_duckdb)
The blue dividing line marks the end of the first n_head=
rows and the start of the last n_tail=
rows.
We can adjust the number of rows shown from the start and end of the table by setting the n_head=
and n_tail=
parameters. Let’s enlarge each of these to 10
:
=10, n_tail=10) pb.preview(small_table_polars, n_head
In the above case, the entire dataset is shown since the sum of n_head=
and n_tail=
is greater than the number of rows in the table (which is 13).
The columns_subset=
parameter can be used to show only specific columns in the table. You can provide a list of column names to make the selection. Let’s try that with the "game_revenue"
dataset as a Pandas DataFrame:
= pb.load_dataset("game_revenue", tbl_type="pandas")
game_revenue_pandas
=["player_id", "item_name", "item_revenue"]) pb.preview(game_revenue_pandas, columns_subset
Alternatively, we can use column selector functions like starts_with()
and matches()
to select columns based on text or patterns:
=2, n_tail=2, columns_subset=pb.starts_with("session")) pb.preview(game_revenue_pandas, n_head
Multiple column selector functions can be combined within col()
using operators like |
and &
:
pb.preview(
game_revenue_pandas,=2,
diff --git a/reference/starts_with.html b/reference/starts_with.html
index 736ccb3a..f7d75a23 100644
--- a/reference/starts_with.html
+++ b/reference/starts_with.html
@@ -254,7 +254,7 @@ n_head
Examples
Suppose we have a table with columns name
, paid_2021
, paid_2022
, and person_id
and we’d like to validate that the values in columns that start with "paid"
are greater than 10
. We can use the starts_with()
column selector function to specify the columns that start with "paid"
as the columns to validate.
-
+
import pointblank as pb
import polars as pl
@@ -450,7 +450,7 @@
From the results of the validation table we get two validation steps, one for paid_2021
and one for paid_2022
. The values in both columns were all greater than 10
.
We can also use the starts_with()
function in combination with other column selectors (within col()
) to create more complex column selection criteria (i.e., to select columns that satisfy multiple conditions). For example, to select columns that start with "paid"
and match the text "2023"
or "2024"
, we can use the &
operator to combine column selectors.
-
+
= pl.DataFrame(
tbl
{"name": ["Alice", "Bob", "Charlie"],
diff --git a/search.json b/search.json
index 8e965a25..5c6b253c 100644
--- a/search.json
+++ b/search.json
@@ -151,7 +151,7 @@
"href": "reference/Validate.get_sundered_data.html#examples",
"title": "Validate.get_sundered_data",
"section": "Examples",
- "text": "Examples\nLet’s create a Validate object with three validation steps and then interrogate the data.\n\nimport pointblank as pb\nimport polars as pl\n\ntbl = pl.DataFrame(\n {\n \"a\": [7, 6, 9, 7, 3, 2],\n \"b\": [9, 8, 10, 5, 10, 6],\n \"c\": [\"c\", \"d\", \"a\", \"b\", \"a\", \"b\"]\n }\n)\n\nvalidation = (\n pb.Validate(data=tbl)\n .col_vals_gt(columns=\"a\", value=5)\n .col_vals_in_set(columns=\"c\", set=[\"a\", \"b\"])\n .interrogate()\n)\n\nvalidation\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-10|22:11:30Polars\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C66\n 1\n \n \n \n\n col_vals_gt\n \n \n \n \n \n \n\n \n col_vals_gt()\n \n a\n 5\n \n \n \n \n \n \n \n \n\n ✓\n 6\n 40.67\n 20.33\n —\n —\n —\n CSV\n \n \n #4CA64C66\n 2\n \n \n \n\n col_vals_in_set\n \n \n \n \n \n \n\n \n col_vals_in_set()\n \n c\n a, b\n \n \n \n \n \n \n \n \n\n ✓\n 6\n 40.67\n 20.33\n —\n —\n —\n CSV\n \n\n \n \n \n 2025-02-10 22:11:30 UTC< 1 s2025-02-10 22:11:30 UTC\n \n\n\n\n\n\n\n \n\n\nFrom the validation table, we can see that the first and second steps each had 4 passing test units. A failing test unit will mark the entire row as failing in the context of the get_sundered_data() method. We can use this method to get the rows of data that passed the during interrogation.\n\npb.preview(validation.get_sundered_data())\n\n\n\n\n\n \n \n \n \n\n\n\n\n\n \n aInt64\n bInt64\n cString\n\n\n\n \n 1\n 9\n 10\n a\n \n \n 2\n 7\n 5\n b\n \n\n\n\n\n\n\n \n\n\nThe returned DataFrame contains the rows that passed all validation steps (we passed this object to pb.preview() to show it in an HTML view). From the six-row input DataFrame, the first two rows and the last two rows had test units that failed validation. Thus the middle two rows are the only ones that passed all validation steps and that’s what we see in the returned DataFrame."
+ "text": "Examples\nLet’s create a Validate object with three validation steps and then interrogate the data.\n\nimport pointblank as pb\nimport polars as pl\n\ntbl = pl.DataFrame(\n {\n \"a\": [7, 6, 9, 7, 3, 2],\n \"b\": [9, 8, 10, 5, 10, 6],\n \"c\": [\"c\", \"d\", \"a\", \"b\", \"a\", \"b\"]\n }\n)\n\nvalidation = (\n pb.Validate(data=tbl)\n .col_vals_gt(columns=\"a\", value=5)\n .col_vals_in_set(columns=\"c\", set=[\"a\", \"b\"])\n .interrogate()\n)\n\nvalidation\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-11|05:29:27Polars\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C66\n 1\n \n \n \n\n col_vals_gt\n \n \n \n \n \n \n\n \n col_vals_gt()\n \n a\n 5\n \n \n \n \n \n \n \n \n\n ✓\n 6\n 40.67\n 20.33\n —\n —\n —\n CSV\n \n \n #4CA64C66\n 2\n \n \n \n\n col_vals_in_set\n \n \n \n \n \n \n\n \n col_vals_in_set()\n \n c\n a, b\n \n \n \n \n \n \n \n \n\n ✓\n 6\n 40.67\n 20.33\n —\n —\n —\n CSV\n \n\n \n \n \n 2025-02-11 05:29:27 UTC< 1 s2025-02-11 05:29:27 UTC\n \n\n\n\n\n\n\n \n\n\nFrom the validation table, we can see that the first and second steps each had 4 passing test units. A failing test unit will mark the entire row as failing in the context of the get_sundered_data() method. We can use this method to get the rows of data that passed the during interrogation.\n\npb.preview(validation.get_sundered_data())\n\n\n\n\n\n \n \n \n \n\n\n\n\n\n \n aInt64\n bInt64\n cString\n\n\n\n \n 1\n 9\n 10\n a\n \n \n 2\n 7\n 5\n b\n \n\n\n\n\n\n\n \n\n\nThe returned DataFrame contains the rows that passed all validation steps (we passed this object to pb.preview() to show it in an HTML view). From the six-row input DataFrame, the first two rows and the last two rows had test units that failed validation. Thus the middle two rows are the only ones that passed all validation steps and that’s what we see in the returned DataFrame."
},
{
"objectID": "reference/Validate.n_passed.html",
@@ -207,7 +207,7 @@
"href": "reference/Validate.warn.html#examples",
"title": "Validate.warn",
"section": "Examples",
- "text": "Examples\nIn the example below, we’ll use a simple Polars DataFrame with three columns (a, b, and c). There will be three validation steps, and the first step will have some failing test units, the rest will be completely passing. We’ve set thresholds here for each of the steps by using thresholds=(2, 4, 5), which means:\n\nthe warn threshold is 2 failing test units\nthe stop threshold is 4 failing test units\nthe notify threshold is 5 failing test units\n\nAfter interrogation, the warn() method is used to determine the warn status for each validation step.\n\nimport pointblank as pb\nimport polars as pl\n\ntbl = pl.DataFrame(\n {\n \"a\": [7, 4, 9, 7, 12, 3, 10],\n \"b\": [9, 8, 10, 5, 10, 6, 2],\n \"c\": [\"a\", \"b\", \"a\", \"a\", \"b\", \"b\", \"a\"]\n }\n)\n\nvalidation = (\n pb.Validate(data=tbl, thresholds=(2, 4, 5))\n .col_vals_gt(columns=\"a\", value=5)\n .col_vals_lt(columns=\"b\", value=15)\n .col_vals_in_set(columns=\"c\", set=[\"a\", \"b\"])\n .interrogate()\n)\n\nvalidation.warn()\n\n{1: True, 2: False, 3: False}\n\n\nThe returned dictionary provides the warn status for each validation step. The first step has a True value since the number of failing test units meets the threshold for the warn level. The second and third steps have False values since the number of failing test units was 0, which is below the threshold for the warn level.\nWe can also visually inspect the warn status across all steps by viewing the validation table:\n\nvalidation\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-10|22:11:19PolarsWARN2STOP4NOTIFY5\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #FFBF00\n 1\n \n \n \n\n col_vals_gt\n \n \n \n \n \n \n\n \n col_vals_gt()\n \n a\n 5\n \n \n \n \n \n \n \n \n\n ✓\n 7\n 50.71\n 20.29\n ●\n ○\n ○\n CSV\n \n \n #4CA64C\n 2\n \n \n \n\n col_vals_lt\n \n \n \n \n \n \n\n \n col_vals_lt()\n \n b\n 15\n \n \n \n \n \n \n \n \n\n ✓\n 7\n 71.00\n 00.00\n ○\n ○\n ○\n —\n \n \n #4CA64C\n 3\n \n \n \n\n col_vals_in_set\n \n \n \n \n \n \n\n \n col_vals_in_set()\n \n c\n a, b\n \n \n \n \n \n \n \n \n\n ✓\n 7\n 71.00\n 00.00\n ○\n ○\n ○\n —\n \n\n \n \n \n 2025-02-10 22:11:19 UTC< 1 s2025-02-10 22:11:19 UTC\n \n\n\n\n\n\n\n \n\n\nWe can see that there’s a filled yellow circle in the first step (far right side, in the W column) indicating that the warn threshold was met. The other steps have empty yellow circles. This means that thresholds were ‘set but not met’ in those steps.\nIf we wanted to check the warn status for a single validation step, we can provide the step number. Also, we could have the value returned as a scalar by setting scalar=True (ensuring that i= is a scalar).\n\nvalidation.warn(i=1)\n\n{1: True}\n\n\nThe returned value is True, indicating that the first validation step had the warn threshold met."
+ "text": "Examples\nIn the example below, we’ll use a simple Polars DataFrame with three columns (a, b, and c). There will be three validation steps, and the first step will have some failing test units, the rest will be completely passing. We’ve set thresholds here for each of the steps by using thresholds=(2, 4, 5), which means:\n\nthe warn threshold is 2 failing test units\nthe stop threshold is 4 failing test units\nthe notify threshold is 5 failing test units\n\nAfter interrogation, the warn() method is used to determine the warn status for each validation step.\n\nimport pointblank as pb\nimport polars as pl\n\ntbl = pl.DataFrame(\n {\n \"a\": [7, 4, 9, 7, 12, 3, 10],\n \"b\": [9, 8, 10, 5, 10, 6, 2],\n \"c\": [\"a\", \"b\", \"a\", \"a\", \"b\", \"b\", \"a\"]\n }\n)\n\nvalidation = (\n pb.Validate(data=tbl, thresholds=(2, 4, 5))\n .col_vals_gt(columns=\"a\", value=5)\n .col_vals_lt(columns=\"b\", value=15)\n .col_vals_in_set(columns=\"c\", set=[\"a\", \"b\"])\n .interrogate()\n)\n\nvalidation.warn()\n\n{1: True, 2: False, 3: False}\n\n\nThe returned dictionary provides the warn status for each validation step. The first step has a True value since the number of failing test units meets the threshold for the warn level. The second and third steps have False values since the number of failing test units was 0, which is below the threshold for the warn level.\nWe can also visually inspect the warn status across all steps by viewing the validation table:\n\nvalidation\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-11|05:29:15PolarsWARN2STOP4NOTIFY5\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #FFBF00\n 1\n \n \n \n\n col_vals_gt\n \n \n \n \n \n \n\n \n col_vals_gt()\n \n a\n 5\n \n \n \n \n \n \n \n \n\n ✓\n 7\n 50.71\n 20.29\n ●\n ○\n ○\n CSV\n \n \n #4CA64C\n 2\n \n \n \n\n col_vals_lt\n \n \n \n \n \n \n\n \n col_vals_lt()\n \n b\n 15\n \n \n \n \n \n \n \n \n\n ✓\n 7\n 71.00\n 00.00\n ○\n ○\n ○\n —\n \n \n #4CA64C\n 3\n \n \n \n\n col_vals_in_set\n \n \n \n \n \n \n\n \n col_vals_in_set()\n \n c\n a, b\n \n \n \n \n \n \n \n \n\n ✓\n 7\n 71.00\n 00.00\n ○\n ○\n ○\n —\n \n\n \n \n \n 2025-02-11 05:29:15 UTC< 1 s2025-02-11 05:29:15 UTC\n \n\n\n\n\n\n\n \n\n\nWe can see that there’s a filled yellow circle in the first step (far right side, in the W column) indicating that the warn threshold was met. The other steps have empty yellow circles. This means that thresholds were ‘set but not met’ in those steps.\nIf we wanted to check the warn status for a single validation step, we can provide the step number. Also, we could have the value returned as a scalar by setting scalar=True (ensuring that i= is a scalar).\n\nvalidation.warn(i=1)\n\n{1: True}\n\n\nThe returned value is True, indicating that the first validation step had the warn threshold met."
},
{
"objectID": "reference/everything.html",
@@ -284,7 +284,7 @@
"href": "reference/Validate.html#creating-a-validation-plan-and-interrogating",
"title": "Validate",
"section": "Creating a validation plan and interrogating",
- "text": "Creating a validation plan and interrogating\nLet’s walk through a data quality analysis of an extremely small table. It’s actually called small_table and it’s accessible through the load_dataset() function.\n\nimport pointblank as pb\n\n# Load the small_table dataset\nsmall_table = pb.load_dataset()\n\n# Preview the table\npb.preview(small_table)\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PolarsRows13Columns8\n \n\n \n date_timeDatetime\n dateDate\n aInt64\n bString\n cInt64\n dFloat64\n eBoolean\n fString\n\n\n\n \n 1\n 2016-01-04 11:00:00\n 2016-01-04\n 2\n 1-bcd-345\n 3\n 3423.29\n True\n high\n \n \n 2\n 2016-01-04 00:32:00\n 2016-01-04\n 3\n 5-egh-163\n 8\n 9999.99\n True\n low\n \n \n 3\n 2016-01-05 13:32:00\n 2016-01-05\n 6\n 8-kdg-938\n 3\n 2343.23\n True\n high\n \n \n 4\n 2016-01-06 17:23:00\n 2016-01-06\n 2\n 5-jdo-903\n None\n 3892.4\n False\n mid\n \n \n 5\n 2016-01-09 12:36:00\n 2016-01-09\n 8\n 3-ldm-038\n 7\n 283.94\n True\n low\n \n \n 9\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 10\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 11\n 2016-01-26 20:07:00\n 2016-01-26\n 4\n 2-dmx-010\n 7\n 833.98\n True\n low\n \n \n 12\n 2016-01-28 02:51:00\n 2016-01-28\n 2\n 7-dmx-010\n 8\n 108.34\n False\n low\n \n \n 13\n 2016-01-30 11:23:00\n 2016-01-30\n 1\n 3-dka-303\n None\n 2230.09\n True\n high\n \n\n\n\n\n\n\n \n\n\nWe ought to think about what’s tolerable in terms of data quality so let’s designate proportional failure thresholds to the warn, stop, and notify states. This can be done by using the Thresholds class.\n\nthresholds = pb.Thresholds(warn_at=0.10, stop_at=0.25, notify_at=0.35)\n\nNow, we use the Validate class and give it the thresholds object (which serves as a default for all validation steps but can be overridden). The static thresholds provided in thresholds will make the reporting a bit more useful. We also need to provide a target table and we’ll use small_table for this.\n\nvalidation = (\n pb.Validate(\n data=small_table,\n tbl_name=\"small_table\",\n label=\"`Validate` example.\",\n thresholds=thresholds\n )\n)\n\nThen, as with any Validate object, we can add steps to the validation plan by using as many validation methods as we want. To conclude the process (and actually query the data table), we use the interrogate() method.\n\nvalidation = (\n validation\n .col_vals_gt(columns=\"d\", value=100)\n .col_vals_le(columns=\"c\", value=5)\n .col_vals_between(columns=\"c\", left=3, right=10, na_pass=True)\n .col_vals_regex(columns=\"b\", pattern=r\"[0-9]-[a-z]{3}-[0-9]{3}\")\n .col_exists(columns=[\"date\", \"date_time\"])\n .interrogate()\n)\n\nThe validation object can be printed as a reporting table.\n\nvalidation\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n `Validate` example.Polarssmall_tableWARN0.1STOP0.25NOTIFY0.35\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C\n 1\n \n \n \n\n col_vals_gt\n \n \n \n \n \n \n\n \n col_vals_gt()\n \n d\n 100\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n ○\n ○\n ○\n —\n \n \n #CF142B\n 2\n \n \n \n\n col_vals_lte\n \n \n \n \n \n \n\n \n col_vals_le()\n \n c\n 5\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 50.38\n 80.62\n ●\n ●\n ●\n CSV\n \n \n #4CA64C66\n 3\n \n \n \n\n col_vals_between\n \n \n \n \n \n \n\n \n col_vals_between()\n \n c\n [3, 10]\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 120.92\n 10.08\n ○\n ○\n ○\n CSV\n \n \n #4CA64C\n 4\n \n \n \n\n col_vals_regex\n \n \n \n \n \n \n \n \n \n\n \n col_vals_regex()\n \n b\n [0-9]-[a-z]{3}-[0-9]{3}\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n ○\n ○\n ○\n —\n \n \n #4CA64C\n 5\n \n \n \n\n col_exists\n \n \n \n \n \n \n \n\n \n col_exists()\n \n date\n —\n \n \n \n \n \n \n \n \n\n ✓\n 1\n 11.00\n 00.00\n ○\n ○\n ○\n —\n \n \n #4CA64C\n 6\n \n \n \n\n col_exists\n \n \n \n \n \n \n \n\n \n col_exists()\n \n date_time\n —\n \n \n \n \n \n \n \n \n\n ✓\n 1\n 11.00\n 00.00\n ○\n ○\n ○\n —\n \n\n \n \n \n 2025-02-10 22:11:08 UTC< 1 s2025-02-10 22:11:08 UTC\n \n\n\n\n\n\n\n \n\n\nThe report could be further customized by using the get_tabular_report() method, which contains options for modifying the display of the table.\nFurthermore, post-interrogation methods such as get_step_report(), get_data_extracts(), and get_sundered_data() allow you to generate additional reporting or extract useful data for downstream analysis from a Validate object."
+ "text": "Creating a validation plan and interrogating\nLet’s walk through a data quality analysis of an extremely small table. It’s actually called small_table and it’s accessible through the load_dataset() function.\n\nimport pointblank as pb\n\n# Load the small_table dataset\nsmall_table = pb.load_dataset()\n\n# Preview the table\npb.preview(small_table)\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PolarsRows13Columns8\n \n\n \n date_timeDatetime\n dateDate\n aInt64\n bString\n cInt64\n dFloat64\n eBoolean\n fString\n\n\n\n \n 1\n 2016-01-04 11:00:00\n 2016-01-04\n 2\n 1-bcd-345\n 3\n 3423.29\n True\n high\n \n \n 2\n 2016-01-04 00:32:00\n 2016-01-04\n 3\n 5-egh-163\n 8\n 9999.99\n True\n low\n \n \n 3\n 2016-01-05 13:32:00\n 2016-01-05\n 6\n 8-kdg-938\n 3\n 2343.23\n True\n high\n \n \n 4\n 2016-01-06 17:23:00\n 2016-01-06\n 2\n 5-jdo-903\n None\n 3892.4\n False\n mid\n \n \n 5\n 2016-01-09 12:36:00\n 2016-01-09\n 8\n 3-ldm-038\n 7\n 283.94\n True\n low\n \n \n 9\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 10\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 11\n 2016-01-26 20:07:00\n 2016-01-26\n 4\n 2-dmx-010\n 7\n 833.98\n True\n low\n \n \n 12\n 2016-01-28 02:51:00\n 2016-01-28\n 2\n 7-dmx-010\n 8\n 108.34\n False\n low\n \n \n 13\n 2016-01-30 11:23:00\n 2016-01-30\n 1\n 3-dka-303\n None\n 2230.09\n True\n high\n \n\n\n\n\n\n\n \n\n\nWe ought to think about what’s tolerable in terms of data quality so let’s designate proportional failure thresholds to the warn, stop, and notify states. This can be done by using the Thresholds class.\n\nthresholds = pb.Thresholds(warn_at=0.10, stop_at=0.25, notify_at=0.35)\n\nNow, we use the Validate class and give it the thresholds object (which serves as a default for all validation steps but can be overridden). The static thresholds provided in thresholds will make the reporting a bit more useful. We also need to provide a target table and we’ll use small_table for this.\n\nvalidation = (\n pb.Validate(\n data=small_table,\n tbl_name=\"small_table\",\n label=\"`Validate` example.\",\n thresholds=thresholds\n )\n)\n\nThen, as with any Validate object, we can add steps to the validation plan by using as many validation methods as we want. To conclude the process (and actually query the data table), we use the interrogate() method.\n\nvalidation = (\n validation\n .col_vals_gt(columns=\"d\", value=100)\n .col_vals_le(columns=\"c\", value=5)\n .col_vals_between(columns=\"c\", left=3, right=10, na_pass=True)\n .col_vals_regex(columns=\"b\", pattern=r\"[0-9]-[a-z]{3}-[0-9]{3}\")\n .col_exists(columns=[\"date\", \"date_time\"])\n .interrogate()\n)\n\nThe validation object can be printed as a reporting table.\n\nvalidation\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n `Validate` example.Polarssmall_tableWARN0.1STOP0.25NOTIFY0.35\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C\n 1\n \n \n \n\n col_vals_gt\n \n \n \n \n \n \n\n \n col_vals_gt()\n \n d\n 100\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n ○\n ○\n ○\n —\n \n \n #CF142B\n 2\n \n \n \n\n col_vals_lte\n \n \n \n \n \n \n\n \n col_vals_le()\n \n c\n 5\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 50.38\n 80.62\n ●\n ●\n ●\n CSV\n \n \n #4CA64C66\n 3\n \n \n \n\n col_vals_between\n \n \n \n \n \n \n\n \n col_vals_between()\n \n c\n [3, 10]\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 120.92\n 10.08\n ○\n ○\n ○\n CSV\n \n \n #4CA64C\n 4\n \n \n \n\n col_vals_regex\n \n \n \n \n \n \n \n \n \n\n \n col_vals_regex()\n \n b\n [0-9]-[a-z]{3}-[0-9]{3}\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n ○\n ○\n ○\n —\n \n \n #4CA64C\n 5\n \n \n \n\n col_exists\n \n \n \n \n \n \n \n\n \n col_exists()\n \n date\n —\n \n \n \n \n \n \n \n \n\n ✓\n 1\n 11.00\n 00.00\n ○\n ○\n ○\n —\n \n \n #4CA64C\n 6\n \n \n \n\n col_exists\n \n \n \n \n \n \n \n\n \n col_exists()\n \n date_time\n —\n \n \n \n \n \n \n \n \n\n ✓\n 1\n 11.00\n 00.00\n ○\n ○\n ○\n —\n \n\n \n \n \n 2025-02-11 05:29:04 UTC< 1 s2025-02-11 05:29:04 UTC\n \n\n\n\n\n\n\n \n\n\nThe report could be further customized by using the get_tabular_report() method, which contains options for modifying the display of the table.\nFurthermore, post-interrogation methods such as get_step_report(), get_data_extracts(), and get_sundered_data() allow you to generate additional reporting or extract useful data for downstream analysis from a Validate object."
},
{
"objectID": "reference/Validate.col_vals_not_null.html",
@@ -396,7 +396,7 @@
"href": "reference/Validate.interrogate.html#examples",
"title": "Validate.interrogate",
"section": "Examples",
- "text": "Examples\nLet’s use a built-in dataset (\"game_revenue\") to demonstrate some of the options of the interrogation process. A series of validation steps will populate our validation plan. After setting up the plan, the next step is to interrogate the table and see how well it aligns with our expectations. We’ll use the get_first_n= option so that any extracts of failing rows are limited to the first n rows.\n\nimport pointblank as pb\nimport polars as pl\n\nvalidation = (\n pb.Validate(data=pb.load_dataset(dataset=\"game_revenue\"))\n .col_vals_lt(columns=\"item_revenue\", value=200)\n .col_vals_gt(columns=\"item_revenue\", value=0)\n .col_vals_gt(columns=\"session_duration\", value=5)\n .col_vals_in_set(columns=\"item_type\", set=[\"iap\", \"ad\"])\n .col_vals_regex(columns=\"player_id\", pattern=r\"[A-Z]{12}[0-9]{3}\")\n)\n\nvalidation.interrogate(get_first_n=10)\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-10|22:10:51Polars\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C\n 1\n \n \n \n\n col_vals_lt\n \n \n \n \n \n \n\n \n col_vals_lt()\n \n item_revenue\n 200\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 2\n \n \n \n\n col_vals_gt\n \n \n \n \n \n \n\n \n col_vals_gt()\n \n item_revenue\n 0\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C66\n 3\n \n \n \n\n col_vals_gt\n \n \n \n \n \n \n\n \n col_vals_gt()\n \n session_duration\n 5\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 19820.99\n 180.01\n —\n —\n —\n CSV\n \n \n #4CA64C\n 4\n \n \n \n\n col_vals_in_set\n \n \n \n \n \n \n\n \n col_vals_in_set()\n \n item_type\n iap, ad\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 5\n \n \n \n\n col_vals_regex\n \n \n \n \n \n \n \n \n \n\n \n col_vals_regex()\n \n player_id\n [A-Z]{12}[0-9]{3}\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n —\n —\n —\n —\n \n\n \n \n \n 2025-02-10 22:10:51 UTC< 1 s2025-02-10 22:10:51 UTC\n \n\n\n\n\n\n\n \n\n\nThe validation table shows that step 3 (checking for session_duration greater than 5) has 18 failing test units. This means that 18 rows in the table are problematic. We’d like to see the rows that failed this validation step and we can do that with the get_data_extracts() method.\n\npb.preview(validation.get_data_extracts(i=3, frame=True))\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PolarsRows10Columns12\n \n\n \n player_idString\n session_idString\n session_startDatetime\n timeDatetime\n item_typeString\n item_nameString\n item_revenueFloat64\n session_durationFloat64\n start_dayDate\n acquisitionString\n countryString\n\n\n\n \n 549\n QNLVRDEOXFYJ892\n QNLVRDEOXFYJ892-lz5fmr6k\n 2015-01-10 16:44:17+00:00\n 2015-01-10 16:45:29+00:00\n iap\n gold3\n 3.49\n 3.7\n 2015-01-09\n crosspromo\n Australia\n \n \n 620\n RMOSWHJGELCI675\n RMOSWHJGELCI675-t4y8bjcu\n 2015-01-11 07:24:24+00:00\n 2015-01-11 07:25:18+00:00\n iap\n offer4\n 17.991\n 5.0\n 2015-01-10\n other_campaign\n France\n \n \n 621\n RMOSWHJGELCI675\n RMOSWHJGELCI675-t4y8bjcu\n 2015-01-11 07:24:24+00:00\n 2015-01-11 07:26:24+00:00\n iap\n offer5\n 26.09\n 5.0\n 2015-01-10\n other_campaign\n France\n \n \n 622\n RMOSWHJGELCI675\n RMOSWHJGELCI675-t4y8bjcu\n 2015-01-11 07:24:24+00:00\n 2015-01-11 07:28:36+00:00\n ad\n ad_15sec\n 0.53\n 5.0\n 2015-01-10\n other_campaign\n France\n \n \n 663\n GFLYJHAPMZWD631\n GFLYJHAPMZWD631-i2v1bl7a\n 2015-01-11 16:13:24+00:00\n 2015-01-11 16:14:54+00:00\n iap\n gems2\n 3.99\n 3.6\n 2015-01-09\n organic\n India\n \n \n 772\n BFNLURISJXTH647\n BFNLURISJXTH647-6o5hx27z\n 2015-01-12 17:37:39+00:00\n 2015-01-12 17:39:27+00:00\n iap\n offer5\n 11.59\n 4.1\n 2015-01-10\n organic\n India\n \n \n 773\n BFNLURISJXTH647\n BFNLURISJXTH647-6o5hx27z\n 2015-01-12 17:37:39+00:00\n 2015-01-12 17:41:45+00:00\n iap\n gems3\n 9.99\n 4.1\n 2015-01-10\n organic\n India\n \n \n 908\n KILWZYHRSJEG316\n KILWZYHRSJEG316-uke7dhqj\n 2015-01-13 22:16:29+00:00\n 2015-01-13 22:17:35+00:00\n iap\n offer2\n 10.99\n 3.2\n 2015-01-04\n organic\n Denmark\n \n \n 1037\n JUBDVFHCNQWT198\n JUBDVFHCNQWT198-9h4xs2pb\n 2015-01-14 16:08:25+00:00\n 2015-01-14 16:08:43+00:00\n iap\n offer5\n 8.69\n 3.3\n 2015-01-14\n organic\n Philippines\n \n \n 1038\n JUBDVFHCNQWT198\n JUBDVFHCNQWT198-9h4xs2pb\n 2015-01-14 16:08:25+00:00\n 2015-01-14 16:11:01+00:00\n iap\n offer4\n 5.99\n 3.3\n 2015-01-14\n organic\n Philippines\n \n\n\n\n\n\n\n \n\n\nThe get_data_extracts() method will return a Polars DataFrame with the first 10 rows that failed the validation step (we passed that into the preview() function for a better display). There are actually 18 rows that failed but we limited the collection of extracts with get_first_n=10."
+ "text": "Examples\nLet’s use a built-in dataset (\"game_revenue\") to demonstrate some of the options of the interrogation process. A series of validation steps will populate our validation plan. After setting up the plan, the next step is to interrogate the table and see how well it aligns with our expectations. We’ll use the get_first_n= option so that any extracts of failing rows are limited to the first n rows.\n\nimport pointblank as pb\nimport polars as pl\n\nvalidation = (\n pb.Validate(data=pb.load_dataset(dataset=\"game_revenue\"))\n .col_vals_lt(columns=\"item_revenue\", value=200)\n .col_vals_gt(columns=\"item_revenue\", value=0)\n .col_vals_gt(columns=\"session_duration\", value=5)\n .col_vals_in_set(columns=\"item_type\", set=[\"iap\", \"ad\"])\n .col_vals_regex(columns=\"player_id\", pattern=r\"[A-Z]{12}[0-9]{3}\")\n)\n\nvalidation.interrogate(get_first_n=10)\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-11|05:28:45Polars\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C\n 1\n \n \n \n\n col_vals_lt\n \n \n \n \n \n \n\n \n col_vals_lt()\n \n item_revenue\n 200\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 2\n \n \n \n\n col_vals_gt\n \n \n \n \n \n \n\n \n col_vals_gt()\n \n item_revenue\n 0\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C66\n 3\n \n \n \n\n col_vals_gt\n \n \n \n \n \n \n\n \n col_vals_gt()\n \n session_duration\n 5\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 19820.99\n 180.01\n —\n —\n —\n CSV\n \n \n #4CA64C\n 4\n \n \n \n\n col_vals_in_set\n \n \n \n \n \n \n\n \n col_vals_in_set()\n \n item_type\n iap, ad\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 5\n \n \n \n\n col_vals_regex\n \n \n \n \n \n \n \n \n \n\n \n col_vals_regex()\n \n player_id\n [A-Z]{12}[0-9]{3}\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n —\n —\n —\n —\n \n\n \n \n \n 2025-02-11 05:28:45 UTC< 1 s2025-02-11 05:28:45 UTC\n \n\n\n\n\n\n\n \n\n\nThe validation table shows that step 3 (checking for session_duration greater than 5) has 18 failing test units. This means that 18 rows in the table are problematic. We’d like to see the rows that failed this validation step and we can do that with the get_data_extracts() method.\n\npb.preview(validation.get_data_extracts(i=3, frame=True))\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PolarsRows10Columns12\n \n\n \n player_idString\n session_idString\n session_startDatetime\n timeDatetime\n item_typeString\n item_nameString\n item_revenueFloat64\n session_durationFloat64\n start_dayDate\n acquisitionString\n countryString\n\n\n\n \n 549\n QNLVRDEOXFYJ892\n QNLVRDEOXFYJ892-lz5fmr6k\n 2015-01-10 16:44:17+00:00\n 2015-01-10 16:45:29+00:00\n iap\n gold3\n 3.49\n 3.7\n 2015-01-09\n crosspromo\n Australia\n \n \n 620\n RMOSWHJGELCI675\n RMOSWHJGELCI675-t4y8bjcu\n 2015-01-11 07:24:24+00:00\n 2015-01-11 07:25:18+00:00\n iap\n offer4\n 17.991\n 5.0\n 2015-01-10\n other_campaign\n France\n \n \n 621\n RMOSWHJGELCI675\n RMOSWHJGELCI675-t4y8bjcu\n 2015-01-11 07:24:24+00:00\n 2015-01-11 07:26:24+00:00\n iap\n offer5\n 26.09\n 5.0\n 2015-01-10\n other_campaign\n France\n \n \n 622\n RMOSWHJGELCI675\n RMOSWHJGELCI675-t4y8bjcu\n 2015-01-11 07:24:24+00:00\n 2015-01-11 07:28:36+00:00\n ad\n ad_15sec\n 0.53\n 5.0\n 2015-01-10\n other_campaign\n France\n \n \n 663\n GFLYJHAPMZWD631\n GFLYJHAPMZWD631-i2v1bl7a\n 2015-01-11 16:13:24+00:00\n 2015-01-11 16:14:54+00:00\n iap\n gems2\n 3.99\n 3.6\n 2015-01-09\n organic\n India\n \n \n 772\n BFNLURISJXTH647\n BFNLURISJXTH647-6o5hx27z\n 2015-01-12 17:37:39+00:00\n 2015-01-12 17:39:27+00:00\n iap\n offer5\n 11.59\n 4.1\n 2015-01-10\n organic\n India\n \n \n 773\n BFNLURISJXTH647\n BFNLURISJXTH647-6o5hx27z\n 2015-01-12 17:37:39+00:00\n 2015-01-12 17:41:45+00:00\n iap\n gems3\n 9.99\n 4.1\n 2015-01-10\n organic\n India\n \n \n 908\n KILWZYHRSJEG316\n KILWZYHRSJEG316-uke7dhqj\n 2015-01-13 22:16:29+00:00\n 2015-01-13 22:17:35+00:00\n iap\n offer2\n 10.99\n 3.2\n 2015-01-04\n organic\n Denmark\n \n \n 1037\n JUBDVFHCNQWT198\n JUBDVFHCNQWT198-9h4xs2pb\n 2015-01-14 16:08:25+00:00\n 2015-01-14 16:08:43+00:00\n iap\n offer5\n 8.69\n 3.3\n 2015-01-14\n organic\n Philippines\n \n \n 1038\n JUBDVFHCNQWT198\n JUBDVFHCNQWT198-9h4xs2pb\n 2015-01-14 16:08:25+00:00\n 2015-01-14 16:11:01+00:00\n iap\n offer4\n 5.99\n 3.3\n 2015-01-14\n organic\n Philippines\n \n\n\n\n\n\n\n \n\n\nThe get_data_extracts() method will return a Polars DataFrame with the first 10 rows that failed the validation step (we passed that into the preview() function for a better display). There are actually 18 rows that failed but we limited the collection of extracts with get_first_n=10."
},
{
"objectID": "reference/Validate.n_failed.html",
@@ -732,7 +732,7 @@
"href": "reference/load_dataset.html#included-datasets",
"title": "load_dataset",
"section": "Included Datasets",
- "text": "Included Datasets\nThere are two included datasets that can be loaded using the load_dataset() function:\n\nsmall_table: A small dataset with 13 rows and 8 columns. This dataset is useful for testing and demonstration purposes.\ngame_revenue: A dataset with 2000 rows and 11 columns. Provides revenue data for a game development company. For the particular game, there are records of player sessions, the items they purchased, ads viewed, and the revenue generated."
+ "text": "Included Datasets\nThere are two included datasets that can be loaded using the load_dataset() function:\n\nsmall_table: A small dataset with 13 rows and 8 columns. This dataset is useful for testing and demonstration purposes.\ngame_revenue: A dataset with 2000 rows and 11 columns. Provides revenue data for a game development company. For the particular game, there are records of player sessions, the items they purchased, ads viewed, and the revenue generated.\nnycflights: A dataset with 336,776 rows and 18 columns. This dataset provides information about flights departing from New York City airports (JFK, LGA, or EWR) in 2013."
},
{
"objectID": "reference/load_dataset.html#supported-dataframe-types",
@@ -746,7 +746,7 @@
"href": "reference/load_dataset.html#examples",
"title": "load_dataset",
"section": "Examples",
- "text": "Examples\nLoad the small_table dataset as a Polars DataFrame by calling load_dataset() with its defaults:\n\nimport pointblank as pb\n\nsmall_table = pb.load_dataset()\n\npb.preview(small_table)\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PolarsRows13Columns8\n \n\n \n date_timeDatetime\n dateDate\n aInt64\n bString\n cInt64\n dFloat64\n eBoolean\n fString\n\n\n\n \n 1\n 2016-01-04 11:00:00\n 2016-01-04\n 2\n 1-bcd-345\n 3\n 3423.29\n True\n high\n \n \n 2\n 2016-01-04 00:32:00\n 2016-01-04\n 3\n 5-egh-163\n 8\n 9999.99\n True\n low\n \n \n 3\n 2016-01-05 13:32:00\n 2016-01-05\n 6\n 8-kdg-938\n 3\n 2343.23\n True\n high\n \n \n 4\n 2016-01-06 17:23:00\n 2016-01-06\n 2\n 5-jdo-903\n None\n 3892.4\n False\n mid\n \n \n 5\n 2016-01-09 12:36:00\n 2016-01-09\n 8\n 3-ldm-038\n 7\n 283.94\n True\n low\n \n \n 9\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 10\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 11\n 2016-01-26 20:07:00\n 2016-01-26\n 4\n 2-dmx-010\n 7\n 833.98\n True\n low\n \n \n 12\n 2016-01-28 02:51:00\n 2016-01-28\n 2\n 7-dmx-010\n 8\n 108.34\n False\n low\n \n \n 13\n 2016-01-30 11:23:00\n 2016-01-30\n 1\n 3-dka-303\n None\n 2230.09\n True\n high\n \n\n\n\n\n\n\n \n\n\nNote that the small_table dataset is a simple Polars DataFrame and using the preview() function will display the table in an HTML viewing environment.\nThe game_revenue dataset can be loaded as a Pandas DataFrame by specifying the dataset name and setting tbl_type=\"pandas\":\n\nimport pointblank as pb\n\ngame_revenue = pb.load_dataset(dataset=\"game_revenue\", tbl_type=\"pandas\")\n\npb.preview(game_revenue)\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PandasRows2000Columns11\n \n\n \n player_idobject\n session_idobject\n session_startdatetime64[ns, UTC]\n timedatetime64[ns, UTC]\n item_typeobject\n item_nameobject\n item_revenuefloat64\n session_durationfloat64\n start_daydatetime64[ns]\n acquisitionobject\n countryobject\n\n\n\n \n 1\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:31:27+00:00\n iap\n offer2\n 8.99\n 16.3\n 2015-01-01 00:00:00\n google\n Germany\n \n \n 2\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:36:57+00:00\n iap\n gems3\n 22.49\n 16.3\n 2015-01-01 00:00:00\n google\n Germany\n \n \n 3\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:37:45+00:00\n iap\n gold7\n 107.99\n 16.3\n 2015-01-01 00:00:00\n google\n Germany\n \n \n 4\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:42:33+00:00\n ad\n ad_20sec\n 0.76\n 16.3\n 2015-01-01 00:00:00\n google\n Germany\n \n \n 5\n ECPANOIXLZHF896\n ECPANOIXLZHF896-hdu9jkls\n 2015-01-01 11:50:02+00:00\n 2015-01-01 11:55:20+00:00\n ad\n ad_5sec\n 0.03\n 35.2\n 2015-01-01 00:00:00\n google\n Germany\n \n \n 1996\n NAOJRDMCSEBI281\n NAOJRDMCSEBI281-j2vs9ilp\n 2015-01-21 01:57:50+00:00\n 2015-01-21 02:02:50+00:00\n ad\n ad_survey\n 1.332\n 25.8\n 2015-01-11 00:00:00\n organic\n Norway\n \n \n 1997\n NAOJRDMCSEBI281\n NAOJRDMCSEBI281-j2vs9ilp\n 2015-01-21 01:57:50+00:00\n 2015-01-21 02:22:14+00:00\n ad\n ad_survey\n 1.35\n 25.8\n 2015-01-11 00:00:00\n organic\n Norway\n \n \n 1998\n RMOSWHJGELCI675\n RMOSWHJGELCI675-vbhcsmtr\n 2015-01-21 02:39:48+00:00\n 2015-01-21 02:40:00+00:00\n ad\n ad_5sec\n 0.03\n 8.4\n 2015-01-10 00:00:00\n other_campaign\n France\n \n \n 1999\n RMOSWHJGELCI675\n RMOSWHJGELCI675-vbhcsmtr\n 2015-01-21 02:39:48+00:00\n 2015-01-21 02:47:12+00:00\n iap\n offer5\n 26.09\n 8.4\n 2015-01-10 00:00:00\n other_campaign\n France\n \n \n 2000\n GJCXNTWEBIPQ369\n GJCXNTWEBIPQ369-9elq67md\n 2015-01-21 03:59:23+00:00\n 2015-01-21 04:06:29+00:00\n ad\n ad_5sec\n 0.12\n 18.5\n 2015-01-14 00:00:00\n organic\n United States\n \n\n\n\n\n\n\n \n\n\nThe game_revenue dataset is a more real-world dataset with a mix of data types, and it’s significantly larger than the small_table dataset at 2000 rows and 11 columns."
+ "text": "Examples\nLoad the small_table dataset as a Polars DataFrame by calling load_dataset() with its defaults:\n\nimport pointblank as pb\n\nsmall_table = pb.load_dataset()\n\npb.preview(small_table)\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PolarsRows13Columns8\n \n\n \n date_timeDatetime\n dateDate\n aInt64\n bString\n cInt64\n dFloat64\n eBoolean\n fString\n\n\n\n \n 1\n 2016-01-04 11:00:00\n 2016-01-04\n 2\n 1-bcd-345\n 3\n 3423.29\n True\n high\n \n \n 2\n 2016-01-04 00:32:00\n 2016-01-04\n 3\n 5-egh-163\n 8\n 9999.99\n True\n low\n \n \n 3\n 2016-01-05 13:32:00\n 2016-01-05\n 6\n 8-kdg-938\n 3\n 2343.23\n True\n high\n \n \n 4\n 2016-01-06 17:23:00\n 2016-01-06\n 2\n 5-jdo-903\n None\n 3892.4\n False\n mid\n \n \n 5\n 2016-01-09 12:36:00\n 2016-01-09\n 8\n 3-ldm-038\n 7\n 283.94\n True\n low\n \n \n 9\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 10\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 11\n 2016-01-26 20:07:00\n 2016-01-26\n 4\n 2-dmx-010\n 7\n 833.98\n True\n low\n \n \n 12\n 2016-01-28 02:51:00\n 2016-01-28\n 2\n 7-dmx-010\n 8\n 108.34\n False\n low\n \n \n 13\n 2016-01-30 11:23:00\n 2016-01-30\n 1\n 3-dka-303\n None\n 2230.09\n True\n high\n \n\n\n\n\n\n\n \n\n\nNote that the small_table dataset is a simple Polars DataFrame and using the preview() function will display the table in an HTML viewing environment.\nThe game_revenue dataset can be loaded as a Pandas DataFrame by specifying the dataset name and setting tbl_type=\"pandas\":\n\nimport pointblank as pb\n\ngame_revenue = pb.load_dataset(dataset=\"game_revenue\", tbl_type=\"pandas\")\n\npb.preview(game_revenue)\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PandasRows2000Columns11\n \n\n \n player_idobject\n session_idobject\n session_startdatetime64[ns, UTC]\n timedatetime64[ns, UTC]\n item_typeobject\n item_nameobject\n item_revenuefloat64\n session_durationfloat64\n start_daydatetime64[ns]\n acquisitionobject\n countryobject\n\n\n\n \n 1\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:31:27+00:00\n iap\n offer2\n 8.99\n 16.3\n 2015-01-01 00:00:00\n google\n Germany\n \n \n 2\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:36:57+00:00\n iap\n gems3\n 22.49\n 16.3\n 2015-01-01 00:00:00\n google\n Germany\n \n \n 3\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:37:45+00:00\n iap\n gold7\n 107.99\n 16.3\n 2015-01-01 00:00:00\n google\n Germany\n \n \n 4\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:42:33+00:00\n ad\n ad_20sec\n 0.76\n 16.3\n 2015-01-01 00:00:00\n google\n Germany\n \n \n 5\n ECPANOIXLZHF896\n ECPANOIXLZHF896-hdu9jkls\n 2015-01-01 11:50:02+00:00\n 2015-01-01 11:55:20+00:00\n ad\n ad_5sec\n 0.03\n 35.2\n 2015-01-01 00:00:00\n google\n Germany\n \n \n 1996\n NAOJRDMCSEBI281\n NAOJRDMCSEBI281-j2vs9ilp\n 2015-01-21 01:57:50+00:00\n 2015-01-21 02:02:50+00:00\n ad\n ad_survey\n 1.332\n 25.8\n 2015-01-11 00:00:00\n organic\n Norway\n \n \n 1997\n NAOJRDMCSEBI281\n NAOJRDMCSEBI281-j2vs9ilp\n 2015-01-21 01:57:50+00:00\n 2015-01-21 02:22:14+00:00\n ad\n ad_survey\n 1.35\n 25.8\n 2015-01-11 00:00:00\n organic\n Norway\n \n \n 1998\n RMOSWHJGELCI675\n RMOSWHJGELCI675-vbhcsmtr\n 2015-01-21 02:39:48+00:00\n 2015-01-21 02:40:00+00:00\n ad\n ad_5sec\n 0.03\n 8.4\n 2015-01-10 00:00:00\n other_campaign\n France\n \n \n 1999\n RMOSWHJGELCI675\n RMOSWHJGELCI675-vbhcsmtr\n 2015-01-21 02:39:48+00:00\n 2015-01-21 02:47:12+00:00\n iap\n offer5\n 26.09\n 8.4\n 2015-01-10 00:00:00\n other_campaign\n France\n \n \n 2000\n GJCXNTWEBIPQ369\n GJCXNTWEBIPQ369-9elq67md\n 2015-01-21 03:59:23+00:00\n 2015-01-21 04:06:29+00:00\n ad\n ad_5sec\n 0.12\n 18.5\n 2015-01-14 00:00:00\n organic\n United States\n \n\n\n\n\n\n\n \n\n\nThe game_revenue dataset is a more real-world dataset with a mix of data types, and it’s significantly larger than the small_table dataset at 2000 rows and 11 columns.\nThe nycflights dataset can be loaded as a DuckDB table by specifying the dataset name and setting tbl_type=\"duckdb\":\n\nimport pointblank as pb\n\nnycflights = pb.load_dataset(dataset=\"nycflights\", tbl_type=\"duckdb\")\n\npb.preview(nycflights)\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n DuckDBRows336776Columns18\n \n\n \n yearint64\n monthint64\n dayint64\n dep_timeint64\n sched_dep_timeint64\n dep_delayint64\n arr_timeint64\n sched_arr_timeint64\n arr_delayint64\n carrierstring\n flightint64\n tailnumstring\n originstring\n deststring\n air_timeint64\n distanceint64\n hourint64\n minuteint64\n\n\n\n \n 1\n 2013\n 1\n 1\n 517\n 515\n 2\n 830\n 819\n 11\n UA\n 1545\n N14228\n EWR\n IAH\n 227\n 1400\n 5\n 15\n \n \n 2\n 2013\n 1\n 1\n 533\n 529\n 4\n 850\n 830\n 20\n UA\n 1714\n N24211\n LGA\n IAH\n 227\n 1416\n 5\n 29\n \n \n 3\n 2013\n 1\n 1\n 542\n 540\n 2\n 923\n 850\n 33\n AA\n 1141\n N619AA\n JFK\n MIA\n 160\n 1089\n 5\n 40\n \n \n 4\n 2013\n 1\n 1\n 544\n 545\n -1\n 1004\n 1022\n -18\n B6\n 725\n N804JB\n JFK\n BQN\n 183\n 1576\n 5\n 45\n \n \n 5\n 2013\n 1\n 1\n 554\n 600\n -6\n 812\n 837\n -25\n DL\n 461\n N668DN\n LGA\n ATL\n 116\n 762\n 6\n 0\n \n \n 336772\n 2013\n 9\n 30\n NULL\n 1455\n NULL\n NULL\n 1634\n NULL\n 9E\n 3393\n NULL\n JFK\n DCA\n NULL\n 213\n 14\n 55\n \n \n 336773\n 2013\n 9\n 30\n NULL\n 2200\n NULL\n NULL\n 2312\n NULL\n 9E\n 3525\n NULL\n LGA\n SYR\n NULL\n 198\n 22\n 0\n \n \n 336774\n 2013\n 9\n 30\n NULL\n 1210\n NULL\n NULL\n 1330\n NULL\n MQ\n 3461\n N535MQ\n LGA\n BNA\n NULL\n 764\n 12\n 10\n \n \n 336775\n 2013\n 9\n 30\n NULL\n 1159\n NULL\n NULL\n 1344\n NULL\n MQ\n 3572\n N511MQ\n LGA\n CLE\n NULL\n 419\n 11\n 59\n \n \n 336776\n 2013\n 9\n 30\n NULL\n 840\n NULL\n NULL\n 1020\n NULL\n MQ\n 3531\n N839MQ\n LGA\n RDU\n NULL\n 431\n 8\n 40\n \n\n\n\n\n\n\n \n\n\nThe nycflights dataset is a large dataset with 336,776 rows and 18 columns. This dataset is truly a real-world dataset and provides information about flights originating from New York City airports in 2013."
},
{
"objectID": "reference/Validate.all_passed.html",
@@ -858,7 +858,7 @@
"href": "get-started/thresholds.html",
"title": "Thresholds",
"section": "",
- "text": "This is a work in progress. And some of this article is just an outline for now.\nThresholds enable you to signal failure at different severity levels. In the near future, thresholds will be able to trigger custom actions. For example, when testing a column for Null/missing values with col_vals_not_null() you might want to warn on any missing values and stop where there are 10% missing values in the column.\nimport pointblank as pb\n\nvalidation_1 = (\n pb.Validate(data=pb.load_dataset(dataset=\"small_table\"))\n .col_vals_not_null(columns=\"c\", thresholds=(1, 0.1))\n .interrogate()\n)\n\nvalidation_1\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-10|22:09:24Polars\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #CF142B\n 1\n \n \n \n\n col_vals_not_null\n \n \n \n \n \n \n \n \n\n \n col_vals_not_null()\n \n c\n —\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 110.85\n 20.15\n ●\n ●\n —\n CSV\n \n\n \n \n \n 2025-02-10 22:09:24 UTC< 1 s2025-02-10 22:09:24 UTC\nThe code uses thresholds=(1, 0.1) to set a warn threshold of 1 and a stop threshold of 0.1 (which is 10%) failing test units. Notice these pieces in the validation table:\nThe one final threshold, N (notify), wasn’t set so appears on the validation table as a dash."
+ "text": "This is a work in progress. And some of this article is just an outline for now.\nThresholds enable you to signal failure at different severity levels. In the near future, thresholds will be able to trigger custom actions. For example, when testing a column for Null/missing values with col_vals_not_null() you might want to warn on any missing values and stop where there are 10% missing values in the column.\nimport pointblank as pb\n\nvalidation_1 = (\n pb.Validate(data=pb.load_dataset(dataset=\"small_table\"))\n .col_vals_not_null(columns=\"c\", thresholds=(1, 0.1))\n .interrogate()\n)\n\nvalidation_1\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-11|05:27:10Polars\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #CF142B\n 1\n \n \n \n\n col_vals_not_null\n \n \n \n \n \n \n \n \n\n \n col_vals_not_null()\n \n c\n —\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 110.85\n 20.15\n ●\n ●\n —\n CSV\n \n\n \n \n \n 2025-02-11 05:27:10 UTC< 1 s2025-02-11 05:27:10 UTC\nThe code uses thresholds=(1, 0.1) to set a warn threshold of 1 and a stop threshold of 0.1 (which is 10%) failing test units. Notice these pieces in the validation table:\nThe one final threshold, N (notify), wasn’t set so appears on the validation table as a dash."
},
{
"objectID": "get-started/thresholds.html#using-the-validationthreshold-argument",
@@ -907,126 +907,126 @@
"href": "demos/set-membership/index.html",
"title": "pointblank",
"section": "",
- "text": "Set Membership\nPerform validations that check whether values are part of a set (or not part of one).\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-10|22:09:19Polars\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C\n 1\n \n \n \n\n col_vals_in_set\n \n \n \n \n \n \n\n \n col_vals_in_set()\n \n f\n low, mid, high\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 2\n \n \n \n\n col_vals_not_in_set\n \n \n \n \n \n \n \n \n\n \n col_vals_not_in_set()\n \n f\n zero, infinity\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n —\n —\n —\n —\n \n\n \n \n \n 2025-02-10 22:09:19 UTC< 1 s2025-02-10 22:09:19 UTC\n \n\n\n\n\n\n\n \n\n\nimport pointblank as pb\n\nvalidation = (\n pb.Validate(\n data=pb.load_dataset(dataset=\"small_table\", tbl_type=\"polars\")\n )\n .col_vals_in_set(columns=\"f\", set=[\"low\", \"mid\", \"high\"]) # part of this set\n .col_vals_not_in_set(columns=\"f\", set=[\"zero\", \"infinity\"]) # not part of this set\n .interrogate()\n)\n\nvalidation\n\n\nPreview of Input Table\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PolarsRows13Columns8\n \n\n \n date_timeDatetime\n dateDate\n aInt64\n bString\n cInt64\n dFloat64\n eBoolean\n fString\n\n\n\n \n 1\n 2016-01-04 11:00:00\n 2016-01-04\n 2\n 1-bcd-345\n 3\n 3423.29\n True\n high\n \n \n 2\n 2016-01-04 00:32:00\n 2016-01-04\n 3\n 5-egh-163\n 8\n 9999.99\n True\n low\n \n \n 3\n 2016-01-05 13:32:00\n 2016-01-05\n 6\n 8-kdg-938\n 3\n 2343.23\n True\n high\n \n \n 4\n 2016-01-06 17:23:00\n 2016-01-06\n 2\n 5-jdo-903\n None\n 3892.4\n False\n mid\n \n \n 5\n 2016-01-09 12:36:00\n 2016-01-09\n 8\n 3-ldm-038\n 7\n 283.94\n True\n low\n \n \n 6\n 2016-01-11 06:15:00\n 2016-01-11\n 4\n 2-dhe-923\n 4\n 3291.03\n True\n mid\n \n \n 7\n 2016-01-15 18:46:00\n 2016-01-15\n 7\n 1-knw-093\n 3\n 843.34\n True\n high\n \n \n 8\n 2016-01-17 11:27:00\n 2016-01-17\n 4\n 5-boe-639\n 2\n 1035.64\n False\n low\n \n \n 9\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 10\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 11\n 2016-01-26 20:07:00\n 2016-01-26\n 4\n 2-dmx-010\n 7\n 833.98\n True\n low\n \n \n 12\n 2016-01-28 02:51:00\n 2016-01-28\n 2\n 7-dmx-010\n 8\n 108.34\n False\n low\n \n \n 13\n 2016-01-30 11:23:00\n 2016-01-30\n 1\n 3-dka-303\n None\n 2230.09\n True\n high"
+ "text": "Set Membership\nPerform validations that check whether values are part of a set (or not part of one).\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-11|05:27:04Polars\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C\n 1\n \n \n \n\n col_vals_in_set\n \n \n \n \n \n \n\n \n col_vals_in_set()\n \n f\n low, mid, high\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 2\n \n \n \n\n col_vals_not_in_set\n \n \n \n \n \n \n \n \n\n \n col_vals_not_in_set()\n \n f\n zero, infinity\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n —\n —\n —\n —\n \n\n \n \n \n 2025-02-11 05:27:04 UTC< 1 s2025-02-11 05:27:04 UTC\n \n\n\n\n\n\n\n \n\n\nimport pointblank as pb\n\nvalidation = (\n pb.Validate(\n data=pb.load_dataset(dataset=\"small_table\", tbl_type=\"polars\")\n )\n .col_vals_in_set(columns=\"f\", set=[\"low\", \"mid\", \"high\"]) # part of this set\n .col_vals_not_in_set(columns=\"f\", set=[\"zero\", \"infinity\"]) # not part of this set\n .interrogate()\n)\n\nvalidation\n\n\nPreview of Input Table\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PolarsRows13Columns8\n \n\n \n date_timeDatetime\n dateDate\n aInt64\n bString\n cInt64\n dFloat64\n eBoolean\n fString\n\n\n\n \n 1\n 2016-01-04 11:00:00\n 2016-01-04\n 2\n 1-bcd-345\n 3\n 3423.29\n True\n high\n \n \n 2\n 2016-01-04 00:32:00\n 2016-01-04\n 3\n 5-egh-163\n 8\n 9999.99\n True\n low\n \n \n 3\n 2016-01-05 13:32:00\n 2016-01-05\n 6\n 8-kdg-938\n 3\n 2343.23\n True\n high\n \n \n 4\n 2016-01-06 17:23:00\n 2016-01-06\n 2\n 5-jdo-903\n None\n 3892.4\n False\n mid\n \n \n 5\n 2016-01-09 12:36:00\n 2016-01-09\n 8\n 3-ldm-038\n 7\n 283.94\n True\n low\n \n \n 6\n 2016-01-11 06:15:00\n 2016-01-11\n 4\n 2-dhe-923\n 4\n 3291.03\n True\n mid\n \n \n 7\n 2016-01-15 18:46:00\n 2016-01-15\n 7\n 1-knw-093\n 3\n 843.34\n True\n high\n \n \n 8\n 2016-01-17 11:27:00\n 2016-01-17\n 4\n 5-boe-639\n 2\n 1035.64\n False\n low\n \n \n 9\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 10\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 11\n 2016-01-26 20:07:00\n 2016-01-26\n 4\n 2-dmx-010\n 7\n 833.98\n True\n low\n \n \n 12\n 2016-01-28 02:51:00\n 2016-01-28\n 2\n 7-dmx-010\n 8\n 108.34\n False\n low\n \n \n 13\n 2016-01-30 11:23:00\n 2016-01-30\n 1\n 3-dka-303\n None\n 2230.09\n True\n high"
},
{
"objectID": "demos/comparisons-across-columns/index.html",
"href": "demos/comparisons-across-columns/index.html",
"title": "pointblank",
"section": "",
- "text": "Comparison Checks Across Columns\nPerform comparisons of values in columns to values in other columns.\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-10|22:09:13Polars\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C66\n 1\n \n \n \n\n col_vals_lt\n \n \n \n \n \n \n\n \n col_vals_lt()\n \n a\n c\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 60.46\n 70.54\n —\n —\n —\n CSV\n \n \n #4CA64C\n 2\n \n \n \n\n col_vals_between\n \n \n \n \n \n \n\n \n col_vals_between()\n \n d\n [c, 12000]\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n —\n —\n —\n —\n \n\n \n \n \n 2025-02-10 22:09:13 UTC< 1 s2025-02-10 22:09:13 UTC\n \n\n\n\n\n\n\n \n\n\nimport pointblank as pb\n\nvalidation = (\n pb.Validate(\n data=pb.load_dataset(dataset=\"small_table\", tbl_type=\"polars\")\n )\n .col_vals_lt(columns=\"a\", value=pb.col(\"c\")) # values in 'a' > values in 'c'\n .col_vals_between(\n columns=\"d\", # values in 'd' are between values\n left=pb.col(\"c\"), # in 'c' and the fixed value of 12,000;\n right=12000, # any missing values encountered result\n na_pass=True # in a passing test unit\n )\n .interrogate()\n)\n\nvalidation\n\n\nPreview of Input Table\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PolarsRows13Columns8\n \n\n \n date_timeDatetime\n dateDate\n aInt64\n bString\n cInt64\n dFloat64\n eBoolean\n fString\n\n\n\n \n 1\n 2016-01-04 11:00:00\n 2016-01-04\n 2\n 1-bcd-345\n 3\n 3423.29\n True\n high\n \n \n 2\n 2016-01-04 00:32:00\n 2016-01-04\n 3\n 5-egh-163\n 8\n 9999.99\n True\n low\n \n \n 3\n 2016-01-05 13:32:00\n 2016-01-05\n 6\n 8-kdg-938\n 3\n 2343.23\n True\n high\n \n \n 4\n 2016-01-06 17:23:00\n 2016-01-06\n 2\n 5-jdo-903\n None\n 3892.4\n False\n mid\n \n \n 5\n 2016-01-09 12:36:00\n 2016-01-09\n 8\n 3-ldm-038\n 7\n 283.94\n True\n low\n \n \n 6\n 2016-01-11 06:15:00\n 2016-01-11\n 4\n 2-dhe-923\n 4\n 3291.03\n True\n mid\n \n \n 7\n 2016-01-15 18:46:00\n 2016-01-15\n 7\n 1-knw-093\n 3\n 843.34\n True\n high\n \n \n 8\n 2016-01-17 11:27:00\n 2016-01-17\n 4\n 5-boe-639\n 2\n 1035.64\n False\n low\n \n \n 9\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 10\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 11\n 2016-01-26 20:07:00\n 2016-01-26\n 4\n 2-dmx-010\n 7\n 833.98\n True\n low\n \n \n 12\n 2016-01-28 02:51:00\n 2016-01-28\n 2\n 7-dmx-010\n 8\n 108.34\n False\n low\n \n \n 13\n 2016-01-30 11:23:00\n 2016-01-30\n 1\n 3-dka-303\n None\n 2230.09\n True\n high"
+ "text": "Comparison Checks Across Columns\nPerform comparisons of values in columns to values in other columns.\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-11|05:26:58Polars\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C66\n 1\n \n \n \n\n col_vals_lt\n \n \n \n \n \n \n\n \n col_vals_lt()\n \n a\n c\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 60.46\n 70.54\n —\n —\n —\n CSV\n \n \n #4CA64C\n 2\n \n \n \n\n col_vals_between\n \n \n \n \n \n \n\n \n col_vals_between()\n \n d\n [c, 12000]\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n —\n —\n —\n —\n \n\n \n \n \n 2025-02-11 05:26:58 UTC< 1 s2025-02-11 05:26:58 UTC\n \n\n\n\n\n\n\n \n\n\nimport pointblank as pb\n\nvalidation = (\n pb.Validate(\n data=pb.load_dataset(dataset=\"small_table\", tbl_type=\"polars\")\n )\n .col_vals_lt(columns=\"a\", value=pb.col(\"c\")) # values in 'a' > values in 'c'\n .col_vals_between(\n columns=\"d\", # values in 'd' are between values\n left=pb.col(\"c\"), # in 'c' and the fixed value of 12,000;\n right=12000, # any missing values encountered result\n na_pass=True # in a passing test unit\n )\n .interrogate()\n)\n\nvalidation\n\n\nPreview of Input Table\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PolarsRows13Columns8\n \n\n \n date_timeDatetime\n dateDate\n aInt64\n bString\n cInt64\n dFloat64\n eBoolean\n fString\n\n\n\n \n 1\n 2016-01-04 11:00:00\n 2016-01-04\n 2\n 1-bcd-345\n 3\n 3423.29\n True\n high\n \n \n 2\n 2016-01-04 00:32:00\n 2016-01-04\n 3\n 5-egh-163\n 8\n 9999.99\n True\n low\n \n \n 3\n 2016-01-05 13:32:00\n 2016-01-05\n 6\n 8-kdg-938\n 3\n 2343.23\n True\n high\n \n \n 4\n 2016-01-06 17:23:00\n 2016-01-06\n 2\n 5-jdo-903\n None\n 3892.4\n False\n mid\n \n \n 5\n 2016-01-09 12:36:00\n 2016-01-09\n 8\n 3-ldm-038\n 7\n 283.94\n True\n low\n \n \n 6\n 2016-01-11 06:15:00\n 2016-01-11\n 4\n 2-dhe-923\n 4\n 3291.03\n True\n mid\n \n \n 7\n 2016-01-15 18:46:00\n 2016-01-15\n 7\n 1-knw-093\n 3\n 843.34\n True\n high\n \n \n 8\n 2016-01-17 11:27:00\n 2016-01-17\n 4\n 5-boe-639\n 2\n 1035.64\n False\n low\n \n \n 9\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 10\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 11\n 2016-01-26 20:07:00\n 2016-01-26\n 4\n 2-dmx-010\n 7\n 833.98\n True\n low\n \n \n 12\n 2016-01-28 02:51:00\n 2016-01-28\n 2\n 7-dmx-010\n 8\n 108.34\n False\n low\n \n \n 13\n 2016-01-30 11:23:00\n 2016-01-30\n 1\n 3-dka-303\n None\n 2230.09\n True\n high"
},
{
"objectID": "demos/failure-thresholds/index.html",
"href": "demos/failure-thresholds/index.html",
"title": "pointblank",
"section": "",
- "text": "Set Failure Threshold Levels\nSet threshold levels to better gauge adverse data quality.\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-10|22:09:07DuckDBWARN0.05STOP0.1NOTIFY0.15\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C\n 1\n \n \n \n\n col_vals_in_set\n \n \n \n \n \n \n\n \n col_vals_in_set()\n \n item_type\n iap, ad\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n ○\n ○\n ○\n —\n \n \n #4CA64C\n 2\n \n \n \n\n col_vals_regex\n \n \n \n \n \n \n \n \n \n\n \n col_vals_regex()\n \n player_id\n [A-Z]{12}\\d{3}\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n ○\n ○\n ○\n —\n \n \n #CF142B\n 3\n \n \n \n\n col_vals_gt\n \n \n \n \n \n \n\n \n col_vals_gt()\n \n item_revenue\n 0.05\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 17010.85\n 2990.15\n ●\n ●\n ○\n —\n \n \n #FFBF00\n 4\n \n \n \n\n col_vals_gt\n \n \n \n \n \n \n\n \n col_vals_gt()\n \n session_duration\n 4\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 19931.00\n 70.00\n ●\n ○\n ○\n —\n \n \n #CF142B\n 5\n \n \n \n\n col_exists\n \n \n \n \n \n \n \n\n \n col_exists()\n \n end_day\n —\n \n \n \n \n \n \n \n \n\n ✓\n 1\n 00.00\n 11.00\n ●\n ●\n ●\n —\n \n\n \n \n \n 2025-02-10 22:09:07 UTC< 1 s2025-02-10 22:09:07 UTC\n \n\n\n\n\n\n\n \n\n\nimport pointblank as pb\n\nvalidation = (\n pb.Validate(\n data=pb.load_dataset(dataset=\"game_revenue\", tbl_type=\"duckdb\"),\n thresholds=pb.Thresholds( # setting relative threshold defaults for all steps\n warn_at=0.05, # 5% failing test units: warn threshold (yellow)\n stop_at=0.10, # 10% failed test units: stop threshold (red)\n notify_at=0.15 # 15% failed test units: notify threshold (blue)\n ),\n )\n .col_vals_in_set(columns=\"item_type\", set=[\"iap\", \"ad\"])\n .col_vals_regex(columns=\"player_id\", pattern=r\"[A-Z]{12}\\d{3}\")\n .col_vals_gt(columns=\"item_revenue\", value=0.05)\n .col_vals_gt(\n columns=\"session_duration\",\n value=4,\n thresholds=(5, 10, 20) # setting absolute thresholds for *this* step (warn, stop, notify)\n )\n .col_exists(columns=\"end_day\")\n .interrogate()\n)\n\nvalidation\n\n\nPreview of Input Table\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n DuckDBRows2000Columns11\n \n\n \n player_idstring\n session_idstring\n session_starttimestamp\n timetimestamp\n item_typestring\n item_namestring\n item_revenuefloat64\n session_durationfloat64\n start_daydate\n acquisitionstring\n countrystring\n\n\n\n \n 1\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:31:27+00:00\n iap\n offer2\n 8.99\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 2\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:36:57+00:00\n iap\n gems3\n 22.49\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 3\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:37:45+00:00\n iap\n gold7\n 107.99\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 4\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:42:33+00:00\n ad\n ad_20sec\n 0.76\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 5\n ECPANOIXLZHF896\n ECPANOIXLZHF896-hdu9jkls\n 2015-01-01 11:50:02+00:00\n 2015-01-01 11:55:20+00:00\n ad\n ad_5sec\n 0.03\n 35.2\n 2015-01-01\n google\n Germany\n \n \n 1996\n NAOJRDMCSEBI281\n NAOJRDMCSEBI281-j2vs9ilp\n 2015-01-21 01:57:50+00:00\n 2015-01-21 02:02:50+00:00\n ad\n ad_survey\n 1.332\n 25.8\n 2015-01-11\n organic\n Norway\n \n \n 1997\n NAOJRDMCSEBI281\n NAOJRDMCSEBI281-j2vs9ilp\n 2015-01-21 01:57:50+00:00\n 2015-01-21 02:22:14+00:00\n ad\n ad_survey\n 1.35\n 25.8\n 2015-01-11\n organic\n Norway\n \n \n 1998\n RMOSWHJGELCI675\n RMOSWHJGELCI675-vbhcsmtr\n 2015-01-21 02:39:48+00:00\n 2015-01-21 02:40:00+00:00\n ad\n ad_5sec\n 0.03\n 8.4\n 2015-01-10\n other_campaign\n France\n \n \n 1999\n RMOSWHJGELCI675\n RMOSWHJGELCI675-vbhcsmtr\n 2015-01-21 02:39:48+00:00\n 2015-01-21 02:47:12+00:00\n iap\n offer5\n 26.09\n 8.4\n 2015-01-10\n other_campaign\n France\n \n \n 2000\n GJCXNTWEBIPQ369\n GJCXNTWEBIPQ369-9elq67md\n 2015-01-21 03:59:23+00:00\n 2015-01-21 04:06:29+00:00\n ad\n ad_5sec\n 0.12\n 18.5\n 2015-01-14\n organic\n United States"
+ "text": "Set Failure Threshold Levels\nSet threshold levels to better gauge adverse data quality.\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-11|05:26:52DuckDBWARN0.05STOP0.1NOTIFY0.15\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C\n 1\n \n \n \n\n col_vals_in_set\n \n \n \n \n \n \n\n \n col_vals_in_set()\n \n item_type\n iap, ad\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n ○\n ○\n ○\n —\n \n \n #4CA64C\n 2\n \n \n \n\n col_vals_regex\n \n \n \n \n \n \n \n \n \n\n \n col_vals_regex()\n \n player_id\n [A-Z]{12}\\d{3}\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n ○\n ○\n ○\n —\n \n \n #CF142B\n 3\n \n \n \n\n col_vals_gt\n \n \n \n \n \n \n\n \n col_vals_gt()\n \n item_revenue\n 0.05\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 17010.85\n 2990.15\n ●\n ●\n ○\n —\n \n \n #FFBF00\n 4\n \n \n \n\n col_vals_gt\n \n \n \n \n \n \n\n \n col_vals_gt()\n \n session_duration\n 4\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 19931.00\n 70.00\n ●\n ○\n ○\n —\n \n \n #CF142B\n 5\n \n \n \n\n col_exists\n \n \n \n \n \n \n \n\n \n col_exists()\n \n end_day\n —\n \n \n \n \n \n \n \n \n\n ✓\n 1\n 00.00\n 11.00\n ●\n ●\n ●\n —\n \n\n \n \n \n 2025-02-11 05:26:52 UTC< 1 s2025-02-11 05:26:52 UTC\n \n\n\n\n\n\n\n \n\n\nimport pointblank as pb\n\nvalidation = (\n pb.Validate(\n data=pb.load_dataset(dataset=\"game_revenue\", tbl_type=\"duckdb\"),\n thresholds=pb.Thresholds( # setting relative threshold defaults for all steps\n warn_at=0.05, # 5% failing test units: warn threshold (yellow)\n stop_at=0.10, # 10% failed test units: stop threshold (red)\n notify_at=0.15 # 15% failed test units: notify threshold (blue)\n ),\n )\n .col_vals_in_set(columns=\"item_type\", set=[\"iap\", \"ad\"])\n .col_vals_regex(columns=\"player_id\", pattern=r\"[A-Z]{12}\\d{3}\")\n .col_vals_gt(columns=\"item_revenue\", value=0.05)\n .col_vals_gt(\n columns=\"session_duration\",\n value=4,\n thresholds=(5, 10, 20) # setting absolute thresholds for *this* step (warn, stop, notify)\n )\n .col_exists(columns=\"end_day\")\n .interrogate()\n)\n\nvalidation\n\n\nPreview of Input Table\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n DuckDBRows2000Columns11\n \n\n \n player_idstring\n session_idstring\n session_starttimestamp\n timetimestamp\n item_typestring\n item_namestring\n item_revenuefloat64\n session_durationfloat64\n start_daydate\n acquisitionstring\n countrystring\n\n\n\n \n 1\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:31:27+00:00\n iap\n offer2\n 8.99\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 2\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:36:57+00:00\n iap\n gems3\n 22.49\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 3\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:37:45+00:00\n iap\n gold7\n 107.99\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 4\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:42:33+00:00\n ad\n ad_20sec\n 0.76\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 5\n ECPANOIXLZHF896\n ECPANOIXLZHF896-hdu9jkls\n 2015-01-01 11:50:02+00:00\n 2015-01-01 11:55:20+00:00\n ad\n ad_5sec\n 0.03\n 35.2\n 2015-01-01\n google\n Germany\n \n \n 1996\n NAOJRDMCSEBI281\n NAOJRDMCSEBI281-j2vs9ilp\n 2015-01-21 01:57:50+00:00\n 2015-01-21 02:02:50+00:00\n ad\n ad_survey\n 1.332\n 25.8\n 2015-01-11\n organic\n Norway\n \n \n 1997\n NAOJRDMCSEBI281\n NAOJRDMCSEBI281-j2vs9ilp\n 2015-01-21 01:57:50+00:00\n 2015-01-21 02:22:14+00:00\n ad\n ad_survey\n 1.35\n 25.8\n 2015-01-11\n organic\n Norway\n \n \n 1998\n RMOSWHJGELCI675\n RMOSWHJGELCI675-vbhcsmtr\n 2015-01-21 02:39:48+00:00\n 2015-01-21 02:40:00+00:00\n ad\n ad_5sec\n 0.03\n 8.4\n 2015-01-10\n other_campaign\n France\n \n \n 1999\n RMOSWHJGELCI675\n RMOSWHJGELCI675-vbhcsmtr\n 2015-01-21 02:39:48+00:00\n 2015-01-21 02:47:12+00:00\n iap\n offer5\n 26.09\n 8.4\n 2015-01-10\n other_campaign\n France\n \n \n 2000\n GJCXNTWEBIPQ369\n GJCXNTWEBIPQ369-9elq67md\n 2015-01-21 03:59:23+00:00\n 2015-01-21 04:06:29+00:00\n ad\n ad_5sec\n 0.12\n 18.5\n 2015-01-14\n organic\n United States"
},
{
"objectID": "demos/apply-checks-to-several-columns/index.html",
"href": "demos/apply-checks-to-several-columns/index.html",
"title": "pointblank",
"section": "",
- "text": "Apply Validation Rules to Multiple Columns\nCreate multiple validation steps by using a list of column names with columns=.\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-10|22:09:04Polars\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C\n 1\n \n \n \n\n col_vals_gte\n \n \n \n \n \n \n\n \n col_vals_ge()\n \n a\n 0\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C66\n 2\n \n \n \n\n col_vals_gte\n \n \n \n \n \n \n\n \n col_vals_ge()\n \n c\n 0\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 110.85\n 20.15\n —\n —\n —\n CSV\n \n \n #4CA64C\n 3\n \n \n \n\n col_vals_gte\n \n \n \n \n \n \n\n \n col_vals_ge()\n \n d\n 0\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 4\n \n \n \n\n col_exists\n \n \n \n \n \n \n \n\n \n col_exists()\n \n date_time\n —\n \n \n \n \n \n \n \n \n\n ✓\n 1\n 11.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 5\n \n \n \n\n col_exists\n \n \n \n \n \n \n \n\n \n col_exists()\n \n date\n —\n \n \n \n \n \n \n \n \n\n ✓\n 1\n 11.00\n 00.00\n —\n —\n —\n —\n \n\n \n \n \n 2025-02-10 22:09:04 UTC< 1 s2025-02-10 22:09:04 UTC\n \n\n\n\n\n\n\n \n\n\nimport pointblank as pb\n\nvalidation = (\n pb.Validate(\n data=pb.load_dataset(dataset=\"small_table\", tbl_type=\"polars\")\n )\n .col_vals_ge(columns=[\"a\", \"c\", \"d\"], value=0) # check values in 'a', 'c', and 'd'\n .col_exists(columns=[\"date_time\", \"date\"]) # check for the existence of two columns\n .interrogate()\n)\n\nvalidation\n\n\nPreview of Input Table\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PolarsRows13Columns8\n \n\n \n date_timeDatetime\n dateDate\n aInt64\n bString\n cInt64\n dFloat64\n eBoolean\n fString\n\n\n\n \n 1\n 2016-01-04 11:00:00\n 2016-01-04\n 2\n 1-bcd-345\n 3\n 3423.29\n True\n high\n \n \n 2\n 2016-01-04 00:32:00\n 2016-01-04\n 3\n 5-egh-163\n 8\n 9999.99\n True\n low\n \n \n 3\n 2016-01-05 13:32:00\n 2016-01-05\n 6\n 8-kdg-938\n 3\n 2343.23\n True\n high\n \n \n 4\n 2016-01-06 17:23:00\n 2016-01-06\n 2\n 5-jdo-903\n None\n 3892.4\n False\n mid\n \n \n 5\n 2016-01-09 12:36:00\n 2016-01-09\n 8\n 3-ldm-038\n 7\n 283.94\n True\n low\n \n \n 6\n 2016-01-11 06:15:00\n 2016-01-11\n 4\n 2-dhe-923\n 4\n 3291.03\n True\n mid\n \n \n 7\n 2016-01-15 18:46:00\n 2016-01-15\n 7\n 1-knw-093\n 3\n 843.34\n True\n high\n \n \n 8\n 2016-01-17 11:27:00\n 2016-01-17\n 4\n 5-boe-639\n 2\n 1035.64\n False\n low\n \n \n 9\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 10\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 11\n 2016-01-26 20:07:00\n 2016-01-26\n 4\n 2-dmx-010\n 7\n 833.98\n True\n low\n \n \n 12\n 2016-01-28 02:51:00\n 2016-01-28\n 2\n 7-dmx-010\n 8\n 108.34\n False\n low\n \n \n 13\n 2016-01-30 11:23:00\n 2016-01-30\n 1\n 3-dka-303\n None\n 2230.09\n True\n high"
+ "text": "Apply Validation Rules to Multiple Columns\nCreate multiple validation steps by using a list of column names with columns=.\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-11|05:26:48Polars\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C\n 1\n \n \n \n\n col_vals_gte\n \n \n \n \n \n \n\n \n col_vals_ge()\n \n a\n 0\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C66\n 2\n \n \n \n\n col_vals_gte\n \n \n \n \n \n \n\n \n col_vals_ge()\n \n c\n 0\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 110.85\n 20.15\n —\n —\n —\n CSV\n \n \n #4CA64C\n 3\n \n \n \n\n col_vals_gte\n \n \n \n \n \n \n\n \n col_vals_ge()\n \n d\n 0\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 4\n \n \n \n\n col_exists\n \n \n \n \n \n \n \n\n \n col_exists()\n \n date_time\n —\n \n \n \n \n \n \n \n \n\n ✓\n 1\n 11.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 5\n \n \n \n\n col_exists\n \n \n \n \n \n \n \n\n \n col_exists()\n \n date\n —\n \n \n \n \n \n \n \n \n\n ✓\n 1\n 11.00\n 00.00\n —\n —\n —\n —\n \n\n \n \n \n 2025-02-11 05:26:48 UTC< 1 s2025-02-11 05:26:48 UTC\n \n\n\n\n\n\n\n \n\n\nimport pointblank as pb\n\nvalidation = (\n pb.Validate(\n data=pb.load_dataset(dataset=\"small_table\", tbl_type=\"polars\")\n )\n .col_vals_ge(columns=[\"a\", \"c\", \"d\"], value=0) # check values in 'a', 'c', and 'd'\n .col_exists(columns=[\"date_time\", \"date\"]) # check for the existence of two columns\n .interrogate()\n)\n\nvalidation\n\n\nPreview of Input Table\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PolarsRows13Columns8\n \n\n \n date_timeDatetime\n dateDate\n aInt64\n bString\n cInt64\n dFloat64\n eBoolean\n fString\n\n\n\n \n 1\n 2016-01-04 11:00:00\n 2016-01-04\n 2\n 1-bcd-345\n 3\n 3423.29\n True\n high\n \n \n 2\n 2016-01-04 00:32:00\n 2016-01-04\n 3\n 5-egh-163\n 8\n 9999.99\n True\n low\n \n \n 3\n 2016-01-05 13:32:00\n 2016-01-05\n 6\n 8-kdg-938\n 3\n 2343.23\n True\n high\n \n \n 4\n 2016-01-06 17:23:00\n 2016-01-06\n 2\n 5-jdo-903\n None\n 3892.4\n False\n mid\n \n \n 5\n 2016-01-09 12:36:00\n 2016-01-09\n 8\n 3-ldm-038\n 7\n 283.94\n True\n low\n \n \n 6\n 2016-01-11 06:15:00\n 2016-01-11\n 4\n 2-dhe-923\n 4\n 3291.03\n True\n mid\n \n \n 7\n 2016-01-15 18:46:00\n 2016-01-15\n 7\n 1-knw-093\n 3\n 843.34\n True\n high\n \n \n 8\n 2016-01-17 11:27:00\n 2016-01-17\n 4\n 5-boe-639\n 2\n 1035.64\n False\n low\n \n \n 9\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 10\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 11\n 2016-01-26 20:07:00\n 2016-01-26\n 4\n 2-dmx-010\n 7\n 833.98\n True\n low\n \n \n 12\n 2016-01-28 02:51:00\n 2016-01-28\n 2\n 7-dmx-010\n 8\n 108.34\n False\n low\n \n \n 13\n 2016-01-30 11:23:00\n 2016-01-30\n 1\n 3-dka-303\n None\n 2230.09\n True\n high"
},
{
"objectID": "demos/mutate-table-in-step/index.html",
"href": "demos/mutate-table-in-step/index.html",
"title": "pointblank",
"section": "",
- "text": "Mutate the Table in a Validation Step\nFor far more specialized validations, modify the table with the pre= argument before checking it.\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-10|22:08:58Polars\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C\n 1\n \n \n \n\n col_vals_between\n \n \n \n \n \n \n\n \n col_vals_between()\n \n a\n [3, 6]\n \n \n \n \n \n \n \n \n \n \n\n ✓\n 1\n 11.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 2\n \n \n \n\n col_vals_equal\n \n \n \n \n \n \n\n \n col_vals_eq()\n \n b_len\n 9\n \n \n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n —\n —\n —\n —\n \n\n \n \n \n 2025-02-10 22:08:58 UTC< 1 s2025-02-10 22:08:58 UTC\n \n\n\n\n\n\n\n \n\n\nimport pointblank as pb\nimport polars as pl\nimport narwhals as nw\n\nvalidation = (\n pb.Validate(\n data=pb.load_dataset(dataset=\"small_table\", tbl_type=\"polars\")\n )\n .col_vals_between(\n columns=\"a\",\n left=3, right=6,\n pre=lambda df: df.select(pl.median(\"a\")) # Use a Polars expression to aggregate\n )\n .col_vals_eq(\n columns=\"b_len\",\n value=9,\n pre=lambda dfn: dfn.with_columns( # Use a Narwhals expression, identified\n b_len=nw.col(\"b\").str.len_chars() # by the 'dfn' here\n )\n )\n .interrogate()\n)\n\nvalidation\n\n\nPreview of Input Table\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PolarsRows13Columns8\n \n\n \n date_timeDatetime\n dateDate\n aInt64\n bString\n cInt64\n dFloat64\n eBoolean\n fString\n\n\n\n \n 1\n 2016-01-04 11:00:00\n 2016-01-04\n 2\n 1-bcd-345\n 3\n 3423.29\n True\n high\n \n \n 2\n 2016-01-04 00:32:00\n 2016-01-04\n 3\n 5-egh-163\n 8\n 9999.99\n True\n low\n \n \n 3\n 2016-01-05 13:32:00\n 2016-01-05\n 6\n 8-kdg-938\n 3\n 2343.23\n True\n high\n \n \n 4\n 2016-01-06 17:23:00\n 2016-01-06\n 2\n 5-jdo-903\n None\n 3892.4\n False\n mid\n \n \n 5\n 2016-01-09 12:36:00\n 2016-01-09\n 8\n 3-ldm-038\n 7\n 283.94\n True\n low\n \n \n 6\n 2016-01-11 06:15:00\n 2016-01-11\n 4\n 2-dhe-923\n 4\n 3291.03\n True\n mid\n \n \n 7\n 2016-01-15 18:46:00\n 2016-01-15\n 7\n 1-knw-093\n 3\n 843.34\n True\n high\n \n \n 8\n 2016-01-17 11:27:00\n 2016-01-17\n 4\n 5-boe-639\n 2\n 1035.64\n False\n low\n \n \n 9\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 10\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 11\n 2016-01-26 20:07:00\n 2016-01-26\n 4\n 2-dmx-010\n 7\n 833.98\n True\n low\n \n \n 12\n 2016-01-28 02:51:00\n 2016-01-28\n 2\n 7-dmx-010\n 8\n 108.34\n False\n low\n \n \n 13\n 2016-01-30 11:23:00\n 2016-01-30\n 1\n 3-dka-303\n None\n 2230.09\n True\n high"
+ "text": "Mutate the Table in a Validation Step\nFor far more specialized validations, modify the table with the pre= argument before checking it.\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-11|05:26:42Polars\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C\n 1\n \n \n \n\n col_vals_between\n \n \n \n \n \n \n\n \n col_vals_between()\n \n a\n [3, 6]\n \n \n \n \n \n \n \n \n \n \n\n ✓\n 1\n 11.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 2\n \n \n \n\n col_vals_equal\n \n \n \n \n \n \n\n \n col_vals_eq()\n \n b_len\n 9\n \n \n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n —\n —\n —\n —\n \n\n \n \n \n 2025-02-11 05:26:42 UTC< 1 s2025-02-11 05:26:42 UTC\n \n\n\n\n\n\n\n \n\n\nimport pointblank as pb\nimport polars as pl\nimport narwhals as nw\n\nvalidation = (\n pb.Validate(\n data=pb.load_dataset(dataset=\"small_table\", tbl_type=\"polars\")\n )\n .col_vals_between(\n columns=\"a\",\n left=3, right=6,\n pre=lambda df: df.select(pl.median(\"a\")) # Use a Polars expression to aggregate\n )\n .col_vals_eq(\n columns=\"b_len\",\n value=9,\n pre=lambda dfn: dfn.with_columns( # Use a Narwhals expression, identified\n b_len=nw.col(\"b\").str.len_chars() # by the 'dfn' here\n )\n )\n .interrogate()\n)\n\nvalidation\n\n\nPreview of Input Table\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PolarsRows13Columns8\n \n\n \n date_timeDatetime\n dateDate\n aInt64\n bString\n cInt64\n dFloat64\n eBoolean\n fString\n\n\n\n \n 1\n 2016-01-04 11:00:00\n 2016-01-04\n 2\n 1-bcd-345\n 3\n 3423.29\n True\n high\n \n \n 2\n 2016-01-04 00:32:00\n 2016-01-04\n 3\n 5-egh-163\n 8\n 9999.99\n True\n low\n \n \n 3\n 2016-01-05 13:32:00\n 2016-01-05\n 6\n 8-kdg-938\n 3\n 2343.23\n True\n high\n \n \n 4\n 2016-01-06 17:23:00\n 2016-01-06\n 2\n 5-jdo-903\n None\n 3892.4\n False\n mid\n \n \n 5\n 2016-01-09 12:36:00\n 2016-01-09\n 8\n 3-ldm-038\n 7\n 283.94\n True\n low\n \n \n 6\n 2016-01-11 06:15:00\n 2016-01-11\n 4\n 2-dhe-923\n 4\n 3291.03\n True\n mid\n \n \n 7\n 2016-01-15 18:46:00\n 2016-01-15\n 7\n 1-knw-093\n 3\n 843.34\n True\n high\n \n \n 8\n 2016-01-17 11:27:00\n 2016-01-17\n 4\n 5-boe-639\n 2\n 1035.64\n False\n low\n \n \n 9\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 10\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 11\n 2016-01-26 20:07:00\n 2016-01-26\n 4\n 2-dmx-010\n 7\n 833.98\n True\n low\n \n \n 12\n 2016-01-28 02:51:00\n 2016-01-28\n 2\n 7-dmx-010\n 8\n 108.34\n False\n low\n \n \n 13\n 2016-01-30 11:23:00\n 2016-01-30\n 1\n 3-dka-303\n None\n 2230.09\n True\n high"
},
{
"objectID": "demos/01-starter/index.html",
"href": "demos/01-starter/index.html",
"title": "pointblank",
"section": "",
- "text": "Starter Validation\nA validation with the basics.\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-10|22:08:52Polars\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C66\n 1\n \n \n \n\n col_vals_gt\n \n \n \n \n \n \n\n \n col_vals_gt()\n \n d\n 1000\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 70.54\n 60.46\n —\n —\n —\n CSV\n \n \n #4CA64C66\n 2\n \n \n \n\n col_vals_lte\n \n \n \n \n \n \n\n \n col_vals_le()\n \n c\n 5\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 50.38\n 80.62\n —\n —\n —\n CSV\n \n \n #4CA64C\n 3\n \n \n \n\n col_exists\n \n \n \n \n \n \n \n\n \n col_exists()\n \n date\n —\n \n \n \n \n \n \n \n \n\n ✓\n 1\n 11.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 4\n \n \n \n\n col_exists\n \n \n \n \n \n \n \n\n \n col_exists()\n \n date_time\n —\n \n \n \n \n \n \n \n \n\n ✓\n 1\n 11.00\n 00.00\n —\n —\n —\n —\n \n\n \n \n \n 2025-02-10 22:08:52 UTC< 1 s2025-02-10 22:08:52 UTC\n \n\n\n\n\n\n\n \n\n\nimport pointblank as pb\n\nvalidation = (\n pb.Validate( # Use pb.Validate to start\n data=pb.load_dataset(dataset=\"small_table\", tbl_type=\"polars\")\n )\n .col_vals_gt(columns=\"d\", value=1000) # STEP 1 |\n .col_vals_le(columns=\"c\", value=5) # STEP 2 | <-- Build up a validation plan\n .col_exists(columns=[\"date\", \"date_time\"]) # STEP 3 |\n .interrogate() # This will execute all validation steps and collect intel\n)\n\nvalidation\n\n\nPreview of Input Table\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PolarsRows13Columns8\n \n\n \n date_timeDatetime\n dateDate\n aInt64\n bString\n cInt64\n dFloat64\n eBoolean\n fString\n\n\n\n \n 1\n 2016-01-04 11:00:00\n 2016-01-04\n 2\n 1-bcd-345\n 3\n 3423.29\n True\n high\n \n \n 2\n 2016-01-04 00:32:00\n 2016-01-04\n 3\n 5-egh-163\n 8\n 9999.99\n True\n low\n \n \n 3\n 2016-01-05 13:32:00\n 2016-01-05\n 6\n 8-kdg-938\n 3\n 2343.23\n True\n high\n \n \n 4\n 2016-01-06 17:23:00\n 2016-01-06\n 2\n 5-jdo-903\n None\n 3892.4\n False\n mid\n \n \n 5\n 2016-01-09 12:36:00\n 2016-01-09\n 8\n 3-ldm-038\n 7\n 283.94\n True\n low\n \n \n 6\n 2016-01-11 06:15:00\n 2016-01-11\n 4\n 2-dhe-923\n 4\n 3291.03\n True\n mid\n \n \n 7\n 2016-01-15 18:46:00\n 2016-01-15\n 7\n 1-knw-093\n 3\n 843.34\n True\n high\n \n \n 8\n 2016-01-17 11:27:00\n 2016-01-17\n 4\n 5-boe-639\n 2\n 1035.64\n False\n low\n \n \n 9\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 10\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 11\n 2016-01-26 20:07:00\n 2016-01-26\n 4\n 2-dmx-010\n 7\n 833.98\n True\n low\n \n \n 12\n 2016-01-28 02:51:00\n 2016-01-28\n 2\n 7-dmx-010\n 8\n 108.34\n False\n low\n \n \n 13\n 2016-01-30 11:23:00\n 2016-01-30\n 1\n 3-dka-303\n None\n 2230.09\n True\n high"
+ "text": "Starter Validation\nA validation with the basics.\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-11|05:26:36Polars\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C66\n 1\n \n \n \n\n col_vals_gt\n \n \n \n \n \n \n\n \n col_vals_gt()\n \n d\n 1000\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 70.54\n 60.46\n —\n —\n —\n CSV\n \n \n #4CA64C66\n 2\n \n \n \n\n col_vals_lte\n \n \n \n \n \n \n\n \n col_vals_le()\n \n c\n 5\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 50.38\n 80.62\n —\n —\n —\n CSV\n \n \n #4CA64C\n 3\n \n \n \n\n col_exists\n \n \n \n \n \n \n \n\n \n col_exists()\n \n date\n —\n \n \n \n \n \n \n \n \n\n ✓\n 1\n 11.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 4\n \n \n \n\n col_exists\n \n \n \n \n \n \n \n\n \n col_exists()\n \n date_time\n —\n \n \n \n \n \n \n \n \n\n ✓\n 1\n 11.00\n 00.00\n —\n —\n —\n —\n \n\n \n \n \n 2025-02-11 05:26:36 UTC< 1 s2025-02-11 05:26:36 UTC\n \n\n\n\n\n\n\n \n\n\nimport pointblank as pb\n\nvalidation = (\n pb.Validate( # Use pb.Validate to start\n data=pb.load_dataset(dataset=\"small_table\", tbl_type=\"polars\")\n )\n .col_vals_gt(columns=\"d\", value=1000) # STEP 1 |\n .col_vals_le(columns=\"c\", value=5) # STEP 2 | <-- Build up a validation plan\n .col_exists(columns=[\"date\", \"date_time\"]) # STEP 3 |\n .interrogate() # This will execute all validation steps and collect intel\n)\n\nvalidation\n\n\nPreview of Input Table\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PolarsRows13Columns8\n \n\n \n date_timeDatetime\n dateDate\n aInt64\n bString\n cInt64\n dFloat64\n eBoolean\n fString\n\n\n\n \n 1\n 2016-01-04 11:00:00\n 2016-01-04\n 2\n 1-bcd-345\n 3\n 3423.29\n True\n high\n \n \n 2\n 2016-01-04 00:32:00\n 2016-01-04\n 3\n 5-egh-163\n 8\n 9999.99\n True\n low\n \n \n 3\n 2016-01-05 13:32:00\n 2016-01-05\n 6\n 8-kdg-938\n 3\n 2343.23\n True\n high\n \n \n 4\n 2016-01-06 17:23:00\n 2016-01-06\n 2\n 5-jdo-903\n None\n 3892.4\n False\n mid\n \n \n 5\n 2016-01-09 12:36:00\n 2016-01-09\n 8\n 3-ldm-038\n 7\n 283.94\n True\n low\n \n \n 6\n 2016-01-11 06:15:00\n 2016-01-11\n 4\n 2-dhe-923\n 4\n 3291.03\n True\n mid\n \n \n 7\n 2016-01-15 18:46:00\n 2016-01-15\n 7\n 1-knw-093\n 3\n 843.34\n True\n high\n \n \n 8\n 2016-01-17 11:27:00\n 2016-01-17\n 4\n 5-boe-639\n 2\n 1035.64\n False\n low\n \n \n 9\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 10\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 11\n 2016-01-26 20:07:00\n 2016-01-26\n 4\n 2-dmx-010\n 7\n 833.98\n True\n low\n \n \n 12\n 2016-01-28 02:51:00\n 2016-01-28\n 2\n 7-dmx-010\n 8\n 108.34\n False\n low\n \n \n 13\n 2016-01-30 11:23:00\n 2016-01-30\n 1\n 3-dka-303\n None\n 2230.09\n True\n high"
},
{
"objectID": "demos/expect-text-pattern/index.html",
"href": "demos/expect-text-pattern/index.html",
"title": "pointblank",
"section": "",
- "text": "Expectations with a Text Pattern\nWith the col_vals_regex(), check for conformance to a regular expression.\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-10|22:08:46Polars\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C\n 1\n \n \n \n\n col_vals_regex\n \n \n \n \n \n \n \n \n \n\n \n col_vals_regex()\n \n b\n ^\\d-[a-z]{3}-\\d{3}$\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 2\n \n \n \n\n col_vals_regex\n \n \n \n \n \n \n \n \n \n\n \n col_vals_regex()\n \n f\n high|low|mid\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n —\n —\n —\n —\n \n\n \n \n \n 2025-02-10 22:08:46 UTC< 1 s2025-02-10 22:08:46 UTC\n \n\n\n\n\n\n\n \n\n\nimport pointblank as pb\n\nvalidation = (\n pb.Validate(\n data=pb.load_dataset(dataset=\"small_table\", tbl_type=\"polars\")\n )\n .col_vals_regex(columns=\"b\", pattern=r\"^\\d-[a-z]{3}-\\d{3}$\") # check pattern in 'b'\n .col_vals_regex(columns=\"f\", pattern=r\"high|low|mid\") # check pattern in 'f'\n .interrogate()\n)\n\nvalidation\n\n\nPreview of Input Table\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PolarsRows13Columns8\n \n\n \n date_timeDatetime\n dateDate\n aInt64\n bString\n cInt64\n dFloat64\n eBoolean\n fString\n\n\n\n \n 1\n 2016-01-04 11:00:00\n 2016-01-04\n 2\n 1-bcd-345\n 3\n 3423.29\n True\n high\n \n \n 2\n 2016-01-04 00:32:00\n 2016-01-04\n 3\n 5-egh-163\n 8\n 9999.99\n True\n low\n \n \n 3\n 2016-01-05 13:32:00\n 2016-01-05\n 6\n 8-kdg-938\n 3\n 2343.23\n True\n high\n \n \n 4\n 2016-01-06 17:23:00\n 2016-01-06\n 2\n 5-jdo-903\n None\n 3892.4\n False\n mid\n \n \n 5\n 2016-01-09 12:36:00\n 2016-01-09\n 8\n 3-ldm-038\n 7\n 283.94\n True\n low\n \n \n 6\n 2016-01-11 06:15:00\n 2016-01-11\n 4\n 2-dhe-923\n 4\n 3291.03\n True\n mid\n \n \n 7\n 2016-01-15 18:46:00\n 2016-01-15\n 7\n 1-knw-093\n 3\n 843.34\n True\n high\n \n \n 8\n 2016-01-17 11:27:00\n 2016-01-17\n 4\n 5-boe-639\n 2\n 1035.64\n False\n low\n \n \n 9\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 10\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 11\n 2016-01-26 20:07:00\n 2016-01-26\n 4\n 2-dmx-010\n 7\n 833.98\n True\n low\n \n \n 12\n 2016-01-28 02:51:00\n 2016-01-28\n 2\n 7-dmx-010\n 8\n 108.34\n False\n low\n \n \n 13\n 2016-01-30 11:23:00\n 2016-01-30\n 1\n 3-dka-303\n None\n 2230.09\n True\n high"
+ "text": "Expectations with a Text Pattern\nWith the col_vals_regex(), check for conformance to a regular expression.\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-11|05:26:29Polars\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C\n 1\n \n \n \n\n col_vals_regex\n \n \n \n \n \n \n \n \n \n\n \n col_vals_regex()\n \n b\n ^\\d-[a-z]{3}-\\d{3}$\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 2\n \n \n \n\n col_vals_regex\n \n \n \n \n \n \n \n \n \n\n \n col_vals_regex()\n \n f\n high|low|mid\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n —\n —\n —\n —\n \n\n \n \n \n 2025-02-11 05:26:29 UTC< 1 s2025-02-11 05:26:29 UTC\n \n\n\n\n\n\n\n \n\n\nimport pointblank as pb\n\nvalidation = (\n pb.Validate(\n data=pb.load_dataset(dataset=\"small_table\", tbl_type=\"polars\")\n )\n .col_vals_regex(columns=\"b\", pattern=r\"^\\d-[a-z]{3}-\\d{3}$\") # check pattern in 'b'\n .col_vals_regex(columns=\"f\", pattern=r\"high|low|mid\") # check pattern in 'f'\n .interrogate()\n)\n\nvalidation\n\n\nPreview of Input Table\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PolarsRows13Columns8\n \n\n \n date_timeDatetime\n dateDate\n aInt64\n bString\n cInt64\n dFloat64\n eBoolean\n fString\n\n\n\n \n 1\n 2016-01-04 11:00:00\n 2016-01-04\n 2\n 1-bcd-345\n 3\n 3423.29\n True\n high\n \n \n 2\n 2016-01-04 00:32:00\n 2016-01-04\n 3\n 5-egh-163\n 8\n 9999.99\n True\n low\n \n \n 3\n 2016-01-05 13:32:00\n 2016-01-05\n 6\n 8-kdg-938\n 3\n 2343.23\n True\n high\n \n \n 4\n 2016-01-06 17:23:00\n 2016-01-06\n 2\n 5-jdo-903\n None\n 3892.4\n False\n mid\n \n \n 5\n 2016-01-09 12:36:00\n 2016-01-09\n 8\n 3-ldm-038\n 7\n 283.94\n True\n low\n \n \n 6\n 2016-01-11 06:15:00\n 2016-01-11\n 4\n 2-dhe-923\n 4\n 3291.03\n True\n mid\n \n \n 7\n 2016-01-15 18:46:00\n 2016-01-15\n 7\n 1-knw-093\n 3\n 843.34\n True\n high\n \n \n 8\n 2016-01-17 11:27:00\n 2016-01-17\n 4\n 5-boe-639\n 2\n 1035.64\n False\n low\n \n \n 9\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 10\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 11\n 2016-01-26 20:07:00\n 2016-01-26\n 4\n 2-dmx-010\n 7\n 833.98\n True\n low\n \n \n 12\n 2016-01-28 02:51:00\n 2016-01-28\n 2\n 7-dmx-010\n 8\n 108.34\n False\n low\n \n \n 13\n 2016-01-30 11:23:00\n 2016-01-30\n 1\n 3-dka-303\n None\n 2230.09\n True\n high"
},
{
"objectID": "demos/check-row-column-counts/index.html",
"href": "demos/check-row-column-counts/index.html",
"title": "pointblank",
"section": "",
- "text": "Verifying Row and Column Counts\nCheck the dimensions of the table with the *_count_match() validation methods.\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-10|22:08:40DuckDB\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C\n 1\n \n \n \n\n col_count_match\n \n \n \n \n \n \n \n \n \n \n \n\n \n col_count_match()\n \n —\n 11\n \n \n \n \n \n \n \n \n\n ✓\n 1\n 11.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 2\n \n \n \n\n row_count_match\n \n \n \n \n \n \n \n \n \n \n \n\n \n row_count_match()\n \n —\n 2000\n \n \n \n \n \n \n \n \n\n ✓\n 1\n 11.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 3\n \n \n \n\n row_count_match\n \n \n \n \n \n \n \n \n \n \n \n\n \n row_count_match()\n \n —\n ≠ 0\n \n \n \n \n \n \n \n \n\n ✓\n 1\n 11.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 4\n \n \n \n\n col_count_match\n \n \n \n \n \n \n \n \n \n \n \n\n \n col_count_match()\n \n —\n 11\n \n \n \n \n \n \n \n \n\n ✓\n 1\n 11.00\n 00.00\n —\n —\n —\n —\n \n\n \n \n \n 2025-02-10 22:08:40 UTC< 1 s2025-02-10 22:08:40 UTC\n \n\n\n\n\n\n\n \n\n\nimport pointblank as pb\n\nvalidation = (\n pb.Validate(\n data=pb.load_dataset(dataset=\"game_revenue\", tbl_type=\"duckdb\")\n )\n .col_count_match(count=11) # expect 11 columns in the table\n .row_count_match(count=2000) # expect 2,000 rows in the table\n .row_count_match(count=0, inverse=True) # expect that the table has rows\n .col_count_match( # compare column count against\n count=pb.load_dataset( # that of another table\n dataset=\"game_revenue\", tbl_type=\"pandas\"\n )\n )\n .interrogate()\n)\n\nvalidation\n\n\nPreview of Input Table\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n DuckDBRows2000Columns11\n \n\n \n player_idstring\n session_idstring\n session_starttimestamp\n timetimestamp\n item_typestring\n item_namestring\n item_revenuefloat64\n session_durationfloat64\n start_daydate\n acquisitionstring\n countrystring\n\n\n\n \n 1\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:31:27+00:00\n iap\n offer2\n 8.99\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 2\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:36:57+00:00\n iap\n gems3\n 22.49\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 3\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:37:45+00:00\n iap\n gold7\n 107.99\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 4\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:42:33+00:00\n ad\n ad_20sec\n 0.76\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 5\n ECPANOIXLZHF896\n ECPANOIXLZHF896-hdu9jkls\n 2015-01-01 11:50:02+00:00\n 2015-01-01 11:55:20+00:00\n ad\n ad_5sec\n 0.03\n 35.2\n 2015-01-01\n google\n Germany\n \n \n 1996\n NAOJRDMCSEBI281\n NAOJRDMCSEBI281-j2vs9ilp\n 2015-01-21 01:57:50+00:00\n 2015-01-21 02:02:50+00:00\n ad\n ad_survey\n 1.332\n 25.8\n 2015-01-11\n organic\n Norway\n \n \n 1997\n NAOJRDMCSEBI281\n NAOJRDMCSEBI281-j2vs9ilp\n 2015-01-21 01:57:50+00:00\n 2015-01-21 02:22:14+00:00\n ad\n ad_survey\n 1.35\n 25.8\n 2015-01-11\n organic\n Norway\n \n \n 1998\n RMOSWHJGELCI675\n RMOSWHJGELCI675-vbhcsmtr\n 2015-01-21 02:39:48+00:00\n 2015-01-21 02:40:00+00:00\n ad\n ad_5sec\n 0.03\n 8.4\n 2015-01-10\n other_campaign\n France\n \n \n 1999\n RMOSWHJGELCI675\n RMOSWHJGELCI675-vbhcsmtr\n 2015-01-21 02:39:48+00:00\n 2015-01-21 02:47:12+00:00\n iap\n offer5\n 26.09\n 8.4\n 2015-01-10\n other_campaign\n France\n \n \n 2000\n GJCXNTWEBIPQ369\n GJCXNTWEBIPQ369-9elq67md\n 2015-01-21 03:59:23+00:00\n 2015-01-21 04:06:29+00:00\n ad\n ad_5sec\n 0.12\n 18.5\n 2015-01-14\n organic\n United States"
+ "text": "Verifying Row and Column Counts\nCheck the dimensions of the table with the *_count_match() validation methods.\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-11|05:26:23DuckDB\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C\n 1\n \n \n \n\n col_count_match\n \n \n \n \n \n \n \n \n \n \n \n\n \n col_count_match()\n \n —\n 11\n \n \n \n \n \n \n \n \n\n ✓\n 1\n 11.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 2\n \n \n \n\n row_count_match\n \n \n \n \n \n \n \n \n \n \n \n\n \n row_count_match()\n \n —\n 2000\n \n \n \n \n \n \n \n \n\n ✓\n 1\n 11.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 3\n \n \n \n\n row_count_match\n \n \n \n \n \n \n \n \n \n \n \n\n \n row_count_match()\n \n —\n ≠ 0\n \n \n \n \n \n \n \n \n\n ✓\n 1\n 11.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 4\n \n \n \n\n col_count_match\n \n \n \n \n \n \n \n \n \n \n \n\n \n col_count_match()\n \n —\n 11\n \n \n \n \n \n \n \n \n\n ✓\n 1\n 11.00\n 00.00\n —\n —\n —\n —\n \n\n \n \n \n 2025-02-11 05:26:23 UTC< 1 s2025-02-11 05:26:23 UTC\n \n\n\n\n\n\n\n \n\n\nimport pointblank as pb\n\nvalidation = (\n pb.Validate(\n data=pb.load_dataset(dataset=\"game_revenue\", tbl_type=\"duckdb\")\n )\n .col_count_match(count=11) # expect 11 columns in the table\n .row_count_match(count=2000) # expect 2,000 rows in the table\n .row_count_match(count=0, inverse=True) # expect that the table has rows\n .col_count_match( # compare column count against\n count=pb.load_dataset( # that of another table\n dataset=\"game_revenue\", tbl_type=\"pandas\"\n )\n )\n .interrogate()\n)\n\nvalidation\n\n\nPreview of Input Table\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n DuckDBRows2000Columns11\n \n\n \n player_idstring\n session_idstring\n session_starttimestamp\n timetimestamp\n item_typestring\n item_namestring\n item_revenuefloat64\n session_durationfloat64\n start_daydate\n acquisitionstring\n countrystring\n\n\n\n \n 1\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:31:27+00:00\n iap\n offer2\n 8.99\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 2\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:36:57+00:00\n iap\n gems3\n 22.49\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 3\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:37:45+00:00\n iap\n gold7\n 107.99\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 4\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:42:33+00:00\n ad\n ad_20sec\n 0.76\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 5\n ECPANOIXLZHF896\n ECPANOIXLZHF896-hdu9jkls\n 2015-01-01 11:50:02+00:00\n 2015-01-01 11:55:20+00:00\n ad\n ad_5sec\n 0.03\n 35.2\n 2015-01-01\n google\n Germany\n \n \n 1996\n NAOJRDMCSEBI281\n NAOJRDMCSEBI281-j2vs9ilp\n 2015-01-21 01:57:50+00:00\n 2015-01-21 02:02:50+00:00\n ad\n ad_survey\n 1.332\n 25.8\n 2015-01-11\n organic\n Norway\n \n \n 1997\n NAOJRDMCSEBI281\n NAOJRDMCSEBI281-j2vs9ilp\n 2015-01-21 01:57:50+00:00\n 2015-01-21 02:22:14+00:00\n ad\n ad_survey\n 1.35\n 25.8\n 2015-01-11\n organic\n Norway\n \n \n 1998\n RMOSWHJGELCI675\n RMOSWHJGELCI675-vbhcsmtr\n 2015-01-21 02:39:48+00:00\n 2015-01-21 02:40:00+00:00\n ad\n ad_5sec\n 0.03\n 8.4\n 2015-01-10\n other_campaign\n France\n \n \n 1999\n RMOSWHJGELCI675\n RMOSWHJGELCI675-vbhcsmtr\n 2015-01-21 02:39:48+00:00\n 2015-01-21 02:47:12+00:00\n iap\n offer5\n 26.09\n 8.4\n 2015-01-10\n other_campaign\n France\n \n \n 2000\n GJCXNTWEBIPQ369\n GJCXNTWEBIPQ369-9elq67md\n 2015-01-21 03:59:23+00:00\n 2015-01-21 04:06:29+00:00\n ad\n ad_5sec\n 0.12\n 18.5\n 2015-01-14\n organic\n United States"
},
{
"objectID": "demos/expect-no-duplicate-values/index.html",
"href": "demos/expect-no-duplicate-values/index.html",
"title": "pointblank",
"section": "",
- "text": "Checking for Duplicate Values\nTo check for duplicate values down a column, use rows_distinct() with a columns_subset= value.\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-10|22:08:34Polars\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C66\n 1\n \n \n \n\n rows_distinct\n \n \n \n \n \n \n \n \n \n \n\n \n rows_distinct()\n \n b\n —\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 110.85\n 20.15\n —\n —\n —\n —\n \n\n \n \n \n 2025-02-10 22:08:34 UTC< 1 s2025-02-10 22:08:34 UTC\n \n\n\n\n\n\n\n \n\n\nimport pointblank as pb\n\nvalidation = (\n pb.Validate(\n data=pb.load_dataset(dataset=\"small_table\", tbl_type=\"polars\")\n )\n .rows_distinct(columns_subset=\"b\") # expect no duplicate values in 'b'\n .interrogate()\n)\n\nvalidation\n\n\nPreview of Input Table\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PolarsRows13Columns8\n \n\n \n date_timeDatetime\n dateDate\n aInt64\n bString\n cInt64\n dFloat64\n eBoolean\n fString\n\n\n\n \n 1\n 2016-01-04 11:00:00\n 2016-01-04\n 2\n 1-bcd-345\n 3\n 3423.29\n True\n high\n \n \n 2\n 2016-01-04 00:32:00\n 2016-01-04\n 3\n 5-egh-163\n 8\n 9999.99\n True\n low\n \n \n 3\n 2016-01-05 13:32:00\n 2016-01-05\n 6\n 8-kdg-938\n 3\n 2343.23\n True\n high\n \n \n 4\n 2016-01-06 17:23:00\n 2016-01-06\n 2\n 5-jdo-903\n None\n 3892.4\n False\n mid\n \n \n 5\n 2016-01-09 12:36:00\n 2016-01-09\n 8\n 3-ldm-038\n 7\n 283.94\n True\n low\n \n \n 6\n 2016-01-11 06:15:00\n 2016-01-11\n 4\n 2-dhe-923\n 4\n 3291.03\n True\n mid\n \n \n 7\n 2016-01-15 18:46:00\n 2016-01-15\n 7\n 1-knw-093\n 3\n 843.34\n True\n high\n \n \n 8\n 2016-01-17 11:27:00\n 2016-01-17\n 4\n 5-boe-639\n 2\n 1035.64\n False\n low\n \n \n 9\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 10\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 11\n 2016-01-26 20:07:00\n 2016-01-26\n 4\n 2-dmx-010\n 7\n 833.98\n True\n low\n \n \n 12\n 2016-01-28 02:51:00\n 2016-01-28\n 2\n 7-dmx-010\n 8\n 108.34\n False\n low\n \n \n 13\n 2016-01-30 11:23:00\n 2016-01-30\n 1\n 3-dka-303\n None\n 2230.09\n True\n high"
+ "text": "Checking for Duplicate Values\nTo check for duplicate values down a column, use rows_distinct() with a columns_subset= value.\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-11|05:26:17Polars\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C66\n 1\n \n \n \n\n rows_distinct\n \n \n \n \n \n \n \n \n \n \n\n \n rows_distinct()\n \n b\n —\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 110.85\n 20.15\n —\n —\n —\n —\n \n\n \n \n \n 2025-02-11 05:26:17 UTC< 1 s2025-02-11 05:26:17 UTC\n \n\n\n\n\n\n\n \n\n\nimport pointblank as pb\n\nvalidation = (\n pb.Validate(\n data=pb.load_dataset(dataset=\"small_table\", tbl_type=\"polars\")\n )\n .rows_distinct(columns_subset=\"b\") # expect no duplicate values in 'b'\n .interrogate()\n)\n\nvalidation\n\n\nPreview of Input Table\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PolarsRows13Columns8\n \n\n \n date_timeDatetime\n dateDate\n aInt64\n bString\n cInt64\n dFloat64\n eBoolean\n fString\n\n\n\n \n 1\n 2016-01-04 11:00:00\n 2016-01-04\n 2\n 1-bcd-345\n 3\n 3423.29\n True\n high\n \n \n 2\n 2016-01-04 00:32:00\n 2016-01-04\n 3\n 5-egh-163\n 8\n 9999.99\n True\n low\n \n \n 3\n 2016-01-05 13:32:00\n 2016-01-05\n 6\n 8-kdg-938\n 3\n 2343.23\n True\n high\n \n \n 4\n 2016-01-06 17:23:00\n 2016-01-06\n 2\n 5-jdo-903\n None\n 3892.4\n False\n mid\n \n \n 5\n 2016-01-09 12:36:00\n 2016-01-09\n 8\n 3-ldm-038\n 7\n 283.94\n True\n low\n \n \n 6\n 2016-01-11 06:15:00\n 2016-01-11\n 4\n 2-dhe-923\n 4\n 3291.03\n True\n mid\n \n \n 7\n 2016-01-15 18:46:00\n 2016-01-15\n 7\n 1-knw-093\n 3\n 843.34\n True\n high\n \n \n 8\n 2016-01-17 11:27:00\n 2016-01-17\n 4\n 5-boe-639\n 2\n 1035.64\n False\n low\n \n \n 9\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 10\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 11\n 2016-01-26 20:07:00\n 2016-01-26\n 4\n 2-dmx-010\n 7\n 833.98\n True\n low\n \n \n 12\n 2016-01-28 02:51:00\n 2016-01-28\n 2\n 7-dmx-010\n 8\n 108.34\n False\n low\n \n \n 13\n 2016-01-30 11:23:00\n 2016-01-30\n 1\n 3-dka-303\n None\n 2230.09\n True\n high"
},
{
"objectID": "demos/column-selector-functions/index.html",
"href": "demos/column-selector-functions/index.html",
"title": "pointblank",
"section": "",
- "text": "Column Selector Functions: Easily Pick Columns\nUse column selector functions in the columns= argument to conveniently choose columns.\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-10|22:08:28Polars\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C\n 1\n \n \n \n\n col_vals_gte\n \n \n \n \n \n \n\n \n col_vals_ge()\n \n item_revenue\n 0\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 2\n \n \n \n\n col_vals_gte\n \n \n \n \n \n \n\n \n col_vals_ge()\n \n session_duration\n 0\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 3\n \n \n \n\n col_vals_regex\n \n \n \n \n \n \n \n \n \n\n \n col_vals_regex()\n \n player_id\n ^[A-Z]{12}\\d{3}\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 4\n \n \n \n\n col_vals_regex\n \n \n \n \n \n \n \n \n \n\n \n col_vals_regex()\n \n session_id\n ^[A-Z]{12}\\d{3}\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 5\n \n \n \n\n col_vals_not_null\n \n \n \n \n \n \n \n \n\n \n col_vals_not_null()\n \n acquisition\n —\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 6\n \n \n \n\n col_vals_not_null\n \n \n \n \n \n \n \n \n\n \n col_vals_not_null()\n \n country\n —\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 7\n \n \n \n\n col_vals_regex\n \n \n \n \n \n \n \n \n \n\n \n col_vals_regex()\n \n player_id\n (.|\\s)*\\S(.|\\s)*\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 8\n \n \n \n\n col_vals_regex\n \n \n \n \n \n \n \n \n \n\n \n col_vals_regex()\n \n session_id\n (.|\\s)*\\S(.|\\s)*\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 9\n \n \n \n\n col_vals_regex\n \n \n \n \n \n \n \n \n \n\n \n col_vals_regex()\n \n item_type\n (.|\\s)*\\S(.|\\s)*\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 10\n \n \n \n\n col_vals_regex\n \n \n \n \n \n \n \n \n \n\n \n col_vals_regex()\n \n item_name\n (.|\\s)*\\S(.|\\s)*\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 11\n \n \n \n\n col_vals_regex\n \n \n \n \n \n \n \n \n \n\n \n col_vals_regex()\n \n acquisition\n (.|\\s)*\\S(.|\\s)*\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 12\n \n \n \n\n col_vals_regex\n \n \n \n \n \n \n \n \n \n\n \n col_vals_regex()\n \n country\n (.|\\s)*\\S(.|\\s)*\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n —\n —\n —\n —\n \n\n \n \n \n 2025-02-10 22:08:28 UTC< 1 s2025-02-10 22:08:28 UTC\n \n\n\n\n\n\n\n \n\n\nimport pointblank as pb\nimport narwhals.selectors as ncs\n\nvalidation = (\n pb.Validate(\n data=pb.load_dataset(dataset=\"game_revenue\", tbl_type=\"polars\")\n )\n .col_vals_ge(\n columns=pb.matches(\"rev|dur\"), # check values in columns having 'rev' or 'dur' in name\n value=0\n )\n .col_vals_regex(\n columns=pb.ends_with(\"_id\"), # check values in columns with names ending in '_id'\n pattern=r\"^[A-Z]{12}\\d{3}\"\n )\n .col_vals_not_null(\n columns=pb.last_n(2) # check that the last two columns don't have Null values\n )\n .col_vals_regex(\n columns=ncs.string(), # check that all string columns are non-empty strings\n pattern=r\"(.|\\s)*\\S(.|\\s)*\"\n )\n .interrogate()\n)\n\nvalidation\n\n\nPreview of Input Table\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PolarsRows2000Columns11\n \n\n \n player_idString\n session_idString\n session_startDatetime\n timeDatetime\n item_typeString\n item_nameString\n item_revenueFloat64\n session_durationFloat64\n start_dayDate\n acquisitionString\n countryString\n\n\n\n \n 1\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:31:27+00:00\n iap\n offer2\n 8.99\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 2\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:36:57+00:00\n iap\n gems3\n 22.49\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 3\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:37:45+00:00\n iap\n gold7\n 107.99\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 4\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:42:33+00:00\n ad\n ad_20sec\n 0.76\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 5\n ECPANOIXLZHF896\n ECPANOIXLZHF896-hdu9jkls\n 2015-01-01 11:50:02+00:00\n 2015-01-01 11:55:20+00:00\n ad\n ad_5sec\n 0.03\n 35.2\n 2015-01-01\n google\n Germany\n \n \n 1996\n NAOJRDMCSEBI281\n NAOJRDMCSEBI281-j2vs9ilp\n 2015-01-21 01:57:50+00:00\n 2015-01-21 02:02:50+00:00\n ad\n ad_survey\n 1.332\n 25.8\n 2015-01-11\n organic\n Norway\n \n \n 1997\n NAOJRDMCSEBI281\n NAOJRDMCSEBI281-j2vs9ilp\n 2015-01-21 01:57:50+00:00\n 2015-01-21 02:22:14+00:00\n ad\n ad_survey\n 1.35\n 25.8\n 2015-01-11\n organic\n Norway\n \n \n 1998\n RMOSWHJGELCI675\n RMOSWHJGELCI675-vbhcsmtr\n 2015-01-21 02:39:48+00:00\n 2015-01-21 02:40:00+00:00\n ad\n ad_5sec\n 0.03\n 8.4\n 2015-01-10\n other_campaign\n France\n \n \n 1999\n RMOSWHJGELCI675\n RMOSWHJGELCI675-vbhcsmtr\n 2015-01-21 02:39:48+00:00\n 2015-01-21 02:47:12+00:00\n iap\n offer5\n 26.09\n 8.4\n 2015-01-10\n other_campaign\n France\n \n \n 2000\n GJCXNTWEBIPQ369\n GJCXNTWEBIPQ369-9elq67md\n 2015-01-21 03:59:23+00:00\n 2015-01-21 04:06:29+00:00\n ad\n ad_5sec\n 0.12\n 18.5\n 2015-01-14\n organic\n United States"
+ "text": "Column Selector Functions: Easily Pick Columns\nUse column selector functions in the columns= argument to conveniently choose columns.\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-11|05:26:10Polars\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C\n 1\n \n \n \n\n col_vals_gte\n \n \n \n \n \n \n\n \n col_vals_ge()\n \n item_revenue\n 0\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 2\n \n \n \n\n col_vals_gte\n \n \n \n \n \n \n\n \n col_vals_ge()\n \n session_duration\n 0\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 3\n \n \n \n\n col_vals_regex\n \n \n \n \n \n \n \n \n \n\n \n col_vals_regex()\n \n player_id\n ^[A-Z]{12}\\d{3}\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 4\n \n \n \n\n col_vals_regex\n \n \n \n \n \n \n \n \n \n\n \n col_vals_regex()\n \n session_id\n ^[A-Z]{12}\\d{3}\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 5\n \n \n \n\n col_vals_not_null\n \n \n \n \n \n \n \n \n\n \n col_vals_not_null()\n \n acquisition\n —\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 6\n \n \n \n\n col_vals_not_null\n \n \n \n \n \n \n \n \n\n \n col_vals_not_null()\n \n country\n —\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 7\n \n \n \n\n col_vals_regex\n \n \n \n \n \n \n \n \n \n\n \n col_vals_regex()\n \n player_id\n (.|\\s)*\\S(.|\\s)*\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 8\n \n \n \n\n col_vals_regex\n \n \n \n \n \n \n \n \n \n\n \n col_vals_regex()\n \n session_id\n (.|\\s)*\\S(.|\\s)*\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 9\n \n \n \n\n col_vals_regex\n \n \n \n \n \n \n \n \n \n\n \n col_vals_regex()\n \n item_type\n (.|\\s)*\\S(.|\\s)*\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 10\n \n \n \n\n col_vals_regex\n \n \n \n \n \n \n \n \n \n\n \n col_vals_regex()\n \n item_name\n (.|\\s)*\\S(.|\\s)*\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 11\n \n \n \n\n col_vals_regex\n \n \n \n \n \n \n \n \n \n\n \n col_vals_regex()\n \n acquisition\n (.|\\s)*\\S(.|\\s)*\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 12\n \n \n \n\n col_vals_regex\n \n \n \n \n \n \n \n \n \n\n \n col_vals_regex()\n \n country\n (.|\\s)*\\S(.|\\s)*\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n —\n —\n —\n —\n \n\n \n \n \n 2025-02-11 05:26:10 UTC< 1 s2025-02-11 05:26:10 UTC\n \n\n\n\n\n\n\n \n\n\nimport pointblank as pb\nimport narwhals.selectors as ncs\n\nvalidation = (\n pb.Validate(\n data=pb.load_dataset(dataset=\"game_revenue\", tbl_type=\"polars\")\n )\n .col_vals_ge(\n columns=pb.matches(\"rev|dur\"), # check values in columns having 'rev' or 'dur' in name\n value=0\n )\n .col_vals_regex(\n columns=pb.ends_with(\"_id\"), # check values in columns with names ending in '_id'\n pattern=r\"^[A-Z]{12}\\d{3}\"\n )\n .col_vals_not_null(\n columns=pb.last_n(2) # check that the last two columns don't have Null values\n )\n .col_vals_regex(\n columns=ncs.string(), # check that all string columns are non-empty strings\n pattern=r\"(.|\\s)*\\S(.|\\s)*\"\n )\n .interrogate()\n)\n\nvalidation\n\n\nPreview of Input Table\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PolarsRows2000Columns11\n \n\n \n player_idString\n session_idString\n session_startDatetime\n timeDatetime\n item_typeString\n item_nameString\n item_revenueFloat64\n session_durationFloat64\n start_dayDate\n acquisitionString\n countryString\n\n\n\n \n 1\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:31:27+00:00\n iap\n offer2\n 8.99\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 2\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:36:57+00:00\n iap\n gems3\n 22.49\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 3\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:37:45+00:00\n iap\n gold7\n 107.99\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 4\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:42:33+00:00\n ad\n ad_20sec\n 0.76\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 5\n ECPANOIXLZHF896\n ECPANOIXLZHF896-hdu9jkls\n 2015-01-01 11:50:02+00:00\n 2015-01-01 11:55:20+00:00\n ad\n ad_5sec\n 0.03\n 35.2\n 2015-01-01\n google\n Germany\n \n \n 1996\n NAOJRDMCSEBI281\n NAOJRDMCSEBI281-j2vs9ilp\n 2015-01-21 01:57:50+00:00\n 2015-01-21 02:02:50+00:00\n ad\n ad_survey\n 1.332\n 25.8\n 2015-01-11\n organic\n Norway\n \n \n 1997\n NAOJRDMCSEBI281\n NAOJRDMCSEBI281-j2vs9ilp\n 2015-01-21 01:57:50+00:00\n 2015-01-21 02:22:14+00:00\n ad\n ad_survey\n 1.35\n 25.8\n 2015-01-11\n organic\n Norway\n \n \n 1998\n RMOSWHJGELCI675\n RMOSWHJGELCI675-vbhcsmtr\n 2015-01-21 02:39:48+00:00\n 2015-01-21 02:40:00+00:00\n ad\n ad_5sec\n 0.03\n 8.4\n 2015-01-10\n other_campaign\n France\n \n \n 1999\n RMOSWHJGELCI675\n RMOSWHJGELCI675-vbhcsmtr\n 2015-01-21 02:39:48+00:00\n 2015-01-21 02:47:12+00:00\n iap\n offer5\n 26.09\n 8.4\n 2015-01-10\n other_campaign\n France\n \n \n 2000\n GJCXNTWEBIPQ369\n GJCXNTWEBIPQ369-9elq67md\n 2015-01-21 03:59:23+00:00\n 2015-01-21 04:06:29+00:00\n ad\n ad_5sec\n 0.12\n 18.5\n 2015-01-14\n organic\n United States"
},
{
"objectID": "demos/numeric-comparisons/index.html",
"href": "demos/numeric-comparisons/index.html",
"title": "pointblank",
"section": "",
- "text": "Numeric Comparisons\nPerform comparisons of values in columns to fixed values.\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-10|22:08:21Polars\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C66\n 1\n \n \n \n\n col_vals_gt\n \n \n \n \n \n \n\n \n col_vals_gt()\n \n d\n 1000\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 70.54\n 60.46\n —\n —\n —\n CSV\n \n \n #4CA64C\n 2\n \n \n \n\n col_vals_lt\n \n \n \n \n \n \n\n \n col_vals_lt()\n \n d\n 10000\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 3\n \n \n \n\n col_vals_gte\n \n \n \n \n \n \n\n \n col_vals_ge()\n \n a\n 1\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C66\n 4\n \n \n \n\n col_vals_lte\n \n \n \n \n \n \n\n \n col_vals_le()\n \n c\n 5\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 50.38\n 80.62\n —\n —\n —\n CSV\n \n \n #4CA64C66\n 5\n \n \n \n\n col_vals_not_equal\n \n \n \n \n \n \n\n \n col_vals_ne()\n \n a\n 7\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 120.92\n 10.08\n —\n —\n —\n CSV\n \n \n #4CA64C66\n 6\n \n \n \n\n col_vals_between\n \n \n \n \n \n \n\n \n col_vals_between()\n \n c\n [0, 15]\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 110.85\n 20.15\n —\n —\n —\n CSV\n \n\n \n \n \n 2025-02-10 22:08:21 UTC< 1 s2025-02-10 22:08:21 UTC\n \n\n\n\n\n\n\n \n\n\nimport pointblank as pb\n\nvalidation = (\n pb.Validate(\n data=pb.load_dataset(dataset=\"small_table\", tbl_type=\"polars\")\n )\n .col_vals_gt(columns=\"d\", value=1000) # values in 'd' > 1000\n .col_vals_lt(columns=\"d\", value=10000) # values in 'd' < 10000\n .col_vals_ge(columns=\"a\", value=1) # values in 'a' >= 1\n .col_vals_le(columns=\"c\", value=5) # values in 'c' <= 5\n .col_vals_ne(columns=\"a\", value=7) # values in 'a' not equal to 7\n .col_vals_between(columns=\"c\", left=0, right=15) # 0 <= 'c' values <= 15\n .interrogate()\n)\n\nvalidation\n\n\nPreview of Input Table\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PolarsRows13Columns8\n \n\n \n date_timeDatetime\n dateDate\n aInt64\n bString\n cInt64\n dFloat64\n eBoolean\n fString\n\n\n\n \n 1\n 2016-01-04 11:00:00\n 2016-01-04\n 2\n 1-bcd-345\n 3\n 3423.29\n True\n high\n \n \n 2\n 2016-01-04 00:32:00\n 2016-01-04\n 3\n 5-egh-163\n 8\n 9999.99\n True\n low\n \n \n 3\n 2016-01-05 13:32:00\n 2016-01-05\n 6\n 8-kdg-938\n 3\n 2343.23\n True\n high\n \n \n 4\n 2016-01-06 17:23:00\n 2016-01-06\n 2\n 5-jdo-903\n None\n 3892.4\n False\n mid\n \n \n 5\n 2016-01-09 12:36:00\n 2016-01-09\n 8\n 3-ldm-038\n 7\n 283.94\n True\n low\n \n \n 6\n 2016-01-11 06:15:00\n 2016-01-11\n 4\n 2-dhe-923\n 4\n 3291.03\n True\n mid\n \n \n 7\n 2016-01-15 18:46:00\n 2016-01-15\n 7\n 1-knw-093\n 3\n 843.34\n True\n high\n \n \n 8\n 2016-01-17 11:27:00\n 2016-01-17\n 4\n 5-boe-639\n 2\n 1035.64\n False\n low\n \n \n 9\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 10\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 11\n 2016-01-26 20:07:00\n 2016-01-26\n 4\n 2-dmx-010\n 7\n 833.98\n True\n low\n \n \n 12\n 2016-01-28 02:51:00\n 2016-01-28\n 2\n 7-dmx-010\n 8\n 108.34\n False\n low\n \n \n 13\n 2016-01-30 11:23:00\n 2016-01-30\n 1\n 3-dka-303\n None\n 2230.09\n True\n high"
+ "text": "Numeric Comparisons\nPerform comparisons of values in columns to fixed values.\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-11|05:26:03Polars\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C66\n 1\n \n \n \n\n col_vals_gt\n \n \n \n \n \n \n\n \n col_vals_gt()\n \n d\n 1000\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 70.54\n 60.46\n —\n —\n —\n CSV\n \n \n #4CA64C\n 2\n \n \n \n\n col_vals_lt\n \n \n \n \n \n \n\n \n col_vals_lt()\n \n d\n 10000\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 3\n \n \n \n\n col_vals_gte\n \n \n \n \n \n \n\n \n col_vals_ge()\n \n a\n 1\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C66\n 4\n \n \n \n\n col_vals_lte\n \n \n \n \n \n \n\n \n col_vals_le()\n \n c\n 5\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 50.38\n 80.62\n —\n —\n —\n CSV\n \n \n #4CA64C66\n 5\n \n \n \n\n col_vals_not_equal\n \n \n \n \n \n \n\n \n col_vals_ne()\n \n a\n 7\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 120.92\n 10.08\n —\n —\n —\n CSV\n \n \n #4CA64C66\n 6\n \n \n \n\n col_vals_between\n \n \n \n \n \n \n\n \n col_vals_between()\n \n c\n [0, 15]\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 110.85\n 20.15\n —\n —\n —\n CSV\n \n\n \n \n \n 2025-02-11 05:26:03 UTC< 1 s2025-02-11 05:26:03 UTC\n \n\n\n\n\n\n\n \n\n\nimport pointblank as pb\n\nvalidation = (\n pb.Validate(\n data=pb.load_dataset(dataset=\"small_table\", tbl_type=\"polars\")\n )\n .col_vals_gt(columns=\"d\", value=1000) # values in 'd' > 1000\n .col_vals_lt(columns=\"d\", value=10000) # values in 'd' < 10000\n .col_vals_ge(columns=\"a\", value=1) # values in 'a' >= 1\n .col_vals_le(columns=\"c\", value=5) # values in 'c' <= 5\n .col_vals_ne(columns=\"a\", value=7) # values in 'a' not equal to 7\n .col_vals_between(columns=\"c\", left=0, right=15) # 0 <= 'c' values <= 15\n .interrogate()\n)\n\nvalidation\n\n\nPreview of Input Table\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PolarsRows13Columns8\n \n\n \n date_timeDatetime\n dateDate\n aInt64\n bString\n cInt64\n dFloat64\n eBoolean\n fString\n\n\n\n \n 1\n 2016-01-04 11:00:00\n 2016-01-04\n 2\n 1-bcd-345\n 3\n 3423.29\n True\n high\n \n \n 2\n 2016-01-04 00:32:00\n 2016-01-04\n 3\n 5-egh-163\n 8\n 9999.99\n True\n low\n \n \n 3\n 2016-01-05 13:32:00\n 2016-01-05\n 6\n 8-kdg-938\n 3\n 2343.23\n True\n high\n \n \n 4\n 2016-01-06 17:23:00\n 2016-01-06\n 2\n 5-jdo-903\n None\n 3892.4\n False\n mid\n \n \n 5\n 2016-01-09 12:36:00\n 2016-01-09\n 8\n 3-ldm-038\n 7\n 283.94\n True\n low\n \n \n 6\n 2016-01-11 06:15:00\n 2016-01-11\n 4\n 2-dhe-923\n 4\n 3291.03\n True\n mid\n \n \n 7\n 2016-01-15 18:46:00\n 2016-01-15\n 7\n 1-knw-093\n 3\n 843.34\n True\n high\n \n \n 8\n 2016-01-17 11:27:00\n 2016-01-17\n 4\n 5-boe-639\n 2\n 1035.64\n False\n low\n \n \n 9\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 10\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 11\n 2016-01-26 20:07:00\n 2016-01-26\n 4\n 2-dmx-010\n 7\n 833.98\n True\n low\n \n \n 12\n 2016-01-28 02:51:00\n 2016-01-28\n 2\n 7-dmx-010\n 8\n 108.34\n False\n low\n \n \n 13\n 2016-01-30 11:23:00\n 2016-01-30\n 1\n 3-dka-303\n None\n 2230.09\n True\n high"
},
{
"objectID": "demos/schema-check/index.html",
"href": "demos/schema-check/index.html",
"title": "pointblank",
"section": "",
- "text": "Check the Schema of a Table\nThe schema of a table can be flexibly defined with Schema and verified with col_schema_match().\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-10|22:08:25Polars\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C\n 1\n \n \n \n\n col_schema_match\n \n \n \n \n \n \n \n \n \n \n \n\n \n col_schema_match()\n \n —\n SCHEMA\n \n \n \n \n \n \n \n \n\n ✓\n 1\n 11.00\n 00.00\n —\n —\n —\n —\n \n\n \n \n \n 2025-02-10 22:08:25 UTC< 1 s2025-02-10 22:08:25 UTC\n \n\n\n\n\n\n\n \n\n\nimport pointblank as pb\nimport polars as pl\n\ntbl = pl.DataFrame(\n {\n \"a\": [\"apple\", \"banana\", \"cherry\", \"date\"],\n \"b\": [1, 6, 3, 5],\n \"c\": [1.1, 2.2, 3.3, 4.4],\n }\n)\n\n# Use the Schema class to define the column schema as loosely or rigorously as required\nschema = pb.Schema(\n columns=[\n (\"a\", \"String\"), # Column 'a' has dtype 'String'\n (\"b\", [\"Int\", \"Int64\"]), # Column 'b' has dtype 'Int' or 'Int64'\n (\"c\", ) # Column 'c' follows 'b' but we don't specify a dtype here\n ]\n)\n\n# Use the `col_schema_match()` validation method to perform the schema check\nvalidation = (\n pb.Validate(data=tbl)\n .col_schema_match(schema=schema)\n .interrogate()\n)\n\nvalidation"
+ "text": "Check the Schema of a Table\nThe schema of a table can be flexibly defined with Schema and verified with col_schema_match().\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-11|05:26:07Polars\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C\n 1\n \n \n \n\n col_schema_match\n \n \n \n \n \n \n \n \n \n \n \n\n \n col_schema_match()\n \n —\n SCHEMA\n \n \n \n \n \n \n \n \n\n ✓\n 1\n 11.00\n 00.00\n —\n —\n —\n —\n \n\n \n \n \n 2025-02-11 05:26:07 UTC< 1 s2025-02-11 05:26:07 UTC\n \n\n\n\n\n\n\n \n\n\nimport pointblank as pb\nimport polars as pl\n\ntbl = pl.DataFrame(\n {\n \"a\": [\"apple\", \"banana\", \"cherry\", \"date\"],\n \"b\": [1, 6, 3, 5],\n \"c\": [1.1, 2.2, 3.3, 4.4],\n }\n)\n\n# Use the Schema class to define the column schema as loosely or rigorously as required\nschema = pb.Schema(\n columns=[\n (\"a\", \"String\"), # Column 'a' has dtype 'String'\n (\"b\", [\"Int\", \"Int64\"]), # Column 'b' has dtype 'Int' or 'Int64'\n (\"c\", ) # Column 'c' follows 'b' but we don't specify a dtype here\n ]\n)\n\n# Use the `col_schema_match()` validation method to perform the schema check\nvalidation = (\n pb.Validate(data=tbl)\n .col_schema_match(schema=schema)\n .interrogate()\n)\n\nvalidation"
},
{
"objectID": "demos/using-parquet-data/index.html",
"href": "demos/using-parquet-data/index.html",
"title": "pointblank",
"section": "",
- "text": "Using Parquet Data\nA Parquet dataset can be used for data validation, thanks to Ibis.\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n Example using a Parquet dataset.Parquet\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C\n 1\n \n \n \n\n col_vals_lt\n \n \n \n \n \n \n\n \n col_vals_lt()\n \n item_revenue\n 200\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 2\n \n \n \n\n col_vals_gt\n \n \n \n \n \n \n\n \n col_vals_gt()\n \n item_revenue\n 0\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C66\n 3\n \n \n \n\n col_vals_gt\n \n \n \n \n \n \n\n \n col_vals_gt()\n \n session_duration\n 5\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 19820.99\n 180.01\n —\n —\n —\n —\n \n \n #4CA64C\n 4\n \n \n \n\n col_vals_in_set\n \n \n \n \n \n \n\n \n col_vals_in_set()\n \n item_type\n iap, ad\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 5\n \n \n \n\n col_vals_regex\n \n \n \n \n \n \n \n \n \n\n \n col_vals_regex()\n \n player_id\n [A-Z]{12}\\d{3}\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n —\n —\n —\n —\n \n\n \n \n \n 2025-02-10 22:08:31 UTC< 1 s2025-02-10 22:08:31 UTC\n \n\n\n\n\n\n\n \n\n\nimport pointblank as pb\nimport ibis\n\ngame_revenue = ibis.read_parquet(\"data/game_revenue.parquet\")\n\nvalidation = (\n pb.Validate(data=game_revenue, label=\"Example using a Parquet dataset.\")\n .col_vals_lt(columns=\"item_revenue\", value=200)\n .col_vals_gt(columns=\"item_revenue\", value=0)\n .col_vals_gt(columns=\"session_duration\", value=5)\n .col_vals_in_set(columns=\"item_type\", set=[\"iap\", \"ad\"])\n .col_vals_regex(columns=\"player_id\", pattern=r\"[A-Z]{12}\\d{3}\")\n .interrogate()\n)\n\nvalidation\n\n\nPreview of Input Table\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n ParquetRows2000Columns11\n \n\n \n player_idstring\n session_idstring\n session_starttimestamp\n timetimestamp\n item_typestring\n item_namestring\n item_revenuefloat64\n session_durationfloat64\n start_daydate\n acquisitionstring\n countrystring\n\n\n\n \n 1\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:31:27+00:00\n iap\n offer2\n 8.99\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 2\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:36:57+00:00\n iap\n gems3\n 22.49\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 3\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:37:45+00:00\n iap\n gold7\n 107.99\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 4\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:42:33+00:00\n ad\n ad_20sec\n 0.76\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 5\n ECPANOIXLZHF896\n ECPANOIXLZHF896-hdu9jkls\n 2015-01-01 11:50:02+00:00\n 2015-01-01 11:55:20+00:00\n ad\n ad_5sec\n 0.03\n 35.2\n 2015-01-01\n google\n Germany\n \n \n 1996\n NAOJRDMCSEBI281\n NAOJRDMCSEBI281-j2vs9ilp\n 2015-01-21 01:57:50+00:00\n 2015-01-21 02:02:50+00:00\n ad\n ad_survey\n 1.332\n 25.8\n 2015-01-11\n organic\n Norway\n \n \n 1997\n NAOJRDMCSEBI281\n NAOJRDMCSEBI281-j2vs9ilp\n 2015-01-21 01:57:50+00:00\n 2015-01-21 02:22:14+00:00\n ad\n ad_survey\n 1.35\n 25.8\n 2015-01-11\n organic\n Norway\n \n \n 1998\n RMOSWHJGELCI675\n RMOSWHJGELCI675-vbhcsmtr\n 2015-01-21 02:39:48+00:00\n 2015-01-21 02:40:00+00:00\n ad\n ad_5sec\n 0.03\n 8.4\n 2015-01-10\n other_campaign\n France\n \n \n 1999\n RMOSWHJGELCI675\n RMOSWHJGELCI675-vbhcsmtr\n 2015-01-21 02:39:48+00:00\n 2015-01-21 02:47:12+00:00\n iap\n offer5\n 26.09\n 8.4\n 2015-01-10\n other_campaign\n France\n \n \n 2000\n GJCXNTWEBIPQ369\n GJCXNTWEBIPQ369-9elq67md\n 2015-01-21 03:59:23+00:00\n 2015-01-21 04:06:29+00:00\n ad\n ad_5sec\n 0.12\n 18.5\n 2015-01-14\n organic\n United States"
+ "text": "Using Parquet Data\nA Parquet dataset can be used for data validation, thanks to Ibis.\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n Example using a Parquet dataset.Parquet\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C\n 1\n \n \n \n\n col_vals_lt\n \n \n \n \n \n \n\n \n col_vals_lt()\n \n item_revenue\n 200\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 2\n \n \n \n\n col_vals_gt\n \n \n \n \n \n \n\n \n col_vals_gt()\n \n item_revenue\n 0\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C66\n 3\n \n \n \n\n col_vals_gt\n \n \n \n \n \n \n\n \n col_vals_gt()\n \n session_duration\n 5\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 19820.99\n 180.01\n —\n —\n —\n —\n \n \n #4CA64C\n 4\n \n \n \n\n col_vals_in_set\n \n \n \n \n \n \n\n \n col_vals_in_set()\n \n item_type\n iap, ad\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 5\n \n \n \n\n col_vals_regex\n \n \n \n \n \n \n \n \n \n\n \n col_vals_regex()\n \n player_id\n [A-Z]{12}\\d{3}\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n —\n —\n —\n —\n \n\n \n \n \n 2025-02-11 05:26:14 UTC< 1 s2025-02-11 05:26:14 UTC\n \n\n\n\n\n\n\n \n\n\nimport pointblank as pb\nimport ibis\n\ngame_revenue = ibis.read_parquet(\"data/game_revenue.parquet\")\n\nvalidation = (\n pb.Validate(data=game_revenue, label=\"Example using a Parquet dataset.\")\n .col_vals_lt(columns=\"item_revenue\", value=200)\n .col_vals_gt(columns=\"item_revenue\", value=0)\n .col_vals_gt(columns=\"session_duration\", value=5)\n .col_vals_in_set(columns=\"item_type\", set=[\"iap\", \"ad\"])\n .col_vals_regex(columns=\"player_id\", pattern=r\"[A-Z]{12}\\d{3}\")\n .interrogate()\n)\n\nvalidation\n\n\nPreview of Input Table\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n ParquetRows2000Columns11\n \n\n \n player_idstring\n session_idstring\n session_starttimestamp\n timetimestamp\n item_typestring\n item_namestring\n item_revenuefloat64\n session_durationfloat64\n start_daydate\n acquisitionstring\n countrystring\n\n\n\n \n 1\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:31:27+00:00\n iap\n offer2\n 8.99\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 2\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:36:57+00:00\n iap\n gems3\n 22.49\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 3\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:37:45+00:00\n iap\n gold7\n 107.99\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 4\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:42:33+00:00\n ad\n ad_20sec\n 0.76\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 5\n ECPANOIXLZHF896\n ECPANOIXLZHF896-hdu9jkls\n 2015-01-01 11:50:02+00:00\n 2015-01-01 11:55:20+00:00\n ad\n ad_5sec\n 0.03\n 35.2\n 2015-01-01\n google\n Germany\n \n \n 1996\n NAOJRDMCSEBI281\n NAOJRDMCSEBI281-j2vs9ilp\n 2015-01-21 01:57:50+00:00\n 2015-01-21 02:02:50+00:00\n ad\n ad_survey\n 1.332\n 25.8\n 2015-01-11\n organic\n Norway\n \n \n 1997\n NAOJRDMCSEBI281\n NAOJRDMCSEBI281-j2vs9ilp\n 2015-01-21 01:57:50+00:00\n 2015-01-21 02:22:14+00:00\n ad\n ad_survey\n 1.35\n 25.8\n 2015-01-11\n organic\n Norway\n \n \n 1998\n RMOSWHJGELCI675\n RMOSWHJGELCI675-vbhcsmtr\n 2015-01-21 02:39:48+00:00\n 2015-01-21 02:40:00+00:00\n ad\n ad_5sec\n 0.03\n 8.4\n 2015-01-10\n other_campaign\n France\n \n \n 1999\n RMOSWHJGELCI675\n RMOSWHJGELCI675-vbhcsmtr\n 2015-01-21 02:39:48+00:00\n 2015-01-21 02:47:12+00:00\n iap\n offer5\n 26.09\n 8.4\n 2015-01-10\n other_campaign\n France\n \n \n 2000\n GJCXNTWEBIPQ369\n GJCXNTWEBIPQ369-9elq67md\n 2015-01-21 03:59:23+00:00\n 2015-01-21 04:06:29+00:00\n ad\n ad_5sec\n 0.12\n 18.5\n 2015-01-14\n organic\n United States"
},
{
"objectID": "demos/04-sundered-data/index.html",
"href": "demos/04-sundered-data/index.html",
"title": "pointblank",
"section": "",
- "text": "Sundered Data\nSplitting your data into ‘pass’ and ‘fail’ subsets.\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-10|22:08:37Pandas\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C66\n 1\n \n \n \n\n col_vals_gt\n \n \n \n \n \n \n\n \n col_vals_gt()\n \n d\n 1000\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 70.54\n 60.46\n —\n —\n —\n CSV\n \n \n #4CA64C66\n 2\n \n \n \n\n col_vals_lte\n \n \n \n \n \n \n\n \n col_vals_le()\n \n c\n 5\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 50.38\n 80.62\n —\n —\n —\n CSV\n \n\n \n \n \n 2025-02-10 22:08:37 UTC< 1 s2025-02-10 22:08:37 UTC\n \n\n\n\n\n\n\n \n\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PandasRows4Columns8\n \n\n \n date_timedatetime64[ns]\n datedatetime64[ns]\n aint64\n bobject\n cfloat64\n dfloat64\n ebool\n fobject\n\n\n\n \n 1\n 2016-01-04 11:00:00\n 2016-01-04 00:00:00\n 2\n 1-bcd-345\n 3.0\n 3423.29\n True\n high\n \n \n 2\n 2016-01-05 13:32:00\n 2016-01-05 00:00:00\n 6\n 8-kdg-938\n 3.0\n 2343.23\n True\n high\n \n \n 3\n 2016-01-11 06:15:00\n 2016-01-11 00:00:00\n 4\n 2-dhe-923\n 4.0\n 3291.03\n True\n mid\n \n \n 4\n 2016-01-17 11:27:00\n 2016-01-17 00:00:00\n 4\n 5-boe-639\n 2.0\n 1035.64\n False\n low\n \n\n\n\n\n\n\n \n\n\nimport pointblank as pb\nimport polars as pl\n\nvalidation = (\n pb.Validate(data=pb.load_dataset(dataset=\"small_table\", tbl_type=\"pandas\"))\n .col_vals_gt(columns=\"d\", value=1000)\n .col_vals_le(columns=\"c\", value=5)\n .interrogate()\n)\n\nvalidation\npb.preview(validation.get_sundered_data(type=\"pass\"))\n\n\nPreview of Input Table\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PandasRows13Columns8\n \n\n \n date_timedatetime64[ns]\n datedatetime64[ns]\n aint64\n bobject\n cfloat64\n dfloat64\n ebool\n fobject\n\n\n\n \n 1\n 2016-01-04 11:00:00\n 2016-01-04 00:00:00\n 2\n 1-bcd-345\n 3.0\n 3423.29\n True\n high\n \n \n 2\n 2016-01-04 00:32:00\n 2016-01-04 00:00:00\n 3\n 5-egh-163\n 8.0\n 9999.99\n True\n low\n \n \n 3\n 2016-01-05 13:32:00\n 2016-01-05 00:00:00\n 6\n 8-kdg-938\n 3.0\n 2343.23\n True\n high\n \n \n 4\n 2016-01-06 17:23:00\n 2016-01-06 00:00:00\n 2\n 5-jdo-903\n NA\n 3892.4\n False\n mid\n \n \n 5\n 2016-01-09 12:36:00\n 2016-01-09 00:00:00\n 8\n 3-ldm-038\n 7.0\n 283.94\n True\n low\n \n \n 6\n 2016-01-11 06:15:00\n 2016-01-11 00:00:00\n 4\n 2-dhe-923\n 4.0\n 3291.03\n True\n mid\n \n \n 7\n 2016-01-15 18:46:00\n 2016-01-15 00:00:00\n 7\n 1-knw-093\n 3.0\n 843.34\n True\n high\n \n \n 8\n 2016-01-17 11:27:00\n 2016-01-17 00:00:00\n 4\n 5-boe-639\n 2.0\n 1035.64\n False\n low\n \n \n 9\n 2016-01-20 04:30:00\n 2016-01-20 00:00:00\n 3\n 5-bce-642\n 9.0\n 837.93\n False\n high\n \n \n 10\n 2016-01-20 04:30:00\n 2016-01-20 00:00:00\n 3\n 5-bce-642\n 9.0\n 837.93\n False\n high\n \n \n 11\n 2016-01-26 20:07:00\n 2016-01-26 00:00:00\n 4\n 2-dmx-010\n 7.0\n 833.98\n True\n low\n \n \n 12\n 2016-01-28 02:51:00\n 2016-01-28 00:00:00\n 2\n 7-dmx-010\n 8.0\n 108.34\n False\n low\n \n \n 13\n 2016-01-30 11:23:00\n 2016-01-30 00:00:00\n 1\n 3-dka-303\n NA\n 2230.09\n True\n high"
+ "text": "Sundered Data\nSplitting your data into ‘pass’ and ‘fail’ subsets.\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-11|05:26:20Pandas\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C66\n 1\n \n \n \n\n col_vals_gt\n \n \n \n \n \n \n\n \n col_vals_gt()\n \n d\n 1000\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 70.54\n 60.46\n —\n —\n —\n CSV\n \n \n #4CA64C66\n 2\n \n \n \n\n col_vals_lte\n \n \n \n \n \n \n\n \n col_vals_le()\n \n c\n 5\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 50.38\n 80.62\n —\n —\n —\n CSV\n \n\n \n \n \n 2025-02-11 05:26:20 UTC< 1 s2025-02-11 05:26:20 UTC\n \n\n\n\n\n\n\n \n\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PandasRows4Columns8\n \n\n \n date_timedatetime64[ns]\n datedatetime64[ns]\n aint64\n bobject\n cfloat64\n dfloat64\n ebool\n fobject\n\n\n\n \n 1\n 2016-01-04 11:00:00\n 2016-01-04 00:00:00\n 2\n 1-bcd-345\n 3.0\n 3423.29\n True\n high\n \n \n 2\n 2016-01-05 13:32:00\n 2016-01-05 00:00:00\n 6\n 8-kdg-938\n 3.0\n 2343.23\n True\n high\n \n \n 3\n 2016-01-11 06:15:00\n 2016-01-11 00:00:00\n 4\n 2-dhe-923\n 4.0\n 3291.03\n True\n mid\n \n \n 4\n 2016-01-17 11:27:00\n 2016-01-17 00:00:00\n 4\n 5-boe-639\n 2.0\n 1035.64\n False\n low\n \n\n\n\n\n\n\n \n\n\nimport pointblank as pb\nimport polars as pl\n\nvalidation = (\n pb.Validate(data=pb.load_dataset(dataset=\"small_table\", tbl_type=\"pandas\"))\n .col_vals_gt(columns=\"d\", value=1000)\n .col_vals_le(columns=\"c\", value=5)\n .interrogate()\n)\n\nvalidation\npb.preview(validation.get_sundered_data(type=\"pass\"))\n\n\nPreview of Input Table\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PandasRows13Columns8\n \n\n \n date_timedatetime64[ns]\n datedatetime64[ns]\n aint64\n bobject\n cfloat64\n dfloat64\n ebool\n fobject\n\n\n\n \n 1\n 2016-01-04 11:00:00\n 2016-01-04 00:00:00\n 2\n 1-bcd-345\n 3.0\n 3423.29\n True\n high\n \n \n 2\n 2016-01-04 00:32:00\n 2016-01-04 00:00:00\n 3\n 5-egh-163\n 8.0\n 9999.99\n True\n low\n \n \n 3\n 2016-01-05 13:32:00\n 2016-01-05 00:00:00\n 6\n 8-kdg-938\n 3.0\n 2343.23\n True\n high\n \n \n 4\n 2016-01-06 17:23:00\n 2016-01-06 00:00:00\n 2\n 5-jdo-903\n NA\n 3892.4\n False\n mid\n \n \n 5\n 2016-01-09 12:36:00\n 2016-01-09 00:00:00\n 8\n 3-ldm-038\n 7.0\n 283.94\n True\n low\n \n \n 6\n 2016-01-11 06:15:00\n 2016-01-11 00:00:00\n 4\n 2-dhe-923\n 4.0\n 3291.03\n True\n mid\n \n \n 7\n 2016-01-15 18:46:00\n 2016-01-15 00:00:00\n 7\n 1-knw-093\n 3.0\n 843.34\n True\n high\n \n \n 8\n 2016-01-17 11:27:00\n 2016-01-17 00:00:00\n 4\n 5-boe-639\n 2.0\n 1035.64\n False\n low\n \n \n 9\n 2016-01-20 04:30:00\n 2016-01-20 00:00:00\n 3\n 5-bce-642\n 9.0\n 837.93\n False\n high\n \n \n 10\n 2016-01-20 04:30:00\n 2016-01-20 00:00:00\n 3\n 5-bce-642\n 9.0\n 837.93\n False\n high\n \n \n 11\n 2016-01-26 20:07:00\n 2016-01-26 00:00:00\n 4\n 2-dmx-010\n 7.0\n 833.98\n True\n low\n \n \n 12\n 2016-01-28 02:51:00\n 2016-01-28 00:00:00\n 2\n 7-dmx-010\n 8.0\n 108.34\n False\n low\n \n \n 13\n 2016-01-30 11:23:00\n 2016-01-30 00:00:00\n 1\n 3-dka-303\n NA\n 2230.09\n True\n high"
},
{
"objectID": "demos/05-step-report-column-check/index.html",
"href": "demos/05-step-report-column-check/index.html",
"title": "pointblank",
"section": "",
- "text": "Step Report: Column Data Checks\nA step report for column checks shows what went wrong.\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-10|22:08:43Polars\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C66\n 1\n \n \n \n\n col_vals_gte\n \n \n \n \n \n \n\n \n col_vals_ge()\n \n c\n 4\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 90.69\n 40.31\n —\n —\n —\n CSV\n \n \n #4CA64C\n 2\n \n \n \n\n col_vals_regex\n \n \n \n \n \n \n \n \n \n\n \n col_vals_regex()\n \n b\n \\d-[a-z]{3}-\\d{3}\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n —\n —\n —\n —\n \n\n \n \n \n 2025-02-10 22:08:43 UTC< 1 s2025-02-10 22:08:43 UTC\n \n\n\n\n\n\n\n \n\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Report for Validation Step 1\n \n \n ASSERTION c ≥ 44 / 13 TEST UNIT FAILURES IN COLUMN 5EXTRACT OF 4 ROWS WITH TEST UNIT FAILURES IN RED:\n \n\n \n date_timeDatetime\n dateDate\n aInt64\n bString\n cInt64\n dFloat64\n eBoolean\n fString\n\n\n\n \n 1\n 2016-01-04 11:00:00\n 2016-01-04\n 2\n 1-bcd-345\n 3\n 3423.29\n True\n high\n \n \n 3\n 2016-01-05 13:32:00\n 2016-01-05\n 6\n 8-kdg-938\n 3\n 2343.23\n True\n high\n \n \n 7\n 2016-01-15 18:46:00\n 2016-01-15\n 7\n 1-knw-093\n 3\n 843.34\n True\n high\n \n \n 8\n 2016-01-17 11:27:00\n 2016-01-17\n 4\n 5-boe-639\n 2\n 1035.64\n False\n low\n \n\n\n\n\n\n\n \n\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Report for Validation Step 2 ✓\n \n \n ASSERTION b matches regex \\d-[a-z]{3}-\\d{3}13 TEST UNITS ALL PASSED IN COLUMN 4PREVIEW OF TARGET TABLE:\n \n\n \n date_timeDatetime\n dateDate\n aInt64\n bString\n cInt64\n dFloat64\n eBoolean\n fString\n\n\n\n \n 1\n 2016-01-04 11:00:00\n 2016-01-04\n 2\n 1-bcd-345\n 3\n 3423.29\n True\n high\n \n \n 2\n 2016-01-04 00:32:00\n 2016-01-04\n 3\n 5-egh-163\n 8\n 9999.99\n True\n low\n \n \n 3\n 2016-01-05 13:32:00\n 2016-01-05\n 6\n 8-kdg-938\n 3\n 2343.23\n True\n high\n \n \n 4\n 2016-01-06 17:23:00\n 2016-01-06\n 2\n 5-jdo-903\n None\n 3892.4\n False\n mid\n \n \n 5\n 2016-01-09 12:36:00\n 2016-01-09\n 8\n 3-ldm-038\n 7\n 283.94\n True\n low\n \n \n 9\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 10\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 11\n 2016-01-26 20:07:00\n 2016-01-26\n 4\n 2-dmx-010\n 7\n 833.98\n True\n low\n \n \n 12\n 2016-01-28 02:51:00\n 2016-01-28\n 2\n 7-dmx-010\n 8\n 108.34\n False\n low\n \n \n 13\n 2016-01-30 11:23:00\n 2016-01-30\n 1\n 3-dka-303\n None\n 2230.09\n True\n high\n \n\n\n\n\n\n\n \n\n\nimport pointblank as pb\n\nvalidation = (\n pb.Validate(data=pb.load_dataset(dataset=\"small_table\"))\n .col_vals_ge(columns=\"c\", value=4, na_pass=True) # has failing test units\n .col_vals_regex(columns=\"b\", pattern=r\"\\d-[a-z]{3}-\\d{3}\") # no failing test units\n .interrogate()\n)\n\nvalidation\nvalidation.get_step_report(i=1)\nvalidation.get_step_report(i=2)\n\n\nPreview of Input Table\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PolarsRows13Columns8\n \n\n \n date_timeDatetime\n dateDate\n aInt64\n bString\n cInt64\n dFloat64\n eBoolean\n fString\n\n\n\n \n 1\n 2016-01-04 11:00:00\n 2016-01-04\n 2\n 1-bcd-345\n 3\n 3423.29\n True\n high\n \n \n 2\n 2016-01-04 00:32:00\n 2016-01-04\n 3\n 5-egh-163\n 8\n 9999.99\n True\n low\n \n \n 3\n 2016-01-05 13:32:00\n 2016-01-05\n 6\n 8-kdg-938\n 3\n 2343.23\n True\n high\n \n \n 4\n 2016-01-06 17:23:00\n 2016-01-06\n 2\n 5-jdo-903\n None\n 3892.4\n False\n mid\n \n \n 5\n 2016-01-09 12:36:00\n 2016-01-09\n 8\n 3-ldm-038\n 7\n 283.94\n True\n low\n \n \n 6\n 2016-01-11 06:15:00\n 2016-01-11\n 4\n 2-dhe-923\n 4\n 3291.03\n True\n mid\n \n \n 7\n 2016-01-15 18:46:00\n 2016-01-15\n 7\n 1-knw-093\n 3\n 843.34\n True\n high\n \n \n 8\n 2016-01-17 11:27:00\n 2016-01-17\n 4\n 5-boe-639\n 2\n 1035.64\n False\n low\n \n \n 9\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 10\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 11\n 2016-01-26 20:07:00\n 2016-01-26\n 4\n 2-dmx-010\n 7\n 833.98\n True\n low\n \n \n 12\n 2016-01-28 02:51:00\n 2016-01-28\n 2\n 7-dmx-010\n 8\n 108.34\n False\n low\n \n \n 13\n 2016-01-30 11:23:00\n 2016-01-30\n 1\n 3-dka-303\n None\n 2230.09\n True\n high"
+ "text": "Step Report: Column Data Checks\nA step report for column checks shows what went wrong.\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-11|05:26:26Polars\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C66\n 1\n \n \n \n\n col_vals_gte\n \n \n \n \n \n \n\n \n col_vals_ge()\n \n c\n 4\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 90.69\n 40.31\n —\n —\n —\n CSV\n \n \n #4CA64C\n 2\n \n \n \n\n col_vals_regex\n \n \n \n \n \n \n \n \n \n\n \n col_vals_regex()\n \n b\n \\d-[a-z]{3}-\\d{3}\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n —\n —\n —\n —\n \n\n \n \n \n 2025-02-11 05:26:26 UTC< 1 s2025-02-11 05:26:26 UTC\n \n\n\n\n\n\n\n \n\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Report for Validation Step 1\n \n \n ASSERTION c ≥ 44 / 13 TEST UNIT FAILURES IN COLUMN 5EXTRACT OF 4 ROWS WITH TEST UNIT FAILURES IN RED:\n \n\n \n date_timeDatetime\n dateDate\n aInt64\n bString\n cInt64\n dFloat64\n eBoolean\n fString\n\n\n\n \n 1\n 2016-01-04 11:00:00\n 2016-01-04\n 2\n 1-bcd-345\n 3\n 3423.29\n True\n high\n \n \n 3\n 2016-01-05 13:32:00\n 2016-01-05\n 6\n 8-kdg-938\n 3\n 2343.23\n True\n high\n \n \n 7\n 2016-01-15 18:46:00\n 2016-01-15\n 7\n 1-knw-093\n 3\n 843.34\n True\n high\n \n \n 8\n 2016-01-17 11:27:00\n 2016-01-17\n 4\n 5-boe-639\n 2\n 1035.64\n False\n low\n \n\n\n\n\n\n\n \n\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Report for Validation Step 2 ✓\n \n \n ASSERTION b matches regex \\d-[a-z]{3}-\\d{3}13 TEST UNITS ALL PASSED IN COLUMN 4PREVIEW OF TARGET TABLE:\n \n\n \n date_timeDatetime\n dateDate\n aInt64\n bString\n cInt64\n dFloat64\n eBoolean\n fString\n\n\n\n \n 1\n 2016-01-04 11:00:00\n 2016-01-04\n 2\n 1-bcd-345\n 3\n 3423.29\n True\n high\n \n \n 2\n 2016-01-04 00:32:00\n 2016-01-04\n 3\n 5-egh-163\n 8\n 9999.99\n True\n low\n \n \n 3\n 2016-01-05 13:32:00\n 2016-01-05\n 6\n 8-kdg-938\n 3\n 2343.23\n True\n high\n \n \n 4\n 2016-01-06 17:23:00\n 2016-01-06\n 2\n 5-jdo-903\n None\n 3892.4\n False\n mid\n \n \n 5\n 2016-01-09 12:36:00\n 2016-01-09\n 8\n 3-ldm-038\n 7\n 283.94\n True\n low\n \n \n 9\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 10\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 11\n 2016-01-26 20:07:00\n 2016-01-26\n 4\n 2-dmx-010\n 7\n 833.98\n True\n low\n \n \n 12\n 2016-01-28 02:51:00\n 2016-01-28\n 2\n 7-dmx-010\n 8\n 108.34\n False\n low\n \n \n 13\n 2016-01-30 11:23:00\n 2016-01-30\n 1\n 3-dka-303\n None\n 2230.09\n True\n high\n \n\n\n\n\n\n\n \n\n\nimport pointblank as pb\n\nvalidation = (\n pb.Validate(data=pb.load_dataset(dataset=\"small_table\"))\n .col_vals_ge(columns=\"c\", value=4, na_pass=True) # has failing test units\n .col_vals_regex(columns=\"b\", pattern=r\"\\d-[a-z]{3}-\\d{3}\") # no failing test units\n .interrogate()\n)\n\nvalidation\nvalidation.get_step_report(i=1)\nvalidation.get_step_report(i=2)\n\n\nPreview of Input Table\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PolarsRows13Columns8\n \n\n \n date_timeDatetime\n dateDate\n aInt64\n bString\n cInt64\n dFloat64\n eBoolean\n fString\n\n\n\n \n 1\n 2016-01-04 11:00:00\n 2016-01-04\n 2\n 1-bcd-345\n 3\n 3423.29\n True\n high\n \n \n 2\n 2016-01-04 00:32:00\n 2016-01-04\n 3\n 5-egh-163\n 8\n 9999.99\n True\n low\n \n \n 3\n 2016-01-05 13:32:00\n 2016-01-05\n 6\n 8-kdg-938\n 3\n 2343.23\n True\n high\n \n \n 4\n 2016-01-06 17:23:00\n 2016-01-06\n 2\n 5-jdo-903\n None\n 3892.4\n False\n mid\n \n \n 5\n 2016-01-09 12:36:00\n 2016-01-09\n 8\n 3-ldm-038\n 7\n 283.94\n True\n low\n \n \n 6\n 2016-01-11 06:15:00\n 2016-01-11\n 4\n 2-dhe-923\n 4\n 3291.03\n True\n mid\n \n \n 7\n 2016-01-15 18:46:00\n 2016-01-15\n 7\n 1-knw-093\n 3\n 843.34\n True\n high\n \n \n 8\n 2016-01-17 11:27:00\n 2016-01-17\n 4\n 5-boe-639\n 2\n 1035.64\n False\n low\n \n \n 9\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 10\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 11\n 2016-01-26 20:07:00\n 2016-01-26\n 4\n 2-dmx-010\n 7\n 833.98\n True\n low\n \n \n 12\n 2016-01-28 02:51:00\n 2016-01-28\n 2\n 7-dmx-010\n 8\n 108.34\n False\n low\n \n \n 13\n 2016-01-30 11:23:00\n 2016-01-30\n 1\n 3-dka-303\n None\n 2230.09\n True\n high"
},
{
"objectID": "demos/02-advanced/index.html",
"href": "demos/02-advanced/index.html",
"title": "pointblank",
"section": "",
- "text": "Advanced Validation\nA validation with a comprehensive set of rules.\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n Comprehensive validation examplePolarsgame_revenueWARN0.1STOP0.25NOTIFY0.35\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C\n 1\n \n \n \n\n col_vals_regex\n \n \n \n \n \n \n \n \n \n\n \n col_vals_regex()\n \n player_id\n ^[A-Z]{12}[0-9]{3}$\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n ○\n ○\n ○\n —\n \n \n #4CA64C66\n 2\n \n \n \n\n col_vals_gt\n \n \n \n \n \n \n\n \n col_vals_gt()\n \n session_duration\n 5\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 19820.99\n 180.01\n ○\n ○\n ○\n CSV\n \n \n #4CA64C66\n 3\n \n \n \n\n col_vals_gte\n \n \n \n \n \n \n\n \n col_vals_ge()\n \n item_revenue\n 0.02\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 19410.97\n 590.03\n ○\n ○\n ○\n CSV\n \n \n #4CA64C\n 4\n \n \n \n\n col_vals_in_set\n \n \n \n \n \n \n\n \n col_vals_in_set()\n \n item_type\n iap, ad\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n ○\n ○\n ○\n —\n \n \n #4CA64C66\n 5\n \n \n \n\n col_vals_in_set\n \n \n \n \n \n \n\n \n col_vals_in_set()\n \n acquisition\n google, facebook, organic, crosspromo, other_campaign\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 19750.99\n 250.01\n ○\n ○\n ○\n CSV\n \n \n #FFBF00\n 6\n \n \n \n\n col_vals_not_in_set\n \n \n \n \n \n \n \n \n\n \n col_vals_not_in_set()\n \n country\n Mongolia, Germany\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 17750.89\n 2250.11\n ●\n ○\n ○\n CSV\n \n \n #4CA64C\n 7\n \n \n \n\n col_vals_between\n \n \n \n \n \n \n\n \n col_vals_between()\n \n session_duration\n [10, 50]\n \n \n \n \n \n \n \n \n \n \n\n ✓\n 1\n 11.00\n 00.00\n ○\n ○\n ○\n —\n \n \n #4CA64C66\n 8\n \n \n \n\n rows_distinct\n \n \n \n \n \n \n \n \n \n \n\n \n rows_distinct()\n \n player_id, session_id, time\n —\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 19780.99\n 220.01\n ○\n ○\n ○\n —\n \n \n #4CA64C\n 9\n \n \n \n\n row_count_match\n \n \n \n \n \n \n \n \n \n \n \n\n \n row_count_match()\n \n —\n 2000\n \n \n \n \n \n \n \n \n\n ✓\n 1\n 11.00\n 00.00\n ○\n ○\n ○\n —\n \n \n #4CA64C\n 10\n \n \n \n\n col_count_match\n \n \n \n \n \n \n \n \n \n \n \n\n \n col_count_match()\n \n —\n 11\n \n \n \n \n \n \n \n \n\n ✓\n 1\n 11.00\n 00.00\n ○\n ○\n ○\n —\n \n \n #4CA64C\n 11\n \n \n \n\n col_vals_not_null\n \n \n \n \n \n \n \n \n\n \n col_vals_not_null()\n \n item_type\n —\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n ○\n ○\n ○\n —\n \n \n #4CA64C\n 12\n \n \n \n\n col_vals_not_null\n \n \n \n \n \n \n \n \n\n \n col_vals_not_null()\n \n item_name\n —\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n ○\n ○\n ○\n —\n \n \n #4CA64C\n 13\n \n \n \n\n col_vals_not_null\n \n \n \n \n \n \n \n \n\n \n col_vals_not_null()\n \n item_revenue\n —\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n ○\n ○\n ○\n —\n \n \n #4CA64C\n 14\n \n \n \n\n col_exists\n \n \n \n \n \n \n \n\n \n col_exists()\n \n start_day\n —\n \n \n \n \n \n \n \n \n\n ✓\n 1\n 11.00\n 00.00\n ○\n ○\n ○\n —\n \n\n \n \n \n 2025-02-10 22:08:49 UTC< 1 s2025-02-10 22:08:49 UTC\n \n\n\n\n\n\n\n \n\n\nimport pointblank as pb\nimport polars as pl\nimport narwhals as nw\n\nvalidation = (\n pb.Validate(\n data=pb.load_dataset(dataset=\"game_revenue\", tbl_type=\"polars\"),\n tbl_name=\"game_revenue\",\n label=\"Comprehensive validation example\",\n thresholds=pb.Thresholds(warn_at=0.10, stop_at=0.25, notify_at=0.35),\n )\n .col_vals_regex(columns=\"player_id\", pattern=r\"^[A-Z]{12}[0-9]{3}$\") # STEP 1\n .col_vals_gt(columns=\"session_duration\", value=5) # STEP 2\n .col_vals_ge(columns=\"item_revenue\", value=0.02) # STEP 3\n .col_vals_in_set(columns=\"item_type\", set=[\"iap\", \"ad\"]) # STEP 4\n .col_vals_in_set( # STEP 5\n columns=\"acquisition\",\n set=[\"google\", \"facebook\", \"organic\", \"crosspromo\", \"other_campaign\"]\n )\n .col_vals_not_in_set(columns=\"country\", set=[\"Mongolia\", \"Germany\"]) # STEP 6\n .col_vals_between( # STEP 7\n columns=\"session_duration\",\n left=10, right=50,\n pre = lambda df: df.select(pl.median(\"session_duration\"))\n )\n .rows_distinct(columns_subset=[\"player_id\", \"session_id\", \"time\"]) # STEP 8\n .row_count_match(count=2000) # STEP 9\n .col_count_match(count=11) # STEP 10\n .col_vals_not_null(columns=pb.starts_with(\"item\")) # STEPS 11-13\n .col_exists(columns=\"start_day\") # STEP 14\n .interrogate()\n)\n\nvalidation\n\n\nPreview of Input Table\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PolarsRows2000Columns11\n \n\n \n player_idString\n session_idString\n session_startDatetime\n timeDatetime\n item_typeString\n item_nameString\n item_revenueFloat64\n session_durationFloat64\n start_dayDate\n acquisitionString\n countryString\n\n\n\n \n 1\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:31:27+00:00\n iap\n offer2\n 8.99\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 2\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:36:57+00:00\n iap\n gems3\n 22.49\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 3\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:37:45+00:00\n iap\n gold7\n 107.99\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 4\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:42:33+00:00\n ad\n ad_20sec\n 0.76\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 5\n ECPANOIXLZHF896\n ECPANOIXLZHF896-hdu9jkls\n 2015-01-01 11:50:02+00:00\n 2015-01-01 11:55:20+00:00\n ad\n ad_5sec\n 0.03\n 35.2\n 2015-01-01\n google\n Germany\n \n \n 6\n ECPANOIXLZHF896\n ECPANOIXLZHF896-hdu9jkls\n 2015-01-01 11:50:02+00:00\n 2015-01-01 12:08:56+00:00\n ad\n ad_10sec\n 0.07\n 35.2\n 2015-01-01\n google\n Germany\n \n \n 7\n ECPANOIXLZHF896\n ECPANOIXLZHF896-hdu9jkls\n 2015-01-01 11:50:02+00:00\n 2015-01-01 12:14:08+00:00\n ad\n ad_10sec\n 0.08\n 35.2\n 2015-01-01\n google\n Germany\n \n \n 8\n ECPANOIXLZHF896\n ECPANOIXLZHF896-hdu9jkls\n 2015-01-01 11:50:02+00:00\n 2015-01-01 12:21:44+00:00\n ad\n ad_30sec\n 1.17\n 35.2\n 2015-01-01\n google\n Germany\n \n \n 9\n ECPANOIXLZHF896\n ECPANOIXLZHF896-hdu9jkls\n 2015-01-01 11:50:02+00:00\n 2015-01-01 12:24:20+00:00\n ad\n ad_10sec\n 0.14\n 35.2\n 2015-01-01\n google\n Germany\n \n \n 10\n FXWUORGYNJAE271\n FXWUORGYNJAE271-et7bs639\n 2015-01-01 15:17:18+00:00\n 2015-01-01 15:19:36+00:00\n ad\n ad_5sec\n 0.08\n 30.7\n 2015-01-01\n organic\n Canada\n \n \n 1991\n VPNRYLMBKJGT925\n VPNRYLMBKJGT925-vt26q9gb\n 2015-01-21 01:07:24+00:00\n 2015-01-21 01:26:12+00:00\n ad\n ad_survey\n 0.72\n 24.9\n 2015-01-21\n other_campaign\n Germany\n \n \n 1992\n JVBZCPKXHFMU491\n JVBZCPKXHFMU491-wvi6hs2t\n 2015-01-21 01:49:36+00:00\n 2015-01-21 01:53:36+00:00\n iap\n gold6\n 41.99\n 7.1\n 2015-01-07\n organic\n United States\n \n \n 1993\n JVBZCPKXHFMU491\n JVBZCPKXHFMU491-wvi6hs2t\n 2015-01-21 01:49:36+00:00\n 2015-01-21 01:55:42+00:00\n iap\n gems3\n 17.49\n 7.1\n 2015-01-07\n organic\n United States\n \n \n 1994\n NAOJRDMCSEBI281\n NAOJRDMCSEBI281-j2vs9ilp\n 2015-01-21 01:57:50+00:00\n 2015-01-21 02:01:20+00:00\n ad\n ad_playable\n 1.116\n 25.8\n 2015-01-11\n organic\n Norway\n \n \n 1995\n NAOJRDMCSEBI281\n NAOJRDMCSEBI281-j2vs9ilp\n 2015-01-21 01:57:50+00:00\n 2015-01-21 02:02:14+00:00\n ad\n ad_15sec\n 0.225\n 25.8\n 2015-01-11\n organic\n Norway\n \n \n 1996\n NAOJRDMCSEBI281\n NAOJRDMCSEBI281-j2vs9ilp\n 2015-01-21 01:57:50+00:00\n 2015-01-21 02:02:50+00:00\n ad\n ad_survey\n 1.332\n 25.8\n 2015-01-11\n organic\n Norway\n \n \n 1997\n NAOJRDMCSEBI281\n NAOJRDMCSEBI281-j2vs9ilp\n 2015-01-21 01:57:50+00:00\n 2015-01-21 02:22:14+00:00\n ad\n ad_survey\n 1.35\n 25.8\n 2015-01-11\n organic\n Norway\n \n \n 1998\n RMOSWHJGELCI675\n RMOSWHJGELCI675-vbhcsmtr\n 2015-01-21 02:39:48+00:00\n 2015-01-21 02:40:00+00:00\n ad\n ad_5sec\n 0.03\n 8.4\n 2015-01-10\n other_campaign\n France\n \n \n 1999\n RMOSWHJGELCI675\n RMOSWHJGELCI675-vbhcsmtr\n 2015-01-21 02:39:48+00:00\n 2015-01-21 02:47:12+00:00\n iap\n offer5\n 26.09\n 8.4\n 2015-01-10\n other_campaign\n France\n \n \n 2000\n GJCXNTWEBIPQ369\n GJCXNTWEBIPQ369-9elq67md\n 2015-01-21 03:59:23+00:00\n 2015-01-21 04:06:29+00:00\n ad\n ad_5sec\n 0.12\n 18.5\n 2015-01-14\n organic\n United States"
+ "text": "Advanced Validation\nA validation with a comprehensive set of rules.\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n Comprehensive validation examplePolarsgame_revenueWARN0.1STOP0.25NOTIFY0.35\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C\n 1\n \n \n \n\n col_vals_regex\n \n \n \n \n \n \n \n \n \n\n \n col_vals_regex()\n \n player_id\n ^[A-Z]{12}[0-9]{3}$\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n ○\n ○\n ○\n —\n \n \n #4CA64C66\n 2\n \n \n \n\n col_vals_gt\n \n \n \n \n \n \n\n \n col_vals_gt()\n \n session_duration\n 5\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 19820.99\n 180.01\n ○\n ○\n ○\n CSV\n \n \n #4CA64C66\n 3\n \n \n \n\n col_vals_gte\n \n \n \n \n \n \n\n \n col_vals_ge()\n \n item_revenue\n 0.02\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 19410.97\n 590.03\n ○\n ○\n ○\n CSV\n \n \n #4CA64C\n 4\n \n \n \n\n col_vals_in_set\n \n \n \n \n \n \n\n \n col_vals_in_set()\n \n item_type\n iap, ad\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n ○\n ○\n ○\n —\n \n \n #4CA64C66\n 5\n \n \n \n\n col_vals_in_set\n \n \n \n \n \n \n\n \n col_vals_in_set()\n \n acquisition\n google, facebook, organic, crosspromo, other_campaign\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 19750.99\n 250.01\n ○\n ○\n ○\n CSV\n \n \n #FFBF00\n 6\n \n \n \n\n col_vals_not_in_set\n \n \n \n \n \n \n \n \n\n \n col_vals_not_in_set()\n \n country\n Mongolia, Germany\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 17750.89\n 2250.11\n ●\n ○\n ○\n CSV\n \n \n #4CA64C\n 7\n \n \n \n\n col_vals_between\n \n \n \n \n \n \n\n \n col_vals_between()\n \n session_duration\n [10, 50]\n \n \n \n \n \n \n \n \n \n \n\n ✓\n 1\n 11.00\n 00.00\n ○\n ○\n ○\n —\n \n \n #4CA64C66\n 8\n \n \n \n\n rows_distinct\n \n \n \n \n \n \n \n \n \n \n\n \n rows_distinct()\n \n player_id, session_id, time\n —\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 19780.99\n 220.01\n ○\n ○\n ○\n —\n \n \n #4CA64C\n 9\n \n \n \n\n row_count_match\n \n \n \n \n \n \n \n \n \n \n \n\n \n row_count_match()\n \n —\n 2000\n \n \n \n \n \n \n \n \n\n ✓\n 1\n 11.00\n 00.00\n ○\n ○\n ○\n —\n \n \n #4CA64C\n 10\n \n \n \n\n col_count_match\n \n \n \n \n \n \n \n \n \n \n \n\n \n col_count_match()\n \n —\n 11\n \n \n \n \n \n \n \n \n\n ✓\n 1\n 11.00\n 00.00\n ○\n ○\n ○\n —\n \n \n #4CA64C\n 11\n \n \n \n\n col_vals_not_null\n \n \n \n \n \n \n \n \n\n \n col_vals_not_null()\n \n item_type\n —\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n ○\n ○\n ○\n —\n \n \n #4CA64C\n 12\n \n \n \n\n col_vals_not_null\n \n \n \n \n \n \n \n \n\n \n col_vals_not_null()\n \n item_name\n —\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n ○\n ○\n ○\n —\n \n \n #4CA64C\n 13\n \n \n \n\n col_vals_not_null\n \n \n \n \n \n \n \n \n\n \n col_vals_not_null()\n \n item_revenue\n —\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n ○\n ○\n ○\n —\n \n \n #4CA64C\n 14\n \n \n \n\n col_exists\n \n \n \n \n \n \n \n\n \n col_exists()\n \n start_day\n —\n \n \n \n \n \n \n \n \n\n ✓\n 1\n 11.00\n 00.00\n ○\n ○\n ○\n —\n \n\n \n \n \n 2025-02-11 05:26:32 UTC< 1 s2025-02-11 05:26:32 UTC\n \n\n\n\n\n\n\n \n\n\nimport pointblank as pb\nimport polars as pl\nimport narwhals as nw\n\nvalidation = (\n pb.Validate(\n data=pb.load_dataset(dataset=\"game_revenue\", tbl_type=\"polars\"),\n tbl_name=\"game_revenue\",\n label=\"Comprehensive validation example\",\n thresholds=pb.Thresholds(warn_at=0.10, stop_at=0.25, notify_at=0.35),\n )\n .col_vals_regex(columns=\"player_id\", pattern=r\"^[A-Z]{12}[0-9]{3}$\") # STEP 1\n .col_vals_gt(columns=\"session_duration\", value=5) # STEP 2\n .col_vals_ge(columns=\"item_revenue\", value=0.02) # STEP 3\n .col_vals_in_set(columns=\"item_type\", set=[\"iap\", \"ad\"]) # STEP 4\n .col_vals_in_set( # STEP 5\n columns=\"acquisition\",\n set=[\"google\", \"facebook\", \"organic\", \"crosspromo\", \"other_campaign\"]\n )\n .col_vals_not_in_set(columns=\"country\", set=[\"Mongolia\", \"Germany\"]) # STEP 6\n .col_vals_between( # STEP 7\n columns=\"session_duration\",\n left=10, right=50,\n pre = lambda df: df.select(pl.median(\"session_duration\"))\n )\n .rows_distinct(columns_subset=[\"player_id\", \"session_id\", \"time\"]) # STEP 8\n .row_count_match(count=2000) # STEP 9\n .col_count_match(count=11) # STEP 10\n .col_vals_not_null(columns=pb.starts_with(\"item\")) # STEPS 11-13\n .col_exists(columns=\"start_day\") # STEP 14\n .interrogate()\n)\n\nvalidation\n\n\nPreview of Input Table\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PolarsRows2000Columns11\n \n\n \n player_idString\n session_idString\n session_startDatetime\n timeDatetime\n item_typeString\n item_nameString\n item_revenueFloat64\n session_durationFloat64\n start_dayDate\n acquisitionString\n countryString\n\n\n\n \n 1\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:31:27+00:00\n iap\n offer2\n 8.99\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 2\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:36:57+00:00\n iap\n gems3\n 22.49\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 3\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:37:45+00:00\n iap\n gold7\n 107.99\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 4\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:42:33+00:00\n ad\n ad_20sec\n 0.76\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 5\n ECPANOIXLZHF896\n ECPANOIXLZHF896-hdu9jkls\n 2015-01-01 11:50:02+00:00\n 2015-01-01 11:55:20+00:00\n ad\n ad_5sec\n 0.03\n 35.2\n 2015-01-01\n google\n Germany\n \n \n 6\n ECPANOIXLZHF896\n ECPANOIXLZHF896-hdu9jkls\n 2015-01-01 11:50:02+00:00\n 2015-01-01 12:08:56+00:00\n ad\n ad_10sec\n 0.07\n 35.2\n 2015-01-01\n google\n Germany\n \n \n 7\n ECPANOIXLZHF896\n ECPANOIXLZHF896-hdu9jkls\n 2015-01-01 11:50:02+00:00\n 2015-01-01 12:14:08+00:00\n ad\n ad_10sec\n 0.08\n 35.2\n 2015-01-01\n google\n Germany\n \n \n 8\n ECPANOIXLZHF896\n ECPANOIXLZHF896-hdu9jkls\n 2015-01-01 11:50:02+00:00\n 2015-01-01 12:21:44+00:00\n ad\n ad_30sec\n 1.17\n 35.2\n 2015-01-01\n google\n Germany\n \n \n 9\n ECPANOIXLZHF896\n ECPANOIXLZHF896-hdu9jkls\n 2015-01-01 11:50:02+00:00\n 2015-01-01 12:24:20+00:00\n ad\n ad_10sec\n 0.14\n 35.2\n 2015-01-01\n google\n Germany\n \n \n 10\n FXWUORGYNJAE271\n FXWUORGYNJAE271-et7bs639\n 2015-01-01 15:17:18+00:00\n 2015-01-01 15:19:36+00:00\n ad\n ad_5sec\n 0.08\n 30.7\n 2015-01-01\n organic\n Canada\n \n \n 1991\n VPNRYLMBKJGT925\n VPNRYLMBKJGT925-vt26q9gb\n 2015-01-21 01:07:24+00:00\n 2015-01-21 01:26:12+00:00\n ad\n ad_survey\n 0.72\n 24.9\n 2015-01-21\n other_campaign\n Germany\n \n \n 1992\n JVBZCPKXHFMU491\n JVBZCPKXHFMU491-wvi6hs2t\n 2015-01-21 01:49:36+00:00\n 2015-01-21 01:53:36+00:00\n iap\n gold6\n 41.99\n 7.1\n 2015-01-07\n organic\n United States\n \n \n 1993\n JVBZCPKXHFMU491\n JVBZCPKXHFMU491-wvi6hs2t\n 2015-01-21 01:49:36+00:00\n 2015-01-21 01:55:42+00:00\n iap\n gems3\n 17.49\n 7.1\n 2015-01-07\n organic\n United States\n \n \n 1994\n NAOJRDMCSEBI281\n NAOJRDMCSEBI281-j2vs9ilp\n 2015-01-21 01:57:50+00:00\n 2015-01-21 02:01:20+00:00\n ad\n ad_playable\n 1.116\n 25.8\n 2015-01-11\n organic\n Norway\n \n \n 1995\n NAOJRDMCSEBI281\n NAOJRDMCSEBI281-j2vs9ilp\n 2015-01-21 01:57:50+00:00\n 2015-01-21 02:02:14+00:00\n ad\n ad_15sec\n 0.225\n 25.8\n 2015-01-11\n organic\n Norway\n \n \n 1996\n NAOJRDMCSEBI281\n NAOJRDMCSEBI281-j2vs9ilp\n 2015-01-21 01:57:50+00:00\n 2015-01-21 02:02:50+00:00\n ad\n ad_survey\n 1.332\n 25.8\n 2015-01-11\n organic\n Norway\n \n \n 1997\n NAOJRDMCSEBI281\n NAOJRDMCSEBI281-j2vs9ilp\n 2015-01-21 01:57:50+00:00\n 2015-01-21 02:22:14+00:00\n ad\n ad_survey\n 1.35\n 25.8\n 2015-01-11\n organic\n Norway\n \n \n 1998\n RMOSWHJGELCI675\n RMOSWHJGELCI675-vbhcsmtr\n 2015-01-21 02:39:48+00:00\n 2015-01-21 02:40:00+00:00\n ad\n ad_5sec\n 0.03\n 8.4\n 2015-01-10\n other_campaign\n France\n \n \n 1999\n RMOSWHJGELCI675\n RMOSWHJGELCI675-vbhcsmtr\n 2015-01-21 02:39:48+00:00\n 2015-01-21 02:47:12+00:00\n iap\n offer5\n 26.09\n 8.4\n 2015-01-10\n other_campaign\n France\n \n \n 2000\n GJCXNTWEBIPQ369\n GJCXNTWEBIPQ369-9elq67md\n 2015-01-21 03:59:23+00:00\n 2015-01-21 04:06:29+00:00\n ad\n ad_5sec\n 0.12\n 18.5\n 2015-01-14\n organic\n United States"
},
{
"objectID": "demos/06-step-report-schema-check/index.html",
"href": "demos/06-step-report-schema-check/index.html",
"title": "pointblank",
"section": "",
- "text": "Step Report: Schema Check\nWhen a schema doesn’t match, a step report gives you the details.\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-10|22:08:55DuckDB\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C66\n 1\n \n \n \n\n col_schema_match\n \n \n \n \n \n \n \n \n \n \n \n\n \n col_schema_match()\n \n —\n SCHEMA\n \n \n \n \n \n \n \n \n\n ✓\n 1\n 00.00\n 11.00\n —\n —\n —\n —\n \n\n \n \n \n 2025-02-10 22:08:55 UTC< 1 s2025-02-10 22:08:55 UTC\n \n\n\n\n\n\n\n \n\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n\n\n\n\n \n Report for Validation Step 1 ✗\n \n \n COLUMN SCHEMA MATCHCOMPLETEIN ORDERCOLUMN ≠ columnDTYPE ≠ dtypefloat ≠ float64\n \n\n \n TARGET\n \n \n EXPECTED\n \n\n\n \n COLUMN\n DTYPE\n \n COLUMN\n \n DTYPE\n \n\n\n\n \n 1\n date_time\n timestamp(6)\n 1\n date_time\n ✓\n timestamp\n ✗\n \n \n 2\n date\n date\n 2\n dates\n ✗\n date\n —\n \n \n 3\n a\n int64\n 3\n a\n ✓\n int64\n ✓\n \n \n 4\n b\n string\n 4\n b\n ✓\n —\n \n \n \n 5\n c\n int64\n 5\n c\n ✓\n —\n \n \n \n 6\n d\n float64\n 6\n d\n ✓\n float64\n ✓\n \n \n 7\n e\n boolean\n 7\n e\n ✓\n bool | boolean\n ✓\n \n \n 8\n f\n string\n 8\n f\n ✓\n str\n ✗\n \n\n \n \n \n Supplied Column Schema:[('date_time', 'timestamp'), ('dates', 'date'), ('a', 'int64'), ('b',), ('c',), ('d', 'float64'), ('e', ['bool', 'boolean']), ('f', 'str')]\n \n\n\n\n\n\n\n \n\n\nimport pointblank as pb\n\n# Create a schema for the target table (`small_table` as a DuckDB table)\nschema = pb.Schema(\n columns=[\n (\"date_time\", \"timestamp\"), # this dtype doesn't match\n (\"dates\", \"date\"), # this column name doesn't match\n (\"a\", \"int64\"),\n (\"b\",), # omit dtype to not check for it\n (\"c\",), # \"\" \"\" \"\" \"\"\n (\"d\", \"float64\"),\n (\"e\", [\"bool\", \"boolean\"]), # try several dtypes (second one matches)\n (\"f\", \"str\"), # this dtype doesn't match\n ]\n)\n\n# Use the `col_schema_match()` validation method to perform a schema check\nvalidation = (\n pb.Validate(data=pb.load_dataset(dataset=\"small_table\", tbl_type=\"duckdb\"))\n .col_schema_match(schema=schema)\n .interrogate()\n)\n\nvalidation\nvalidation.get_step_report(i=1)\n\n\nPreview of Input Table\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n\n\n\n\n \n DuckDBRows13Columns8\n \n\n \n date_timetimestamp\n datedate\n aint64\n bstring\n cint64\n dfloat64\n eboolean\n fstring\n\n\n\n \n 1\n 2016-01-04 11:00:00\n 2016-01-04\n 2\n 1-bcd-345\n 3\n 3423.29\n True\n high\n \n \n 2\n 2016-01-04 00:32:00\n 2016-01-04\n 3\n 5-egh-163\n 8\n 9999.99\n True\n low\n \n \n 3\n 2016-01-05 13:32:00\n 2016-01-05\n 6\n 8-kdg-938\n 3\n 2343.23\n True\n high\n \n \n 4\n 2016-01-06 17:23:00\n 2016-01-06\n 2\n 5-jdo-903\n NULL\n 3892.4\n False\n mid\n \n \n 5\n 2016-01-09 12:36:00\n 2016-01-09\n 8\n 3-ldm-038\n 7\n 283.94\n True\n low\n \n \n 6\n 2016-01-11 06:15:00\n 2016-01-11\n 4\n 2-dhe-923\n 4\n 3291.03\n True\n mid\n \n \n 7\n 2016-01-15 18:46:00\n 2016-01-15\n 7\n 1-knw-093\n 3\n 843.34\n True\n high\n \n \n 8\n 2016-01-17 11:27:00\n 2016-01-17\n 4\n 5-boe-639\n 2\n 1035.64\n False\n low\n \n \n 9\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 10\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 11\n 2016-01-26 20:07:00\n 2016-01-26\n 4\n 2-dmx-010\n 7\n 833.98\n True\n low\n \n \n 12\n 2016-01-28 02:51:00\n 2016-01-28\n 2\n 7-dmx-010\n 8\n 108.34\n False\n low\n \n \n 13\n 2016-01-30 11:23:00\n 2016-01-30\n 1\n 3-dka-303\n NULL\n 2230.09\n True\n high"
+ "text": "Step Report: Schema Check\nWhen a schema doesn’t match, a step report gives you the details.\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-11|05:26:39DuckDB\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C66\n 1\n \n \n \n\n col_schema_match\n \n \n \n \n \n \n \n \n \n \n \n\n \n col_schema_match()\n \n —\n SCHEMA\n \n \n \n \n \n \n \n \n\n ✓\n 1\n 00.00\n 11.00\n —\n —\n —\n —\n \n\n \n \n \n 2025-02-11 05:26:39 UTC< 1 s2025-02-11 05:26:39 UTC\n \n\n\n\n\n\n\n \n\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n\n\n\n\n \n Report for Validation Step 1 ✗\n \n \n COLUMN SCHEMA MATCHCOMPLETEIN ORDERCOLUMN ≠ columnDTYPE ≠ dtypefloat ≠ float64\n \n\n \n TARGET\n \n \n EXPECTED\n \n\n\n \n COLUMN\n DTYPE\n \n COLUMN\n \n DTYPE\n \n\n\n\n \n 1\n date_time\n timestamp(6)\n 1\n date_time\n ✓\n timestamp\n ✗\n \n \n 2\n date\n date\n 2\n dates\n ✗\n date\n —\n \n \n 3\n a\n int64\n 3\n a\n ✓\n int64\n ✓\n \n \n 4\n b\n string\n 4\n b\n ✓\n —\n \n \n \n 5\n c\n int64\n 5\n c\n ✓\n —\n \n \n \n 6\n d\n float64\n 6\n d\n ✓\n float64\n ✓\n \n \n 7\n e\n boolean\n 7\n e\n ✓\n bool | boolean\n ✓\n \n \n 8\n f\n string\n 8\n f\n ✓\n str\n ✗\n \n\n \n \n \n Supplied Column Schema:[('date_time', 'timestamp'), ('dates', 'date'), ('a', 'int64'), ('b',), ('c',), ('d', 'float64'), ('e', ['bool', 'boolean']), ('f', 'str')]\n \n\n\n\n\n\n\n \n\n\nimport pointblank as pb\n\n# Create a schema for the target table (`small_table` as a DuckDB table)\nschema = pb.Schema(\n columns=[\n (\"date_time\", \"timestamp\"), # this dtype doesn't match\n (\"dates\", \"date\"), # this column name doesn't match\n (\"a\", \"int64\"),\n (\"b\",), # omit dtype to not check for it\n (\"c\",), # \"\" \"\" \"\" \"\"\n (\"d\", \"float64\"),\n (\"e\", [\"bool\", \"boolean\"]), # try several dtypes (second one matches)\n (\"f\", \"str\"), # this dtype doesn't match\n ]\n)\n\n# Use the `col_schema_match()` validation method to perform a schema check\nvalidation = (\n pb.Validate(data=pb.load_dataset(dataset=\"small_table\", tbl_type=\"duckdb\"))\n .col_schema_match(schema=schema)\n .interrogate()\n)\n\nvalidation\nvalidation.get_step_report(i=1)\n\n\nPreview of Input Table\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n\n\n\n\n \n DuckDBRows13Columns8\n \n\n \n date_timetimestamp\n datedate\n aint64\n bstring\n cint64\n dfloat64\n eboolean\n fstring\n\n\n\n \n 1\n 2016-01-04 11:00:00\n 2016-01-04\n 2\n 1-bcd-345\n 3\n 3423.29\n True\n high\n \n \n 2\n 2016-01-04 00:32:00\n 2016-01-04\n 3\n 5-egh-163\n 8\n 9999.99\n True\n low\n \n \n 3\n 2016-01-05 13:32:00\n 2016-01-05\n 6\n 8-kdg-938\n 3\n 2343.23\n True\n high\n \n \n 4\n 2016-01-06 17:23:00\n 2016-01-06\n 2\n 5-jdo-903\n NULL\n 3892.4\n False\n mid\n \n \n 5\n 2016-01-09 12:36:00\n 2016-01-09\n 8\n 3-ldm-038\n 7\n 283.94\n True\n low\n \n \n 6\n 2016-01-11 06:15:00\n 2016-01-11\n 4\n 2-dhe-923\n 4\n 3291.03\n True\n mid\n \n \n 7\n 2016-01-15 18:46:00\n 2016-01-15\n 7\n 1-knw-093\n 3\n 843.34\n True\n high\n \n \n 8\n 2016-01-17 11:27:00\n 2016-01-17\n 4\n 5-boe-639\n 2\n 1035.64\n False\n low\n \n \n 9\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 10\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 11\n 2016-01-26 20:07:00\n 2016-01-26\n 4\n 2-dmx-010\n 7\n 833.98\n True\n low\n \n \n 12\n 2016-01-28 02:51:00\n 2016-01-28\n 2\n 7-dmx-010\n 8\n 108.34\n False\n low\n \n \n 13\n 2016-01-30 11:23:00\n 2016-01-30\n 1\n 3-dka-303\n NULL\n 2230.09\n True\n high"
},
{
"objectID": "demos/checks-for-missing/index.html",
"href": "demos/checks-for-missing/index.html",
"title": "pointblank",
"section": "",
- "text": "Checks for Missing Values\nPerform validations that check whether missing/NA/Null values are present.\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-10|22:09:01Polars\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C\n 1\n \n \n \n\n col_vals_not_null\n \n \n \n \n \n \n \n \n\n \n col_vals_not_null()\n \n a\n —\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 2\n \n \n \n\n col_vals_not_null\n \n \n \n \n \n \n \n \n\n \n col_vals_not_null()\n \n b\n —\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C66\n 3\n \n \n \n\n col_vals_not_null\n \n \n \n \n \n \n \n \n\n \n col_vals_not_null()\n \n c\n —\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 110.85\n 20.15\n —\n —\n —\n CSV\n \n \n #4CA64C\n 4\n \n \n \n\n col_vals_not_null\n \n \n \n \n \n \n \n \n\n \n col_vals_not_null()\n \n d\n —\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C66\n 5\n \n \n \n\n col_vals_null\n \n \n \n \n \n \n\n \n col_vals_null()\n \n a\n —\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 00.00\n 131.00\n —\n —\n —\n CSV\n \n\n \n \n \n 2025-02-10 22:09:01 UTC< 1 s2025-02-10 22:09:01 UTC\n \n\n\n\n\n\n\n \n\n\nimport pointblank as pb\n\nvalidation = (\n pb.Validate(\n data=pb.load_dataset(dataset=\"small_table\", tbl_type=\"polars\")\n )\n .col_vals_not_null(columns=\"a\") # expect no Null values\n .col_vals_not_null(columns=\"b\") # \"\" \"\"\n .col_vals_not_null(columns=\"c\") # \"\" \"\"\n .col_vals_not_null(columns=\"d\") # \"\" \"\"\n .col_vals_null(columns=\"a\") # expect all values to be Null\n .interrogate()\n)\n\nvalidation\n\n\nPreview of Input Table\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PolarsRows13Columns8\n \n\n \n date_timeDatetime\n dateDate\n aInt64\n bString\n cInt64\n dFloat64\n eBoolean\n fString\n\n\n\n \n 1\n 2016-01-04 11:00:00\n 2016-01-04\n 2\n 1-bcd-345\n 3\n 3423.29\n True\n high\n \n \n 2\n 2016-01-04 00:32:00\n 2016-01-04\n 3\n 5-egh-163\n 8\n 9999.99\n True\n low\n \n \n 3\n 2016-01-05 13:32:00\n 2016-01-05\n 6\n 8-kdg-938\n 3\n 2343.23\n True\n high\n \n \n 4\n 2016-01-06 17:23:00\n 2016-01-06\n 2\n 5-jdo-903\n None\n 3892.4\n False\n mid\n \n \n 5\n 2016-01-09 12:36:00\n 2016-01-09\n 8\n 3-ldm-038\n 7\n 283.94\n True\n low\n \n \n 6\n 2016-01-11 06:15:00\n 2016-01-11\n 4\n 2-dhe-923\n 4\n 3291.03\n True\n mid\n \n \n 7\n 2016-01-15 18:46:00\n 2016-01-15\n 7\n 1-knw-093\n 3\n 843.34\n True\n high\n \n \n 8\n 2016-01-17 11:27:00\n 2016-01-17\n 4\n 5-boe-639\n 2\n 1035.64\n False\n low\n \n \n 9\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 10\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 11\n 2016-01-26 20:07:00\n 2016-01-26\n 4\n 2-dmx-010\n 7\n 833.98\n True\n low\n \n \n 12\n 2016-01-28 02:51:00\n 2016-01-28\n 2\n 7-dmx-010\n 8\n 108.34\n False\n low\n \n \n 13\n 2016-01-30 11:23:00\n 2016-01-30\n 1\n 3-dka-303\n None\n 2230.09\n True\n high"
+ "text": "Checks for Missing Values\nPerform validations that check whether missing/NA/Null values are present.\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-11|05:26:45Polars\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C\n 1\n \n \n \n\n col_vals_not_null\n \n \n \n \n \n \n \n \n\n \n col_vals_not_null()\n \n a\n —\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 2\n \n \n \n\n col_vals_not_null\n \n \n \n \n \n \n \n \n\n \n col_vals_not_null()\n \n b\n —\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C66\n 3\n \n \n \n\n col_vals_not_null\n \n \n \n \n \n \n \n \n\n \n col_vals_not_null()\n \n c\n —\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 110.85\n 20.15\n —\n —\n —\n CSV\n \n \n #4CA64C\n 4\n \n \n \n\n col_vals_not_null\n \n \n \n \n \n \n \n \n\n \n col_vals_not_null()\n \n d\n —\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C66\n 5\n \n \n \n\n col_vals_null\n \n \n \n \n \n \n\n \n col_vals_null()\n \n a\n —\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 00.00\n 131.00\n —\n —\n —\n CSV\n \n\n \n \n \n 2025-02-11 05:26:45 UTC< 1 s2025-02-11 05:26:45 UTC\n \n\n\n\n\n\n\n \n\n\nimport pointblank as pb\n\nvalidation = (\n pb.Validate(\n data=pb.load_dataset(dataset=\"small_table\", tbl_type=\"polars\")\n )\n .col_vals_not_null(columns=\"a\") # expect no Null values\n .col_vals_not_null(columns=\"b\") # \"\" \"\"\n .col_vals_not_null(columns=\"c\") # \"\" \"\"\n .col_vals_not_null(columns=\"d\") # \"\" \"\"\n .col_vals_null(columns=\"a\") # expect all values to be Null\n .interrogate()\n)\n\nvalidation\n\n\nPreview of Input Table\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PolarsRows13Columns8\n \n\n \n date_timeDatetime\n dateDate\n aInt64\n bString\n cInt64\n dFloat64\n eBoolean\n fString\n\n\n\n \n 1\n 2016-01-04 11:00:00\n 2016-01-04\n 2\n 1-bcd-345\n 3\n 3423.29\n True\n high\n \n \n 2\n 2016-01-04 00:32:00\n 2016-01-04\n 3\n 5-egh-163\n 8\n 9999.99\n True\n low\n \n \n 3\n 2016-01-05 13:32:00\n 2016-01-05\n 6\n 8-kdg-938\n 3\n 2343.23\n True\n high\n \n \n 4\n 2016-01-06 17:23:00\n 2016-01-06\n 2\n 5-jdo-903\n None\n 3892.4\n False\n mid\n \n \n 5\n 2016-01-09 12:36:00\n 2016-01-09\n 8\n 3-ldm-038\n 7\n 283.94\n True\n low\n \n \n 6\n 2016-01-11 06:15:00\n 2016-01-11\n 4\n 2-dhe-923\n 4\n 3291.03\n True\n mid\n \n \n 7\n 2016-01-15 18:46:00\n 2016-01-15\n 7\n 1-knw-093\n 3\n 843.34\n True\n high\n \n \n 8\n 2016-01-17 11:27:00\n 2016-01-17\n 4\n 5-boe-639\n 2\n 1035.64\n False\n low\n \n \n 9\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 10\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 11\n 2016-01-26 20:07:00\n 2016-01-26\n 4\n 2-dmx-010\n 7\n 833.98\n True\n low\n \n \n 12\n 2016-01-28 02:51:00\n 2016-01-28\n 2\n 7-dmx-010\n 8\n 108.34\n False\n low\n \n \n 13\n 2016-01-30 11:23:00\n 2016-01-30\n 1\n 3-dka-303\n None\n 2230.09\n True\n high"
},
{
"objectID": "demos/index.html",
@@ -1040,21 +1040,21 @@
"href": "demos/03-data-extracts/index.html",
"title": "pointblank",
"section": "",
- "text": "Data Extracts\nPulling out data extracts that highlight rows with validation failures.\n\nValidation with failures at Step 2:\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n Validation with test unit failures available as an extractPolarsgame_revenue\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C\n 1\n \n \n \n\n col_vals_gt\n \n \n \n \n \n \n\n \n col_vals_gt()\n \n item_revenue\n 0\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C66\n 2\n \n \n \n\n col_vals_gte\n \n \n \n \n \n \n\n \n col_vals_ge()\n \n session_duration\n 5\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 19860.99\n 140.01\n —\n —\n —\n CSV\n \n\n \n \n \n 2025-02-10 22:09:10 UTC< 1 s2025-02-10 22:09:10 UTC\n \n\n\n\n\n\n\n \n\n\n\n\nExtract from Step 2 (which has 14 failing test units):\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PolarsRows14Columns12\n \n\n \n player_idString\n session_idString\n session_startDatetime\n timeDatetime\n item_typeString\n item_nameString\n item_revenueFloat64\n session_durationFloat64\n start_dayDate\n acquisitionString\n countryString\n\n\n\n \n 549\n QNLVRDEOXFYJ892\n QNLVRDEOXFYJ892-lz5fmr6k\n 2015-01-10 16:44:17+00:00\n 2015-01-10 16:45:29+00:00\n iap\n gold3\n 3.49\n 3.7\n 2015-01-09\n crosspromo\n Australia\n \n \n 663\n GFLYJHAPMZWD631\n GFLYJHAPMZWD631-i2v1bl7a\n 2015-01-11 16:13:24+00:00\n 2015-01-11 16:14:54+00:00\n iap\n gems2\n 3.99\n 3.6\n 2015-01-09\n organic\n India\n \n \n 772\n BFNLURISJXTH647\n BFNLURISJXTH647-6o5hx27z\n 2015-01-12 17:37:39+00:00\n 2015-01-12 17:39:27+00:00\n iap\n offer5\n 11.59\n 4.1\n 2015-01-10\n organic\n India\n \n \n 773\n BFNLURISJXTH647\n BFNLURISJXTH647-6o5hx27z\n 2015-01-12 17:37:39+00:00\n 2015-01-12 17:41:45+00:00\n iap\n gems3\n 9.99\n 4.1\n 2015-01-10\n organic\n India\n \n \n 908\n KILWZYHRSJEG316\n KILWZYHRSJEG316-uke7dhqj\n 2015-01-13 22:16:29+00:00\n 2015-01-13 22:17:35+00:00\n iap\n offer2\n 10.99\n 3.2\n 2015-01-04\n organic\n Denmark\n \n \n 1037\n JUBDVFHCNQWT198\n JUBDVFHCNQWT198-9h4xs2pb\n 2015-01-14 16:08:25+00:00\n 2015-01-14 16:08:43+00:00\n iap\n offer5\n 8.69\n 3.3\n 2015-01-14\n organic\n Philippines\n \n \n 1038\n JUBDVFHCNQWT198\n JUBDVFHCNQWT198-9h4xs2pb\n 2015-01-14 16:08:25+00:00\n 2015-01-14 16:11:01+00:00\n iap\n offer4\n 5.99\n 3.3\n 2015-01-14\n organic\n Philippines\n \n \n 1455\n GJCXNTWEBIPQ369\n GJCXNTWEBIPQ369-46cdjzy7\n 2015-01-17 11:25:25+00:00\n 2015-01-17 11:28:01+00:00\n iap\n offer4\n 13.99\n 4.6\n 2015-01-14\n organic\n United States\n \n \n 1516\n OMCVUAIKSDTR651\n OMCVUAIKSDTR651-yso9e1b2\n 2015-01-17 20:58:34+00:00\n 2015-01-17 21:01:34+00:00\n iap\n offer3\n 10.49\n 4.2\n 2015-01-07\n other_campaign\n United States\n \n \n 1517\n OMCVUAIKSDTR651\n OMCVUAIKSDTR651-yso9e1b2\n 2015-01-17 20:58:34+00:00\n 2015-01-17 21:02:34+00:00\n iap\n offer5\n 20.29\n 4.2\n 2015-01-07\n other_campaign\n United States\n \n \n 1913\n MTCIWKOVASYP925\n MTCIWKOVASYP925-1q3xvfmp\n 2015-01-20 12:34:43+00:00\n 2015-01-20 12:35:37+00:00\n iap\n offer5\n 26.09\n 3.9\n 2015-01-14\n organic\n Germany\n \n \n 1914\n MTCIWKOVASYP925\n MTCIWKOVASYP925-1q3xvfmp\n 2015-01-20 12:34:43+00:00\n 2015-01-20 12:37:25+00:00\n iap\n gold2\n 1.79\n 3.9\n 2015-01-14\n organic\n Germany\n \n \n 1919\n BFNLURISJXTH647\n BFNLURISJXTH647-len6vujd\n 2015-01-20 14:09:51+00:00\n 2015-01-20 14:10:03+00:00\n iap\n gold7\n 47.99\n 4.5\n 2015-01-10\n organic\n India\n \n \n 1920\n BFNLURISJXTH647\n BFNLURISJXTH647-len6vujd\n 2015-01-20 14:09:51+00:00\n 2015-01-20 14:14:21+00:00\n iap\n gold6\n 23.99\n 4.5\n 2015-01-10\n organic\n India\n \n\n\n\n\n\n\n \n\n\nimport pointblank as pb\n\nvalidation = (\n pb.Validate(\n data=pb.load_dataset(dataset=\"game_revenue\"),\n tbl_name=\"game_revenue\",\n label=\"Validation with test unit failures available as an extract\"\n )\n .col_vals_gt(columns=\"item_revenue\", value=0) # STEP 1: no test unit failures\n .col_vals_ge(columns=\"session_duration\", value=5) # STEP 2: 14 test unit failures -> extract\n .interrogate()\n)\npb.preview(validation.get_data_extracts(i=2, frame=True), n_head=20, n_tail=20)\n\n\nPreview of Input Table\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PolarsRows2000Columns11\n \n\n \n player_idString\n session_idString\n session_startDatetime\n timeDatetime\n item_typeString\n item_nameString\n item_revenueFloat64\n session_durationFloat64\n start_dayDate\n acquisitionString\n countryString\n\n\n\n \n 1\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:31:27+00:00\n iap\n offer2\n 8.99\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 2\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:36:57+00:00\n iap\n gems3\n 22.49\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 3\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:37:45+00:00\n iap\n gold7\n 107.99\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 4\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:42:33+00:00\n ad\n ad_20sec\n 0.76\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 5\n ECPANOIXLZHF896\n ECPANOIXLZHF896-hdu9jkls\n 2015-01-01 11:50:02+00:00\n 2015-01-01 11:55:20+00:00\n ad\n ad_5sec\n 0.03\n 35.2\n 2015-01-01\n google\n Germany\n \n \n 1996\n NAOJRDMCSEBI281\n NAOJRDMCSEBI281-j2vs9ilp\n 2015-01-21 01:57:50+00:00\n 2015-01-21 02:02:50+00:00\n ad\n ad_survey\n 1.332\n 25.8\n 2015-01-11\n organic\n Norway\n \n \n 1997\n NAOJRDMCSEBI281\n NAOJRDMCSEBI281-j2vs9ilp\n 2015-01-21 01:57:50+00:00\n 2015-01-21 02:22:14+00:00\n ad\n ad_survey\n 1.35\n 25.8\n 2015-01-11\n organic\n Norway\n \n \n 1998\n RMOSWHJGELCI675\n RMOSWHJGELCI675-vbhcsmtr\n 2015-01-21 02:39:48+00:00\n 2015-01-21 02:40:00+00:00\n ad\n ad_5sec\n 0.03\n 8.4\n 2015-01-10\n other_campaign\n France\n \n \n 1999\n RMOSWHJGELCI675\n RMOSWHJGELCI675-vbhcsmtr\n 2015-01-21 02:39:48+00:00\n 2015-01-21 02:47:12+00:00\n iap\n offer5\n 26.09\n 8.4\n 2015-01-10\n other_campaign\n France\n \n \n 2000\n GJCXNTWEBIPQ369\n GJCXNTWEBIPQ369-9elq67md\n 2015-01-21 03:59:23+00:00\n 2015-01-21 04:06:29+00:00\n ad\n ad_5sec\n 0.12\n 18.5\n 2015-01-14\n organic\n United States"
+ "text": "Data Extracts\nPulling out data extracts that highlight rows with validation failures.\n\nValidation with failures at Step 2:\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n Validation with test unit failures available as an extractPolarsgame_revenue\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C\n 1\n \n \n \n\n col_vals_gt\n \n \n \n \n \n \n\n \n col_vals_gt()\n \n item_revenue\n 0\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 20001.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C66\n 2\n \n \n \n\n col_vals_gte\n \n \n \n \n \n \n\n \n col_vals_ge()\n \n session_duration\n 5\n \n \n \n \n \n \n \n \n\n ✓\n 2000\n 19860.99\n 140.01\n —\n —\n —\n CSV\n \n\n \n \n \n 2025-02-11 05:26:55 UTC< 1 s2025-02-11 05:26:55 UTC\n \n\n\n\n\n\n\n \n\n\n\n\nExtract from Step 2 (which has 14 failing test units):\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PolarsRows14Columns12\n \n\n \n player_idString\n session_idString\n session_startDatetime\n timeDatetime\n item_typeString\n item_nameString\n item_revenueFloat64\n session_durationFloat64\n start_dayDate\n acquisitionString\n countryString\n\n\n\n \n 549\n QNLVRDEOXFYJ892\n QNLVRDEOXFYJ892-lz5fmr6k\n 2015-01-10 16:44:17+00:00\n 2015-01-10 16:45:29+00:00\n iap\n gold3\n 3.49\n 3.7\n 2015-01-09\n crosspromo\n Australia\n \n \n 663\n GFLYJHAPMZWD631\n GFLYJHAPMZWD631-i2v1bl7a\n 2015-01-11 16:13:24+00:00\n 2015-01-11 16:14:54+00:00\n iap\n gems2\n 3.99\n 3.6\n 2015-01-09\n organic\n India\n \n \n 772\n BFNLURISJXTH647\n BFNLURISJXTH647-6o5hx27z\n 2015-01-12 17:37:39+00:00\n 2015-01-12 17:39:27+00:00\n iap\n offer5\n 11.59\n 4.1\n 2015-01-10\n organic\n India\n \n \n 773\n BFNLURISJXTH647\n BFNLURISJXTH647-6o5hx27z\n 2015-01-12 17:37:39+00:00\n 2015-01-12 17:41:45+00:00\n iap\n gems3\n 9.99\n 4.1\n 2015-01-10\n organic\n India\n \n \n 908\n KILWZYHRSJEG316\n KILWZYHRSJEG316-uke7dhqj\n 2015-01-13 22:16:29+00:00\n 2015-01-13 22:17:35+00:00\n iap\n offer2\n 10.99\n 3.2\n 2015-01-04\n organic\n Denmark\n \n \n 1037\n JUBDVFHCNQWT198\n JUBDVFHCNQWT198-9h4xs2pb\n 2015-01-14 16:08:25+00:00\n 2015-01-14 16:08:43+00:00\n iap\n offer5\n 8.69\n 3.3\n 2015-01-14\n organic\n Philippines\n \n \n 1038\n JUBDVFHCNQWT198\n JUBDVFHCNQWT198-9h4xs2pb\n 2015-01-14 16:08:25+00:00\n 2015-01-14 16:11:01+00:00\n iap\n offer4\n 5.99\n 3.3\n 2015-01-14\n organic\n Philippines\n \n \n 1455\n GJCXNTWEBIPQ369\n GJCXNTWEBIPQ369-46cdjzy7\n 2015-01-17 11:25:25+00:00\n 2015-01-17 11:28:01+00:00\n iap\n offer4\n 13.99\n 4.6\n 2015-01-14\n organic\n United States\n \n \n 1516\n OMCVUAIKSDTR651\n OMCVUAIKSDTR651-yso9e1b2\n 2015-01-17 20:58:34+00:00\n 2015-01-17 21:01:34+00:00\n iap\n offer3\n 10.49\n 4.2\n 2015-01-07\n other_campaign\n United States\n \n \n 1517\n OMCVUAIKSDTR651\n OMCVUAIKSDTR651-yso9e1b2\n 2015-01-17 20:58:34+00:00\n 2015-01-17 21:02:34+00:00\n iap\n offer5\n 20.29\n 4.2\n 2015-01-07\n other_campaign\n United States\n \n \n 1913\n MTCIWKOVASYP925\n MTCIWKOVASYP925-1q3xvfmp\n 2015-01-20 12:34:43+00:00\n 2015-01-20 12:35:37+00:00\n iap\n offer5\n 26.09\n 3.9\n 2015-01-14\n organic\n Germany\n \n \n 1914\n MTCIWKOVASYP925\n MTCIWKOVASYP925-1q3xvfmp\n 2015-01-20 12:34:43+00:00\n 2015-01-20 12:37:25+00:00\n iap\n gold2\n 1.79\n 3.9\n 2015-01-14\n organic\n Germany\n \n \n 1919\n BFNLURISJXTH647\n BFNLURISJXTH647-len6vujd\n 2015-01-20 14:09:51+00:00\n 2015-01-20 14:10:03+00:00\n iap\n gold7\n 47.99\n 4.5\n 2015-01-10\n organic\n India\n \n \n 1920\n BFNLURISJXTH647\n BFNLURISJXTH647-len6vujd\n 2015-01-20 14:09:51+00:00\n 2015-01-20 14:14:21+00:00\n iap\n gold6\n 23.99\n 4.5\n 2015-01-10\n organic\n India\n \n\n\n\n\n\n\n \n\n\nimport pointblank as pb\n\nvalidation = (\n pb.Validate(\n data=pb.load_dataset(dataset=\"game_revenue\"),\n tbl_name=\"game_revenue\",\n label=\"Validation with test unit failures available as an extract\"\n )\n .col_vals_gt(columns=\"item_revenue\", value=0) # STEP 1: no test unit failures\n .col_vals_ge(columns=\"session_duration\", value=5) # STEP 2: 14 test unit failures -> extract\n .interrogate()\n)\npb.preview(validation.get_data_extracts(i=2, frame=True), n_head=20, n_tail=20)\n\n\nPreview of Input Table\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PolarsRows2000Columns11\n \n\n \n player_idString\n session_idString\n session_startDatetime\n timeDatetime\n item_typeString\n item_nameString\n item_revenueFloat64\n session_durationFloat64\n start_dayDate\n acquisitionString\n countryString\n\n\n\n \n 1\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:31:27+00:00\n iap\n offer2\n 8.99\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 2\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:36:57+00:00\n iap\n gems3\n 22.49\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 3\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:37:45+00:00\n iap\n gold7\n 107.99\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 4\n ECPANOIXLZHF896\n ECPANOIXLZHF896-eol2j8bs\n 2015-01-01 01:31:03+00:00\n 2015-01-01 01:42:33+00:00\n ad\n ad_20sec\n 0.76\n 16.3\n 2015-01-01\n google\n Germany\n \n \n 5\n ECPANOIXLZHF896\n ECPANOIXLZHF896-hdu9jkls\n 2015-01-01 11:50:02+00:00\n 2015-01-01 11:55:20+00:00\n ad\n ad_5sec\n 0.03\n 35.2\n 2015-01-01\n google\n Germany\n \n \n 1996\n NAOJRDMCSEBI281\n NAOJRDMCSEBI281-j2vs9ilp\n 2015-01-21 01:57:50+00:00\n 2015-01-21 02:02:50+00:00\n ad\n ad_survey\n 1.332\n 25.8\n 2015-01-11\n organic\n Norway\n \n \n 1997\n NAOJRDMCSEBI281\n NAOJRDMCSEBI281-j2vs9ilp\n 2015-01-21 01:57:50+00:00\n 2015-01-21 02:22:14+00:00\n ad\n ad_survey\n 1.35\n 25.8\n 2015-01-11\n organic\n Norway\n \n \n 1998\n RMOSWHJGELCI675\n RMOSWHJGELCI675-vbhcsmtr\n 2015-01-21 02:39:48+00:00\n 2015-01-21 02:40:00+00:00\n ad\n ad_5sec\n 0.03\n 8.4\n 2015-01-10\n other_campaign\n France\n \n \n 1999\n RMOSWHJGELCI675\n RMOSWHJGELCI675-vbhcsmtr\n 2015-01-21 02:39:48+00:00\n 2015-01-21 02:47:12+00:00\n iap\n offer5\n 26.09\n 8.4\n 2015-01-10\n other_campaign\n France\n \n \n 2000\n GJCXNTWEBIPQ369\n GJCXNTWEBIPQ369-9elq67md\n 2015-01-21 03:59:23+00:00\n 2015-01-21 04:06:29+00:00\n ad\n ad_5sec\n 0.12\n 18.5\n 2015-01-14\n organic\n United States"
},
{
"objectID": "demos/expect-no-duplicate-rows/index.html",
"href": "demos/expect-no-duplicate-rows/index.html",
"title": "pointblank",
"section": "",
- "text": "Expect No Duplicate Rows\nWe can check for duplicate rows in the table with rows_distinct().\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-10|22:09:16Polars\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C66\n 1\n \n \n \n\n rows_distinct\n \n \n \n \n \n \n \n \n \n \n\n \n rows_distinct()\n \n ALL COLUMNS\n —\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 110.85\n 20.15\n —\n —\n —\n —\n \n\n \n \n \n 2025-02-10 22:09:16 UTC< 1 s2025-02-10 22:09:16 UTC\n \n\n\n\n\n\n\n \n\n\nimport pointblank as pb\n\nvalidation = (\n pb.Validate(\n data=pb.load_dataset(dataset=\"small_table\", tbl_type=\"polars\")\n )\n .rows_distinct() # expect no duplicate rows\n .interrogate()\n)\n\nvalidation\n\n\nPreview of Input Table\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PolarsRows13Columns8\n \n\n \n date_timeDatetime\n dateDate\n aInt64\n bString\n cInt64\n dFloat64\n eBoolean\n fString\n\n\n\n \n 1\n 2016-01-04 11:00:00\n 2016-01-04\n 2\n 1-bcd-345\n 3\n 3423.29\n True\n high\n \n \n 2\n 2016-01-04 00:32:00\n 2016-01-04\n 3\n 5-egh-163\n 8\n 9999.99\n True\n low\n \n \n 3\n 2016-01-05 13:32:00\n 2016-01-05\n 6\n 8-kdg-938\n 3\n 2343.23\n True\n high\n \n \n 4\n 2016-01-06 17:23:00\n 2016-01-06\n 2\n 5-jdo-903\n None\n 3892.4\n False\n mid\n \n \n 5\n 2016-01-09 12:36:00\n 2016-01-09\n 8\n 3-ldm-038\n 7\n 283.94\n True\n low\n \n \n 6\n 2016-01-11 06:15:00\n 2016-01-11\n 4\n 2-dhe-923\n 4\n 3291.03\n True\n mid\n \n \n 7\n 2016-01-15 18:46:00\n 2016-01-15\n 7\n 1-knw-093\n 3\n 843.34\n True\n high\n \n \n 8\n 2016-01-17 11:27:00\n 2016-01-17\n 4\n 5-boe-639\n 2\n 1035.64\n False\n low\n \n \n 9\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 10\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 11\n 2016-01-26 20:07:00\n 2016-01-26\n 4\n 2-dmx-010\n 7\n 833.98\n True\n low\n \n \n 12\n 2016-01-28 02:51:00\n 2016-01-28\n 2\n 7-dmx-010\n 8\n 108.34\n False\n low\n \n \n 13\n 2016-01-30 11:23:00\n 2016-01-30\n 1\n 3-dka-303\n None\n 2230.09\n True\n high"
+ "text": "Expect No Duplicate Rows\nWe can check for duplicate rows in the table with rows_distinct().\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-11|05:27:01Polars\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C66\n 1\n \n \n \n\n rows_distinct\n \n \n \n \n \n \n \n \n \n \n\n \n rows_distinct()\n \n ALL COLUMNS\n —\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 110.85\n 20.15\n —\n —\n —\n —\n \n\n \n \n \n 2025-02-11 05:27:01 UTC< 1 s2025-02-11 05:27:01 UTC\n \n\n\n\n\n\n\n \n\n\nimport pointblank as pb\n\nvalidation = (\n pb.Validate(\n data=pb.load_dataset(dataset=\"small_table\", tbl_type=\"polars\")\n )\n .rows_distinct() # expect no duplicate rows\n .interrogate()\n)\n\nvalidation\n\n\nPreview of Input Table\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PolarsRows13Columns8\n \n\n \n date_timeDatetime\n dateDate\n aInt64\n bString\n cInt64\n dFloat64\n eBoolean\n fString\n\n\n\n \n 1\n 2016-01-04 11:00:00\n 2016-01-04\n 2\n 1-bcd-345\n 3\n 3423.29\n True\n high\n \n \n 2\n 2016-01-04 00:32:00\n 2016-01-04\n 3\n 5-egh-163\n 8\n 9999.99\n True\n low\n \n \n 3\n 2016-01-05 13:32:00\n 2016-01-05\n 6\n 8-kdg-938\n 3\n 2343.23\n True\n high\n \n \n 4\n 2016-01-06 17:23:00\n 2016-01-06\n 2\n 5-jdo-903\n None\n 3892.4\n False\n mid\n \n \n 5\n 2016-01-09 12:36:00\n 2016-01-09\n 8\n 3-ldm-038\n 7\n 283.94\n True\n low\n \n \n 6\n 2016-01-11 06:15:00\n 2016-01-11\n 4\n 2-dhe-923\n 4\n 3291.03\n True\n mid\n \n \n 7\n 2016-01-15 18:46:00\n 2016-01-15\n 7\n 1-knw-093\n 3\n 843.34\n True\n high\n \n \n 8\n 2016-01-17 11:27:00\n 2016-01-17\n 4\n 5-boe-639\n 2\n 1035.64\n False\n low\n \n \n 9\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 10\n 2016-01-20 04:30:00\n 2016-01-20\n 3\n 5-bce-642\n 9\n 837.93\n False\n high\n \n \n 11\n 2016-01-26 20:07:00\n 2016-01-26\n 4\n 2-dmx-010\n 7\n 833.98\n True\n low\n \n \n 12\n 2016-01-28 02:51:00\n 2016-01-28\n 2\n 7-dmx-010\n 8\n 108.34\n False\n low\n \n \n 13\n 2016-01-30 11:23:00\n 2016-01-30\n 1\n 3-dka-303\n None\n 2230.09\n True\n high"
},
{
"objectID": "demos/col-vals-custom-expr/index.html",
"href": "demos/col-vals-custom-expr/index.html",
"title": "pointblank",
"section": "",
- "text": "Custom Expression for Checking Column Values\nA column expression can be used to check column values. Just use col_vals_expr() for this.\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-10|22:09:21Pandas\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C\n 1\n \n \n \n\n col_vals_expr\n \n \n \n \n \n \n\n \n col_vals_expr()\n \n —\n COLUMN EXPR\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n —\n —\n —\n —\n \n\n \n \n \n 2025-02-10 22:09:21 UTC< 1 s2025-02-10 22:09:21 UTC\n \n\n\n\n\n\n\n \n\n\nimport pointblank as pb\n\nvalidation = (\n pb.Validate(\n data=pb.load_dataset(dataset=\"small_table\", tbl_type=\"pandas\")\n )\n .col_vals_expr(expr=lambda df: (df[\"d\"] % 1 != 0) & (df[\"a\"] < 10)) # Pandas column expr\n .interrogate()\n)\n\nvalidation\n\n\nPreview of Input Table\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PandasRows13Columns8\n \n\n \n date_timedatetime64[ns]\n datedatetime64[ns]\n aint64\n bobject\n cfloat64\n dfloat64\n ebool\n fobject\n\n\n\n \n 1\n 2016-01-04 11:00:00\n 2016-01-04 00:00:00\n 2\n 1-bcd-345\n 3.0\n 3423.29\n True\n high\n \n \n 2\n 2016-01-04 00:32:00\n 2016-01-04 00:00:00\n 3\n 5-egh-163\n 8.0\n 9999.99\n True\n low\n \n \n 3\n 2016-01-05 13:32:00\n 2016-01-05 00:00:00\n 6\n 8-kdg-938\n 3.0\n 2343.23\n True\n high\n \n \n 4\n 2016-01-06 17:23:00\n 2016-01-06 00:00:00\n 2\n 5-jdo-903\n NA\n 3892.4\n False\n mid\n \n \n 5\n 2016-01-09 12:36:00\n 2016-01-09 00:00:00\n 8\n 3-ldm-038\n 7.0\n 283.94\n True\n low\n \n \n 6\n 2016-01-11 06:15:00\n 2016-01-11 00:00:00\n 4\n 2-dhe-923\n 4.0\n 3291.03\n True\n mid\n \n \n 7\n 2016-01-15 18:46:00\n 2016-01-15 00:00:00\n 7\n 1-knw-093\n 3.0\n 843.34\n True\n high\n \n \n 8\n 2016-01-17 11:27:00\n 2016-01-17 00:00:00\n 4\n 5-boe-639\n 2.0\n 1035.64\n False\n low\n \n \n 9\n 2016-01-20 04:30:00\n 2016-01-20 00:00:00\n 3\n 5-bce-642\n 9.0\n 837.93\n False\n high\n \n \n 10\n 2016-01-20 04:30:00\n 2016-01-20 00:00:00\n 3\n 5-bce-642\n 9.0\n 837.93\n False\n high\n \n \n 11\n 2016-01-26 20:07:00\n 2016-01-26 00:00:00\n 4\n 2-dmx-010\n 7.0\n 833.98\n True\n low\n \n \n 12\n 2016-01-28 02:51:00\n 2016-01-28 00:00:00\n 2\n 7-dmx-010\n 8.0\n 108.34\n False\n low\n \n \n 13\n 2016-01-30 11:23:00\n 2016-01-30 00:00:00\n 1\n 3-dka-303\n NA\n 2230.09\n True\n high"
+ "text": "Custom Expression for Checking Column Values\nA column expression can be used to check column values. Just use col_vals_expr() for this.\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-11|05:27:07Pandas\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C\n 1\n \n \n \n\n col_vals_expr\n \n \n \n \n \n \n\n \n col_vals_expr()\n \n —\n COLUMN EXPR\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n —\n —\n —\n —\n \n\n \n \n \n 2025-02-11 05:27:07 UTC< 1 s2025-02-11 05:27:07 UTC\n \n\n\n\n\n\n\n \n\n\nimport pointblank as pb\n\nvalidation = (\n pb.Validate(\n data=pb.load_dataset(dataset=\"small_table\", tbl_type=\"pandas\")\n )\n .col_vals_expr(expr=lambda df: (df[\"d\"] % 1 != 0) & (df[\"a\"] < 10)) # Pandas column expr\n .interrogate()\n)\n\nvalidation\n\n\nPreview of Input Table\n\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n\n\n\n\n \n PandasRows13Columns8\n \n\n \n date_timedatetime64[ns]\n datedatetime64[ns]\n aint64\n bobject\n cfloat64\n dfloat64\n ebool\n fobject\n\n\n\n \n 1\n 2016-01-04 11:00:00\n 2016-01-04 00:00:00\n 2\n 1-bcd-345\n 3.0\n 3423.29\n True\n high\n \n \n 2\n 2016-01-04 00:32:00\n 2016-01-04 00:00:00\n 3\n 5-egh-163\n 8.0\n 9999.99\n True\n low\n \n \n 3\n 2016-01-05 13:32:00\n 2016-01-05 00:00:00\n 6\n 8-kdg-938\n 3.0\n 2343.23\n True\n high\n \n \n 4\n 2016-01-06 17:23:00\n 2016-01-06 00:00:00\n 2\n 5-jdo-903\n NA\n 3892.4\n False\n mid\n \n \n 5\n 2016-01-09 12:36:00\n 2016-01-09 00:00:00\n 8\n 3-ldm-038\n 7.0\n 283.94\n True\n low\n \n \n 6\n 2016-01-11 06:15:00\n 2016-01-11 00:00:00\n 4\n 2-dhe-923\n 4.0\n 3291.03\n True\n mid\n \n \n 7\n 2016-01-15 18:46:00\n 2016-01-15 00:00:00\n 7\n 1-knw-093\n 3.0\n 843.34\n True\n high\n \n \n 8\n 2016-01-17 11:27:00\n 2016-01-17 00:00:00\n 4\n 5-boe-639\n 2.0\n 1035.64\n False\n low\n \n \n 9\n 2016-01-20 04:30:00\n 2016-01-20 00:00:00\n 3\n 5-bce-642\n 9.0\n 837.93\n False\n high\n \n \n 10\n 2016-01-20 04:30:00\n 2016-01-20 00:00:00\n 3\n 5-bce-642\n 9.0\n 837.93\n False\n high\n \n \n 11\n 2016-01-26 20:07:00\n 2016-01-26 00:00:00\n 4\n 2-dmx-010\n 7.0\n 833.98\n True\n low\n \n \n 12\n 2016-01-28 02:51:00\n 2016-01-28 00:00:00\n 2\n 7-dmx-010\n 8.0\n 108.34\n False\n low\n \n \n 13\n 2016-01-30 11:23:00\n 2016-01-30 00:00:00\n 1\n 3-dka-303\n NA\n 2230.09\n True\n high"
},
{
"objectID": "get-started/index.html",
@@ -1068,28 +1068,28 @@
"href": "get-started/index.html#a-simple-validation-table",
"title": "Intro",
"section": "A Simple Validation Table",
- "text": "A Simple Validation Table\nThis is a validation report table that is produced from a validation of a Polars DataFrame:\n\n\nCode\nimport pointblank as pb\n\nvalidation_1 = (\n pb.Validate(data=pb.load_dataset(dataset=\"small_table\"))\n .col_vals_lt(columns=\"a\", value=10)\n .col_vals_between(columns=\"d\", left=0, right=5000)\n .col_vals_in_set(columns=\"f\", set=[\"low\", \"mid\", \"high\"])\n .col_vals_regex(columns=\"b\", pattern=r\"^[0-9]-[a-z]{3}-[0-9]{3}$\")\n .interrogate()\n)\n\nvalidation_1\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-10|22:09:27Polars\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C\n 1\n \n \n \n\n col_vals_lt\n \n \n \n \n \n \n\n \n col_vals_lt()\n \n a\n 10\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C66\n 2\n \n \n \n\n col_vals_between\n \n \n \n \n \n \n\n \n col_vals_between()\n \n d\n [0, 5000]\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 120.92\n 10.08\n —\n —\n —\n CSV\n \n \n #4CA64C\n 3\n \n \n \n\n col_vals_in_set\n \n \n \n \n \n \n\n \n col_vals_in_set()\n \n f\n low, mid, high\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 4\n \n \n \n\n col_vals_regex\n \n \n \n \n \n \n \n \n \n\n \n col_vals_regex()\n \n b\n ^[0-9]-[a-z]{3}-[0-9]{3}$\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n —\n —\n —\n —\n \n\n \n \n \n 2025-02-10 22:09:27 UTC< 1 s2025-02-10 22:09:27 UTC\n \n\n\n\n\n\n\n \n\n\nEach row in this reporting table constitutes a single validation step. Roughly, the left-hand side outlines the validation rules and the right-hand side provides the results of each validation step. While simple in principle, there’s a lot of useful information packed into this validation table.\nHere’s a diagram that describes a few of the important parts of the validation table:\n\nThere are three things that should be noted here:\n\nvalidation steps: each step is a separate test on the table, focused on a certain aspect of the table\nvalidation rules: the validation type is provided here along with key constraints\nvalidation results: interrogation results are provided here, with a breakdown of test units (total, passing, and failing), threshold states, and more\n\nThe intent is to provide the key information in one place, and have it be interpretable by data stakeholders."
+ "text": "A Simple Validation Table\nThis is a validation report table that is produced from a validation of a Polars DataFrame:\n\n\nCode\nimport pointblank as pb\n\nvalidation_1 = (\n pb.Validate(data=pb.load_dataset(dataset=\"small_table\"))\n .col_vals_lt(columns=\"a\", value=10)\n .col_vals_between(columns=\"d\", left=0, right=5000)\n .col_vals_in_set(columns=\"f\", set=[\"low\", \"mid\", \"high\"])\n .col_vals_regex(columns=\"b\", pattern=r\"^[0-9]-[a-z]{3}-[0-9]{3}$\")\n .interrogate()\n)\n\nvalidation_1\n\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-11|05:27:12Polars\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C\n 1\n \n \n \n\n col_vals_lt\n \n \n \n \n \n \n\n \n col_vals_lt()\n \n a\n 10\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C66\n 2\n \n \n \n\n col_vals_between\n \n \n \n \n \n \n\n \n col_vals_between()\n \n d\n [0, 5000]\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 120.92\n 10.08\n —\n —\n —\n CSV\n \n \n #4CA64C\n 3\n \n \n \n\n col_vals_in_set\n \n \n \n \n \n \n\n \n col_vals_in_set()\n \n f\n low, mid, high\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 4\n \n \n \n\n col_vals_regex\n \n \n \n \n \n \n \n \n \n\n \n col_vals_regex()\n \n b\n ^[0-9]-[a-z]{3}-[0-9]{3}$\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n —\n —\n —\n —\n \n\n \n \n \n 2025-02-11 05:27:12 UTC< 1 s2025-02-11 05:27:12 UTC\n \n\n\n\n\n\n\n \n\n\nEach row in this reporting table constitutes a single validation step. Roughly, the left-hand side outlines the validation rules and the right-hand side provides the results of each validation step. While simple in principle, there’s a lot of useful information packed into this validation table.\nHere’s a diagram that describes a few of the important parts of the validation table:\n\nThere are three things that should be noted here:\n\nvalidation steps: each step is a separate test on the table, focused on a certain aspect of the table\nvalidation rules: the validation type is provided here along with key constraints\nvalidation results: interrogation results are provided here, with a breakdown of test units (total, passing, and failing), threshold states, and more\n\nThe intent is to provide the key information in one place, and have it be interpretable by data stakeholders."
},
{
"objectID": "get-started/index.html#example-code-step-by-step",
"href": "get-started/index.html#example-code-step-by-step",
"title": "Intro",
"section": "Example Code, Step-by-Step",
- "text": "Example Code, Step-by-Step\nHere’s the code that performs the validation on the Polars table.\n\nimport pointblank as pb\n\nvalidation_2 = (\n pb.Validate(data=pb.load_dataset(dataset=\"small_table\"))\n .col_vals_lt(columns=\"a\", value=10)\n .col_vals_between(columns=\"d\", left=0, right=5000)\n .col_vals_in_set(columns=\"f\", set=[\"low\", \"mid\", \"high\"])\n .col_vals_regex(columns=\"b\", pattern=r\"^[0-9]-[a-z]{3}-[0-9]{3}$\")\n .interrogate()\n)\n\nvalidation_2\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-10|22:09:27Polars\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C\n 1\n \n \n \n\n col_vals_lt\n \n \n \n \n \n \n\n \n col_vals_lt()\n \n a\n 10\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C66\n 2\n \n \n \n\n col_vals_between\n \n \n \n \n \n \n\n \n col_vals_between()\n \n d\n [0, 5000]\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 120.92\n 10.08\n —\n —\n —\n CSV\n \n \n #4CA64C\n 3\n \n \n \n\n col_vals_in_set\n \n \n \n \n \n \n\n \n col_vals_in_set()\n \n f\n low, mid, high\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 4\n \n \n \n\n col_vals_regex\n \n \n \n \n \n \n \n \n \n\n \n col_vals_regex()\n \n b\n ^[0-9]-[a-z]{3}-[0-9]{3}$\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n —\n —\n —\n —\n \n\n \n \n \n 2025-02-10 22:09:27 UTC< 1 s2025-02-10 22:09:27 UTC\n \n\n\n\n\n\n\n \n\n\nNote these three key pieces in the code:\n\nthe Validate(data=...) argument takes a DataFrame or database table that you want to validate\nthe methods starting with col_* specify validation steps that run on specific columns\nthe interrogate() method executes the validation plan on the table\n\nThis common pattern is used in a validation workflow, where Validate() and interrogate() bookend a validation plan generated through calling validation methods. And that’s data validation with Pointblank in a nutshell! In the next section we’ll go a bit further by introducing a means to gauge data quality with failure thresholds."
+ "text": "Example Code, Step-by-Step\nHere’s the code that performs the validation on the Polars table.\n\nimport pointblank as pb\n\nvalidation_2 = (\n pb.Validate(data=pb.load_dataset(dataset=\"small_table\"))\n .col_vals_lt(columns=\"a\", value=10)\n .col_vals_between(columns=\"d\", left=0, right=5000)\n .col_vals_in_set(columns=\"f\", set=[\"low\", \"mid\", \"high\"])\n .col_vals_regex(columns=\"b\", pattern=r\"^[0-9]-[a-z]{3}-[0-9]{3}$\")\n .interrogate()\n)\n\nvalidation_2\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-11|05:27:13Polars\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C\n 1\n \n \n \n\n col_vals_lt\n \n \n \n \n \n \n\n \n col_vals_lt()\n \n a\n 10\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C66\n 2\n \n \n \n\n col_vals_between\n \n \n \n \n \n \n\n \n col_vals_between()\n \n d\n [0, 5000]\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 120.92\n 10.08\n —\n —\n —\n CSV\n \n \n #4CA64C\n 3\n \n \n \n\n col_vals_in_set\n \n \n \n \n \n \n\n \n col_vals_in_set()\n \n f\n low, mid, high\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n —\n —\n —\n —\n \n \n #4CA64C\n 4\n \n \n \n\n col_vals_regex\n \n \n \n \n \n \n \n \n \n\n \n col_vals_regex()\n \n b\n ^[0-9]-[a-z]{3}-[0-9]{3}$\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 131.00\n 00.00\n —\n —\n —\n —\n \n\n \n \n \n 2025-02-11 05:27:13 UTC< 1 s2025-02-11 05:27:13 UTC\n \n\n\n\n\n\n\n \n\n\nNote these three key pieces in the code:\n\nthe Validate(data=...) argument takes a DataFrame or database table that you want to validate\nthe methods starting with col_* specify validation steps that run on specific columns\nthe interrogate() method executes the validation plan on the table\n\nThis common pattern is used in a validation workflow, where Validate() and interrogate() bookend a validation plan generated through calling validation methods. And that’s data validation with Pointblank in a nutshell! In the next section we’ll go a bit further by introducing a means to gauge data quality with failure thresholds."
},
{
"objectID": "get-started/index.html#understanding-test-units",
"href": "get-started/index.html#understanding-test-units",
"title": "Intro",
"section": "Understanding Test Units",
- "text": "Understanding Test Units\nEach validation step will execute a separate validation test on the target table. For example, the col_vals_lt() validation tests that each value in a column is less than a specified number. A key thing that’s reported is the number of test units that pass or fail a validation step.\nTest units are dependent on the test being run. The col_vals_* tests each value in a column, so each value will be a test unit (and the number of test units is the number of rows in the target table).\nThis matters because you can set thresholds that signal warn, stop, and notify states based the proportion or number of failing test units.\nHere’s a simple example that uses a single col_vals_lt() step along with thresholds set in the thresholds= argument of the validation method.\n\nvalidation_3 = (\n pb.Validate(data=pb.load_dataset(dataset=\"small_table\"))\n .col_vals_lt(columns=\"a\", value=7, thresholds=(2, 4))\n .interrogate()\n)\n\nvalidation_3\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-10|22:09:27Polars\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #FFBF00\n 1\n \n \n \n\n col_vals_lt\n \n \n \n \n \n \n\n \n col_vals_lt()\n \n a\n 7\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 110.85\n 20.15\n ●\n ○\n —\n CSV\n \n\n \n \n \n 2025-02-10 22:09:27 UTC< 1 s2025-02-10 22:09:27 UTC\n \n\n\n\n\n\n\n \n\n\nThe code uses thresholds=(2, 4) to set a warn threshold of 2 and a stop threshold of 4. If you look at the validation report table, we can see:\n\nthe FAIL column shows that 2 tests units have failed\nthe W column (short for warn) shows a filled yellow circle indicating those failing test units reached that threshold value\nthe S column (short for stop) shows an open red circle indicating that the number of failing test units is below that threshold\n\nThe one final threshold, N (for notify), wasn’t set so it appears on the validation table as a long dash.\nThresholds let you take action at different levels of severity. The next section discusses setting and acting on thresholds in detail."
+ "text": "Understanding Test Units\nEach validation step will execute a separate validation test on the target table. For example, the col_vals_lt() validation tests that each value in a column is less than a specified number. A key thing that’s reported is the number of test units that pass or fail a validation step.\nTest units are dependent on the test being run. The col_vals_* tests each value in a column, so each value will be a test unit (and the number of test units is the number of rows in the target table).\nThis matters because you can set thresholds that signal warn, stop, and notify states based the proportion or number of failing test units.\nHere’s a simple example that uses a single col_vals_lt() step along with thresholds set in the thresholds= argument of the validation method.\n\nvalidation_3 = (\n pb.Validate(data=pb.load_dataset(dataset=\"small_table\"))\n .col_vals_lt(columns=\"a\", value=7, thresholds=(2, 4))\n .interrogate()\n)\n\nvalidation_3\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-11|05:27:13Polars\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #FFBF00\n 1\n \n \n \n\n col_vals_lt\n \n \n \n \n \n \n\n \n col_vals_lt()\n \n a\n 7\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 110.85\n 20.15\n ●\n ○\n —\n CSV\n \n\n \n \n \n 2025-02-11 05:27:13 UTC< 1 s2025-02-11 05:27:13 UTC\n \n\n\n\n\n\n\n \n\n\nThe code uses thresholds=(2, 4) to set a warn threshold of 2 and a stop threshold of 4. If you look at the validation report table, we can see:\n\nthe FAIL column shows that 2 tests units have failed\nthe W column (short for warn) shows a filled yellow circle indicating those failing test units reached that threshold value\nthe S column (short for stop) shows an open red circle indicating that the number of failing test units is below that threshold\n\nThe one final threshold, N (for notify), wasn’t set so it appears on the validation table as a long dash.\nThresholds let you take action at different levels of severity. The next section discusses setting and acting on thresholds in detail."
},
{
"objectID": "get-started/index.html#using-threshold-levels",
"href": "get-started/index.html#using-threshold-levels",
"title": "Intro",
"section": "Using Threshold Levels",
- "text": "Using Threshold Levels\nThresholds enable you to signal failure at different severity levels. In the near future, thresholds will be able to trigger custom actions (should those actions be defined).\nHere’s an example where we test if a certain column has Null/missing values with col_vals_not_null(). This is a case where we want to warn on any Null values and stop where there are 20% Nulls in the column.\n\nvalidation_4 = (\n pb.Validate(data=pb.load_dataset(dataset=\"small_table\"))\n .col_vals_not_null(columns=\"c\", thresholds=(1, 0.2))\n .interrogate()\n)\n\nvalidation_4\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-10|22:09:27Polars\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #FFBF00\n 1\n \n \n \n\n col_vals_not_null\n \n \n \n \n \n \n \n \n\n \n col_vals_not_null()\n \n c\n —\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 110.85\n 20.15\n ●\n ○\n —\n CSV\n \n\n \n \n \n 2025-02-10 22:09:27 UTC< 1 s2025-02-10 22:09:27 UTC\n \n\n\n\n\n\n\n \n\n\nIn this case, the thresholds= argument in the cols_vals_not_null() step was set to (1, 0.2), indicating 1 failing test unit is set for warn and a 0.2 fraction of all failing test units is set to stop.\nFor more on thresholds, see the Thresholds article."
+ "text": "Using Threshold Levels\nThresholds enable you to signal failure at different severity levels. In the near future, thresholds will be able to trigger custom actions (should those actions be defined).\nHere’s an example where we test if a certain column has Null/missing values with col_vals_not_null(). This is a case where we want to warn on any Null values and stop where there are 20% Nulls in the column.\n\nvalidation_4 = (\n pb.Validate(data=pb.load_dataset(dataset=\"small_table\"))\n .col_vals_not_null(columns=\"c\", thresholds=(1, 0.2))\n .interrogate()\n)\n\nvalidation_4\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-11|05:27:13Polars\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #FFBF00\n 1\n \n \n \n\n col_vals_not_null\n \n \n \n \n \n \n \n \n\n \n col_vals_not_null()\n \n c\n —\n \n \n \n \n \n \n \n \n\n ✓\n 13\n 110.85\n 20.15\n ●\n ○\n —\n CSV\n \n\n \n \n \n 2025-02-11 05:27:13 UTC< 1 s2025-02-11 05:27:13 UTC\n \n\n\n\n\n\n\n \n\n\nIn this case, the thresholds= argument in the cols_vals_not_null() step was set to (1, 0.2), indicating 1 failing test unit is set for warn and a 0.2 fraction of all failing test units is set to stop.\nFor more on thresholds, see the Thresholds article."
},
{
"objectID": "reference/Thresholds.html",
@@ -1159,35 +1159,42 @@
"href": "reference/missing_vals_tbl.html",
"title": "missing_vals_tbl",
"section": "",
- "text": "missing_vals_tbl(data)\nDisplay a table that shows the missing values in the input table.\nThe missing_vals_tbl() function generates a table that shows the missing values in the input table. The table is displayed using the Great Tables (GT) API, which allows for further customization of the table’s appearance if so desired.\n\n\n\ndata : FrameT | Any\n\nThe table for which to display the missing values. This could be a DataFrame object or an Ibis table object. Read the Supported Input Table Types section for details on the supported table types.\n\n\n\n\n\n\n : GT\n\nA GT object that displays the table of missing values in the input table.\n\n\n\n\n\nThe data= parameter can be given any of the following table types:\n\nPolars DataFrame (\"polars\")\nPandas DataFrame (\"pandas\")\nDuckDB table (\"duckdb\")*\nMySQL table (\"mysql\")*\nPostgreSQL table (\"postgresql\")*\nSQLite table (\"sqlite\")*\nParquet table (\"parquet\")*\n\nThe table types marked with an asterisk need to be prepared as Ibis tables (with type of ibis.expr.types.relations.Table). Furthermore, using missing_vals_tbl() with these types of tables requires the Ibis library (v9.5.0 or above) to be installed. If the input table is a Polars or Pandas DataFrame, the availability of Ibis is not needed.\n\n\n\nThe missing values table shows the proportion of missing values in each column of the input table. The table is divided into sectors, with each sector representing a range of rows in the table. The proportion of missing values in each sector is calculated for each column. The table is displayed using the Great Tables API, which allows for further customization of the table’s appearance.\nTo ensure that the table can scale to tables with many columns, each row in the reporting table represents a column in the input table. There are 10 sectors shown in the table, where the first sector represents the first 10% of the rows, the second sector represents the next 10% of the rows, and so on. Any sectors that are light blue indicate that there are no missing values in that sector. If there are missing values, the proportion of missing values is shown by a gray color (light gray for low proportions, dark gray to black for very high proportions)."
+ "text": "missing_vals_tbl(data)\nDisplay a table that shows the missing values in the input table.\nThe missing_vals_tbl() function generates a table that shows the missing values in the input table. The table is displayed using the Great Tables (GT) API, which allows for further customization of the table’s appearance if so desired."
},
{
"objectID": "reference/missing_vals_tbl.html#parameters",
"href": "reference/missing_vals_tbl.html#parameters",
"title": "missing_vals_tbl",
- "section": "",
- "text": "data : FrameT | Any\n\nThe table for which to display the missing values. This could be a DataFrame object or an Ibis table object. Read the Supported Input Table Types section for details on the supported table types."
+ "section": "Parameters",
+ "text": "Parameters\n\ndata : FrameT | Any\n\nThe table for which to display the missing values. This could be a DataFrame object or an Ibis table object. Read the Supported Input Table Types section for details on the supported table types."
},
{
"objectID": "reference/missing_vals_tbl.html#returns",
"href": "reference/missing_vals_tbl.html#returns",
"title": "missing_vals_tbl",
- "section": "",
- "text": ": GT\n\nA GT object that displays the table of missing values in the input table."
+ "section": "Returns",
+ "text": "Returns\n\n : GT\n\nA GT object that displays the table of missing values in the input table."
},
{
"objectID": "reference/missing_vals_tbl.html#supported-input-table-types",
"href": "reference/missing_vals_tbl.html#supported-input-table-types",
"title": "missing_vals_tbl",
- "section": "",
- "text": "The data= parameter can be given any of the following table types:\n\nPolars DataFrame (\"polars\")\nPandas DataFrame (\"pandas\")\nDuckDB table (\"duckdb\")*\nMySQL table (\"mysql\")*\nPostgreSQL table (\"postgresql\")*\nSQLite table (\"sqlite\")*\nParquet table (\"parquet\")*\n\nThe table types marked with an asterisk need to be prepared as Ibis tables (with type of ibis.expr.types.relations.Table). Furthermore, using missing_vals_tbl() with these types of tables requires the Ibis library (v9.5.0 or above) to be installed. If the input table is a Polars or Pandas DataFrame, the availability of Ibis is not needed."
+ "section": "Supported Input Table Types",
+ "text": "Supported Input Table Types\nThe data= parameter can be given any of the following table types:\n\nPolars DataFrame (\"polars\")\nPandas DataFrame (\"pandas\")\nDuckDB table (\"duckdb\")*\nMySQL table (\"mysql\")*\nPostgreSQL table (\"postgresql\")*\nSQLite table (\"sqlite\")*\nParquet table (\"parquet\")*\n\nThe table types marked with an asterisk need to be prepared as Ibis tables (with type of ibis.expr.types.relations.Table). Furthermore, using missing_vals_tbl() with these types of tables requires the Ibis library (v9.5.0 or above) to be installed. If the input table is a Polars or Pandas DataFrame, the availability of Ibis is not needed."
},
{
"objectID": "reference/missing_vals_tbl.html#the-missing-values-table",
"href": "reference/missing_vals_tbl.html#the-missing-values-table",
"title": "missing_vals_tbl",
- "section": "",
- "text": "The missing values table shows the proportion of missing values in each column of the input table. The table is divided into sectors, with each sector representing a range of rows in the table. The proportion of missing values in each sector is calculated for each column. The table is displayed using the Great Tables API, which allows for further customization of the table’s appearance.\nTo ensure that the table can scale to tables with many columns, each row in the reporting table represents a column in the input table. There are 10 sectors shown in the table, where the first sector represents the first 10% of the rows, the second sector represents the next 10% of the rows, and so on. Any sectors that are light blue indicate that there are no missing values in that sector. If there are missing values, the proportion of missing values is shown by a gray color (light gray for low proportions, dark gray to black for very high proportions)."
+ "section": "The Missing Values Table",
+ "text": "The Missing Values Table\nThe missing values table shows the proportion of missing values in each column of the input table. The table is divided into sectors, with each sector representing a range of rows in the table. The proportion of missing values in each sector is calculated for each column. The table is displayed using the Great Tables API, which allows for further customization of the table’s appearance.\nTo ensure that the table can scale to tables with many columns, each row in the reporting table represents a column in the input table. There are 10 sectors shown in the table, where the first sector represents the first 10% of the rows, the second sector represents the next 10% of the rows, and so on. Any sectors that are light blue indicate that there are no missing values in that sector. If there are missing values, the proportion of missing values is shown by a gray color (light gray for low proportions, dark gray to black for very high proportions)."
+ },
+ {
+ "objectID": "reference/missing_vals_tbl.html#examples",
+ "href": "reference/missing_vals_tbl.html#examples",
+ "title": "missing_vals_tbl",
+ "section": "Examples",
+ "text": "Examples\nThe missing_vals_tbl() function is useful for quickly identifying columns with missing values in a table. Here’s an example using the nycflights dataset (loaded using the load_dataset() function as a Polars DataFrame):\n\nimport pointblank as pb\n\nnycflights = pb.load_dataset(\"nycflights\", tbl_type=\"polars\")\n\npb.missing_vals_tbl(nycflights)\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Missing Values 46595 in total\n \n \n PolarsRows336776Columns18\n \n\n Column\n \n Row Sector\n \n\n\n 1\n 2\n 3\n 4\n 5\n 6\n 7\n 8\n 9\n 10\n\n\n\n \n year\n \n \n \n \n \n \n \n \n \n \n \n \n month\n \n \n \n \n \n \n \n \n \n \n \n \n day\n \n \n \n \n \n \n \n \n \n \n \n \n dep_time\n \n \n \n \n \n \n \n \n \n \n \n \n sched_dep_time\n \n \n \n \n \n \n \n \n \n \n \n \n dep_delay\n \n \n \n \n \n \n \n \n \n \n \n \n arr_time\n \n \n \n \n \n \n \n \n \n \n \n \n sched_arr_time\n \n \n \n \n \n \n \n \n \n \n \n \n arr_delay\n \n \n \n \n \n \n \n \n \n \n \n \n carrier\n \n \n \n \n \n \n \n \n \n \n \n \n flight\n \n \n \n \n \n \n \n \n \n \n \n \n tailnum\n \n \n \n \n \n \n \n \n \n \n \n \n origin\n \n \n \n \n \n \n \n \n \n \n \n \n dest\n \n \n \n \n \n \n \n \n \n \n \n \n air_time\n \n \n \n \n \n \n \n \n \n \n \n \n distance\n \n \n \n \n \n \n \n \n \n \n \n \n hour\n \n \n \n \n \n \n \n \n \n \n \n \n minute\n \n \n \n \n \n \n \n \n \n \n \n\n \n \n \n NO MISSING VALUES PROPORTION MISSING: 0%100%ROW SECTORS1 – 3367733678 – 6735467355 – 101031101032 – 134708134709 – 168385168386 – 202062202063 – 235739235740 – 269416269417 – 303093303094 – 336776\n \n\n\n\n\n\n\n \n\n\nThe table shows the proportion of missing values in each column of the nycflights dataset. The table is divided into sectors, with each sector representing a range of rows in the table (with around 34,000 rows per sector). The proportion of missing values in each sector is calculated for each column. The various shades of gray indicate the proportion of missing values in each sector. Many columns have no missing values at all, and those sectors are colored light blue."
},
{
"objectID": "reference/Validate.col_vals_outside.html",
@@ -1397,7 +1404,7 @@
"href": "reference/Validate.notify.html#examples",
"title": "Validate.notify",
"section": "Examples",
- "text": "Examples\nIn the example below, we’ll use a simple Polars DataFrame with three columns (a, b, and c). There will be three validation steps, and the first step will have many failing test units, the rest will be completely passing. We’ve set thresholds here for each of the steps by using thresholds=(2, 4, 5), which means:\n\nthe warn threshold is 2 failing test units\nthe stop threshold is 4 failing test units\nthe notify threshold is 5 failing test units\n\nAfter interrogation, the notify() method is used to determine the notify status for each validation step.\n\nimport pointblank as pb\nimport polars as pl\n\ntbl = pl.DataFrame(\n {\n \"a\": [2, 4, 4, 7, 2, 3, 8],\n \"b\": [9, 8, 10, 5, 10, 6, 2],\n \"c\": [\"a\", \"b\", \"a\", \"a\", \"b\", \"b\", \"a\"]\n }\n)\n\nvalidation = (\n pb.Validate(data=tbl, thresholds=(2, 4, 5))\n .col_vals_gt(columns=\"a\", value=5)\n .col_vals_lt(columns=\"b\", value=15)\n .col_vals_in_set(columns=\"c\", set=[\"a\", \"b\"])\n .interrogate()\n)\n\nvalidation.notify()\n\n{1: True, 2: False, 3: False}\n\n\nThe returned dictionary provides the notify status for each validation step. The first step has a True value since the number of failing test units meets the threshold for the notify level. The second and third steps have False values since the number of failing test units was 0, which is below the threshold for the notify level.\nWe can also visually inspect the notify status across all steps by viewing the validation table:\n\nvalidation\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-10|22:10:21PolarsWARN2STOP4NOTIFY5\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #CF142B\n 1\n \n \n \n\n col_vals_gt\n \n \n \n \n \n \n\n \n col_vals_gt()\n \n a\n 5\n \n \n \n \n \n \n \n \n\n ✓\n 7\n 20.29\n 50.71\n ●\n ●\n ●\n CSV\n \n \n #4CA64C\n 2\n \n \n \n\n col_vals_lt\n \n \n \n \n \n \n\n \n col_vals_lt()\n \n b\n 15\n \n \n \n \n \n \n \n \n\n ✓\n 7\n 71.00\n 00.00\n ○\n ○\n ○\n —\n \n \n #4CA64C\n 3\n \n \n \n\n col_vals_in_set\n \n \n \n \n \n \n\n \n col_vals_in_set()\n \n c\n a, b\n \n \n \n \n \n \n \n \n\n ✓\n 7\n 71.00\n 00.00\n ○\n ○\n ○\n —\n \n\n \n \n \n 2025-02-10 22:10:21 UTC< 1 s2025-02-10 22:10:21 UTC\n \n\n\n\n\n\n\n \n\n\nWe can see that there are filled yellow, red, and blue circles in the first step (far right side, in the W, S, and N columns) indicating that the warn, stop, and notify thresholds were met. The other steps have empty yellow, red, and blue circles. This means that thresholds were ‘set but not met’ in those steps.\nIf we wanted to check the notify status for a single validation step, we can provide the step number. Also, we could have the value returned as a scalar by setting scalar=True (ensuring that i= is a scalar).\n\nvalidation.notify(i=1)\n\n{1: True}\n\n\nThe returned value is True, indicating that the first validation step had the notify threshold met."
+ "text": "Examples\nIn the example below, we’ll use a simple Polars DataFrame with three columns (a, b, and c). There will be three validation steps, and the first step will have many failing test units, the rest will be completely passing. We’ve set thresholds here for each of the steps by using thresholds=(2, 4, 5), which means:\n\nthe warn threshold is 2 failing test units\nthe stop threshold is 4 failing test units\nthe notify threshold is 5 failing test units\n\nAfter interrogation, the notify() method is used to determine the notify status for each validation step.\n\nimport pointblank as pb\nimport polars as pl\n\ntbl = pl.DataFrame(\n {\n \"a\": [2, 4, 4, 7, 2, 3, 8],\n \"b\": [9, 8, 10, 5, 10, 6, 2],\n \"c\": [\"a\", \"b\", \"a\", \"a\", \"b\", \"b\", \"a\"]\n }\n)\n\nvalidation = (\n pb.Validate(data=tbl, thresholds=(2, 4, 5))\n .col_vals_gt(columns=\"a\", value=5)\n .col_vals_lt(columns=\"b\", value=15)\n .col_vals_in_set(columns=\"c\", set=[\"a\", \"b\"])\n .interrogate()\n)\n\nvalidation.notify()\n\n{1: True, 2: False, 3: False}\n\n\nThe returned dictionary provides the notify status for each validation step. The first step has a True value since the number of failing test units meets the threshold for the notify level. The second and third steps have False values since the number of failing test units was 0, which is below the threshold for the notify level.\nWe can also visually inspect the notify status across all steps by viewing the validation table:\n\nvalidation\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-11|05:28:13PolarsWARN2STOP4NOTIFY5\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #CF142B\n 1\n \n \n \n\n col_vals_gt\n \n \n \n \n \n \n\n \n col_vals_gt()\n \n a\n 5\n \n \n \n \n \n \n \n \n\n ✓\n 7\n 20.29\n 50.71\n ●\n ●\n ●\n CSV\n \n \n #4CA64C\n 2\n \n \n \n\n col_vals_lt\n \n \n \n \n \n \n\n \n col_vals_lt()\n \n b\n 15\n \n \n \n \n \n \n \n \n\n ✓\n 7\n 71.00\n 00.00\n ○\n ○\n ○\n —\n \n \n #4CA64C\n 3\n \n \n \n\n col_vals_in_set\n \n \n \n \n \n \n\n \n col_vals_in_set()\n \n c\n a, b\n \n \n \n \n \n \n \n \n\n ✓\n 7\n 71.00\n 00.00\n ○\n ○\n ○\n —\n \n\n \n \n \n 2025-02-11 05:28:13 UTC< 1 s2025-02-11 05:28:13 UTC\n \n\n\n\n\n\n\n \n\n\nWe can see that there are filled yellow, red, and blue circles in the first step (far right side, in the W, S, and N columns) indicating that the warn, stop, and notify thresholds were met. The other steps have empty yellow, red, and blue circles. This means that thresholds were ‘set but not met’ in those steps.\nIf we wanted to check the notify status for a single validation step, we can provide the step number. Also, we could have the value returned as a scalar by setting scalar=True (ensuring that i= is a scalar).\n\nvalidation.notify(i=1)\n\n{1: True}\n\n\nThe returned value is True, indicating that the first validation step had the notify threshold met."
},
{
"objectID": "reference/Validate.stop.html",
@@ -1425,7 +1432,7 @@
"href": "reference/Validate.stop.html#examples",
"title": "Validate.stop",
"section": "Examples",
- "text": "Examples\nIn the example below, we’ll use a simple Polars DataFrame with three columns (a, b, and c). There will be three validation steps, and the first step will have some failing test units, the rest will be completely passing. We’ve set thresholds here for each of the steps by using thresholds=(2, 4, 5), which means:\n\nthe warn threshold is 2 failing test units\nthe stop threshold is 4 failing test units\nthe notify threshold is 5 failing test units\n\nAfter interrogation, the stop() method is used to determine the stop status for each validation step.\n\nimport pointblank as pb\nimport polars as pl\n\ntbl = pl.DataFrame(\n {\n \"a\": [3, 4, 9, 7, 2, 3, 8],\n \"b\": [9, 8, 10, 5, 10, 6, 2],\n \"c\": [\"a\", \"b\", \"a\", \"a\", \"b\", \"b\", \"a\"]\n }\n)\n\nvalidation = (\n pb.Validate(data=tbl, thresholds=(2, 4, 5))\n .col_vals_gt(columns=\"a\", value=5)\n .col_vals_lt(columns=\"b\", value=15)\n .col_vals_in_set(columns=\"c\", set=[\"a\", \"b\"])\n .interrogate()\n)\n\nvalidation.stop()\n\n{1: True, 2: False, 3: False}\n\n\nThe returned dictionary provides the stop status for each validation step. The first step has a True value since the number of failing test units meets the threshold for the stop level. The second and third steps have False values since the number of failing test units was 0, which is below the threshold for the stop level.\nWe can also visually inspect the stop status across all steps by viewing the validation table:\n\nvalidation\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-10|22:10:26PolarsWARN2STOP4NOTIFY5\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #CF142B\n 1\n \n \n \n\n col_vals_gt\n \n \n \n \n \n \n\n \n col_vals_gt()\n \n a\n 5\n \n \n \n \n \n \n \n \n\n ✓\n 7\n 30.43\n 40.57\n ●\n ●\n ○\n CSV\n \n \n #4CA64C\n 2\n \n \n \n\n col_vals_lt\n \n \n \n \n \n \n\n \n col_vals_lt()\n \n b\n 15\n \n \n \n \n \n \n \n \n\n ✓\n 7\n 71.00\n 00.00\n ○\n ○\n ○\n —\n \n \n #4CA64C\n 3\n \n \n \n\n col_vals_in_set\n \n \n \n \n \n \n\n \n col_vals_in_set()\n \n c\n a, b\n \n \n \n \n \n \n \n \n\n ✓\n 7\n 71.00\n 00.00\n ○\n ○\n ○\n —\n \n\n \n \n \n 2025-02-10 22:10:26 UTC< 1 s2025-02-10 22:10:26 UTC\n \n\n\n\n\n\n\n \n\n\nWe can see that there are filled yellow and red circles in the first step (far right side, in the W and S columns) indicating that the warn and stop thresholds were met. The other steps have empty yellow and red circles. This means that thresholds were ‘set but not met’ in those steps.\nIf we wanted to check the stop status for a single validation step, we can provide the step number. Also, we could have the value returned as a scalar by setting scalar=True (ensuring that i= is a scalar).\n\nvalidation.stop(i=1)\n\n{1: True}\n\n\nThe returned value is True, indicating that the first validation step had the stop threshold met."
+ "text": "Examples\nIn the example below, we’ll use a simple Polars DataFrame with three columns (a, b, and c). There will be three validation steps, and the first step will have some failing test units, the rest will be completely passing. We’ve set thresholds here for each of the steps by using thresholds=(2, 4, 5), which means:\n\nthe warn threshold is 2 failing test units\nthe stop threshold is 4 failing test units\nthe notify threshold is 5 failing test units\n\nAfter interrogation, the stop() method is used to determine the stop status for each validation step.\n\nimport pointblank as pb\nimport polars as pl\n\ntbl = pl.DataFrame(\n {\n \"a\": [3, 4, 9, 7, 2, 3, 8],\n \"b\": [9, 8, 10, 5, 10, 6, 2],\n \"c\": [\"a\", \"b\", \"a\", \"a\", \"b\", \"b\", \"a\"]\n }\n)\n\nvalidation = (\n pb.Validate(data=tbl, thresholds=(2, 4, 5))\n .col_vals_gt(columns=\"a\", value=5)\n .col_vals_lt(columns=\"b\", value=15)\n .col_vals_in_set(columns=\"c\", set=[\"a\", \"b\"])\n .interrogate()\n)\n\nvalidation.stop()\n\n{1: True, 2: False, 3: False}\n\n\nThe returned dictionary provides the stop status for each validation step. The first step has a True value since the number of failing test units meets the threshold for the stop level. The second and third steps have False values since the number of failing test units was 0, which is below the threshold for the stop level.\nWe can also visually inspect the stop status across all steps by viewing the validation table:\n\nvalidation\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-11|05:28:19PolarsWARN2STOP4NOTIFY5\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #CF142B\n 1\n \n \n \n\n col_vals_gt\n \n \n \n \n \n \n\n \n col_vals_gt()\n \n a\n 5\n \n \n \n \n \n \n \n \n\n ✓\n 7\n 30.43\n 40.57\n ●\n ●\n ○\n CSV\n \n \n #4CA64C\n 2\n \n \n \n\n col_vals_lt\n \n \n \n \n \n \n\n \n col_vals_lt()\n \n b\n 15\n \n \n \n \n \n \n \n \n\n ✓\n 7\n 71.00\n 00.00\n ○\n ○\n ○\n —\n \n \n #4CA64C\n 3\n \n \n \n\n col_vals_in_set\n \n \n \n \n \n \n\n \n col_vals_in_set()\n \n c\n a, b\n \n \n \n \n \n \n \n \n\n ✓\n 7\n 71.00\n 00.00\n ○\n ○\n ○\n —\n \n\n \n \n \n 2025-02-11 05:28:19 UTC< 1 s2025-02-11 05:28:19 UTC\n \n\n\n\n\n\n\n \n\n\nWe can see that there are filled yellow and red circles in the first step (far right side, in the W and S columns) indicating that the warn and stop thresholds were met. The other steps have empty yellow and red circles. This means that thresholds were ‘set but not met’ in those steps.\nIf we wanted to check the stop status for a single validation step, we can provide the step number. Also, we could have the value returned as a scalar by setting scalar=True (ensuring that i= is a scalar).\n\nvalidation.stop(i=1)\n\n{1: True}\n\n\nThe returned value is True, indicating that the first validation step had the stop threshold met."
},
{
"objectID": "reference/col.html",
@@ -1901,7 +1908,7 @@
"href": "reference/Validate.get_tabular_report.html#examples",
"title": "Validate.get_tabular_report",
"section": "Examples",
- "text": "Examples\nLet’s create a Validate object with a few validation steps and then interrogate the data table to see how it performs against the validation plan. We can then generate a tabular report to get a summary of the results.\n\nimport pointblank as pb\nimport polars as pl\n\n# Create a Polars DataFrame\ntbl_pl = pl.DataFrame({\"x\": [1, 2, 3, 4], \"y\": [4, 5, 6, 7]})\n\n# Validate data using Polars DataFrame\nvalidation = (\n pb.Validate(data=tbl_pl, tbl_name=\"tbl_xy\", thresholds=(2, 3, 4))\n .col_vals_gt(columns=\"x\", value=1)\n .col_vals_lt(columns=\"x\", value=3)\n .col_vals_le(columns=\"y\", value=7)\n .interrogate()\n)\n\n# Look at the validation table\nvalidation\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-10|22:11:38Polarstbl_xyWARN2STOP3NOTIFY4\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C66\n 1\n \n \n \n\n col_vals_gt\n \n \n \n \n \n \n\n \n col_vals_gt()\n \n x\n 1\n \n \n \n \n \n \n \n \n\n ✓\n 4\n 30.75\n 10.25\n ○\n ○\n ○\n CSV\n \n \n #FFBF00\n 2\n \n \n \n\n col_vals_lt\n \n \n \n \n \n \n\n \n col_vals_lt()\n \n x\n 3\n \n \n \n \n \n \n \n \n\n ✓\n 4\n 20.50\n 20.50\n ●\n ○\n ○\n CSV\n \n \n #4CA64C\n 3\n \n \n \n\n col_vals_lte\n \n \n \n \n \n \n\n \n col_vals_le()\n \n y\n 7\n \n \n \n \n \n \n \n \n\n ✓\n 4\n 41.00\n 00.00\n ○\n ○\n ○\n —\n \n\n \n \n \n 2025-02-10 22:11:38 UTC< 1 s2025-02-10 22:11:38 UTC\n \n\n\n\n\n\n\n \n\n\nThe validation table is displayed with a default title (‘Validation Report’). We can use the get_tabular_report() method to customize the title of the report. For example, we can set the title to the name of the table by using the title=\":tbl_name:\" option. This will use the string provided in the tbl_name= argument of the Validate object.\n\nvalidation.get_tabular_report(title=\":tbl_name:\")\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n tbl_xy\n \n \n 2025-02-10|22:11:38Polarstbl_xyWARN2STOP3NOTIFY4\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C66\n 1\n \n \n \n\n col_vals_gt\n \n \n \n \n \n \n\n \n col_vals_gt()\n \n x\n 1\n \n \n \n \n \n \n \n \n\n ✓\n 4\n 30.75\n 10.25\n ○\n ○\n ○\n CSV\n \n \n #FFBF00\n 2\n \n \n \n\n col_vals_lt\n \n \n \n \n \n \n\n \n col_vals_lt()\n \n x\n 3\n \n \n \n \n \n \n \n \n\n ✓\n 4\n 20.50\n 20.50\n ●\n ○\n ○\n CSV\n \n \n #4CA64C\n 3\n \n \n \n\n col_vals_lte\n \n \n \n \n \n \n\n \n col_vals_le()\n \n y\n 7\n \n \n \n \n \n \n \n \n\n ✓\n 4\n 41.00\n 00.00\n ○\n ○\n ○\n —\n \n\n \n \n \n 2025-02-10 22:11:38 UTC< 1 s2025-02-10 22:11:38 UTC\n \n\n\n\n\n\n\n \n\n\nThe title of the report is now set to the name of the table, which is ‘tbl_xy’. This can be useful if you have multiple tables and want to keep track of which table the validation report is for.\nAlternatively, you can provide your own title for the report.\n\nvalidation.get_tabular_report(title=\"Report for Table XY\")\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Report for Table XY\n\n \n \n 2025-02-10|22:11:38Polarstbl_xyWARN2STOP3NOTIFY4\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C66\n 1\n \n \n \n\n col_vals_gt\n \n \n \n \n \n \n\n \n col_vals_gt()\n \n x\n 1\n \n \n \n \n \n \n \n \n\n ✓\n 4\n 30.75\n 10.25\n ○\n ○\n ○\n CSV\n \n \n #FFBF00\n 2\n \n \n \n\n col_vals_lt\n \n \n \n \n \n \n\n \n col_vals_lt()\n \n x\n 3\n \n \n \n \n \n \n \n \n\n ✓\n 4\n 20.50\n 20.50\n ●\n ○\n ○\n CSV\n \n \n #4CA64C\n 3\n \n \n \n\n col_vals_lte\n \n \n \n \n \n \n\n \n col_vals_le()\n \n y\n 7\n \n \n \n \n \n \n \n \n\n ✓\n 4\n 41.00\n 00.00\n ○\n ○\n ○\n —\n \n\n \n \n \n 2025-02-10 22:11:38 UTC< 1 s2025-02-10 22:11:38 UTC\n \n\n\n\n\n\n\n \n\n\nThe title of the report is now set to ‘Report for Table XY’. This can be useful if you want to provide a more descriptive title for the report."
+ "text": "Examples\nLet’s create a Validate object with a few validation steps and then interrogate the data table to see how it performs against the validation plan. We can then generate a tabular report to get a summary of the results.\n\nimport pointblank as pb\nimport polars as pl\n\n# Create a Polars DataFrame\ntbl_pl = pl.DataFrame({\"x\": [1, 2, 3, 4], \"y\": [4, 5, 6, 7]})\n\n# Validate data using Polars DataFrame\nvalidation = (\n pb.Validate(data=tbl_pl, tbl_name=\"tbl_xy\", thresholds=(2, 3, 4))\n .col_vals_gt(columns=\"x\", value=1)\n .col_vals_lt(columns=\"x\", value=3)\n .col_vals_le(columns=\"y\", value=7)\n .interrogate()\n)\n\n# Look at the validation table\nvalidation\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Pointblank Validation\n \n \n 2025-02-11|05:29:35Polarstbl_xyWARN2STOP3NOTIFY4\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C66\n 1\n \n \n \n\n col_vals_gt\n \n \n \n \n \n \n\n \n col_vals_gt()\n \n x\n 1\n \n \n \n \n \n \n \n \n\n ✓\n 4\n 30.75\n 10.25\n ○\n ○\n ○\n CSV\n \n \n #FFBF00\n 2\n \n \n \n\n col_vals_lt\n \n \n \n \n \n \n\n \n col_vals_lt()\n \n x\n 3\n \n \n \n \n \n \n \n \n\n ✓\n 4\n 20.50\n 20.50\n ●\n ○\n ○\n CSV\n \n \n #4CA64C\n 3\n \n \n \n\n col_vals_lte\n \n \n \n \n \n \n\n \n col_vals_le()\n \n y\n 7\n \n \n \n \n \n \n \n \n\n ✓\n 4\n 41.00\n 00.00\n ○\n ○\n ○\n —\n \n\n \n \n \n 2025-02-11 05:29:35 UTC< 1 s2025-02-11 05:29:35 UTC\n \n\n\n\n\n\n\n \n\n\nThe validation table is displayed with a default title (‘Validation Report’). We can use the get_tabular_report() method to customize the title of the report. For example, we can set the title to the name of the table by using the title=\":tbl_name:\" option. This will use the string provided in the tbl_name= argument of the Validate object.\n\nvalidation.get_tabular_report(title=\":tbl_name:\")\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n tbl_xy\n \n \n 2025-02-11|05:29:35Polarstbl_xyWARN2STOP3NOTIFY4\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C66\n 1\n \n \n \n\n col_vals_gt\n \n \n \n \n \n \n\n \n col_vals_gt()\n \n x\n 1\n \n \n \n \n \n \n \n \n\n ✓\n 4\n 30.75\n 10.25\n ○\n ○\n ○\n CSV\n \n \n #FFBF00\n 2\n \n \n \n\n col_vals_lt\n \n \n \n \n \n \n\n \n col_vals_lt()\n \n x\n 3\n \n \n \n \n \n \n \n \n\n ✓\n 4\n 20.50\n 20.50\n ●\n ○\n ○\n CSV\n \n \n #4CA64C\n 3\n \n \n \n\n col_vals_lte\n \n \n \n \n \n \n\n \n col_vals_le()\n \n y\n 7\n \n \n \n \n \n \n \n \n\n ✓\n 4\n 41.00\n 00.00\n ○\n ○\n ○\n —\n \n\n \n \n \n 2025-02-11 05:29:35 UTC< 1 s2025-02-11 05:29:35 UTC\n \n\n\n\n\n\n\n \n\n\nThe title of the report is now set to the name of the table, which is ‘tbl_xy’. This can be useful if you have multiple tables and want to keep track of which table the validation report is for.\nAlternatively, you can provide your own title for the report.\n\nvalidation.get_tabular_report(title=\"Report for Table XY\")\n\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n\n \n Report for Table XY\n\n \n \n 2025-02-11|05:29:35Polarstbl_xyWARN2STOP3NOTIFY4\n \n\n \n \n STEP\n COLUMNS\n VALUES\n TBL\n EVAL\n UNITS\n PASS\n FAIL\n W\n S\n N\n EXT\n\n\n\n \n #4CA64C66\n 1\n \n \n \n\n col_vals_gt\n \n \n \n \n \n \n\n \n col_vals_gt()\n \n x\n 1\n \n \n \n \n \n \n \n \n\n ✓\n 4\n 30.75\n 10.25\n ○\n ○\n ○\n CSV\n \n \n #FFBF00\n 2\n \n \n \n\n col_vals_lt\n \n \n \n \n \n \n\n \n col_vals_lt()\n \n x\n 3\n \n \n \n \n \n \n \n \n\n ✓\n 4\n 20.50\n 20.50\n ●\n ○\n ○\n CSV\n \n \n #4CA64C\n 3\n \n \n \n\n col_vals_lte\n \n \n \n \n \n \n\n \n col_vals_le()\n \n y\n 7\n \n \n \n \n \n \n \n \n\n ✓\n 4\n 41.00\n 00.00\n ○\n ○\n ○\n —\n \n\n \n \n \n 2025-02-11 05:29:35 UTC< 1 s2025-02-11 05:29:35 UTC\n \n\n\n\n\n\n\n \n\n\nThe title of the report is now set to ‘Report for Table XY’. This can be useful if you want to provide a more descriptive title for the report."
},
{
"objectID": "reference/Validate.get_json_report.html",
diff --git a/sitemap.xml b/sitemap.xml
index 9fbfe92a..9f014e0d 100644
--- a/sitemap.xml
+++ b/sitemap.xml
@@ -2,310 +2,310 @@
https://posit-dev.github.io/pointblank/index.html
- 2025-02-10T22:07:32.727Z
+ 2025-02-11T05:25:11.664Z
https://posit-dev.github.io/pointblank/reference/get_row_count.html
- 2025-02-10T22:08:18.017Z
+ 2025-02-11T05:25:59.840Z
https://posit-dev.github.io/pointblank/reference/Validate.col_count_match.html
- 2025-02-10T22:08:17.858Z
+ 2025-02-11T05:25:59.680Z
https://posit-dev.github.io/pointblank/reference/Validate.get_sundered_data.html
- 2025-02-10T22:08:17.932Z
+ 2025-02-11T05:25:59.755Z
https://posit-dev.github.io/pointblank/reference/Validate.n_passed.html
- 2025-02-10T22:08:17.953Z
+ 2025-02-11T05:25:59.776Z
https://posit-dev.github.io/pointblank/reference/Validate.warn.html
- 2025-02-10T22:08:17.975Z
+ 2025-02-11T05:25:59.798Z
https://posit-dev.github.io/pointblank/reference/everything.html
- 2025-02-10T22:08:17.893Z
+ 2025-02-11T05:25:59.714Z
https://posit-dev.github.io/pointblank/reference/Validate.html
- 2025-02-10T22:08:17.694Z
+ 2025-02-11T05:25:59.516Z
https://posit-dev.github.io/pointblank/reference/Validate.col_vals_not_null.html
- 2025-02-10T22:08:17.806Z
+ 2025-02-11T05:25:59.627Z
https://posit-dev.github.io/pointblank/reference/Validate.col_schema_match.html
- 2025-02-10T22:08:17.843Z
+ 2025-02-11T05:25:59.665Z
https://posit-dev.github.io/pointblank/reference/Validate.n.html
- 2025-02-10T22:08:17.948Z
+ 2025-02-11T05:25:59.771Z
https://posit-dev.github.io/pointblank/reference/Validate.interrogate.html
- 2025-02-10T22:08:17.912Z
+ 2025-02-11T05:25:59.734Z
https://posit-dev.github.io/pointblank/reference/Validate.n_failed.html
- 2025-02-10T22:08:17.959Z
+ 2025-02-11T05:25:59.782Z
https://posit-dev.github.io/pointblank/reference/Validate.col_vals_not_in_set.html
- 2025-02-10T22:08:17.792Z
+ 2025-02-11T05:25:59.613Z
https://posit-dev.github.io/pointblank/reference/Validate.col_vals_gt.html
- 2025-02-10T22:08:17.716Z
+ 2025-02-11T05:25:59.538Z
https://posit-dev.github.io/pointblank/reference/Validate.f_failed.html
- 2025-02-10T22:08:17.970Z
+ 2025-02-11T05:25:59.793Z
https://posit-dev.github.io/pointblank/reference/Validate.col_vals_expr.html
- 2025-02-10T22:08:17.820Z
+ 2025-02-11T05:25:59.642Z
https://posit-dev.github.io/pointblank/reference/Validate.rows_distinct.html
- 2025-02-10T22:08:17.834Z
+ 2025-02-11T05:25:59.655Z
https://posit-dev.github.io/pointblank/reference/Validate.get_data_extracts.html
- 2025-02-10T22:08:17.938Z
+ 2025-02-11T05:25:59.762Z
https://posit-dev.github.io/pointblank/reference/Validate.col_vals_null.html
- 2025-02-10T22:08:17.799Z
+ 2025-02-11T05:25:59.620Z
https://posit-dev.github.io/pointblank/reference/matches.html
- 2025-02-10T22:08:17.888Z
+ 2025-02-11T05:25:59.709Z
https://posit-dev.github.io/pointblank/reference/preview.html
- 2025-02-10T22:08:18.002Z
+ 2025-02-11T05:25:59.825Z
https://posit-dev.github.io/pointblank/reference/load_dataset.html
- 2025-02-10T22:08:17.992Z
+ 2025-02-11T05:25:59.815Z
https://posit-dev.github.io/pointblank/reference/Validate.all_passed.html
- 2025-02-10T22:08:17.942Z
+ 2025-02-11T05:25:59.765Z
https://posit-dev.github.io/pointblank/reference/Validate.col_vals_lt.html
- 2025-02-10T22:08:17.724Z
+ 2025-02-11T05:25:59.546Z
https://posit-dev.github.io/pointblank/reference/Validate.col_vals_le.html
- 2025-02-10T22:08:17.741Z
+ 2025-02-11T05:25:59.563Z
https://posit-dev.github.io/pointblank/reference/Validate.row_count_match.html
- 2025-02-10T22:08:17.851Z
+ 2025-02-11T05:25:59.672Z
https://posit-dev.github.io/pointblank/get-started/thresholds.html
- 2025-02-10T22:07:32.727Z
+ 2025-02-11T05:25:11.664Z
https://posit-dev.github.io/pointblank/demos/set-membership/index.html
- 2025-02-10T22:07:32.727Z
+ 2025-02-11T05:25:11.664Z
https://posit-dev.github.io/pointblank/demos/comparisons-across-columns/index.html
- 2025-02-10T22:07:32.721Z
+ 2025-02-11T05:25:11.659Z
https://posit-dev.github.io/pointblank/demos/failure-thresholds/index.html
- 2025-02-10T22:07:32.722Z
+ 2025-02-11T05:25:11.659Z
https://posit-dev.github.io/pointblank/demos/apply-checks-to-several-columns/index.html
- 2025-02-10T22:07:32.721Z
+ 2025-02-11T05:25:11.658Z
https://posit-dev.github.io/pointblank/demos/mutate-table-in-step/index.html
- 2025-02-10T22:07:32.727Z
+ 2025-02-11T05:25:11.664Z
https://posit-dev.github.io/pointblank/demos/01-starter/index.html
- 2025-02-10T22:07:32.721Z
+ 2025-02-11T05:25:11.658Z
https://posit-dev.github.io/pointblank/demos/expect-text-pattern/index.html
- 2025-02-10T22:07:32.722Z
+ 2025-02-11T05:25:11.659Z
https://posit-dev.github.io/pointblank/demos/check-row-column-counts/index.html
- 2025-02-10T22:07:32.721Z
+ 2025-02-11T05:25:11.658Z
https://posit-dev.github.io/pointblank/demos/expect-no-duplicate-values/index.html
- 2025-02-10T22:07:32.722Z
+ 2025-02-11T05:25:11.659Z
https://posit-dev.github.io/pointblank/demos/column-selector-functions/index.html
- 2025-02-10T22:07:32.721Z
+ 2025-02-11T05:25:11.659Z
https://posit-dev.github.io/pointblank/demos/numeric-comparisons/index.html
- 2025-02-10T22:07:32.727Z
+ 2025-02-11T05:25:11.664Z
https://posit-dev.github.io/pointblank/demos/schema-check/index.html
- 2025-02-10T22:07:32.727Z
+ 2025-02-11T05:25:11.664Z
https://posit-dev.github.io/pointblank/demos/using-parquet-data/index.html
- 2025-02-10T22:07:32.727Z
+ 2025-02-11T05:25:11.664Z
https://posit-dev.github.io/pointblank/demos/04-sundered-data/index.html
- 2025-02-10T22:07:32.721Z
+ 2025-02-11T05:25:11.658Z
https://posit-dev.github.io/pointblank/demos/05-step-report-column-check/index.html
- 2025-02-10T22:07:32.721Z
+ 2025-02-11T05:25:11.658Z
https://posit-dev.github.io/pointblank/demos/02-advanced/index.html
- 2025-02-10T22:07:32.721Z
+ 2025-02-11T05:25:11.658Z
https://posit-dev.github.io/pointblank/demos/06-step-report-schema-check/index.html
- 2025-02-10T22:07:32.721Z
+ 2025-02-11T05:25:11.658Z
https://posit-dev.github.io/pointblank/demos/checks-for-missing/index.html
- 2025-02-10T22:07:32.721Z
+ 2025-02-11T05:25:11.658Z
https://posit-dev.github.io/pointblank/demos/index.html
- 2025-02-10T22:07:32.727Z
+ 2025-02-11T05:25:11.664Z
https://posit-dev.github.io/pointblank/demos/03-data-extracts/index.html
- 2025-02-10T22:07:32.721Z
+ 2025-02-11T05:25:11.658Z
https://posit-dev.github.io/pointblank/demos/expect-no-duplicate-rows/index.html
- 2025-02-10T22:07:32.722Z
+ 2025-02-11T05:25:11.659Z
https://posit-dev.github.io/pointblank/demos/col-vals-custom-expr/index.html
- 2025-02-10T22:07:32.721Z
+ 2025-02-11T05:25:11.659Z
https://posit-dev.github.io/pointblank/get-started/index.html
- 2025-02-10T22:07:32.727Z
+ 2025-02-11T05:25:11.664Z
https://posit-dev.github.io/pointblank/reference/Thresholds.html
- 2025-02-10T22:08:17.701Z
+ 2025-02-11T05:25:59.523Z
https://posit-dev.github.io/pointblank/reference/get_column_count.html
- 2025-02-10T22:08:18.012Z
+ 2025-02-11T05:25:59.835Z
https://posit-dev.github.io/pointblank/reference/missing_vals_tbl.html
- 2025-02-10T22:08:18.007Z
+ 2025-02-11T05:25:59.830Z
https://posit-dev.github.io/pointblank/reference/Validate.col_vals_outside.html
- 2025-02-10T22:08:17.776Z
+ 2025-02-11T05:25:59.598Z
https://posit-dev.github.io/pointblank/reference/Validate.get_step_report.html
- 2025-02-10T22:08:17.922Z
+ 2025-02-11T05:25:59.745Z
https://posit-dev.github.io/pointblank/reference/contains.html
- 2025-02-10T22:08:17.882Z
+ 2025-02-11T05:25:59.703Z
https://posit-dev.github.io/pointblank/reference/Validate.col_exists.html
- 2025-02-10T22:08:17.827Z
+ 2025-02-11T05:25:59.648Z
https://posit-dev.github.io/pointblank/reference/Validate.col_vals_between.html
- 2025-02-10T22:08:17.767Z
+ 2025-02-11T05:25:59.588Z
https://posit-dev.github.io/pointblank/reference/Schema.html
- 2025-02-10T22:08:17.708Z
+ 2025-02-11T05:25:59.530Z
https://posit-dev.github.io/pointblank/reference/Validate.notify.html
- 2025-02-10T22:08:17.986Z
+ 2025-02-11T05:25:59.809Z
https://posit-dev.github.io/pointblank/reference/Validate.stop.html
- 2025-02-10T22:08:17.981Z
+ 2025-02-11T05:25:59.804Z
https://posit-dev.github.io/pointblank/reference/col.html
- 2025-02-10T22:08:17.864Z
+ 2025-02-11T05:25:59.685Z
https://posit-dev.github.io/pointblank/reference/Validate.col_vals_ge.html
- 2025-02-10T22:08:17.732Z
+ 2025-02-11T05:25:59.554Z
https://posit-dev.github.io/pointblank/reference/Validate.f_passed.html
- 2025-02-10T22:08:17.964Z
+ 2025-02-11T05:25:59.787Z
https://posit-dev.github.io/pointblank/reference/Validate.col_vals_eq.html
- 2025-02-10T22:08:17.749Z
+ 2025-02-11T05:25:59.571Z
https://posit-dev.github.io/pointblank/reference/config.html
- 2025-02-10T22:08:18.022Z
+ 2025-02-11T05:25:59.847Z
https://posit-dev.github.io/pointblank/reference/Validate.col_vals_in_set.html
- 2025-02-10T22:08:17.784Z
+ 2025-02-11T05:25:59.605Z
https://posit-dev.github.io/pointblank/reference/index.html
- 2025-02-10T22:08:17.672Z
+ 2025-02-11T05:25:59.493Z
https://posit-dev.github.io/pointblank/reference/Validate.col_vals_ne.html
- 2025-02-10T22:08:17.757Z
+ 2025-02-11T05:25:59.579Z
https://posit-dev.github.io/pointblank/reference/starts_with.html
- 2025-02-10T22:08:17.870Z
+ 2025-02-11T05:25:59.691Z
https://posit-dev.github.io/pointblank/reference/last_n.html
- 2025-02-10T22:08:17.905Z
+ 2025-02-11T05:25:59.726Z
https://posit-dev.github.io/pointblank/reference/ends_with.html
- 2025-02-10T22:08:17.876Z
+ 2025-02-11T05:25:59.697Z
https://posit-dev.github.io/pointblank/reference/Validate.col_vals_regex.html
- 2025-02-10T22:08:17.814Z
+ 2025-02-11T05:25:59.635Z
https://posit-dev.github.io/pointblank/reference/first_n.html
- 2025-02-10T22:08:17.899Z
+ 2025-02-11T05:25:59.720Z
https://posit-dev.github.io/pointblank/reference/Validate.get_tabular_report.html
- 2025-02-10T22:08:17.918Z
+ 2025-02-11T05:25:59.741Z
https://posit-dev.github.io/pointblank/reference/Validate.get_json_report.html
- 2025-02-10T22:08:17.927Z
+ 2025-02-11T05:25:59.750Z