Skip to content

Commit 36d23ef

Browse files
asl3cloud-fan
authored andcommitted
[SPARK-50541] Describe Table As JSON
### What changes were proposed in this pull request? Support `DESCRIBE TABLE ... [AS JSON]` to optionally display table metadata in JSON format. **SQL Ref Spec:** { DESC | DESCRIBE } [ TABLE ] [ EXTENDED | FORMATTED ] table_name { [ PARTITION clause ] | [ column_name ] } **[ AS JSON ]** Output: json_metadata: String ### Why are the changes needed? The Spark SQL command `DESCRIBE TABLE` displays table metadata in a DataFrame format geared toward human consumption. This format causes parsing challenges, e.g. if fields contain special characters or the format changes as new features are added. The new `AS JSON` option would return the table metadata as a JSON string that supports parsing via machine, while being extensible with a minimized risk of breaking changes. It is not meant to be human-readable. ### Does this PR introduce _any_ user-facing change? Yes, this provides a new option to display DESCRIBE TABLE metadata in JSON format. See below (and updated golden files) for the JSON output schema: ``` { "table_name": "<table_name>", "catalog_name": "<catalog_name>", "schema_name": "<innermost_schema_name>", "namespace": ["<innermost_schema_name>"], "type": "<table_type>", "provider": "<provider>", "columns": [ { "name": "<name>", "type": <type_json>, "comment": "<comment>", "nullable": <boolean>, "default": "<default_val>" } ], "partition_values": { "<col_name>": "<val>" }, "location": "<path>", "view_text": "<view_text>", "view_original_text": "<view_original_text>", "view_schema_mode": "<view_schema_mode>", "view_catalog_and_namespace": "<view_catalog_and_namespace>", "view_query_output_columns": ["col1", "col2"], "owner": "<owner>", "comment": "<comment>", "table_properties": { "property1": "<property1>", "property2": "<property2>" }, "storage_properties": { "property1": "<property1>", "property2": "<property2>" }, "serde_library": "<serde_library>", "input_format": "<input_format>", "output_format": "<output_format>", "num_buckets": <num_buckets>, "bucket_columns": ["<col_name>"], "sort_columns": ["<col_name>"], "created_time": "<timestamp_ISO-8601>", "last_access": "<timestamp_ISO-8601>", "partition_provider": "<partition_provider>" } ``` ### How was this patch tested? - Updated golden files for `describe.sql` - Added tests in `DescribeTableParserSuite.scala`, `DescribeTableSuite.scala`, `PlanResolutionSuite.scala` ### Was this patch authored or co-authored using generative AI tooling? Closes #49139 from asl3/asl3/describetableasjson. Authored-by: Amanda Liu <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
1 parent 22cbb96 commit 36d23ef

File tree

26 files changed

+1313
-101
lines changed

26 files changed

+1313
-101
lines changed

common/utils/src/main/resources/error/error-conditions.json

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1155,6 +1155,13 @@
11551155
],
11561156
"sqlState" : "42623"
11571157
},
1158+
"DESCRIBE_JSON_NOT_EXTENDED" : {
1159+
"message" : [
1160+
"DESCRIBE TABLE ... AS JSON only supported when [EXTENDED|FORMATTED] is specified.",
1161+
"For example: DESCRIBE EXTENDED <tableName> AS JSON is supported but DESCRIBE <tableName> AS JSON is not."
1162+
],
1163+
"sqlState" : "0A000"
1164+
},
11581165
"DISTINCT_WINDOW_FUNCTION_UNSUPPORTED" : {
11591166
"message" : [
11601167
"Distinct window functions are not supported: <windowExpr>."
@@ -5283,6 +5290,11 @@
52835290
"Attach a comment to the namespace <namespace>."
52845291
]
52855292
},
5293+
"DESC_TABLE_COLUMN_JSON" : {
5294+
"message" : [
5295+
"DESC TABLE COLUMN AS JSON not supported for individual columns."
5296+
]
5297+
},
52865298
"DESC_TABLE_COLUMN_PARTITION" : {
52875299
"message" : [
52885300
"DESC TABLE COLUMN for a specific partition."

docs/sql-ref-ansi-compliance.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -568,6 +568,7 @@ Below is a list of all the keywords in Spark SQL.
568568
|ITEMS|non-reserved|non-reserved|non-reserved|
569569
|ITERATE|non-reserved|non-reserved|non-reserved|
570570
|JOIN|reserved|strict-non-reserved|reserved|
571+
|JSON|non-reserved|non-reserved|non-reserved|
571572
|KEYS|non-reserved|non-reserved|non-reserved|
572573
|LANGUAGE|non-reserved|non-reserved|reserved|
573574
|LAST|non-reserved|non-reserved|non-reserved|

docs/sql-ref-syntax-aux-describe-table.md

Lines changed: 96 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -29,16 +29,17 @@ to return the metadata pertaining to a partition or column respectively.
2929
### Syntax
3030

3131
```sql
32-
{ DESC | DESCRIBE } [ TABLE ] [ format ] table_identifier [ partition_spec ] [ col_name ]
32+
{ DESC | DESCRIBE } [ TABLE ] [ format ] table_identifier [ partition_spec ] [ col_name ] [ AS JSON ]
3333
```
3434

3535
### Parameters
3636

3737
* **format**
3838

39-
Specifies the optional format of describe output. If `EXTENDED` is specified
39+
Specifies the optional format of describe output. If `EXTENDED` or `FORMATTED` is specified
4040
then additional metadata information (such as parent database, owner, and access time)
41-
is returned.
41+
is returned. Also if `EXTENDED` or `FORMATTED` is specified, then the metadata can be returned
42+
in JSON format by specifying `AS JSON` at the end of the statement.
4243

4344
* **table_identifier**
4445

@@ -60,8 +61,96 @@ to return the metadata pertaining to a partition or column respectively.
6061
and `col_name` are mutually exclusive and can not be specified together. Currently
6162
nested columns are not allowed to be specified.
6263

64+
JSON format is not currently supported for individual columns.
65+
6366
**Syntax:** `[ database_name. ] [ table_name. ] column_name`
6467

68+
* **AS JSON**
69+
70+
An optional parameter to return the table metadata in JSON format. Only supported when `EXTENDED`
71+
or `FORMATTED` format is specified (both produce equivalent JSON).
72+
73+
**Syntax:** `[ AS JSON ]`
74+
75+
**Schema:**
76+
77+
Below is the full JSON schema.
78+
In actual output, null fields are omitted and the JSON is not pretty-printed (see Examples).
79+
80+
```sql
81+
{
82+
"table_name": "<table_name>",
83+
"catalog_name": "<catalog_name>",
84+
"schema_name": "<innermost_namespace_name>",
85+
"namespace": ["<namespace_names>"],
86+
"type": "<table_type>",
87+
"provider": "<provider>",
88+
"columns": [
89+
{
90+
"name": "<name>",
91+
"type": <type_json>,
92+
"comment": "<comment>",
93+
"nullable": <boolean>,
94+
"default": "<default_val>"
95+
}
96+
],
97+
"partition_values": {
98+
"<col_name>": "<val>"
99+
},
100+
"location": "<path>",
101+
"view_text": "<view_text>",
102+
"view_original_text": "<view_original_text>",
103+
"view_schema_mode": "<view_schema_mode>",
104+
"view_catalog_and_namespace": "<view_catalog_and_namespace>",
105+
"view_query_output_columns": ["col1", "col2"],
106+
"comment": "<comment>",
107+
"table_properties": {
108+
"property1": "<property1>",
109+
"property2": "<property2>"
110+
},
111+
"storage_properties": {
112+
"property1": "<property1>",
113+
"property2": "<property2>"
114+
},
115+
"serde_library": "<serde_library>",
116+
"input_format": "<input_format>",
117+
"output_format": "<output_format>",
118+
"num_buckets": <num_buckets>,
119+
"bucket_columns": ["<col_name>"],
120+
"sort_columns": ["<col_name>"],
121+
"created_time": "<timestamp_ISO-8601>",
122+
"created_by": "<created_by>",
123+
"last_access": "<timestamp_ISO-8601>",
124+
"partition_provider": "<partition_provider>"
125+
}
126+
```
127+
128+
Below are the schema definitions for `<type_json>`:
129+
130+
| Spark SQL Data Types | JSON Representation |
131+
|-----------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------|
132+
| ByteType | `{ "name" : "tinyint" }` |
133+
| ShortType | `{ "name" : "smallint" }` |
134+
| IntegerType | `{ "name" : "int" }` |
135+
| LongType | `{ "name" : "bigint" }` |
136+
| FloatType | `{ "name" : "float" }` |
137+
| DoubleType | `{ "name" : "double" }` |
138+
| DecimalType | `{ "name" : "decimal", "precision": p, "scale": s }` |
139+
| StringType | `{ "name" : "string" }` |
140+
| VarCharType | `{ "name" : "varchar", "length": n }` |
141+
| CharType | `{ "name" : "char", "length": n }` |
142+
| BinaryType | `{ "name" : "binary" }` |
143+
| BooleanType | `{ "name" : "boolean" }` |
144+
| DateType | `{ "name" : "date" }` |
145+
| VariantType | `{ "name" : "variant" }` |
146+
| TimestampType | `{ "name" : "timestamp_ltz" }` |
147+
| TimestampNTZType | `{ "name" : "timestamp_ntz" }` |
148+
| YearMonthIntervalType | `{ "name" : "interval", "start_unit": "<start_unit>", "end_unit": "<end_unit>" }` |
149+
| DayTimeIntervalType | `{ "name" : "interval", "start_unit": "<start_unit>", "end_unit": "<end_unit>" }` |
150+
| ArrayType | `{ "name" : "array", "element_type": <type_json>, "element_nullable": <boolean> }` |
151+
| MapType | `{ "name" : "map", "key_type": <type_json>, "value_type": <type_json>, "value_nullable": <boolean> }` |
152+
| StructType | `{ "name" : "struct", "fields": [ {"name" : "field1", "type" : <type_json>, “nullable”: <boolean>, "comment": “<comment>”, "default": “<default_val>”}, ... ] }` |
153+
65154
### Examples
66155

67156
```sql
@@ -173,6 +262,10 @@ DESCRIBE customer salesdb.customer.name;
173262
|data_type| string|
174263
| comment|Short name|
175264
+---------+----------+
265+
266+
-- Returns the table metadata in JSON format.
267+
DESC FORMATTED customer AS JSON;
268+
{"table_name":"customer","catalog_name":"spark_catalog","schema_name":"default","namespace":["default"],"columns":[{"name":"cust_id","type":{"name":"integer"},"nullable":true},{"name":"name","type":{"name":"string"},"comment":"Short name","nullable":true},{"name":"state","type":{"name":"varchar","length":20},"nullable":true}],"location": "file:/tmp/salesdb.db/custom...","created_time":"2020-04-07T14:05:43Z","last_access":"UNKNOWN","created_by":"None","type":"MANAGED","provider":"parquet","partition_provider":"Catalog","partition_columns":["state"]}
176269
```
177270

178271
### Related Statements

sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -283,6 +283,7 @@ IS: 'IS';
283283
ITEMS: 'ITEMS';
284284
ITERATE: 'ITERATE';
285285
JOIN: 'JOIN';
286+
JSON: 'JSON';
286287
KEYS: 'KEYS';
287288
LANGUAGE: 'LANGUAGE';
288289
LAST: 'LAST';

sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -287,7 +287,7 @@ statement
287287
| (DESC | DESCRIBE) namespace EXTENDED?
288288
identifierReference #describeNamespace
289289
| (DESC | DESCRIBE) TABLE? option=(EXTENDED | FORMATTED)?
290-
identifierReference partitionSpec? describeColName? #describeRelation
290+
identifierReference partitionSpec? describeColName? (AS JSON)? #describeRelation
291291
| (DESC | DESCRIBE) QUERY? query #describeQuery
292292
| COMMENT ON namespace identifierReference IS
293293
comment #commentNamespace
@@ -1680,6 +1680,7 @@ ansiNonReserved
16801680
| INVOKER
16811681
| ITEMS
16821682
| ITERATE
1683+
| JSON
16831684
| KEYS
16841685
| LANGUAGE
16851686
| LAST
@@ -2039,6 +2040,7 @@ nonReserved
20392040
| IS
20402041
| ITEMS
20412042
| ITERATE
2043+
| JSON
20422044
| KEYS
20432045
| LANGUAGE
20442046
| LAST

sql/api/src/main/scala/org/apache/spark/sql/errors/CompilationErrors.scala

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,18 @@ private[sql] trait CompilationErrors extends DataTypeErrorsBase {
4141
cause = Option(cause))
4242
}
4343

44+
def describeJsonNotExtendedError(tableName: String): AnalysisException = {
45+
new AnalysisException(
46+
errorClass = "DESCRIBE_JSON_NOT_EXTENDED",
47+
messageParameters = Map("tableName" -> tableName))
48+
}
49+
50+
def describeColJsonUnsupportedError(): AnalysisException = {
51+
new AnalysisException(
52+
errorClass = "UNSUPPORTED_FEATURE.DESC_TABLE_COLUMN_JSON",
53+
messageParameters = Map.empty)
54+
}
55+
4456
def cannotFindDescriptorFileError(filePath: String, cause: Throwable): AnalysisException = {
4557
new AnalysisException(
4658
errorClass = "PROTOBUF_DESCRIPTOR_FILE_NOT_FOUND",

0 commit comments

Comments
 (0)