Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(aws-glue): what format to use w/ cloudfront #16695

Closed
wants to merge 4 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion packages/@aws-cdk/aws-glue/lib/data-format.ts
Original file line number Diff line number Diff line change
Expand Up @@ -208,7 +208,7 @@ export interface DataFormatProps {
*/
export class DataFormat {
/**
* DataFormat for Apache Web Server Logs. Also works for CloudFront logs
* DataFormat for Apache Web Server Logs.
*
* @see https://docs.aws.amazon.com/athena/latest/ug/apache.html
*/
Expand Down Expand Up @@ -304,6 +304,10 @@ export class DataFormat {
/**
* DataFormat for TSV (Tab-Separated Values)
*
* This works for CloudFront logs when supplied with
* additional SerdeInfo parameters (`field.delim` and
* `serialization.format` set to "\t")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't clear to me where you are suggesting to put the additional SerdeInfo parameters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! My next question is, where does field.delim and serialization.format come from? How can I supply the additional parameters? I don't have a ton of context in this space so you're going to have to make sure your documentation makes sense to those who are learning how to use glue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, there is no way of adding additional Serde parameters into the Glue L2 construct, am I missing something?

serdeInfo: {
serializationLibrary: props.dataFormat.serializationLibrary.className,
},

https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-glue-table-serdeinfo.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! My next question is, where does field.delim and serialization.format come from? How can I supply the additional parameters? I don't have a ton of context in this space so you're going to have to make sure your documentation makes sense to those who are learning how to use glue.

I noticed these when I created the table via the console UI and then queried it. They must come from:
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, there is no way of adding additional Serde parameters into the Glue L2 construct, am I missing something?

serdeInfo: {
serializationLibrary: props.dataFormat.serializationLibrary.className,
},

https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-glue-table-serdeinfo.html

workaround:

  // escape hatch for missing config option in glue.Table
  // ref.: https://github.com/aws/aws-cdk/issues/16660
  propertyOverrides.forEach(({ propertyPath, value }) => {
    (table.node.defaultChild as glue.CfnTable).addPropertyOverride(
      propertyPath,
      value
    );
  });

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes. I meant to say that there is no native support for adding additional Serde parameters. We can't add documentation to our L2 Glue Construct that requires an escape hatch. Seems related to #14159, though I'm not sure if you are suggesting that the additional parameters needed belong in serdeInfo.parameters or tableInput.parameters.

Do you have working code for this feature using the Glue L2 w/ escape hatches? And have you confirmed that the originally documented method using APACHE_LOGS does not work? I can/will investigate this as well but since this is marked p2 it isn't something I can immediately prioritize.

At any rate, if these parameters are necessary, we need native support for parameters before we support it in DataFormat.

Copy link
Contributor Author

@naseemkullah naseemkullah Oct 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have working code for this feature using the Glue L2 w/ escape hatches? And have you confirmed that the originally documented method using APACHE_LOGS does not work? I can/will investigate this as well but since this is marked p2 it isn't something I can immediately prioritize.

Yes ans yes.

At any rate, if these parameters are necessary, we need native support for parameters before we support it in DataFormat.

OK, we then depend on #14159

BTW this will be useful for awslabs/aws-solutions-constructs#399

*
* @see https://docs.aws.amazon.com/athena/latest/ug/lazy-simple-serde.html
*/
public static readonly TSV = new DataFormat({
Expand Down