Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(catalog): Add pagination for list table operation across different catalog types #306

Merged
merged 3 commits into from
Feb 20, 2025

Conversation

dttung2905
Copy link
Contributor

Goal

To support pagination for ListTables method. Similar to ListViews operation in PR #290. We need to change the method interface in catalog.go , cascading changes into glue.go and sql.go. I'm not sure about the pagination in sql.go so I write a wrapper to convert the existing function to iter.Seq2[] type

TODO:

  • Get alignment on the wrapper of ListTables method in sql.go
  • Add more test for sql.go and glue.go once the first point is aligned

@zeroshade zeroshade changed the title Add pagination for list table operation across different catalog types feat(catalog): Add pagination for list table operation across different catalog types Feb 18, 2025
}
}

func (c *Catalog) ListTablesPage(ctx context.Context, namespace table.Identifier) ([]table.Identifier, *string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason to export this?

Also I'm not seeing how the next page token is being passed back to it for it to be used?

In addition, any reason to use a pointer to the string rather than just using an empty string to indicate there are no further pages?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick review. I really appreciate it. 🙏

Any particular reason to export this?

No just typo on my side. It should be private method

Also I'm not seeing how the next page token is being passed back to it for it to be used?

Cmiiw, from the current code, the c.glueSvc.GetTables does not use any next page token. Mainly I think it is meant for checking the end of the page in ListTables

@@ -20,6 +20,7 @@ package glue
import (
"context"
"errors"
"github.com/apache/iceberg-go/table"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use gofmt to reorder the imports. We should probably add a linter step to identify and fix this....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me handle that linter step. I will do some time later this week

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be tmr if I can squeeze in some time to do it 🙏


func (r *Catalog) ListTablesPage(ctx context.Context, namespace table.Identifier, pageToken string, pageSize int) ([]table.Identifier, string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here, is there any reason to export this?

TestToken = "some_jwt_token"
TestCreds = "client:secret"
TestToken = "some_jwt_token"
defaultPageToken = 20
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this be defaultPageSize?

tbls := make([]table.Identifier, 0)
iter := c.ListTables(ctx, namespace)

for tbl, err := range iter {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to gather all the tables, as so as you get 1 table you can break and return the error below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. Thank you for pointing it out 🙏

}
}

func (c *Catalog) ListTablesAll(ctx context.Context, namespace table.Identifier) ([]table.Identifier, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as before, why export this?

Comment on lines 180 to 199
func (c *Catalog) listTablesPage(ctx context.Context, namespace table.Identifier) ([]table.Identifier, string, error) {
database, err := identifierToGlueDatabase(namespace)
if err != nil {
return nil, nil, err
return nil, "", err
}
params := &glue.GetTablesInput{CatalogId: c.catalogId, DatabaseName: aws.String(database)}
tblsRes, err := c.glueSvc.GetTables(ctx, params)
if err != nil {
return nil, nil, fmt.Errorf("failed to list tables in namespace %s: %w", database, err)
return nil, "", fmt.Errorf("failed to list tables in namespace %s: %w", database, err)
}
var icebergTables []table.Identifier
icebergTables = append(icebergTables,
filterTableListByType(database, tblsRes.TableList, glueTypeIceberg)...)

return icebergTables, tblsRes.NextToken, nil
var nextToken string
if tblsRes.NextToken != nil {
nextToken = *tblsRes.NextToken
}
return icebergTables, nextToken, nil
}
Copy link
Member

@zeroshade zeroshade Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at the docs for GetTablesInput it looks like the nextToken should be being passed back to listTablesPage and put into the GetTablesInput.

Personally, it might make more sense to just use NewGetTablesPaginator instead, rather than implementing the pagination ourselves 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @zeroshade ,
Nice. That's something that I have not known about. Thanks for pointing it out. I have tried to create a different version with NewGetTablesPaginator and creating few extra test cases in glue catalog to verify its functionality 🙏

Copy link
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Thanks so much for this!

@zeroshade zeroshade merged commit 6c4e87b into apache:main Feb 20, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants