Skip to content

Proposal: Support for Parameterized Views #301

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
trueleo opened this issue Apr 3, 2025 · 2 comments
Open

Proposal: Support for Parameterized Views #301

trueleo opened this issue Apr 3, 2025 · 2 comments

Comments

@trueleo
Copy link
Contributor

trueleo commented Apr 3, 2025

This crate currently supports registering remote tables from different providers. However, it currently lacks support for registering remote views and procedures, making it difficult to query from views and parametrized view. This feature would be particularly useful for many people who want datafusion to only have partial access to a table ( for ex. in multi-tenancy ) or hiding a very complicated vendor specific logic abstracted behind a view.

Problems

  • Currently, there is no support for passing table function arguments to a SqlTable or any of vendor-specific implementation.
  • Many vendor implementations rely on specific query for fetching schema when a table is created. These queries are tailored for fetching table schema only and do not work for fetching schema of a view for example.
  • Additional changes need to happen in the datafusion-federation crate to support this functionality.

Proposed Solution:

Based on some experimenting that I have done till now, I want to propose following changes to bring this feature.
Either extending SqlTable ( and any similar vendor specific Table Provider ) to store optional arguments as sqlpaser::ast::TableFunctionArgs. Or create a new type similar to SqlTable called SqlView

Extending SyncDbConnection and AsyncDbConnection with new method get_view_schema for retrieving schema for a view.

When creating a SQL string for SqlExec , First create a basic logical plan with an alias set, unparse the logical plan to a Statement. Then use the following visitor to append the arguments if any.

pub struct TableArgReplace {
	pub(crate) args: TableFunctionArgs,
}

impl ast::VisitorMut for TableArgReplace {
	type Break = ();
	fn pre_visit_table_factor(&mut self, table_factor: &mut ast::TableFactor) -> ControlFlow<Self::Break> {
		if let ast::TableFactor::Table { args, .. } = table_factor {
			*args = Some(self.args.clone())
		}
		ControlFlow::Continue(())
	}
}

Similar approach can done in datafusion-federation sql module.

Other Benefits

This will also allow for sort of curried table functions. Essentially allowing for defining a function like read_from(path), which can the internally create a new table provider that calls a remote view read_from(path, cluster)

Open Questions:

  • Should this functionality be a part of SqlTable or a new type will be better for it?
  • What TableFactory for creating views will look like ?
  • Which vendors support fetching query for a view and how. ( I have some ideas for PostgreSQL and MySQL implementation )
  • Can we automatically discover arguments required for querying a view and register it for them as a table function in datafusion ?

We’d love feedback from the community on this proposal. Are there use cases we might be overlooking? Would this fit well with scope of this project.

Note

I will update this thread and provide a PR or reference to my forked branch that people can look at and see if this seems okay.

@trueleo trueleo changed the title Proposal: Support for Parameterized Views in DataFusion TableProvider Proposal: Support for Parameterized Views Apr 3, 2025
@trueleo
Copy link
Contributor Author

trueleo commented Apr 24, 2025

Since support for parameterized views is merged into datafusion-federation. It is now possible to move forward on this feature, although I am not sure yet about the right approach.

Awaiting this to be merged first #324

SqlTable is very overloaded and as it implements both TableProvider and datafusion-federation::SqlExecutor. This is okay for ease of use but I'd prefer to have types that implement SqlExecutor for each remote table. Then if needed SqlTable can just convert into the type that implements SqlExecutor for federation.

@phillipleblanc
Copy link
Collaborator

SqlTable is very overloaded and as it implements both TableProvider and datafusion-federation::SqlExecutor. This is okay for ease of use but I'd prefer to have types that implement SqlExecutor for each remote table. Then if needed SqlTable can just convert into the type that implements SqlExecutor for federation.

We've already started going down this route (see MySQL: https://github.com/datafusion-contrib/datafusion-table-providers/blob/main/core/src/mysql/federation.rs).

In general I think the SqlTable abstraction hasn't worked out as well as I hoped it would (I was very new to Rust when I wrote that). If I were to redo it now, I would probably just have a set of common components and specific types that implement both TableProvider and SqlExecutor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants