Skip to content

feat(ibis): postgreSQL, PostGIS support for IBIS Server. #1188

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

eren-ture
Copy link

@eren-ture eren-ture commented May 6, 2025

Inclusions

  • Added an ExtensionHandler class to handle database extensions. Created the postgis_handler method to define columns as geometry or geography.

  • Added new column types GEOMETRY and GEOGRAPHY to the RustWrenEngineColumnType enum to represent PostGIS-specific data types.

Proof of Work

image
image

Issue

When the frontend is asked to display the values, it obviously fails. I unfortunately don't know much about frontend development, ad cannot fix this issue.
image

Summary by CodeRabbit

  • New Features
    • Added support for recognizing and displaying PostGIS extension column types (GEOMETRY and GEOGRAPHY) in PostgreSQL metadata.
    • Metadata extraction now includes extension-specific columns, providing more accurate information for tables using supported PostgreSQL extensions.
    • Introduced spatial data handling and validation through PostGIS-enabled testing, ensuring accurate spatial query support.

Copy link

coderabbitai bot commented May 6, 2025

"""

Walkthrough

The changes introduce support for PostgreSQL extensions, specifically PostGIS, in the metadata extraction process. This is achieved by extending the column type enum to include geometry types and adding an ExtensionHandler class that augments table metadata with extension-specific information during table list retrieval. Additionally, tests were added to validate PostGIS spatial queries.

Changes

File(s) Change Summary
ibis-server/app/model/metadata/dto.py Extended the RustWrenEngineColumnType enum by adding two new members: GEOMETRY and GEOGRAPHY, representing PostGIS extension types.
ibis-server/app/model/metadata/postgres.py Added the ExtensionHandler class to handle PostgreSQL extensions and augment table metadata. Integrated this handler into PostgresMetadata.get_table_list to support PostGIS by updating column types for geometry and geography columns.
ibis-server/tests/routers/v2/connector/test_postgres.py Added a pytest fixture to start a PostGIS-enabled PostgreSQL container and load spatial data. Added an async test to query PostGIS geometry data and verify spatial calculations through the API.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant PostgresMetadata
    participant ExtensionHandler
    participant Database

    Client->>PostgresMetadata: get_table_list()
    PostgresMetadata->>Database: Query standard table metadata
    Database-->>PostgresMetadata: Return tables and columns
    PostgresMetadata->>ExtensionHandler: augment(tables)
    ExtensionHandler->>Database: get_extensions()
    Database-->>ExtensionHandler: Return installed extensions
    ExtensionHandler->>Database: Query PostGIS columns (if present)
    Database-->>ExtensionHandler: Return geometry/geography columns
    ExtensionHandler->>PostgresMetadata: Return augmented tables
    PostgresMetadata-->>Client: Return final table list with extension types
Loading

Poem

In the warren of code, new tunnels appear,
PostGIS now whispers, "Geometry is here!"
Columns grow clever, with types fresh and bright,
Extensions now handled, metadata’s delight.
With paws on the schema and nose to the ground,
This rabbit ensures all the columns are found!
🐇✨
"""

Note

⚡️ AI Code Reviews for VS Code, Cursor, Windsurf

CodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback.
Learn more here.


Note

⚡️ Faster reviews with caching

CodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 16th. To opt out, configure Review - Disable Cache at either the organization or repository level. If you prefer to disable all data retention across your organization, simply turn off the Data Retention setting under your Organization Settings.
Enjoy the performance boost—your workflow just got faster.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 4dc8002 and dea0da1.

📒 Files selected for processing (2)
  • ibis-server/app/model/metadata/postgres.py (2 hunks)
  • ibis-server/tests/routers/v2/connector/test_postgres.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • ibis-server/app/model/metadata/postgres.py
  • ibis-server/tests/routers/v2/connector/test_postgres.py
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added ibis python Pull requests that update Python code labels May 6, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
ibis-server/app/model/metadata/postgres.py (2)

53-86: Consider parameterizing SQL query and improving table name handling

While the implementation is functional, there are a few improvements that could be made:

  1. The SQL query uses string formatting with f-string. Although the schema_name comes from the database itself and is less likely to be a security risk, it's generally better to use parameterized queries.

  2. As noted in your comment on line 76, the table name formatting appears in multiple places. Consider refactoring to use the existing _format_postgres_compact_table_name method from the PostgresMetadata class.

-            table_name = f"{row['f_table_schema']}.{row['f_table_name']}"  # ? Might want to use a global `_format_postgres_compact_table_name` function.
+            table_name = self._format_postgres_compact_table_name(row['f_table_schema'], row['f_table_name'])

# Add the method to the ExtensionHandler class:
+    def _format_postgres_compact_table_name(self, schema: str, table: str):
+        return f"{schema}.{table}"

87-98: Consider consolidating type transformation methods

There are now two methods with similar functionality:

  1. ExtensionHandler._transform_postgres_column_type (new)
  2. PostgresMetadata._transform_postgres_column_type (existing)

Consider refactoring to avoid duplication, perhaps by:

  1. Moving the extension type mapping to the main transformation method, or
  2. Creating a shared utility function that both classes can use.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c0f888d and 1da65b7.

📒 Files selected for processing (2)
  • ibis-server/app/model/metadata/dto.py (1 hunks)
  • ibis-server/app/model/metadata/postgres.py (2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
ibis-server/app/model/metadata/postgres.py (1)
ibis-server/app/model/metadata/dto.py (2)
  • Table (79-84)
  • RustWrenEngineColumnType (13-57)
🔇 Additional comments (4)
ibis-server/app/model/metadata/dto.py (1)

54-57: Well-organized extension of RustWrenEngineColumnType enum

The addition of GEOMETRY and GEOGRAPHY types under a clear "Extension types > PostGIS" comment block is clean and follows the established pattern. This structure allows for future extension types to be added in an organized manner.

ibis-server/app/model/metadata/postgres.py (3)

14-21: Well-designed ExtensionHandler class structure

The class is designed with extensibility in mind using a handler dictionary that maps extension names to methods. This approach makes it easy to add support for additional PostgreSQL extensions in the future.


22-36: Clear extension handling flow

The augment method has a clean implementation that retrieves installed extensions and applies the appropriate handlers. The method correctly returns the modified tables list.


37-52: Effective extension detection query

The SQL query to retrieve extensions and their schemas is well-structured. The error handling for empty dataframes is a good practice.

Comment on lines +169 to 171
extension_handler = ExtensionHandler(self.connection)
unique_tables = extension_handler.augment(unique_tables)
return list(unique_tables.values())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify extension handler augmentation behavior

The augment method signature returns a list of tables, but it's being used here to modify the unique_tables dictionary. Since Python dictionaries are passed by reference, this likely works as intended, but it would be clearer if the method returned the modified dictionary or the variable assignment was removed.


🏁 Script executed:

#!/bin/bash
# Check if the augment method modifies the tables in place

echo "Examining ExtensionHandler.augment method:"
rg -A 5 "def augment" --type py
echo "------------------------------------------------"
echo "Checking reference to unique_tables before and after augment call:"
rg -A 5 -B 2 "extension_handler.augment" --type py

Length of output: 1566


🏁 Script executed:

#!/bin/bash
# Show full implementation of the augment method including its return behavior
rg -n "def augment" -A 50 ibis-server/app/model/metadata/postgres.py

# Show context around unique_tables before and after its definition and usage
rg -n "unique_tables" -C 5 ibis-server/app/model/metadata/postgres.py

Length of output: 3870


🏁 Script executed:

#!/bin/bash
# Show complete ExtensionHandler class, including handler registrations
rg -n "class ExtensionHandler" -A 200 ibis-server/app/model/metadata/postgres.py

echo "---- handlers mapping ----"
rg -n "self.handlers" -A 20 ibis-server/app/model/metadata/postgres.py

echo "---- postgis_handler implementation ----"
rg -n "def postgis_handler" -A 200 ibis-server/app/model/metadata/postgres.py

Length of output: 19056


Fix type annotations for ExtensionHandler.augment and handlers

The augment and related handler methods actually accept and return a mapping of table names to Table objects (a dict[str, Table]), not a list[Table]. Update the type signatures to reflect that, or else change the implementation to work with lists:

• In ibis-server/app/model/metadata/postgres.py, adjust:

-    class ExtensionHandler:
-        def augment(self, tables: list[Table]) -> list[Table]:
+    class ExtensionHandler:
+        def augment(self, tables: dict[str, Table]) -> dict[str, Table]:
             # …
-    def postgis_handler(self, tables: list[Table], schema_name: str) -> list[Table]:
+    def postgis_handler(self, tables: dict[str, Table], schema_name: str) -> dict[str, Table]:
             # …

• Optionally remove the assignment
Since the handlers mutate the dict in place and then return it, you could also call:

extension_handler.augment(unique_tables)
return list(unique_tables.values())

without re-assigning unique_tables.

These changes will make the code’s behavior and its annotations consistent.

@chilijung
Copy link
Member

This is very interesting use case! @wwwy3y3 @andreashimin take a closer look!

@goldmedal goldmedal changed the title PostgreSQL, PostGIS support for IBIS Server. feat(ibis): postgreSQL, PostGIS support for IBIS Server. May 6, 2025
@goldmedal
Copy link
Contributor

Thanks @eren-ture for the contribution. It's awesome. 👍
However, it's better to add the corresponding test case for it.
You can add the test case in ibis-server/tests/routers/v2/connector/test_postgres.py

async def test_metadata_list_tables(client, postgres: PostgresContainer):

Maybe add a test case called test_metadata_list_tables_with_postgis.
I think you need to prepare the testing data by initializing the testing Postgres with PostGIS and try to list tables from it.
We use testcontainer to create the testing PG when testing. You can refer to postgres() to see how we initialize a testing database.

def postgres(request) -> PostgresContainer:

I expected you might implement another method called postgres_gis() for another instance.

Copy link
Contributor

@goldmedal goldmedal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @eren-ture, I think this PR is pretty close. I left some comments for it. As what I mentioned previously, it's good to add the test case for it. We can use the Postgis image to build the test cases. I think I can help you do this if you don't mind I push some commits to your branch.

Comment on lines 80 to 82
column.type = self._transform_postgres_column_type(
row["column_type"]
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
column.type = self._transform_postgres_column_type(
row["column_type"]
)
column.type = str(self._transform_postgres_column_type(
row["column_type"]
))

I got the following message when testing:

  Expected `str` but got `RustWrenEngineColumnType` with value `<RustWrenEngineColumnType.GEOMETRY: 'GEOMETRY'>` - serialized value may not be as expected

I think it's better to cast to str explicitly to avoid the warning.


# Update tables
for row in response:
table_name = f"{row['f_table_schema']}.{row['f_table_name']}" # ? Might want to use a global `_format_postgres_compact_table_name` function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
table_name = f"{row['f_table_schema']}.{row['f_table_name']}" # ? Might want to use a global `_format_postgres_compact_table_name` function.
# TODO: Might want to use a global `_format_postgres_compact_table_name` function.
table_name = f"{row['f_table_schema']}.{row['f_table_name']}"

The comment is like a to-do enhancement. It's better to add the TODO word for it.

@eren-ture
Copy link
Author

Hi @goldmedal,
I'm currently working on writing the tests. I'm more on the data side than software, but I'll probably get some commits out soon.

@goldmedal
Copy link
Contributor

Hi @goldmedal, I'm currently working on writing the tests. I'm more on the data side than software, but I'll probably get some commits out soon.

Thanks. I'm looking forward to them. 👍

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
ibis-server/tests/routers/v2/connector/test_postgres.py (1)

1074-1078: Consider adding more comprehensive PostGIS tests.

While this test validates basic spatial query functionality, consider adding more tests that cover other PostGIS operations like:

  • Spatial joins
  • Area/perimeter calculations
  • Spatial containment queries
  • Testing both geometry and geography column types

Adding more comprehensive tests would ensure that all aspects of the PostGIS integration work correctly.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 1da65b7 and 5c8957e.

⛔ Files ignored due to path filters (1)
  • ibis-server/tests/resource/tpch/data/cities_geometry.parquet is excluded by !**/*.parquet
📒 Files selected for processing (1)
  • ibis-server/tests/routers/v2/connector/test_postgres.py (3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
ibis-server/tests/routers/v2/connector/test_postgres.py (2)
ibis-server/tests/conftest.py (2)
  • file_path (10-11)
  • client (18-23)
ibis-server/tests/routers/v3/connector/postgres/test_fallback_v2.py (1)
  • manifest_str (29-30)
🔇 Additional comments (3)
ibis-server/tests/routers/v2/connector/test_postgres.py (3)

4-4: Appropriate geopandas import for PostGIS tests.

The addition of the geopandas library as a dependency is necessary for handling geospatial data and loading it into PostGIS tables.


153-163: Well-structured PostGIS fixture.

The postgis fixture correctly follows the same pattern as other fixtures in the file, with appropriate scope and cleanup handling. The fixture properly:

  1. Creates a PostGIS-enabled container using the correct image
  2. Initializes the PostGIS extension
  3. Loads geospatial data from a Parquet file into a table for testing

This provides a good foundation for testing spatial functionality.


159-161:

✅ Verification successful

Verify test data file exists.

The test loads data from a Parquet file at resource/tpch/data/cities_geometry.parquet. Let's verify that this file exists in the repository.


🏁 Script executed:

#!/bin/bash
# Check if the cities_geometry.parquet file exists
if [ -f "tests/resource/tpch/data/cities_geometry.parquet" ]; then
  echo "File exists"
  # Preview first few rows of the file structure if possible
  python -c "import pyarrow.parquet as pq; print(pq.read_metadata('tests/resource/tpch/data/cities_geometry.parquet'))"
else
  echo "File not found: tests/resource/tpch/data/cities_geometry.parquet"
fi

Length of output: 200


🏁 Script executed:

#!/bin/bash
# Search for the cities_geometry.parquet file anywhere in the repository
echo "Searching for cities_geometry.parquet via ripgrep..."
rg -l "cities_geometry\.parquet" || echo "No matches found with rg"

echo ""
echo "Searching for cities_geometry.parquet via find..."
find . -type f -name "cities_geometry.parquet" || echo "No file found with find"

Length of output: 435


File Verified: cities_geometry.parquet Present

The test data file ibis-server/tests/resource/tpch/data/cities_geometry.parquet exists at the expected location. No changes are needed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ibis python Pull requests that update Python code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants