Skip to content

Conversation

YenFuChen
Copy link

SYSTEMDS-3857: Set/Get Column Names

Brief Description

This PR adds two new DML builtin functions, getNames() and setNames(), to manage column names in Frame objects, addressing the need for programmatic column name manipulation in DML scripts.

Detailed Description

This contribution enhances flexibility for working with Frame objects in DML scripts by providing programmatic access to their column names.

The following changes have been implemented:

  1. FrameBlock Class:

    • Added public FrameBlock getNames() method: Returns a single-row FrameBlock with current column names (default "C1", "C2", ... or user defined), based on getColumnNamesAsFrame().
    • Added public void setNames(FrameBlock names) method: Sets custom column names for the FrameBlock, with strict validation checking for null input, single-row requirement, and matching column count.
  2. DML Builtin Functions Integration:

    • Registered getNames and setNames as new Builtin functions in BuiltinFunctionExpression, with integrated validation (e.g, type checking and argument count) during compilation to ensure correct usage in DML scripts.

Testing

Unit tests are implemented in org.apache.sysds.test.functions.builtin.BuiltinGetSetNamesTest.java to validate the correctness Covering:

  • Default Names: Ensures getNames() returns default names ("C1", "C2", ...).
  • Custom Name Setting & Getting: Verifies that setNames() applies custom column names (e.g, 2 columns like "name", "age") and getNames() retrieves them as expected.
  • Error Handling for setNames():
    • Handles null input to setNames().
    • Rejects FrameBlock with incorrect row count (e.g, 2 rows with 2 columns).
    • Detects mismatch in column count between input and target Frame (e.g, 3 vs. 2 columns).
      Tests validate core functionality and pass successfully.

@github-project-automation github-project-automation bot moved this to In Progress in SystemDS PR Queue Jun 29, 2025
@YenFuChen YenFuChen changed the title Implement getNames and setNames builtin functions [SYSTEMDS-3857]: Set/Get Column Names Jun 30, 2025
@YenFuChen YenFuChen changed the title [SYSTEMDS-3857]: Set/Get Column Names [SYSTEMDS-3857] Set/Get Column Names Jun 30, 2025
@phaniarnab
Copy link
Contributor

Thanks @YenFuChen, for the changes. Can you please add a test with a dml script calling setNames and getNames?

@YenFuChen YenFuChen changed the title [SYSTEMDS-3857] Set/Get Column Names [SYSTEMDS-3857] Set/GetNames on Data Frames Jul 10, 2025
@YenFuChen
Copy link
Author

Hi,
I've encountered a NullPointerException during the validation phase, specifically in FunctionCallIdentifier.validateExpression when I trying to call the functions from DML script,

The cause appears to be that the parser doesn't recognize these as builtin functions and instead tries to resolve them as user defined functions, which fails

Can you confirm if I need to explicitly register these new built-ins in something like DmlSyntacticValidator.isBuiltinFunction() or another part of the parser?

Thanks for your assistance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

3 participants