Skip to content

SDV does not support dtype string in a constraint (boolean value of NA is ambiguous) #1835

Open
@npatki

Description

@npatki

I'm filing this issue on behalf of a user.

Environment Details

  • SDV version: 1.10.0 (latest)

Error Description

I tried to use convert_dtypes on my DataFrame in order to optimize the performance (minimize space, improve processing speed, etc.). After doing this, the columns represented as dtype object are convert to string. Unfortunately, the SDV crashes on this dtype when applying a constraint.

Steps to reproduce

import pandas as pd
import numpy as np

from sdv.metadata import SingleTableMetadata
from sdv.single_table import GaussianCopulaSynthesizer

data = pd.DataFrame(data={
    'A': ['2022-08-12', '2020-06-23', '2022-03-04'],
    'B': ['2022-10-13', '2020-10-31', np.nan]
})

# convert dtypes to optimize memory/performance
# this converts from object to string
data = data.convert_dtypes()

metadata = SingleTableMetadata.load_from_dict({
    'columns': {
        'A': { 'sdtype': 'datetime', 'datetime_format': '%Y-%m-%d' },
        'B': { 'sdtype': 'datetime', 'datetime_format': '%Y-%m-%d' }
    }
})

# both of these pass
metadata.validate()
metadata.validate_data(data)

synth = GaussianCopulaSynthesizer(metadata)

inequality_cons = {
    'constraint_class': 'Inequality',
    'constraint_parameters': {
        'low_column_name': 'A',
        'high_column_name': 'B',
        'strict_boundaries': False
    }
}

synth.add_constraints([inequality_cons])
synth.fit(data)

Output:

InvalidDataError: The provided data does not match the metadata:

boolean value of NA is ambiguous

stack_trace.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingfeature:constraintsRelated to inputting rules or business logicfeature:data-connectorsRelated to loading the data into SDV or exporting it out

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions