Open
Description
I'm filing this issue on behalf of a user.
Environment Details
- SDV version: 1.10.0 (latest)
Error Description
I tried to use convert_dtypes
on my DataFrame in order to optimize the performance (minimize space, improve processing speed, etc.). After doing this, the columns represented as dtype object
are convert to string
. Unfortunately, the SDV crashes on this dtype when applying a constraint.
Steps to reproduce
import pandas as pd
import numpy as np
from sdv.metadata import SingleTableMetadata
from sdv.single_table import GaussianCopulaSynthesizer
data = pd.DataFrame(data={
'A': ['2022-08-12', '2020-06-23', '2022-03-04'],
'B': ['2022-10-13', '2020-10-31', np.nan]
})
# convert dtypes to optimize memory/performance
# this converts from object to string
data = data.convert_dtypes()
metadata = SingleTableMetadata.load_from_dict({
'columns': {
'A': { 'sdtype': 'datetime', 'datetime_format': '%Y-%m-%d' },
'B': { 'sdtype': 'datetime', 'datetime_format': '%Y-%m-%d' }
}
})
# both of these pass
metadata.validate()
metadata.validate_data(data)
synth = GaussianCopulaSynthesizer(metadata)
inequality_cons = {
'constraint_class': 'Inequality',
'constraint_parameters': {
'low_column_name': 'A',
'high_column_name': 'B',
'strict_boundaries': False
}
}
synth.add_constraints([inequality_cons])
synth.fit(data)
Output:
InvalidDataError: The provided data does not match the metadata:
boolean value of NA is ambiguous