[ON HOLD] feat: auto detection of model encoding types #288

shuangwu5 · 2025-02-28T09:43:16Z

Main changes:

Introduce new model encoding types: TABULAR_AUTO, LANGUAGE_AUTO
Change the behavior of the existing model encoding type AUTO: Before the PR, it means the same as TABULAR_AUTO, now it is like the union of TABULAR_AUTO and LANGUAGE_AUTO
Extend auto detection logics:
- LANGUAGE_AUTO -> auto detect LANGUAGE_NUMERIC, LANGUAGE_DATETIME, LANGUAGE_TEXT
- TABULAR_AUTO -> "highly-unique categorical columns" (categorical columns that has >100 rows and more than 5% of rows contain unique values) will be detected as TABULAR_CHARACTER
- AUTO -> "highly-unique categorical columns" will be detected as TABULAR_CHARACTER when lengths of all rows are the same. Otherwise it will be LANGUAGE_TEXT
Do auto detection before creating a generator (if columns in the config are undefined at all or have one of the *_AUTO encoding types)
Skip validation of *_model_configuration in SourceTableConfig before auto detection
Revalidate SourceTableConfig strictly (i.e., including *_model_configuration) after auto detection is done
Ensure that SourceTable always has strict validation
New unit tests: test_auto_detect_encoding_types_and_pk, test__auto_detect_encoding_type, test__auto_detect_primary_key

Code refactoring:

Move main logic for detecting location schema from Core to SDK
Move logics for auto detection from a isolated script into DataTable class

Misc / Changes that are not directly related to the requirements:

Update CustomBaseModel in mostlyai/sdk/_data/metadata_objects.py so that the classes can be initialized without using aliases/camelCase field names.
Fix a bug in MostlyAI.train(): allow data to be fed as the first unnamed argument
Fix one incorrect test case in test_execution_plan.py
Add .DS_Store to .gitignore

…bles

shuangwu5 added 30 commits February 28, 2025 09:49

move main logic for location_schema into SDK

bfb6f77

assign default model encoding type

7f0f049

ruff

6b0c46f

Merge branch 'main' into auto-detect-datetime

9ec2eea

Merge branch 'main' into auto-detect-datetime

8a150d9

Merge branch 'main' into auto-detect-datetime

a6fc730

sync API

a4addf9

wip

75d28ab

Merge branch 'main' into auto-detect-datetime

d7314b9

WIP: only validate model configuration after auto detection

05c0e15

refine validation of model configuration

d343c4b

still update model encoding type of detected PK

635fd38

rich print auto detection results

d8da91e

do strict validation before instantiating a Generator

389b854

fix primary key

30596ed

wip: move auto detection logic to DataTable

0c62c94

revert

d413ebf

fix code that got accidentally deleted by Cursor

b0e3fda

fix

17dda7f

move tests from test_auto_detect.py to test_base.py

bbee2fc

fix fk issue

5a1491e

fix test_simple_flat

9a9fe0e

fix test (do not add language model config for AUTO for now)

2defa50

fix test

42eb5bb

fix test

a96c629

always validate model configurations for SourceTable

0795d34

fix tests

a2749c2

apply PR comment from Cursor (claude-3.7-sonnet)

e484af3

resolve AUTO before creating generator; auto detect for non-upload ta…

bcf91bd

…bles

allow populate_by_name for classes in metadata_objects.py

c09d134

shuangwu5 added 7 commits March 19, 2025 12:07

raise error when unable to do auto detection for non-upload tables

ab2ffa7

revert Makefile

2442c69

Merge branch 'main' into auto-detect-datetime

ee23069

Merge branch 'main' into auto-detect-datetime

11ffe46

Merge branch 'main' into auto-detect-datetime

5ffd419

Merge branch 'main' into auto-detect-datetime

c906082

Merge branch 'main' into auto-detect-datetime

3535971

shuangwu5 linked an issue May 20, 2025 that may be closed by this pull request

[FEATURE]: attempt to auto-detect DATETIME if AUTO has been configured, also for local mode #208

Open

shuangwu5 changed the title ~~feat: auto detection of model encoding types~~ [ON HOLD] feat: auto detection of model encoding types May 20, 2025

shuangwu5 removed a link to an issue Jul 8, 2025

[FEATURE]: attempt to auto-detect DATETIME if AUTO has been configured, also for local mode #208

Open

shuangwu5 closed this Jul 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ON HOLD] feat: auto detection of model encoding types #288

[ON HOLD] feat: auto detection of model encoding types #288

Uh oh!

shuangwu5 commented Feb 28, 2025 •

edited

Loading

Uh oh!

Uh oh!

[ON HOLD] feat: auto detection of model encoding types #288

[ON HOLD] feat: auto detection of model encoding types #288

Uh oh!

Conversation

shuangwu5 commented Feb 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

shuangwu5 commented Feb 28, 2025 •

edited

Loading