Add documented CSV data formats and examples by choutkaj · Pull Request #17 · choutkaj/bindcurve

choutkaj · 2026-05-11T15:42:00Z

Summary

This PR adds the two official user-facing CSV input formats for DoseResponseData:

long: the canonical long-form format used internally by BindCurve.
replicate_wide: a spreadsheet-friendly format where each row is one compound, independent experiment, and concentration, with technical replicates stored in response_* columns.

Changes

Adds data_formats.md in the repository root with documentation for both formats.
Extends DoseResponseData.from_csv() with an explicit format argument supporting long and replicate_wide.
Keeps format="long" as the default so existing from_csv(path) behavior remains compatible.
Adds four synthetic example CSV files in the repository root:
- synthetic_direct_binding_long.csv
- synthetic_direct_binding_replicate_wide.csv
- synthetic_competitive_binding_long.csv
- synthetic_competitive_binding_replicate_wide.csv
Adds tests for loading both formats, checking normalization, and validating error handling.

Notes

The feature branch is currently behind main because main moved after the branch point, but the diff is limited to the data-format implementation, docs, examples, and tests.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f2a88b3614

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-11T15:46:26Z

+            columns={
+                source: target
+                for source, target in rename_map.items()
+                if source != target
+            }


Prevent duplicate canonical columns when normalizing long CSVs

When a caller provides custom long-format column names that map onto canonical names already present in the CSV (for example compound_col="cmpd" while compound_id also exists), this rename step can create duplicate labels like two compound_id columns. Downstream normalization/validation then operates on ambiguous columns and can raise runtime errors (e.g., grouping on compound_id becomes non-1D) or use unintended data. This is a regression in the new from_csv(..., format="long") path and should be handled by detecting collisions before rename and failing with a clear error (or resolving precedence explicitly).

Useful? React with 👍 / 👎.

choutkaj added 8 commits May 11, 2026 17:19

Add CSV data format support to DoseResponseData

1e5835c

Restore and add CSV data format support

d238b5f

Document supported data formats

e6e048b

Add synthetic direct binding long CSV

1c02815

Add synthetic direct binding replicate-wide CSV

9eee9c5

Add synthetic competitive binding long CSV

3b69a11

Add synthetic competitive binding replicate-wide CSV

4b91218

Add tests for supported CSV data formats

f2a88b3

chatgpt-codex-connector Bot reviewed May 11, 2026

View reviewed changes

Merge origin/main into feature/data-format-imports

9399bfc

choutkaj merged commit 8a585be into main May 11, 2026
0 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add documented CSV data formats and examples#17

Add documented CSV data formats and examples#17
choutkaj merged 9 commits into
mainfrom
feature/data-format-imports

choutkaj commented May 11, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

choutkaj commented May 11, 2026

Summary

Changes

Notes

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant