Add documented CSV data formats and examples#17
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f2a88b3614
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| columns={ | ||
| source: target | ||
| for source, target in rename_map.items() | ||
| if source != target | ||
| } |
There was a problem hiding this comment.
Prevent duplicate canonical columns when normalizing long CSVs
When a caller provides custom long-format column names that map onto canonical names already present in the CSV (for example compound_col="cmpd" while compound_id also exists), this rename step can create duplicate labels like two compound_id columns. Downstream normalization/validation then operates on ambiguous columns and can raise runtime errors (e.g., grouping on compound_id becomes non-1D) or use unintended data. This is a regression in the new from_csv(..., format="long") path and should be handled by detecting collisions before rename and failing with a clear error (or resolving precedence explicitly).
Useful? React with 👍 / 👎.
Summary
This PR adds the two official user-facing CSV input formats for
DoseResponseData:long: the canonical long-form format used internally by BindCurve.replicate_wide: a spreadsheet-friendly format where each row is one compound, independent experiment, and concentration, with technical replicates stored inresponse_*columns.Changes
data_formats.mdin the repository root with documentation for both formats.DoseResponseData.from_csv()with an explicitformatargument supportinglongandreplicate_wide.format="long"as the default so existingfrom_csv(path)behavior remains compatible.synthetic_direct_binding_long.csvsynthetic_direct_binding_replicate_wide.csvsynthetic_competitive_binding_long.csvsynthetic_competitive_binding_replicate_wide.csvNotes
The feature branch is currently behind
mainbecausemainmoved after the branch point, but the diff is limited to the data-format implementation, docs, examples, and tests.