Skip to content

Commit

Permalink
Split model YAML schema into reusable subtypes (#206)
Browse files Browse the repository at this point in the history
* Split model YAML schema into reusable chunks

new subtypes: pred_files person PhononMetrics GeoOptMetrics DiscoveryMetricsSet DiscoveryMetrics

- Rename `write_geo_opt_metrics_to_yaml()` to `write_metrics_to_yaml()`
- Add `analysis_file_path` parameter to track source of metrics
- Add geo-opt analysis to DPA3-v1-(MPtrj|Openlam) YAML files
- Make yaml schema slightly more strict in for geo-opt metrics and model author fields

* upload_model_preds_to_figshare.py add --file-type CLI arg to selectively upload analysis or prediction files
  • Loading branch information
janosh authored Feb 12, 2025
1 parent 1990171 commit 6de4d21
Show file tree
Hide file tree
Showing 14 changed files with 389 additions and 251 deletions.
7 changes: 5 additions & 2 deletions matbench_discovery/metrics/geo_opt.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@
from matbench_discovery.enums import MbdKey, Model


def write_geo_opt_metrics_to_yaml(
df_geo_opt: pd.DataFrame, model: Model, symprec: float
def write_metrics_to_yaml(
df_geo_opt: pd.DataFrame, model: Model, symprec: float, analysis_file_path: str
) -> None:
"""Write geometry optimization metrics to model YAML metadata files.
Expand All @@ -24,6 +24,7 @@ def write_geo_opt_metrics_to_yaml(
- n_structs: Number of structures evaluated
model (Model): Instance of Model enum that was analyzed in df_geo_opt.
symprec (float): symmetry precision for comparing ML and DFT relaxed structures.
analysis_file_path (str): Path to the CSV file containing the analysis results.
"""
# Load existing metadata
with open(model.yaml_path) as file:
Expand All @@ -39,6 +40,8 @@ def write_geo_opt_metrics_to_yaml(
str(Key.symmetry_match): float(round(df_geo_opt[Key.symmetry_match], 4)),
str(Key.symmetry_increase): float(round(df_geo_opt[Key.symmetry_increase], 4)),
str(Key.n_structures): int(df_geo_opt[Key.n_structures]),
"analysis_file": analysis_file_path,
"analysis_file_url": None, # to be filled after uploading to figshare
}
symprec_key = f"{symprec=:.0e}".replace("e-0", "e-")

Expand Down
2 changes: 1 addition & 1 deletion models/chgnet/chgnet-0.3.0.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
model_name: CHGNet
model_key: chgnet
model_key: chgnet-030
model_version: v0.3.0
matbench_discovery_version: 1.0.0
date_added: "2023-03-03"
Expand Down
21 changes: 18 additions & 3 deletions models/deepmd/dpa3-v1-mptrj.yml
Original file line number Diff line number Diff line change
Expand Up @@ -97,11 +97,26 @@ metrics:
pred_file_url: https://figshare.com/ndownloader/files/52134860
geo_opt:
pred_file: models/deepmd/dpa3-v1-mptrj/2025-01-10-wbm-geo-opt.json.gz
struct_col: dp_structure
struct_col: dpa2_structure
pred_file_url: https://figshare.com/ndownloader/files/52134974
symprec=1e-5:
analysis_file: models/deepmd/dpa3-v1-mptrj/2025-01-10-wbm-geo-opt-symprec=1e-5.csv.gz
analysis_file_url: https://figshare.com/ndownloader/files/52059431
analysis_file: models/deepmd/dpa3-v1-mptrj/2025-01-10-wbm-geo-opt-symprec=1e-5-moyo=0.3.3.csv.gz
analysis_file_url: https://figshare.com/ndownloader/files/52291967
rmsd: 0.0172 # Å
n_sym_ops_mae: 2.1671 # unitless
symmetry_decrease: 0.0804 # fraction
symmetry_match: 0.7128 # fraction
symmetry_increase: 0.2001 # fraction
n_structures: 256955 # count
symprec=1e-2:
analysis_file: models/deepmd/dpa3-v1-mptrj/2025-01-10-wbm-geo-opt-symprec=1e-2-moyo=0.3.3.csv.gz
analysis_file_url: https://figshare.com/ndownloader/files/52291970
rmsd: 0.0172 # Å
n_sym_ops_mae: 1.992 # unitless
symmetry_decrease: 0.0609 # fraction
symmetry_match: 0.8034 # fraction
symmetry_increase: 0.1282 # fraction
n_structures: 256955 # count
discovery:
pred_file: models/deepmd/dpa3-v1-mptrj/2025-01-10-wbm-IS2RE.csv.gz
pred_file_url: https://figshare.com/ndownloader/files/52057529
Expand Down
19 changes: 17 additions & 2 deletions models/deepmd/dpa3-v1-openlam.yml
Original file line number Diff line number Diff line change
Expand Up @@ -112,8 +112,23 @@ metrics:
struct_col: dp_structure
pred_file_url: https://figshare.com/ndownloader/files/52135358
symprec=1e-5:
analysis_file: models/deepmd/dpa3-v1-openlam/2025-01-10-wbm-geo-opt-symprec=1e-5.csv.gz
analysis_file_url: https://figshare.com/ndownloader/files/52059434
analysis_file: models/deepmd/dpa3-v1-openlam/2025-01-10-wbm-geo-opt-symprec=1e-5-moyo=0.3.3.csv.gz
analysis_file_url: https://figshare.com/ndownloader/files/52291973
rmsd: 0.0128 # Å
n_sym_ops_mae: 2.1477 # unitless
symmetry_decrease: 0.0657 # fraction
symmetry_match: 0.7188 # fraction
symmetry_increase: 0.2094 # fraction
n_structures: 256963 # count
symprec=1e-2:
analysis_file: models/deepmd/dpa3-v1-openlam/2025-01-10-wbm-geo-opt-symprec=1e-2-moyo=0.3.3.csv.gz
analysis_file_url: https://figshare.com/ndownloader/files/52291976
rmsd: 0.0128 # Å
n_sym_ops_mae: 1.8912 # unitless
symmetry_decrease: 0.0515 # fraction
symmetry_match: 0.8097 # fraction
symmetry_increase: 0.1314 # fraction
n_structures: 256963 # count
discovery:
pred_file: models/deepmd/dpa3-v1-openlam/2025-01-10-wbm-IS2RE.csv.gz
pred_file_url: https://figshare.com/ndownloader/files/52057532
Expand Down
3 changes: 2 additions & 1 deletion models/m3gnet/m3gnet.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,13 @@ date_published: "2022-02-05"
authors:
- name: Chi Chen
affiliation: UC San Diego
role: Model
orcid: https://orcid.org/0000-0001-8008-7043
github: https://github.com/chc273
- name: Shyue Ping Ong
affiliation: UC San Diego
orcid: https://orcid.org/0000-0001-5726-2587
email: [email protected]
github: https://github.com/shyuep

repo: https://github.com/materialsvirtuallab/m3gnet
url: https://materialsvirtuallab.github.io/m3gnet
Expand Down
1 change: 0 additions & 1 deletion models/voronoi_rf/voronoi-rf.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ authors:
affiliation: Argonne National Laboratory
email: [email protected]
orcid: https://orcid.org/0000-0002-1323-5939
twitter: WardLT2
- name: Chris Wolverton
affiliation: Northwestern University
email: [email protected]
Expand Down
2 changes: 1 addition & 1 deletion scripts/analyze_geo_opt.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@ def analyze_model_symprec(
# Calculate metrics and write to YAML
df_metrics = geo_opt.calc_geo_opt_metrics(df_ml_geo_analysis)
print(f"\nCalculated metrics: {df_metrics}") # Debug print
geo_opt.write_geo_opt_metrics_to_yaml(df_metrics, model, symprec)
geo_opt.write_metrics_to_yaml(df_metrics, model, symprec, geo_opt_csv_path)


if __name__ == "__main__":
Expand Down
36 changes: 29 additions & 7 deletions scripts/upload_model_preds_to_figshare.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
import os
import tomllib
from collections.abc import Sequence
from typing import Any, Final
from typing import Any, Final, Literal

import yaml
from tqdm import tqdm
Expand Down Expand Up @@ -46,16 +46,20 @@ def parse_args(args: Sequence[str] | None = None) -> argparse.Namespace:
nargs="+",
choices=list(MODELING_TASKS),
default=list(MODELING_TASKS),
help=(
"Space-separated list of modeling tasks to update. Defaults to all tasks."
),
help="Space-separated list of modeling tasks to update. Defaults to all tasks.",
)
parser.add_argument(
"-n",
"--dry-run",
action="store_true",
help="Print what would be uploaded without actually uploading",
)
parser.add_argument(
"--file-type",
choices=["all", "analysis", "pred"],
default="all",
help="Type of files to upload: analysis, pred or all (default)",
)

return parser.parse_args(args)

Expand All @@ -80,8 +84,19 @@ def get_article_metadata(task: str) -> dict[str, Sequence[object]]:
}


def should_process_file(
key: str, file_type: Literal["all", "analysis", "pred"]
) -> bool:
"""Filter files by type."""
return file_type == "all" or key.endswith(f"{file_type}_file")


def update_one_modeling_task_article(
task: str, models: list[Model], *, dry_run: bool = False
task: str,
models: list[Model],
*,
dry_run: bool = False,
file_type: Literal["all", "analysis", "pred"] = "all",
) -> None:
"""Update or create a Figshare article for a modeling task."""
article_id = figshare.ARTICLE_IDS[f"model_preds_{task}"]
Expand Down Expand Up @@ -143,7 +158,11 @@ def find_file_keys(data: dict[str, Any], prefix: str = "") -> dict[str, str]:
full_key = f"{prefix}.{key}" if prefix else key
if isinstance(value, dict):
result |= find_file_keys(value, full_key)
elif isinstance(value, str) and key.endswith("_file"):
elif (
isinstance(value, str)
and key.endswith("_file")
and should_process_file(key, file_type)
):
result[full_key] = value
return result

Expand Down Expand Up @@ -236,10 +255,13 @@ def main(args: Sequence[str] | None = None) -> int:
print("\nDry run mode - no files will be uploaded")
print(f"Updating {len(models_to_update)} models: {', '.join(models_to_update)}")
print(f"Updating {len(tasks_to_update)} tasks: {', '.join(tasks_to_update)}")
print(f"File type filter: {parsed_args.file_type}")

for task in tasks_to_update:
try:
update_one_modeling_task_article(task, models_to_update, dry_run=dry_run)
update_one_modeling_task_article(
task, models_to_update, dry_run=dry_run, file_type=parsed_args.file_type
)
except Exception as exc: # prompt to delete article if something went wrong
state = {
key: locals().get(key)
Expand Down
143 changes: 93 additions & 50 deletions site/src/lib/model-schema.d.ts
Original file line number Diff line number Diff line change
@@ -1,9 +1,42 @@
/* eslint-disable */
/**
* This file was automatically generated by json-schema-to-typescript.
* DO NOT MODIFY IT BY HAND. Instead, modify the source JSONSchema file,
* and run json-schema-to-typescript to regenerate this file.
*/
// This file is auto-generated from model-schema.yml. Do not edit directly.

export type GeoOptMetrics = {
[k: string]: unknown
} & {
struct_col: string
pred_file?: string | null
pred_file_url?: string
'symprec=1e-5'?: {
rmsd: number
n_sym_ops_mae: number
symmetry_decrease: number
symmetry_match: number
symmetry_increase: number
n_structures: number
analysis_file?: string | null
analysis_file_url?: string
}
'symprec=1e-2'?: {
rmsd: number
n_sym_ops_mae: number
symmetry_decrease: number
symmetry_match: number
symmetry_increase: number
n_structures: number
analysis_file?: string | null
analysis_file_url?: string
}
}
export type DiscoveryMetrics = {
[k: string]: unknown
} & {
pred_col: string
pred_file?: string | null
pred_file_url?: string
full_test_set?: DiscoveryMetricsSet
most_stable_10k?: DiscoveryMetricsSet
unique_prototypes?: DiscoveryMetricsSet
}

export interface ModelMetadata {
model_name: string
Expand All @@ -12,20 +45,8 @@ export interface ModelMetadata {
matbench_discovery_version: string
date_added: string
date_published: string
authors: {
name: string
affiliation?: string
email?: string
orcid?: string
[k: string]: unknown
}[]
trained_by?: {
name: string
affiliation?: string
orcid?: string
github?: string
[k: string]: unknown
}[]
authors: Person[]
trained_by?: Person[]
repo: string
doi: string
paper: string
Expand All @@ -41,15 +62,15 @@ export interface ModelMetadata {
}
trained_for_benchmark: boolean
training_set: (
| 'MP 2022'
| 'MPtrj'
| 'MPF'
| 'MP Graphs'
| 'GNoME'
| 'MatterSim'
| 'Alex'
| 'OMat24'
| 'sAlex'
| `MP 2022`
| `MPtrj`
| `MPF`
| `MP Graphs`
| `GNoME`
| `MatterSim`
| `Alex`
| `OMat24`
| `sAlex`
)[]
hyperparams?: {
max_force?: number
Expand All @@ -74,27 +95,49 @@ export interface ModelMetadata {
}
model_params: number
n_estimators: number
train_task: 'RP2RE' | 'RS2RE' | 'S2E' | 'S2RE' | 'S2EF' | 'S2EFS' | 'S2EFSM'
test_task: 'IP2E' | 'IS2E' | 'IS2RE' | 'IS2RE-SR' | 'IS2RE-BO'
model_type: 'GNN' | 'UIP' | 'BO-GNN' | 'Fingerprint' | 'Transformer' | 'RF'
targets: 'E' | 'EF_G' | 'EF_D' | 'EFS_G' | 'EFS_D' | 'EFS_GM' | 'EFS_DM'
openness?: 'OSOD' | 'OSCD' | 'CSOD' | 'CSCD'
status?: 'aborted' | 'complete'
train_task: `RP2RE` | `RS2RE` | `S2E` | `S2RE` | `S2EF` | `S2EFS` | `S2EFSM`
test_task: `IP2E` | `IS2E` | `IS2RE` | `IS2RE-SR` | `IS2RE-BO`
model_type: `GNN` | `UIP` | `BO-GNN` | `Fingerprint` | `Transformer` | `RF`
targets: `E` | `EF_G` | `EF_D` | `EFS_G` | `EFS_D` | `EFS_GM` | `EFS_DM`
openness?: `OSOD` | `OSCD` | `CSOD` | `CSCD`
status?: `aborted` | `complete`
metrics?: {
phonons?:
| {
kappa_103?: {
[k: string]: unknown
}
}
| ('not applicable' | 'not available')
geo_opt?:
| {
[k: string]: unknown
}
| ('not applicable' | 'not available')
discovery?: {
[k: string]: unknown
}
phonons?: PhononMetrics | (`not applicable` | `not available`)
geo_opt?: GeoOptMetrics | (`not applicable` | `not available`)
discovery?: DiscoveryMetrics
}
}
export interface Person {
name: string
affiliation?: string
email?: string
url?: string
orcid?: string
github?: string
corresponding?: boolean
}
export interface PhononMetrics {
kappa_103?: {
[k: string]: unknown
}
}
export interface DiscoveryMetricsSet {
F1?: number
DAF?: number
Precision?: number
Recall?: number
Accuracy?: number
TPR?: number
FPR?: number
TNR?: number
FNR?: number
TP?: number
FP?: number
TN?: number
FN?: number
MAE?: number
RMSE?: number
R2?: number
missing_preds?: number
missing_percent?: string
}
2 changes: 1 addition & 1 deletion site/src/routes/+layout.server.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ import { compile as json_to_ts } from 'json-schema-to-typescript'
// i.e. use json-schema-to-typescript to auto-convert YAML schema to TypeScript interface
const model_metadata_ts = await json_to_ts(model_schema, `ModelMetadata`, {
style: prettier_config,
bannerComment: `// This file is auto-generated from model-schema.yml. Do not edit directly.`,
})
// prettier format model_md_type
const dts_out_file = `src/lib/model-schema.d.ts`
fs.writeFileSync(dts_out_file, model_metadata_ts)
2 changes: 1 addition & 1 deletion site/src/routes/+page.svelte
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@
>
{title}
{#if link}
<a href={link} target="_blank">
<a href={link} target="_blank" rel="noopener noreferrer">
<Icon icon="octicon:info" inline />
</a>
{/if}
Expand Down
2 changes: 1 addition & 1 deletion site/src/routes/tasks/discovery/+page.svelte
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@
>
{title}
{#if link}
<a href={link} target="_blank">
<a href={link} target="_blank" rel="noopener noreferrer">
<Icon icon="octicon:info" inline />
</a>
{/if}
Expand Down
Loading

0 comments on commit 6de4d21

Please sign in to comment.