Levels of theory¶
The level of theory describes the method used to generate a dataset (training or benchmarking),
for example, a DFT functional, post-Hartree-Fock method or experimental measurement. The level_of_theory
field appears in two places in ML-PEG:
models.yml— the functional or method of the dataset the MLIP was trained on (or of the head being used).metrics.yml— the reference method used to generate the benchmark data for a given metric.
Flagging level of theory mismatches¶
When a model’s level_of_theory does not match the level_of_theory of any metric within
a benchmark, the app displays a coloured warning triangle in the model’s table cell. These warnings
are designed to draw attention to any potential mismatch so that results can be interpreted in the
correct context. There are three warning types, colour-coded by the nature of the mismatch:
Icon |
Warning |
Meaning |
|---|---|---|
DFT functional mismatch |
The model and benchmark use different DFT functionals (e.g. model trained on PBE, benchmark computed with r2SCAN). |
|
High-level theory mismatch |
The benchmark uses a high-level reference method (e.g. CCSD(T), MP2, DMC) but the model was trained on DFT data. |
|
Experimental reference mismatch |
The benchmark uses experimental reference data. |
The icon colour is currently determined by the mismatched benchmark metric’s level_of_theory, not the
model’s. When mismatches of different types occur within the same benchmark, the highest-priority
warning is shown: DFT functional mismatch > High-level theory mismatch > Experimental reference mismatch.
Note
A warning does not indicate that a model is unsuitable, but it is a prompt to consider whether the level-of-theory difference is relevant for your use case.
Checking your assigned level of theory is correct
Load up the app and inspect the benchmark tables in each category (summary tables don’t show flags). For example, a PBE benchmark should have no warning for models trained on PBE datasets (e.g. MACE-MP-0a), but a DFT functional mismatch warning for models trained on an r2SCAN dataset (e.g. MACE-MATPES-r2SCAN). A model trained on PBE should have no warnings for the PBE phonon benchmark (Bulk Crystals), a “high-level theory mismatch” for the DLPNO-CCSD(T)/CBS Wiggle150 benchmark (Molecular Systems), and an “Experimental reference mismatch” for the experimental lattice constants metric (Bulk Crystals).
Naming conventions¶
Warning
Before introducing a new level_of_theory string, check whether an equivalent string already
exists in models.yml or any metrics.yml file (full list below). The app compares these strings
exactly to decide whether to display a level-of-theory warning badge on a model’s table cell.
If the strings do not match, this could lead to an incorrect warning. When adding a new benchmark
or model, it is recommended to check that you see the expected flags.
Standard methods¶
For standard DFT functionals and post-Hartree-Fock methods, write the name as it would appear in a paper:
level_of_theory: PBE
level_of_theory: r2SCAN
level_of_theory: CCSD(T)/CBS
Dispersion corrections¶
Append the dispersion correction to the base functional using a + separator, with no spaces:
level_of_theory: PBE+D3
The same convention applies on both sides: if a model was trained on PBE+D3 data, its models.yml
entry should read PBE+D3, and any benchmark metric derived from PBE+D3 reference data should
also read PBE+D3.
Note
Dispersion corrections added at inference time (via trained_on_dispersion: false in
models.yml) are handled separately by the add_d3_calculator mechanism. Only set
level_of_theory to include +D3 if the training data itself was generated with
dispersion corrections.
Special values¶
Some benchmark metrics use reference data that is experimental or does not correspond to a standard QM method. The following special strings are used:
Warning
These strings must be written exactly as shown (case sensitive).
String |
Meaning |
|---|---|
|
Reference values taken from experiment (e.g. measured lattice constants, enthalpies). |
|
No level-of-theory comparison is made; no warning badge will be shown for this metric. Seen in physicality benchmarks where a metric is e.g. number of energy minima. |
Reference table¶
The following strings are currently in use across the codebase. If your string is not listed, please add it to this table when making a pull request for your benchmark or model.
Models (ml_peg/models/models.yml)
PBEPBEsolr2SCANωB97M-VωB97M-V/def2-TZVPD
Benchmarks (ml_peg/analysis/<category>/<benchmark>/metrics.yml)
PBEr2SCANr2SCAN-3cCCSD(T)CCSD(T)/CBSCCSD(T)-F12/cc-pVDZ-F12CCSDT(Q)/CBSDLPNO-CCSD(T)/CBSDLPNO-CCSD(T)/cc-pVTZPNO-LCCSD(T)-F12/AVQZDMCph-AFQMCwb97md3 + 1b CCSD(T)Experimental(see special values)null(see special values)
See also¶
Adding benchmarks — where
metrics.ymlis configured.Adding models — where
models.ymlis configured.Benchmark scoring — how metric configuration is used.