Levels of theory

The level of theory describes the method used to generate a dataset (training or benchmarking), for example, a DFT functional, post-Hartree-Fock method or experimental measurement. The level_of_theory field appears in two places in ML-PEG:

  • models.yml — the functional or method of the dataset the MLIP was trained on (or of the head being used).

  • metrics.yml — the reference method used to generate the benchmark data for a given metric.

Flagging level of theory mismatches

When a model’s level_of_theory does not match the level_of_theory of any metric within a benchmark, the app displays a coloured warning triangle in the model’s table cell. These warnings are designed to draw attention to any potential mismatch so that results can be interpreted in the correct context. There are three warning types, colour-coded by the nature of the mismatch:

Icon

Warning

Meaning

DFT functional mismatch

The model and benchmark use different DFT functionals (e.g. model trained on PBE, benchmark computed with r2SCAN).

High-level theory mismatch

The benchmark uses a high-level reference method (e.g. CCSD(T), MP2, DMC) but the model was trained on DFT data.

Experimental reference mismatch

The benchmark uses experimental reference data.

The icon colour is currently determined by the mismatched benchmark metric’s level_of_theory, not the model’s. When mismatches of different types occur within the same benchmark, the highest-priority warning is shown: DFT functional mismatch > High-level theory mismatch > Experimental reference mismatch.

Note

A warning does not indicate that a model is unsuitable, but it is a prompt to consider whether the level-of-theory difference is relevant for your use case.

Checking your assigned level of theory is correct

Load up the app and inspect the benchmark tables in each category (summary tables don’t show flags). For example, a PBE benchmark should have no warning for models trained on PBE datasets (e.g. MACE-MP-0a), but a DFT functional mismatch warning for models trained on an r2SCAN dataset (e.g. MACE-MATPES-r2SCAN). A model trained on PBE should have no warnings for the PBE phonon benchmark (Bulk Crystals), a “high-level theory mismatch” for the DLPNO-CCSD(T)/CBS Wiggle150 benchmark (Molecular Systems), and an “Experimental reference mismatch” for the experimental lattice constants metric (Bulk Crystals).

Naming conventions

Warning

Before introducing a new level_of_theory string, check whether an equivalent string already exists in models.yml or any metrics.yml file (full list below). The app compares these strings exactly to decide whether to display a level-of-theory warning badge on a model’s table cell. If the strings do not match, this could lead to an incorrect warning. When adding a new benchmark or model, it is recommended to check that you see the expected flags.

Standard methods

For standard DFT functionals and post-Hartree-Fock methods, write the name as it would appear in a paper:

level_of_theory: PBE
level_of_theory: r2SCAN
level_of_theory: CCSD(T)/CBS

Dispersion corrections

Append the dispersion correction to the base functional using a + separator, with no spaces:

level_of_theory: PBE+D3

The same convention applies on both sides: if a model was trained on PBE+D3 data, its models.yml entry should read PBE+D3, and any benchmark metric derived from PBE+D3 reference data should also read PBE+D3.

Note

Dispersion corrections added at inference time (via trained_on_dispersion: false in models.yml) are handled separately by the add_d3_calculator mechanism. Only set level_of_theory to include +D3 if the training data itself was generated with dispersion corrections.

Special values

Some benchmark metrics use reference data that is experimental or does not correspond to a standard QM method. The following special strings are used:

Warning

These strings must be written exactly as shown (case sensitive).

String

Meaning

Experimental

Reference values taken from experiment (e.g. measured lattice constants, enthalpies).

null

No level-of-theory comparison is made; no warning badge will be shown for this metric. Seen in physicality benchmarks where a metric is e.g. number of energy minima.

Reference table

The following strings are currently in use across the codebase. If your string is not listed, please add it to this table when making a pull request for your benchmark or model.

Models (ml_peg/models/models.yml)

  • PBE

  • PBEsol

  • r2SCAN

  • ωB97M-V

  • ωB97M-V/def2-TZVPD

Benchmarks (ml_peg/analysis/<category>/<benchmark>/metrics.yml)

  • PBE

  • r2SCAN

  • r2SCAN-3c

  • CCSD(T)

  • CCSD(T)/CBS

  • CCSD(T)-F12/cc-pVDZ-F12

  • CCSDT(Q)/CBS

  • DLPNO-CCSD(T)/CBS

  • DLPNO-CCSD(T)/cc-pVTZ

  • PNO-LCCSD(T)-F12/AVQZ

  • DMC

  • ph-AFQMC

  • wb97md3 + 1b CCSD(T)

  • Experimental (see special values)

  • null (see special values)

See also