# physical_validation package

Physical validation suite for MD simulations

## physical_validation.kinetic_energy module

The kinetic_energy module is part of the physical_validation package, and consists of checks of the kinetic energy distribution and its equipartition.

physical_validation.kinetic_energy.distribution(data: physical_validation.data.simulation_data.SimulationData, strict: bool = False, verbosity: int = 2, screen: bool = False, filename: Optional[str] = None, bs_repetitions: int = 200, bootstrap_seed: Optional[int] = None, data_is_uncorrelated: bool = False) Union[float, Tuple[float, float]][source]

Checks the distribution of a kinetic energy trajectory.

Parameters
• data – Simulation data object

• strict – If True, check full kinetic energy distribution via K-S test. Otherwise, check mean and width of kinetic energy distribution. Default: False

• verbosity – Verbosity level, where 0 is quiet and 3 shows full details. Default: 2.

• screen – Plot distributions on screen. Default: False.

• filename – Plot distributions to filename. Default: None, no plotting to file.

• bs_repetitions – Number of bootstrap samples used for error estimate (if strict=False). Default: 200.

• bootstrap_seed – Sets the random number seed for bootstrapping (if strict=False). If set, bootstrapping will be reproducible. Default: None, bootstrapping is non-reproducible.

• data_is_uncorrelated – Whether the provided data is uncorrelated. If this option is set, the equilibration, decorrelation and tail pruning of the trajectory is skipped. This can speed up the analysis, but note that if the provided data is correlated, the results of the physical validation checks might be invalid.

Returns

If strict=True: The p value of the test. If strict=False: Distance of the estimated T(mu) and T(sigma) from the expected temperature, measured in standard deviations of the respective estimate.

Return type

result

Notes

Non-strict test

If strict = False (the default), this function will estimate the mean and the standard deviation of the data. Analytically, a gamma distribution with shape $$k = N / 2$$ (with $$N$$ the number of degrees of freedom) and scale $$\theta = k_B T$$ (with $$T$$ the target temperature) is expected. The mean and the standard deviation of a gamma distribution are given by $$\mu = k\theta$$ and $$\sigma = \sqrt k \theta$$.

The standard error of the mean and standard deviation are estimated via bootstrap resampling. The function prints the analytically expected mean and variance as well as the fitted values and their error estimates. It also prints T(mu) and T(sigma), which are defined as the temperatures to which the estimated mean and standard deviation correspond, given the number of degrees of freedom $$N$$ in the system:

$T(\mu') = \frac{2 \mu'}{N k_B}$
$T(\sigma') = \frac{\sqrt 2 \sigma'}{\sqrt N k_B}$

The return value is a tuple containing the distance of the estimated T(mu) and T(sigma) from the expected temperature, measured in standard deviations of the respective estimates.

Strict test

If strict = True, this function tests the hypothesis that a sample of kinetic energies is Maxwell-Boltzmann distributed given a specific target temperature and the number of degrees of freedom in the system,

$P(K) \sim K^{N/2-1} e^{-\beta K} \, .$

The test is performed using the Kolmogorov-Smirnov test provided by scipy.stats.kstest. It returns the $$p$$-value, measuring the likelihood that a sample at least as extreme as the one given is originating from the expected distribution.

Note

The Kolmogorov-Smirnov test is known to have two weaknesses.

1. The test is more sensitive towards deviations around the center of the distribution than at its tails. We deem this to be acceptable for most MD applications, but be wary if yours is sensible to the kinetic distribution tails.

2. The test is not valid if its parameters are guessed from the data set. Using the target temperature of the MD simulation as an input is therefore perfectly valid, but using the average temperature over the trajectory as an input to the test can potentially invalidate it.

physical_validation.kinetic_energy.equipartition(data: physical_validation.data.simulation_data.SimulationData, strict: bool = False, molec_groups: Optional[List[numpy.ndarray]] = None, random_divisions: int = 0, random_groups: int = 0, random_division_seed: Optional[int] = None, verbosity: int = 2, screen: bool = False, filename: Optional[str] = None, bootstrap_seed: Optional[int] = None, data_is_uncorrelated: bool = False) Union[List[float], List[Tuple[float, float]]][source]

Checks the equipartition of a simulation trajectory.

Parameters
• data – Simulation data object

• strict – If True, check full kinetic energy distribution via K-S test. Otherwise, check mean and width of kinetic energy distribution. Default: False

• molec_groups

List of 1d arrays containing molecule indices defining groups. Useful to pre-define groups of molecules (e.g. solute / solvent, liquid mixture species, …). If None, no pre-defined molecule groups will be tested. Default: None.

Note: If an empty 1d array is found as last element in the list, the remaining molecules are collected in this array. This allows, for example, to only specify the solute, and indicate the solvent by giving an empty array.

• random_divisions – Number of random division tests attempted. Default: 0 (random division tests off).

• random_groups – Number of groups the system is randomly divided in. Default: 2.

• random_division_seed – Seed making the random divisions reproducible. Default: None, random divisions not reproducible

• verbosity – Verbosity level, where 0 is quiet and 3 very chatty. Default: 2.

• screen – Plot distributions on screen. Default: False.

• filename – Plot distributions to filename. Default: None, no plotting to file

• bootstrap_seed – Sets the random number seed for bootstrapping (if strict=False). If set, bootstrapping will be reproducible. Default: None, bootstrapping is non-reproducible.

• data_is_uncorrelated – Whether the provided data is uncorrelated. If this option is set, the equilibration, decorrelation and tail pruning of the trajectory is skipped. This can speed up the analysis, but note that if the provided data is correlated, the results of the physical validation checks might be invalid.

Returns

If strict=True: The p value for every tests. If strict=False: Distance of the estimated T(mu) and T(sigma) from the expected temperature, measured in standard deviations of the respective estimate, for every test.

Return type

result

Notes

This function compares the kinetic energy between groups of degrees of freedom. Theoretically, the kinetic energy is expected (via the equipartition theorem) to be equally distributed over all degrees of freedom. In practice, deviations of temperature between groups of degrees of freedom up to several degrees K are routinely observed. Larger deviations can, however, hint to misbehaving simulations, such as, e.g., frozen degrees of freedom, lack of energy exchange between degrees of freedom, and transfer of heat from faster to slower degrees of freedom.

Splitting of degrees of freedom is done both on a sub-molecular and on a molecular level. On a sub-molecular level, the degrees of freedom of a molecule can be partitioned into rigid-body contributions (translation of the center-of-mass, rotation around the center-of-mass) and intra-molecular contributions. On a molecular level, the single molecules of the system can be divided in groups, either by function (solute / solvent, different species of liquid mixtures, …) or randomly.

check_equipartition() partitions the kinetic energy of the entire system and, optionally, of predefined or randomly separated groups. It then computes either the mean and the standard deviation of each partition and compares them to the theoretically expected value (strict=True, the default), or it performs a Kolmogorov-Smirnov test of the distribution. See physical_validation.kinetic_energy.distribution for more detail on the checks.

## physical_validation.ensemble module

This software allows users to perform statistical test to determine if a given molecular simulation is consistent with the thermodynamic ensemble it is performed in.

Users should cite the JCTC paper: Shirts, M. R. “Simple Quantitative Tests to Validate Sampling from Thermodynamic Ensembles”, J. Chem. Theory Comput., 2013, 9 (2), pp 909-926, https://dx.doi.org/10.1021/ct300688p

physical_validation.ensemble.check(data_sim_one: physical_validation.data.simulation_data.SimulationData, data_sim_two: physical_validation.data.simulation_data.SimulationData, total_energy: bool = False, bootstrap_error: bool = False, bootstrap_repetitions: int = 200, bootstrap_seed: Optional[int] = None, screen: bool = False, filename: Optional[str] = None, verbosity: int = 1, data_is_uncorrelated: bool = False) List[float][source]

Check the ensemble. The correct check is inferred from the simulation data given.

Parameters
• data_sim_one – Simulation data object of first simulation

• data_sim_two – Simulation data object of second simulation differing in its state point from the first

• total_energy – Whether to use the total energy for the calculation Default: False, use potential energy only

• bootstrap_error – Calculate the standard error via bootstrap resampling Default: False

• bootstrap_repetitions – Number of bootstrap repetitions drawn Default: 200

• bootstrap_seed – Sets the random number seed for bootstrapping. If set, bootstrapping will be reproducible. Default: None, bootstrapping is non-reproducible.

• screen – Plot distributions on screen. Default: False.

• filename – Plot distributions to filename. Default: None, no plotting to file.

• verbosity – Level of verbosity, from 0 (quiet) to 3 (very verbose). Default: 1

• data_is_uncorrelated – Whether the provided data is uncorrelated. If this option is set, the equilibration, decorrelation and tail pruning of the trajectory is skipped. This can speed up the analysis, but note that if the provided data is correlated, the results of the physical validation checks might be invalid.

Returns

The number of quantiles the computed result is off the analytical one.

Return type

quantiles

physical_validation.ensemble.estimate_interval(data: physical_validation.data.simulation_data.SimulationData, verbosity: int = 1, total_energy: bool = False, data_is_uncorrelated: bool = False) Dict[str, float][source]

In order to perform an ensemble check, two simulations at distinct state point are needed. Choosing two state points too far apart will result in poor or zero overlap between the distributions, leading to very noisy results (due to sample errors in the tails) or a breakdown of the method, respectively. Choosing two state points very close to each others, on the other hand, makes it difficult to distinguish the slope from statistical error in the samples.

This function implements a rule of thumb based on the standard deviations of distributions. It takes a single simulation and suggests appropriate intervals for a second simulation to be used for ensemble checking.

Parameters
• data – The performed simulation.

• verbosity – If 0, no output is printed on screen. If 1, estimated intervals are printed. If larger, additional information during calculation are printed. Default: 1

• total_energy – Use total energy instead of potential energy only. Default: False

• data_is_uncorrelated – Whether the provided data is uncorrelated. If this option is set, the equilibration, decorrelation and tail pruning of the trajectory is skipped. This can speed up the analysis, but note that if the provided data is correlated, the results of the physical validation checks might be invalid. Default: False

Returns

If data was performed under NVT conditions, intervals contains only one entry:

• ’dT’, containing the suggested temperature interval.

If data was performed under NPT conditions, intervals contains three entries:

• ’dT’: Suggested temperature interval at constant pressure

• ’dP’: Suggested pressure interval at constant temperature

• ’dTdP’: Suggested combined temperature and pressure interval

Return type

intervals

## physical_validation.integrator module

The integrator.convergence module is part of the physical_validation package, and consists of checks of the convergence of the MD integrator.

physical_validation.integrator.convergence(simulations: , convergence_test: str = 'max_deviation', verbose: bool = True, screen: bool = False, filename: Optional[str] = None) float[source]

Compares the convergence of the fluctuations of conserved quantities with decreasing simulation time step to theoretical expectations.

Parameters
• simulations – The (otherwise identical) simulations performed using different time steps

• convergence_test – A function defining the convergence test. Currently, only one test is implemented: max_deviation, which is chosen by default

• verbose – If True, print more detailed output. Default: False.

• screen – Plot convergence on screen. Default: False.

• filename – Plot convergence to filename. Default: None, no plotting to file.

Returns

Return type

The largest deviation from the expected ratio of fluctuations.

Notes

For a simplectic integration algorithm, the fluctuations $$\delta E$$ of a constant of motion $$E$$ (such as, for example, the total energy in a NVE simulations) are theoretically expected to scale like the squared timestep of the integration. When comparing two otherwise identical simulations performed at different time step $$\Delta t$$, the following equality is hence expected to hold:

$\frac{\Delta t_1^{2}}{\Delta t_2^{2}} = \frac{\delta E_1}{\delta E_2}$

This function calculates the ratio of the fluctuation for simulations performed at different timesteps and compares it to the analytically expected value. If the deviation is larger than tol, the test is considered failed.