# Development of computational design for reliable prediction of dielectric strengths of perfluorocarbon compounds

Table of Contents

Two stages were employed to develop novel computational protocols for predicting the dielectric strengths of organic compounds in the gas phase (Fig. 1). The first stage involved the computation of two fundamental variables, polarizability and ionization energy, for a diverse set of selected organic compounds (Figs. S1 and S2 in the Supporting Information, respectively) using the DFT method with four distinct functional types. The reliability of the computed values was evaluated by comparison with their experimental values to optimize the DFT-assisted computational protocol. The second stage mathematically correlated the two core variables computed in the previous stage using the optimized DFT-based protocol with dielectric strength using an appropriate equation. The primary goal of the second stage was to optimize the correlation through the comparison between the equation-assisted and experimental dielectric strengths for a given set of organic compounds (Fig. S3 in the Supporting Information). Furthermore, a series of equation candidates (Eqs. 1, 2, and 3) were employed to optimize the correlation.

### Stage 1: Validation of computational protocol for polarizability and ionization energy

#### Polarizability

The polarizability values of 54 organic compounds computed via the DFT modeling approach with four distinct DFT functional types, viz*.* B3LYP, PBE1PBE, M062X, and M11, are shown in Fig. 2. The computed values correspond to their experimental values via trend lines, *y* = [(~ 1.22–1.44)]*x* + (~ 0.92–0.96), which are close to *y* = *x*, with the least-squares of 0.952–0.953, irrespective of the DFT functional type. Additionally, the least-squares with respect to *y* = *x* indicate how close the DFT-calculated values are to their experimental values. The analyzed root-mean-square deviation (RMSD) values with respect to *y* = *x* imply that all DFT functional types reliably predict the polarizability values of the organic compounds with acceptable degrees of error ~ 7.46–8.98, with the B3LYP DFT functional exhibiting the lowest error value (Table 1).

#### Ionization energy

The same logic was used to compute the ionization energies of 48 organic compounds using the four DFT functional types employed previously for polarizability validation. Notably, the dataset of organic compounds used to validate the ionization energy is not necessarily identical to that used for the validation of polarizability, primarily due to the potential difference in the availability of experimental information. Likewise, regardless of the DFT functional type, the computed ionization energies agree well with their experimental values, exhibiting trend lines close to *y* = *x* (Fig. 3). B3LYP-computed values, in particular, agree well with their experimental values, exhibiting the trend line closest to *y* = *x* (Fig. 3), with the slope corresponding to almost unity and the y-intercept approaching zero. This observation is further strengthened by the exceptionally low RMSD values of 1.30–2.52 (Table 1).

#### Error distributions of polarizability and ionization energy

The core variables were further explored through the analyses of the distributions of organic compounds in terms of the errors associated with the DFT-computed values relative to the experimental ones (Fig. 4). For polarizability, in particular, averaged relative errors (fractions of organic compounds with relative errors of less than 10%) of 6.47% (87.03%), 7.71% (75.93%), 8.91% (72.22%), and 8.31% (75.93%) are highlighted for the organic compounds at B3LYP, PBE1PBE, M062X, and M11 levels of theory, respectively. In particular, the B3LYP-based protocol has a greater distribution (26 out of 54 organic compounds) than any other DFT functional types at relative errors of less than 5%. In comparison, the accuracy of the DFT-calculated ionization energy is unlikely to be affected by the DFT functional type, with relative errors of typically less than 5% for the majority of organic compounds. Additionally, averaged relative errors of 1.29–2.67% for the ionization energy are predicted. The findings, therefore, imply that B3LYP is the optimal DFT functional for accurately predicting both polarizability and ionization energy. Thus, all subsequent analyses are based on the B3LYP-based computation.

### Stage 2: Development of novel computational protocols for dielectric strength

The DFT-predicted variables, viz*.* polarizability and ionization energy, were further combined with a given equation (correlation of dielectric strength with polarizability and ionization energy) to predict the dielectric strength values of organic compounds. The equation used to accomplish this objective is classified as (i) referenced equation and (ii) parameterized equation. The referenced equation was adopted from a correlation applicable to a database of 75 organic compounds with experimental dielectric strengths of 0.445–1.959 relative to the SF_{6} value obtained in a previous study^{13}. In contrast, the parameterized equations were further developed through the extension/revision of the referenced equation to describe better correlations between the core variables. Notably, 137 organic compounds were introduced as a new dataset to analyze their dielectric strengths from the B3LYP-computed values of polarizability and ionization energy. Based on the above-discussed reason, the dataset of organic compounds utilized for the validation of dielectric strength is not necessarily identical to those utilized for the validation of polarizability and ionization energy.

#### Referenced equation (Eq. 1)

Zhang et al. investigated the relationship between dielectric strength, polarizability, and ionization energy for a given set of organic compounds^{13}. This correlation was adopted to our dataset to verify the equation’s applicability to our organic compounds (Fig. 5). Notably, the dielectric strength of an organic compound is generally reported in relation to that of the representative insulating gas, SF_{6}. Interestingly, the computed values frequently underestimate the dielectric strengths of 137 organic compounds in our dataset (Fig. 5a). The computed values follow a trend line of *y* = 1.553*x*−0.037, resulting in an averaged underestimation of approximately 30–40% relative to their experimental values (Fig. 5b). The RMSD of the dielectric strengths predicted for 137 organic compounds is further notated to be 9.01, indicating the referenced equation’s limited predictive ability (Table 1).

The structural properties of organic compounds with the lowest and highest errors in the computed dielectric strength were further examined to determine the physical basis for the underestimation of the computed dielectric strength (Fig. 5c). Despite the absence of a discernible difference in the structural property of the two groups with the lowest and highest errors, organic compounds with simpler structures are likely to exhibit lower errors. This may be explained by the fact that the organic compounds used in the previous study to develop the referenced equation have a relatively simple structure^{13}.

#### Reparameterized equation (Eq. 2)

The coefficients (*x*, *y*, and *z* in Eq. 2) of the referenced equation were parameterized as the first approach to improve the ability of the referenced equation used for predicting the dielectric strengths of organic compounds. In particular, the coefficients were independently parameterized for the four distinct fitting datasets of randomly selected organic compounds (30, 60, 90, and 137 compounds) to accurately predict their dielectric strengths (Fig. S4). As expected, the parameterized equations (Eq. 2 in conjunction with Table 2) make more accurate predictions in the dielectric strength, with trend lines, *y* = [(~ 0.924–1.241)]*x* + (~ 0.014–0.171) that are close to *y* = *x* (Fig. 6). Moreover, it is unambiguously observed that the equation with the parameterized coefficients for a larger fitting dataset has a superior prediction ability, with the lowest (highest) averaged relative error of 15.49% (20.43%) for the fitting dataset containing 137 (30) organic compounds (Fig. 7). The RMSD values of the dielectric strengths predicted for 137 organic compounds are further predicted to be 6.09, 5.95, 7.28, and 4.13 for the fitting datasets that contain 30, 60, 90, and 137 organic compounds, respectively (Table 1). From these analyses, it is highlighted that 137 organic compounds would be the most suitable dataset for parameterizing the equation coefficients to develop a robust protocol for dielectric strength prediction (Fig. 7).

#### New equation (Eq. 3)

It is noticeable that a molecule with a greater atomic mass has a higher polarizability because a longer distance from its nucleus results in a looser electron, leading to a more easiness in the polarization. Likewise, the ionization energy is often reported as the amount of energy required to ionize the number of atoms or molecules present in one mole, highlighting the intimate relationship between the ionization energy and molecular weight. This implies that the dielectric strength, which is represented by the two key factors, namely polarizability and ionization energy, is expected to be significantly affected by the molecular weight. Therefore, a new variable, viz*.* molecular weight, was introduced to further improve the ability of the above-discussed equation to accurately predict the dielectric strengths of 137 organic compounds. Following a similar logic, the coefficients were independently parameterized for four distinct fitting datasets of randomly selected organic compounds (30, 60, 90, and 137 compounds) to accurately predict their dielectric strengths (Fig. S5). As expected, all the four equations (Eq. 3 in conjunction with Table 3) developed using the distinct fitting datasets outperform the referenced equation in terms of prediction ability, with the trend lines, *y* = [(~ 0.897–1.153)]*x* + (~ 0.018–0.236) (Fig. 8). In particular, the averaged relative errors of 17.07, 14.87, 14.77 and 14.69% are predicted, exhibiting 40, 49, 47, and 57 organic compounds with relative errors of less than 10% for the fitting datasets that contain 30, 60, 90, and 137 organic compounds, respectively (Fig. 9). The RMSD values of 5.01, 4.15, 4.08, and 3.98 for the fitting datasets containing 30, 60, 90, and 137 organic compounds, respectively, are also noteworthy, implying the negligible difference in the RMSD value between the latter three fitting datasets (Table 1).

All of these factors point to an unexpected pivotal evolution in the development of a computational protocol for the reliable and accurate prediction of dielectric strengths. The introduction of molecular weight variable significantly improves the prediction ability, and thus 60 (or even 30) organic compounds are found to be sufficient for the reliable parameterization of the equation coefficients with the RMSD value of 4.15, comparable to the parameterized Eq. (1) with the fitting dataset of 137 compounds. This enables us to further draw a meaningful conclusion on the importance of incorporating the new variable, molecular weight, in the equation, resulting in a reduction in the size of the fitting dataset required for the accurate prediction of the dielectric strengths of 137 organic compounds in the large dataset. This implies that parameterized Eq. (2) guarantees the reliable prediction ability not only for the 137 organic compounds but also for extended datasets with a larger number of organic compounds. Notably, the parameterized Eq. (1) can be designed only using 137 compounds in fitting dataset for the reliable prediction of dielectric strengths. Consequently, the equation does not guarantee that it will accurately predict the dielectric strengths of extended datasets that are larger than the current dataset.

### Structure–property relationship

The above-discussed intrinsic properties, such as polarizability, ionization energy, and dielectric strength, can be further correlated with the structural properties (Figs. 10 and 11). As evident from the figures, unlike the insensitivity of ionization energy to the structural properties, the polarizability increases linearly along the backbone length (the number of carbon atoms) and molecular weight. Additionally, the distinctive features of the two core variables defining dielectric strength lead to the linear correlations of the dielectric strength with the backbone length and molecular weight. These linear correlations are qualitatively applicable to all the experimental and predicted values. This suggests that the dielectric strength of an organic compound relative to the SF_{6} would rely on the difference in the polarizability between the organic compound and SF_{6}, emphasizing the critical role of polarizability in determining the order of the dielectric strength.