Generalised deep-learning workflow for the prediction of hydration layers over surfaces

Atomic force microscopy (AFM) is paving the way for understanding the solid–liquid interfaces at the nanoscale. These AFM studies are complemented with molecular dynamics (MD) simulations of hydration layers over candidate surfaces for a comprehensive characterisation. We earlier proposed, in Ranawat et.al. (2021), a deep-learning (DL) network to predict hydration layers over the candidate surfaces much more rapidly than computationally-intensive MD. However, the proposed elements-as-channels based network is bound to the elements present in the training surfaces. Here, we develop a generalised descriptor of the surface to train element-agnostic networks. We demonstrate the descrip-tor’s efﬁcacy by predicting the hydration layers over a dolomite surface using a network trained on the calcite and magnesite surfaces. We also demonstrate the transfer-learning capability of such a descriptor by incorporating mica into the training surfaces, and predict the pyrophyllite and boehmite surfaces. Further, we propose an energy-based DL framework to gauge the possible prediction accuracy of a network on surfaces hitherto unseen. We combine these advance techniques into a generalised work-ﬂow to complement AFM studies. (cid:1) 2022 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).


Introduction
The nature of the complex hydration structures formed at solid-liquid interfaces plays a key role in many macroscopic surface phenomena [1,2], that drive various natural and technological processes [3].Amongst many complementary techniques to characterise solid-liquid interfaces at the nanoscale, Atomic force microscopy (AFM) [4] has emerged as a leading method to image systems in solution at high speed with molecular resolution [5][6][7][8][9][10][11][12][13][14].However, the convolution of tip-surface-solvent interactions and entropic effects in the measured force at high resolution means that it remains a challenge to provide comprehensive understanding and interpretation of the results -requiring extensive free energy simulations [15][16][17][18].
This challenge was bypassed, to some degree, by the introduction of methods that linked experimental AFM force data directly to water densities [19,20].These approaches have been recently expanded to include the influence of the hydration features around the tip [21] and radius of the tip [17].Ultimately, even these approaches necessitate systematic molecular dynamics (MD) simulations of water on the surface, which are accessible only to experts and only if good classical force field parameterisations are available.These challenges render the high resolution AFM imaging in liquids remains limited in application, especially considering the rapid progress seen in molecular characterisation offered by functional-tip AFM at low-temperature and ultra-high vacuum [22,23,].This is particularly troubling when we consider the breakthroughs seen in the application of AFM to biomolecular systems [24].
In our previous work we proposed a deep-learning (DL) U-Net network with elements-as-channels input [25].This network demonstrated a rapid, robust and reliable prediction of hydration layers over surfaces, and defects, of polymorphs of calcite, directly from their atomic structures.However, binding the network inputs to certain elements (calcium, carbon, and oxygen in that case) posed a big disadvantage when surfaces with other elements were studied.In the present work, we introduce a novel four-channel descriptor to depict the surface structure, based on the MD forcefield terms, namely, their van der Waals and Coulombic interactions (see Fig. 1).This element-agnostic design is shown to be advantageous in predicting various surfaces with a wide range of elemental composition.Further, this design is shown to improve the performance of the network on newer surfaces, through transfer learning [26], which pass on the knowledge trained from related surfaces by progressively augmenting the training set.
Notwithstanding the advantages, the proposed DL networks are overconfident in their predictions due to their deterministic nature [28].Given a surface input, seen/unseen by the network during training, the hydration density are predicted with 100% certainty, even when the prediction is erroneous.To solve this issue, a secondary model, called an energy-based model, can provide out-ofdistribution detection [29].We adapted this energy-based DL framework to predict an "energy" or a score of the four-channel descriptor of the surface.This score is a robust figure-of-merit to gauge the representation (log-likelihood) of a new surface in the given training set.We thereby demonstrate a framework which estimates the possible accuracy with which network can predict the hydration layers over the given surface.
However, no earlier studies, beyond ours, have applied DL to SPM at solid-liquid interfaces [47].Furthermore, this work is an entirely different workflow approach from the past studies of DL applied to SPM, due to the system-agnostic descriptor, out-ofdistribution detection technique, training data and DL model.
In the following sections, we present the complete workflow for the generalised tools to predict hydration layers over surfaces.This workflow comprises three parts.Firstly, a DL network that can rapidly predict hydration layers and screen candidate surfaces to characterise surface structures with hydration layers imaged from AFM, cif.Fig. 1.Secondly, a transfer-learning based training surface augmentation, afforded by element-agnostic surface descriptor design, to reliably include new surfaces and enhance the network capability.Lastly, an energy-based secondary DL model that benchmarks the possible accuracy of a network, given a set of training surfaces.Cumulatively, these methods aim to provide rapid, robust, and reliable AFM image analysis of hydration layers over varied surfaces.

MD and training data
We simulated (10 14) surfaces of calcite, magnesite, and dolomite using the inter-atomic potentials derived by Raiteri et al. [48], an extensively used forcefield for AFM imaging of carbonate minerals in liquids [16,14,49].We chose the SPC/Fw flexible model for water [50].In the case for (001) mica-water interactions, we used the CLAYFF forcefield [51] with TIP3P [52] water model, as used in mica-solvation studies in the context of AFM imaging  [53].We performed the MD simulations using the LAMMPS MD code [54].
These forcefields are functionally made of bonded (bonds, angles, and dihedrals) and non-bonded interactions.In the used forcefields, the non-bonded interactions, primarily responsible for surface-water interactions, comprised pairwise and electrostatic interactions, given respectively as: Hence, we designed a four-channel surface descriptor that encapsulates these non-bonding interactions.The pairwise interactions were taken from the pairwise interaction potentials between the surface atoms and water molecule, which were either Lennard-Jones or Buckingham type potential, cif.Eq. 1.To better capture the repulsive and the attractive parts of the non-bonded forcefield in a normalised input -required for effective network trainingthese fields were split into separate channels.Secondly, the electrostatic interactions were represented by the product of surface atom's charge and the charge on oxygen of the water, cif.Eq. 2. This accommodated the different water models used across training surfaces.The product-charges were smeared with a gaussian atomic density to get an equivalent electrostatic forcefield.Similar to the non-bonded forcefield, the electrostatic forcefield was also split into the positive and the negative electrostatic forcefield.As in our previous approach [25], we created a large dataset of surface defects by removing permutations of the top surface atoms in the calcite, the magnesite, and the mica surface (see Supplementary Information for more details).We generated 39,708 cases, 27,504 cases, and 49,872 cases of the calcite, the magnesite, and the mica surfaces, respectively.These cases included a, 10 Â 10 Â 20 Å 3 volume, four-channel descriptor of the surface for input and a target of hydration layer density over the surface, cif.Fig. 1 (b, e).Finally, these cases were distributed into training, validation, and test sets.The training set was used to optimise the network parameters.The validation set was used to determine the progress of training the network and avoid over-fitting.Lastly, the test set was used to gauge the unbiased performance of the trained network.In this work, an approximate split of 70%, 20%, and 10% of the dataset was used for training, validation, and testing, respectively.Given the large dataset in this work, a smaller test set was considered to be sufficient [55].Moreover, the work also includes tests on newer surfaces with reasonable results, which show consistent trends in test errors making us confident with the split used.

Machine learning
For training, we used the DL U-Net architecture, comprising three pooling scales and skip connections [56], as described in our previous work [25], cif.Fig. 1 (d).However, the model was re-purposed to take-in inputs of 4 channels.Although the network had to be trained from scratch, the element-agnostic approach of the four-channel descriptor lead to simpler integration of surfaces with varied surface elements.Newer surfaces were accommodated without the need to re-train a new network with random initial weights.We first trained our network on the (10 14) surfaces of calcite.Then we progressively augmented the training data set, one at a time, with the ( 1014) magnesite and (001) mica surfaces, to demonstrate the transfer-learning aspect of the network.Further, we predicted increasingly challenging surfaces, that were not rep-resented in the training sets of the networks, to gauge the generalisability of the network with progressive data augmentation.
For out-of-distribution detection, an energy-based DL framework [57,29] was designed that follows the encoder part of the U-Net architecture, cif.Fig. 1 (c).The network takes in the fourchannel descriptor as input, and predicts a scalar in the form of a score associated with the log-likelihood of the input with respect to the training data.A contrastive divergence scheme [58] was used during training of the network, which maximised loglikelihood (or the energy score) for in-distribution surfaces while minimising that for out-of-distribution surfaces (see Supplementary Information for more details).In other words, a higher energy score from the network is better as it indicates that a surface is well represented in the training set -this is a convention for energy networks in machine learning and opposite to the intuitive understanding of energy from physics.

Results
We used the hydration layer densities over (10 14) dolomite, (010) boehmite, (010) pyrophyllite surfaces in addition to the test cases of ( 1014) calcite, magnesite, and (001) mica to benchmark the four-channel descriptor, see Fig. 2. We trained the network on an extensive set of the calcite surfaces and progressively add magnesite and mica surfaces to monitor the training.We compare the simulated and predicted densities by plotting the 1D mean density along the z direction for the dolomite surface.We also plot the 2D mean density, and xy slices of the hydration layers along the z direction.Further, we plot the prediction errors to gauge the transfer learning ability of the element-agnostic approach, along with the score of the energy-based DL model, with the progressive inclusion of several surfaces in the training set.We use matplotlib [59] to plot the graphs.
In the first stage of training, the DL U-Net model was trained on the calcite and magnesite surfaces.The training data didn't contain any surface with dolomite (which has both calcium and magnesium atoms, Fig. 2).Hence, the network wasn't trained to understand the interplay of the hydration layers over the magnesium and the calcium atoms.Regardless, similar to the previous network [25], the network predicts the location of the hydration layers well, albeit with slightly offset magnitudes, cif.Fig. 3 (c, d).On comparing the 2D slices in Fig. 3 (a), the hydration density peaks clearly affect the neighbouring peaks.Moreover, the predicted hydration-densitypeak features over the calcium and the magnesite layers are similar to that in the simulated case.The network is distinctly able to exhibit the difference in the hydration layers over the calcium and the magnesium atoms [60], and deduces the interplay between the corresponding hydration peaks reasonably.This justifies the choice of the descriptor, especially given that the magnesium and calcium elements differ very slightly in their pair-coefficient forcefield terms with oxygen in the water model, and have the same charge in the forcefield parameterised by Raiteri et al. [48].
In the second stage of training, three versions of the model were trained with cascading training sets.The first network was trained on the calcite surfaces, while the second network continued with the parameters of the first network, albeit trained on calcite and magnesite surfaces.Similarly, the third network used the parameters of the second network, and was trained on calcite, magnesite, and mica surfaces.The mean absolute error (MAE) -L1 error -of the three networks in predicting the hydration layers of the dolomite, pyrophyllite, and boehmite surfaces, and the test sets of the calcite, magnesite and mica surfaces, is plotted in Fig. 5 (a) of the three networks.It is clear that the progressive inclusion of varied surfaces in the training sets results in a decrease in the prediction error.To test the generality of the trained networks, hydration layers were predicted on the surfaces of (010) pyrophyllite and (010) boehmite.The MAE of the predictions are higher, cif Fig. 5 (a).We attribute this to the fact that the surface structure of Al-O-H, present in pyrophyllite and boehmite, is not well represented in the mica-surface training set.Furthermore, the hydration-layer peaks predicted have a slight offset, cif Fig. 4 (d, h).For the hydration layers over boehmite surface the features in the xy slice of the first peak are similar, as seen in Fig. 4 (e), while the DL network predicts faint patterned features in the second and third peak which are non-existent in the simulated images.In the case of pyrophyllite, Fig. 4 (a), the network highlights some features in the xy slices more than others, when compared to the simulated images.Moreover, in the slice corresponding to the second peak, the network fills more water in the void seen in the simulated images, while capturing the light feature in the middle of the void.However, we note that our z-slice comparison is quite thorough and such differences will not be noticed in AFM images where the amplitude of the tip-oscillation convolves several z-slices [16].
In the final stage of the training, similar to the last step, three versions of the energy-based DL models were trained with cascading training sets.The energy values, or score, are plotted for the dolomite, pyrophyllite, and boehmite surfaces, along with the test sets of the calcite, magnesite and mica surfaces, using the three networks, as seen in Fig. 5 (b).A similar trend is observed where with progressive inclusion of varied surfaces in the training sets the predicted energy or the score of newer surfaces increases.However, due to the contrastive divergence method, the magnitude of the energy values are not normalised.Therefore, an external observable is needed to understand the energy values.Using  the MAE of the DL models, we can infer that the energy values above $5.1 indicate a reasonable prediction from our DL U-Net model.This implies that DL prediction can be used for the surfaces with an energy score above $5.1, otherwise MD simulation should be performed, cif Fig. 1 (c).

Conclusion
The DL U-Net architecture is a robust network suitable for machine translation tasks.The U-Net, combined with the novel four-channel forcefield-based descriptor -introduced in this work -can predict the hydration layers over surfaces with varied atomic structure.This element-agnostic approach is rapid, robust, and reliable.Additionally, the transfer learning technique is demonstrated during training such network.With the ability to reuse the training sets and the pre-trained model, the process of augmenting the training surfaces with newer surfaces is made straightforward.The energy based DL network is shown to identify out-of-distribution surfaces with its score as a figure-of-merit.A possible workflow is identified where new surfaces are tested with the energy based DL network, and the DL network is used to predict the hydration layers over the surface if the score is above the threshold, otherwise an MD simulation is performed.Moreover, this MD simulation of the out-of-distribution surface can be included in the training data to make the DL network more robust.This makes the workflow highly scalable and applicable to other surfaces in general.
Further work can involve using the solvent tip approximation (STA) [19], or its successor extended-STA [21,17], to simulate hydration layers as seen by AFM.A network trained on such target hydration layers can circumvent an additional step of performing such simulations, and get closer to direct AFM characterisation with a comprehensive DL workflow.Ultimately, as new materials are introduced to the training workflow, the DL network should gradually develop into a general tool for the rapid prediction of hydration structures on any interface without the need for direct MD simulations.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Fig. 1 .
Fig. 1.Schematic of the workflow for hydration layer prediction over a surface (here, (010) boehmite with aluminium, hydrogen and oxygen atoms represented by grey, green, and red circles, respectively): (a) The atomic structure of the surface.(b) The four-channel density descriptor of the surface, comprising non-bonded interactions (NB), zoomed-in attractive region of the non-bonded interactions (NB attractive), the positive and the negative parts of the electrostatic interactions (Pos and Neg, respectively).(c) The energy network.(d) The deep U-net neural network, adapted from [25] (CC-BY).(e) Simulated hydration layers over the surface.(f) Predicted hydration layers over the surface using the DL network.

Fig. 3 .
Fig. 3. Hydration layers prediction over (10 14) dolomite surface using the U-Net trained on (10 14) magnesite and calcite surfaces.(a) 2D slice comparison of the hydration layers in simulated and predicted hydration layer density.(b) 2D xz plane mean hydration layer density, (c) 1D z direction hydration layer density.(d) 1D hydration layers, along z axis, averaged over the calcium and the magnesium surface atoms.The blue, orange, red, and brown balls represent calcium, magnesium, oxygen, and carbon atoms, respectively.

Fig. 5 .
Fig. 5. (a) Mean absolute error in the prediction of hydration layers with networks trained on different training surfaces.(b) Energy values of surfaces with training data of different training surfaces.The "Ca", "Mg", and "Mica" labels represent the training sets of (10 14) calcite, (10 14) magnesite, and (001) mica surfaces, respectively.The predictions are shown on the calcite test set, the magnesite test set, the mica test set, and the surfaces of dolomite, pyrophyllite, and boehmite.