At the fundamental level, machine-learning consists a set of statistical methods to help make sense of complex, high-dimensional data. These methods can be divided into two main categories: supervised and unsupervised learning.
In many applications we want to find how the properties of a material correlate to its descriptors. When the relationship between the two is complex, simple fitting schemes can no longer be applied and we turn to machine-learning. With its supervised flavor, we can train a model that approximates the descriptor-property relation, based on actual knowledge. Its internal parameters are adjusted to reproduce known materials as best as possible, while keeping its complexity low. Afterwards, we can estimate the properties of new materials by feeding their descriptors to the model, and the prediction is calculated much faster and cheaper than measuring the material's properties experimentally or with accurate quantum chemistry methods.
Machine-Learning methods can be used to avoid large portions of the immense materials-space and perform searches and optimisations much faster than conventional methods. However, there are a few challenging point to address. The descriptor has to be carefully defined so that it contains the physics we hope to machine-learn with it. Ideally it is invariant with respect to translations, rotations and atomic permutations. It also has to be complete and uniquely define only one system. Smoothness is also important, i.e. the descriptors of two similar systems should not be too different. Nanolayers is actively developing such descritors together with novel machine-learning methods tailored for catalyst systems. The other main issue is data: more comples ML models contain more internal parameters, which demand more data to train. Large amounts of quantum chemistry calculations are then necessary to build a suitable database of catalysts, and are currently being performed at TUT and Aalto.