Machine learning for science and scientific data management

Session 1A: 10:00 AM – 12:10 PM Pacific Time on Friday, August 21.

Stephen Whitelam*, Mary Scott, Edward Barnard

This symposium surveyed recent applications of machine learning and artificial intelligence relevant to the operation of a nanoscience user facility now and in the future. We will highlight exciting work in a diverse set of topics ranging from robotics to public health to data search and storage, with a view to stimulating ideas and discussion about possible future initiatives at the Molecular Foundry.

Symposium Schedule:

10:00 am

Introduction

10:05 am

Invited: Controlling the thermostat with reinforcement learning

Dr. Isaac Tamblyn, National Research Council of Canada

10:32 am

ScienceSearch: Enabling Search through Automatic Metadata Generation for NCEM Data

Dr. Lavanya Ramakrishnan, Computational Research Division, Berkeley Lab

10:42 am

Accelerating the Development of Tight Binding Model Hamiltonians Using Machine Learning

Prof. Wissam Saidi, Mechanical Engineering & Materials Science, University of Pittsburgh

10:52 am

Invited: Contextualized Data for Materials Informatics

Dr. Julia Ling, Citrine Informatics

11:19 am

Elucidating reaction networks in Cs-Pb-Br nanocrystals with high-throughput synthesis and transformation reactions

Jakob Dahl, Chemistry, UC Berkeley // Materials Sciences Division and Molecular Foundry, Berkeley Lab

11:29 am

Robot-Accelerated Perovskite Investigation and Discovery

Dr. Zhi Li, Molecular Foundry, Berkeley Lab

11:39 am

Invited: Compressed sensing in optical microscopy

Prof. Andrea Bassi, Physics, Politecnico di Milano

12:06 am

Questions & Conclusion

Symposium Abstracts:

10:05 AM

Controlling the thermostat with reinforcement learning

Dr. Isaac Tamblyn
National Research Council of Canada

Reinforcement learning (RL) agents can automatically develop complex control policies through interactive exploration and experimentation. Many RL algorithms exist. Here I will present two case studies of RL in the physical sciences, both of which are forms of non-equilibrium optimization. In the first example an agent learned a simulated experimental control scheme for the guided self-assembly of a model molecular system. The second example demonstrates an agent guiding the annealing process of a collection of interacting spins. In both cases, RL algorithms were provided with a simple definition of success (e.g. finding particular structural motifs or low energy structures) and sought to find the sequential set of decisions needed to achieve their objective. The success of RL across these seemingly disparate set of application domains suggests its potential as a powerful and flexible tool for exploratory science.

10:32 AM

ScienceSearch: Enabling Search through Automatic Metadata Generation for NCEM Data

Dr. Lavanya Ramakrishnan
Computational Research Division, Berkeley Lab

Scientific facilities are increasingly generating and handling large amounts of data. Search capabilities are critical to enable scientists to discover datasets of interest. However, scientific datasets often lack the signals or metadata required for effective searches. In this talk, we will present ScienceSearch, a system infrastructure that uses machine learning techniques to capture and learn the knowledge, context, and surrounding artifacts from data to generate metadata to enable search. Our current implementation is focused on TEAM I microscope dataset from the National Center for Electron Microscopy (NCEM). I will outline work to date and future directions for the project.

10:42 AM

Accelerating the Development of Tight Binding Model Hamiltonians Using Machine Learning

Prof. Wissam Saidi
Mechanical Engineering & Materials Science, University of Pittsburgh
Coauthors: David Abramovitch¹, Liang Tan¹
¹Molecular Foundry

The development of statistical tools based on machine learning and deep networks is actively sought for materials design problems. While tight binding model Hamiltonians based on quantum mechanical methods enables orders of magnitude speedup of first principle simulations, the development of these Hamiltonian models is laborious and a user-intensive process. Further, traditional fitting approaches rely on fitting the parameters to privileged functional forms to describe physical interactions in the system. Herein we use convolutional neural networks to develop a predictive model for the parameters of tight binding Hamiltonian models based on maximally localized Wannier Function representation. We apply our approach to metal halide perovskites and show that the model can well describe electron-phonon coupling and the electronic properties of the system at finite temperature.

10:52 AM

Contextualized Data for Materials Informatics

Dr. Julia Ling
Citrine Informatics

The utility of machine learning and other data-driven approaches depend on the underlying data quality. Materials and chemicals data are particularly challenging to represent because of the complexity of materials processing, the heterogeneity of data, the primacy of uncertainty estimates, and the difference between intention and reality in any given materials run. This talk will explore a data model, GEMD, which has been developed to capture the complexity of materials data for use in data-driven analysis.

11:19 AM

Elucidating reaction networks in Cs-Pb-Br nanocrystals with high-throughput synthesis and transformation reactions

Jakob Dahl
Chemistry, UC Berkeley // Materials Sciences Division and Molecular Foundry, Berkeley Lab

Advances in automation and data analytics can aid exploration of the complex chemistry of nanoparticles. Lead halide perovskite colloidal nanocrystals provide an interesting proving ground: there are reports of many different phases and transformations, which has made it hard to form a coherent conceptual framework for their controlled formation through traditional methods. In this work, we systematically explore the portion of Cs-Pb-Br synthesis space in which many optically distinguishable species are formed using high-throughput robotic synthesis to understand their formation reactions. We deploy an automated method that allows us to determine the relative amount of absorbance that can be attributed to each species in order to create maps of the synthetic space. Based on these maps, we test potential transformation routes between perovskite nanocrystals of different shapes and phases. We find that shape is determined kinetically, but many reactions between different phases show equilibrium behavior. We demonstrate a dynamic equilibrium between complexes, monolayers and nanocrystals of lead bromide, with substantial impact on the reaction outcomes. This allows us to construct a chemical reaction network that qualitatively explains our results as well as previous reports and can serve as a guide for those seeking to prepare a particular composition and shape.

11:29 AM

Robot-Accelerated Perovskite Investigation and Discovery

Dr. Zhi Li
Molecular Foundry, Berkeley Lab
Mansoor Ani Najeeb¹, Emory Chan², Alexander Norquist¹, Joshua Schrier³
¹Haverford College; ²Molecular Foundry; ³Fordham University

Metal halide perovskites are a promising class of materials for next-generation photovoltaic and optoelectronic devices. The discovery and full characterization of new perovskite-derived materials are limited by the difficulty of growing high quality crystals needed for single-crystal X-ray diffraction studies. We present an automated, high-throughput approach for metal halide perovskite single crystal discovery based on inverse temperature crystallization (ITC) as a means to rapidly identify and optimize synthesis conditions for the formation of high quality single crystals. Using this automated approach, a total of 8172 metal halide perovskite synthesis reactions were conducted using 45 organic ammonium cations. This robotic screening increased the number of metal halide perovskite materials accessible by an ITC synthesis route by more than five-fold and resulted in the formation of two new phases, [C₂H₇N₂][PbI₃] and [C₇H₁₆N]₂[PbI₄]. This comprehensive dataset allows for a statistical quantification of the total experimental space and of the likelihood of large single crystal formation. Moreover, this dataset enables the construction and evaluation of machine learning models for predicting crystal formation conditions. This work is a proof-of-concept that combining high throughput experimentation and machine learning accelerates and enhances the study of metal halide perovskite crystallization. This approach is designed to be generalizable to different synthetic routes for the acceleration of materials discovery.

11:39 AM

Compressed sensing in optical microscopy

Prof. Andrea Bassi
Physics, Politecnico di Milano
Coauthors: Andrea Farina, Cosimo D’Andrea, Gianmaria Calisesi

Compressed sensing (CS) is an approach that solves ill-defined inverse problems, from data that are undersampled according to the Nyquist criterium. CS exploits sparsity constraints based on the knowledge of prior information, relative to the structure of the object in the spatial or other domains. CS is commonly used in image and video compression as well as in a number of imaging applications including computed tomography, magnetic resonance imaging. In the field of optical microscopy, it has been demonstrated valuable for single-molecule localization, superresolution, light-sheet microscopy but also for conventional wide-field imaging. The presentation will illustrate the working principles of CS, reviewing its applications in optical microscopy and focusing on some possible usage of CS in microscopy for the field of material sciences.