Generating $\pi$-Functional Molecules Using STGG+ with Active Learning

Type: Preprint
Publication Date: 2025-02-20
Citations: 0
DOI: https://doi.org/10.48550/arxiv.2502.14842

Abstract

Generating novel molecules with out-of-distribution properties is a major challenge in molecular discovery. While supervised learning methods generate high-quality molecules similar to those in a dataset, they struggle to generalize to out-of-distribution properties. Reinforcement learning can explore new chemical spaces but often conducts 'reward-hacking' and generates non-synthesizable molecules. In this work, we address this problem by integrating a state-of-the-art supervised learning method, STGG+, in an active learning loop. Our approach iteratively generates, evaluates, and fine-tunes STGG+ to continuously expand its knowledge. We denote this approach STGG+AL. We apply STGG+AL to the design of organic $\pi$-functional materials, specifically two challenging tasks: 1) generating highly absorptive molecules characterized by high oscillator strength and 2) designing absorptive molecules with reasonable oscillator strength in the near-infrared (NIR) range. The generated molecules are validated and rationalized in-silico with time-dependent density functional theory. Our results demonstrate that our method is highly effective in generating novel molecules with high oscillator strength, contrary to existing methods such as reinforcement learning (RL) methods. We open-source our active-learning code along with our Conjugated-xTB dataset containing 2.9 million $\pi$-conjugated molecules and the function for approximating the oscillator strength and absorption wavelength (based on sTDA-xTB).

Locations

  • arXiv (Cornell University)

Ask a Question About This Paper

Summary

This paper introduces STGG+AL, a novel active learning approach that leverages the strengths of supervised learning and active learning to tackle the challenge of generating out-of-distribution (OOD) molecules with desired optoelectronic properties, specifically targeting π-conjugated molecules. The significance lies in addressing the limitations of existing methods like supervised learning, which struggles with generalization beyond the training data, and reinforcement learning (RL), which often leads to ‘reward hacking’ and non-synthesizable molecules.

Key innovations:
1. Integration of STGG+ in an active learning loop: STGG+ (Spanning Tree Graph Generation) serves as a powerful supervised learning method for molecule generation. The active learning component iteratively generates molecules, evaluates their properties using time-dependent density functional theory (TD-DFT), and fine-tunes STGG+ to expand its knowledge continuously.

  1. Targeted design of π-functional materials: The method is applied to two specific tasks: generating molecules with exceptionally high oscillator strength (fosc) and designing molecules with high fosc in the near-infrared (NIR) range.

Prior Ingredients:
1. STGG+: A state-of-the-art autoregressive generative model that uses spanning tree-based graph generation, trained in a supervised manner with in-distribution and OOD capabilities.

  1. Active Learning: An iterative learning process that combines supervised and unsupervised learning to generate and label new molecules, then retraining the model with the new data.

  2. Time-Dependent Density Functional Theory (TD-DFT): A computational method used to validate and rationalize the generated molecules in-silico, providing a means to evaluate their optoelectronic properties.

  3. π-conjugated molecules: These are molecules with delocalized π-electrons which are capable of enabling functionalities such as those required in OLEDs and SWIR absorbers.

Similar Works

Action Title Date Authors
Active Learning Enables Extrapolation in Molecular Generative Models 2025-01-03 Evan R. Antoniuk Peggy Li Nathan Keilbart Stephen E. Weitzner Bhavya Kailkhura Anna M. Hiszpanski
High-throughput property-driven generative design of functional organic molecules 2022-01-01 Julia Westermayr Joe Gilkes Rhyan Barrett Reinhard J. Maurer
STRIDE: Structure-guided Generation for Inverse Design of Molecules 2023-01-01 Shehtab Zaman Denis Akhiyarov Mauricio Araya‐Polo Kenneth Chiu
It Takes Two to Tango: Directly Optimizing for Constrained Synthesizability in Generative Molecular Design 2024-10-15 Jeff Guo Philippe Schwaller
Less is more: Sampling chemical space with active learning 2018-05-22 Justin S. Smith Benjamin Nebgen Nicholas Lubbers Olexandr Isayev Adrián E. Roitberg
Distributed Reinforcement Learning for Molecular Design: Antioxidant case 2023-01-01 Huanyi Qin Denis Akhiyarov Sophie Loehlé Kenneth Chiu Mauricio Araya‐Polo
Active Causal Learning for Decoding Chemical Complexities with Targeted Interventions 2024-04-05 Zachary Fox Ayana Ghosh
Active Causal Learning for Decoding Chemical Complexities with Targeted Interventions 2024-08-15 Zachary Fox Ayana Ghosh
Materials Discovery with Extreme Properties via AI-Driven Combinatorial Chemistry 2023-01-01 Hyunseung Kim Hae-Yeon Choi Dongju Kang Won Bo Lee Jonggeol Na
Materials discovery with extreme properties <i>via</i> reinforcement learning-guided combinatorial chemistry 2024-01-01 Hyunseung Kim Haeyeon Choi Dongju Kang Won Bo Lee Jonggeol Na
Faster and more diverse de novo molecular optimization with double-loop reinforcement learning using augmented SMILES 2022-01-01 Esben Jannik Bjerrum Christian Margreitter Thomas Blaschke Raquel Lopez-Rios de Castro
Active Learning Exploration of Transition Metal Complexes to Discover Method-Insensitive and Synthetically Accessible Chromophores 2022-09-15 Chenru Duan Aditya Nandy Gianmarco Terrones David W. Kastner Heather J. Kulik
Active Learning Exploration of Transition Metal Complexes to Discover Method-Insensitive and Synthetically Accessible Chromophores 2022-01-01 Chenru Duan Aditya Nandy Gianmarco Terrones David W. Kastner Heather J. Kulik
Active Learning Exploration of Transition-Metal Complexes to Discover Method-Insensitive and Synthetically Accessible Chromophores 2022-12-01 Chenru Duan Aditya Nandy Gianmarco Terrones David W. Kastner Heather J. Kulik
Deep Reinforcement Learning for Inverse Inorganic Materials Design 2022-01-01 Elton Pan Christopher Karpovich Elsa Olivetti
Active-Learning-Based Generative Design for the Discovery of Wide-Band-Gap Materials 2021-07-20 Rui Xin Edirisuriya M. Dilanga Siriwardane Yuqi Song Yong Zhao Steph-Yves Louis Alireza Nasiri Jianjun Hu
Analysis of training and seed bias in small molecules generated with a conditional graph-based variational autoencoder -- Insights for practical AI-driven molecule generation 2021-01-01 Seung-Gu Kang Joseph A. Morrone Jeffrey K. Weber Wendy D. Cornell
Mol-AIR: Molecular Reinforcement Learning with Adaptive Intrinsic Rewards for Goal-directed Molecular Generation 2024-03-29 J.H Park Jaegyoon Ahn Jong-Hwan Choi Jibum Kim
Inverse molecular design from first principles: Tailoring organic chromophore spectra for optoelectronic applications 2022-04-22 James D. Green Eric G. Fuemmeler Timothy J. H. Hele
Improving Molecular Design by Stochastic Iterative Target Augmentation 2020-02-12 Kevin Yang Wengong Jin Kyle Swanson Regina Barzilay Tommi Jaakkola

Cited by (0)

Action Title Date Authors

Citing (0)

Action Title Date Authors