Sign up or log in for free. It helps support the project and unlocks personalized paper recommendations and new AI tools. .
Generating novel molecules with out-of-distribution properties is a major challenge in molecular discovery. While supervised learning methods generate high-quality molecules similar to those in a dataset, they struggle to generalize to out-of-distribution properties. Reinforcement learning can explore new chemical spaces but often conducts 'reward-hacking' and generates non-synthesizable molecules. In this work, we address this problem by integrating a state-of-the-art supervised learning method, STGG+, in an active learning loop. Our approach iteratively generates, evaluates, and fine-tunes STGG+ to continuously expand its knowledge. We denote this approach STGG+AL. We apply STGG+AL to the design of organic $\pi$-functional materials, specifically two challenging tasks: 1) generating highly absorptive molecules characterized by high oscillator strength and 2) designing absorptive molecules with reasonable oscillator strength in the near-infrared (NIR) range. The generated molecules are validated and rationalized in-silico with time-dependent density functional theory. Our results demonstrate that our method is highly effective in generating novel molecules with high oscillator strength, contrary to existing methods such as reinforcement learning (RL) methods. We open-source our active-learning code along with our Conjugated-xTB dataset containing 2.9 million $\pi$-conjugated molecules and the function for approximating the oscillator strength and absorption wavelength (based on sTDA-xTB).
This paper introduces STGG+AL, a novel active learning approach that leverages the strengths of supervised learning and active learning to tackle the challenge of generating out-of-distribution (OOD) molecules with desired optoelectronic properties, specifically targeting π-conjugated molecules. The significance lies in addressing the limitations of existing methods like supervised learning, which struggles with generalization beyond the training data, and reinforcement learning (RL), which often leads to ‘reward hacking’ and non-synthesizable molecules.
Key innovations:
1. Integration of STGG+ in an active learning loop: STGG+ (Spanning Tree Graph Generation) serves as a powerful supervised learning method for molecule generation. The active learning component iteratively generates molecules, evaluates their properties using time-dependent density functional theory (TD-DFT), and fine-tunes STGG+ to expand its knowledge continuously.
Prior Ingredients:
1. STGG+: A state-of-the-art autoregressive generative model that uses spanning tree-based graph generation, trained in a supervised manner with in-distribution and OOD capabilities.
Active Learning: An iterative learning process that combines supervised and unsupervised learning to generate and label new molecules, then retraining the model with the new data.
Time-Dependent Density Functional Theory (TD-DFT): A computational method used to validate and rationalize the generated molecules in-silico, providing a means to evaluate their optoelectronic properties.
π-conjugated molecules: These are molecules with delocalized π-electrons which are capable of enabling functionalities such as those required in OLEDs and SWIR absorbers.
Action | Title | Date | Authors |
---|
Action | Title | Date | Authors |
---|