Researchers from Xi’an Jiaotong University have developed DrugGPT, an autoregressive model based on the GPT framework specifically designed for drug development. DrugGPT addresses the challenge of exploring the vast chemical space required in the drug discovery process. By employing tokenization with the Byte Pair Encoding algorithm, a finite vocabulary is capable of representing an infinite number of potential drug candidates. The model was trained on ligand and protein datasets, enabling it to design ligands based on protein sequences, fulfill specific criteria, and autonomously generate ligand designs.

There are three ways in which DrugGPT designs ligands. It can design ligands based on protein sequences as inputs. It can also design ligands that fulfill specific criteria and can autonomously generate ligand designs as well, even when no input is given. After the ligands are generated, screening and optimization steps follow. Ultimately, the selected compounds are subjected to experimental validation.

DrugGPT’s autoregressive approach improves its accuracy in understanding the relationships between chemical structure and activity. It demonstrates stability, convenience in optimization, adaptability, and avoids the Mode Collapse problem. With its ability to streamline and accelerate drug development, DrugGPT shows great potential in transforming the process into a faster and more efficient technique.

Article written by Neegar Naushaba Iqbal