🧬 Made With Bio
Learn how to create with biology. WIP: survey on the intersection of genomics + machine learning.
Current survey topics:
- Instructions (DNA, RNA)
- Parts of the DNA (regularity motives, gene expressions, etc.)
- Processes (transcription, translation)
- Machines (proteins, types of proteins, enzymes)
- Properties (toxicity, solubility, heat resistance, inhibition, stability, etc.)
- Pathways
- Cell components (mitochondria, nucleus, etc.)
- Synthesis (plasmid maps)
- Microbes
- Techniques (PCR, imaging, CRISPR, etc.)
- Synthetic biology process (design, build, test)
- Data sources (proteins [uniprot], enzymes, compounds [ZINC], drug repurposing hub, evolutionary multiple sequence alignment (used in alpha fold), ChEMBL, NCATS, PubChem, Drugbank, etc.)
- ZINC, a library of commercially-available compounds;
- PubChem, molecules with biological relevance;
- ChEMBL, molecules with bioactivity data; be
- DrugBank, approved or experimental therapeutic molecules
- Parts of the process (benchling, cloud labs, etc.)
- Different stages (invivo, insilco, in vitro, etc.)
- Drug discovery process (phases)
- Relevant ML algorithms (GNNs, representation learning (for graph encoding), deep generative models (for graph generation) like GANs, VAEs)
- Experimental evaluation (DELs and Phage display) — more info in paper on BB’s paper
- Benchmarks (MOSES)
- Types of interventions once a target is identified:
- Genetic (CRISPR)
- Chemical (hit discovery, molecular generation, etc.)
- Phenotypic SAR
- molecular representations for virtual screening (aka molecular property prediction)
- Virtual screening is faster than experimental screening (and cheap since it’s all digital)
- Restricted to commercially available compounds (ex. ZINC library)
- De novo drug design (inverse of virtual screening): directory generate a compound with certain properties
- Both work hand-in-hand (ex. Virtual screening to generate training data where confidence is low)
Sign up for updates!