Imagine for a second that instead of asking an AI to write a poem or a block of Python code, you ask it to write the genetic blueprint for a completely new organism. Not just a slight tweak to a bacteria we already know, but an entirely novel biological system designed from the ground up.
It sounds like science fiction, but we are effectively crossing that rubicon right now. Today’s prompt from Daniel is about Evo, the foundation model from the Arc Institute that is essentially doing for the language of DNA what GPT did for human language.
And by the way, just a quick bit of meta-context for the listeners—today’s episode is actually being powered by Google Gemini three Flash. It’s writing our script today, which is fitting since we’re talking about one AI model’s ability to decode the most complex "code" in existence.
I’m Herman Poppleberry, and honestly Corn, this is the stuff that keeps me up at night in the best way possible. Evo isn’t just another protein folder like AlphaFold. It’s a generative model. We’ve moved from reading and predicting biology to actually authoring it.
That’s the distinction that really jumped out at me. AlphaFold was about "Given this sequence, what shape does it take?" Evo is asking, "Given this desired function, what sequence do I need to invent to make it happen?" It’s a shift from analysis to synthesis.
It really is. And the scale here is staggering. We’re talking about a model trained on roughly three hundred billion tokens of genomic data. But these aren’t words; they’re nucleotides—the A, C, T, and G of the genetic code.
So, let’s frame this for everyone. What exactly is Evo? If I’m a researcher at a biotech firm, is this a tool I’m using to design a better detergent enzyme, or are we talking about something much more foundational?
It’s both, but the "foundation" part is the key. Evo is a forty billion parameter model. To put that in perspective, that’s a massive amount of "brain power" dedicated solely to understanding the patterns of life across every domain—bacteria, archaea, even eukaryotes. The Arc Institute, working with folks like Patrick Hsu and Brian Hie, basically said, "If we treat the entire tree of life as a single dataset, can an AI learn the grammar of existence?"
I love that phrase, "the grammar of existence." Because DNA really is just a long string of characters, right? But the context window matters. If you only look at ten characters at a time, you see nothing. You need to see the whole paragraph to understand the "meaning" of a gene.
You’ve hit on the biggest technical hurdle they had to clear. Standard transformers, the kind used for chatbots, have a hard time with really long sequences because the computational cost grows quadratically. But DNA sequences are millions of base pairs long. Evo uses an architecture called StripedHyena, which allows it to handle much longer contexts—up to a megabase, or one million nucleotides—with near-linear scaling.
A million nucleotides. That’s enough to capture entire genomes of many bacteria in one go.
And that’s why it can "see" how a promoter at one end of a sequence affects a protein expression at the other end. It understands the architecture of a cell, not just the spelling of a single gene.
So, how does it actually work under the hood? You mentioned StripedHyena. Is this just a fancy version of "predict the next letter," or is there something more biological happening in the weights of the model?
At its core, it is still self-supervised learning—predicting the next nucleotide. But because it’s trained across hundreds of thousands of species, it’s learning "evolutionary constraints." It figures out that if you change this "letter" here, the whole organism dies, so it learns that certain patterns are sacred. It’s capturing the "rules" of biology that took four billion years to write.
It’s like learning English by reading every book ever written, but also seeing which books were successful and which ones were nonsense.
Right. And then, once it has that internal map, you can use it for "zero-shot" tasks. You can ask it to predict how a mutation might cause a disease, or you can ask it to generate a sequence for a new CRISPR system.
Wait, did you say it designed a new CRISPR system? Like, the gene-editing tool?
Yes. This is one of the most famous demos from the Arc Institute. They used Evo to design novel CRISPR-Cas molecular complexes from scratch. These aren’t things found in nature. They synthesized them in a lab, and they actually worked. They were functional.
That’s wild. We’ve spent decades hunting through the dirt and the ocean to find natural CRISPR systems in bacteria, and this thing just dreamt one up in a GPU cluster?
It’s a complete paradigm shift. Usually, bioprospecting is like looking for a needle in a haystack. Evo allows us to just build the needle we want.
Okay, but let’s talk about the trade-offs here. If I’m a skeptic, I’m thinking: biology is messy. It’s wet. It’s unpredictable. Can a "digital" model really capture the physical reality of how a protein folds and interacts in a crowded cell?
That’s the million-dollar question. Evo is incredible at the "writing" part, but it doesn't replace the "testing" part. You still have to synthesize the DNA, put it into a cell, and see if it works. But instead of testing ten thousand random variants, Evo might give you ten candidates that are ninety percent likely to work. It’s an efficiency gain of several orders of magnitude.
So it’s a filter for the impossible. It narrows the search space so humans can focus on the viable stuff.
Precisely. And what’s even more interesting is the "multi-scale" aspect. Evo doesn't just look at DNA; it understands how that DNA becomes RNA, and how that RNA becomes a protein. It’s modeling the entire central dogma of molecular biology in a single latent space.
I want to dig into the "organism-level" claims. The prompt mentions designing "entire organisms." Are we talking about AI-generated "Franken-bacteria" running around?
We’re not quite at "designing a pet dragon" yet, Corn. But we are at the stage where we can design "minimal genomes." Think of a simplified bacteria that only does one thing—maybe it just eats plastic or produces insulin—without any of the "extra" genetic baggage that natural organisms have. Evo can help architect those streamlined genomes.
That sounds like a dream for industrial biotech. Instead of trying to coax a stubborn, natural yeast to make a chemical, you just design a biological "factory" that has no other purpose.
That’s the vision. But this brings us to the second-order effects and some of the more... let’s say, spicy implications. If the barrier to entry for designing functional biological systems drops from "decades of PhD research" to "running a prompt on a high-end server," what does that do to biosecurity?
Yeah, that’s the shadow side, isn’t it? If it can design a novel CRISPR system, it can presumably design a novel toxin or a more resilient pathogen.
It’s a huge concern. The researchers at Arc are very aware of this. They’ve actually worked with groups like Goodfire to develop "interpretability" tools. They want to see inside the "neural cathedral" of the model—which we talked about in a different context a while back—to understand why it’s making certain choices. If we can see the "features" it’s using to design a sequence, maybe we can build guardrails to prevent it from generating harmful sequences.
But guardrails on an open-source or widely available model are notoriously leaky. If the weights are out there, someone with a lab and a bit of "bio-hacking" spirit could do some real damage.
True, but we have to weigh that against the upside. Think about the "Red Queen’s Race" we’re in with superbugs—antibiotic resistance. We are losing that race because bacteria evolve faster than we can invent new drugs. An AI like Evo could allow us to design "evolution-proof" antibiotics or even programmable phages that hunt specific bad bacteria.
It’s fighting fire with fire. We use an AI that understands evolution to out-evolve the things that are trying to kill us.
And it’s not just about drugs. Think about climate change. We could design enzymes that pull carbon out of the atmosphere at a rate ten times faster than any tree. Or organisms that can survive in soil that’s currently too salty or too dry for crops.
I’m curious about the "interpretability" part you mentioned. How do you "visualize" what an AI thinks about DNA? Is it just looking at heatmaps of activations?
It’s actually really cool. They can identify specific "neurons" or clusters that respond to certain biological motifs—like a TATA box or a specific protein fold. It’s like finding the "concept" of a "verb" in a language model. Once you know where the "toxic protein" concept is, you can theoretically steer the model away from it.
That’s a fascinating bridge between AI safety and biological safety. It turns out the problems are actually quite similar.
They are. And speaking of similarities, I think we should contrast this with AlphaFold three, which we’ve discussed before. AlphaFold is like the world’s best dictionary—it tells you what every "word" (protein) looks like. Evo is like the world’s best novelist—it’s taking those words and writing entirely new stories.
I love that. So AlphaFold is the reference desk, and Evo is the creative studio.
That’s a great way to put it. And because Evo is trained on such a diverse range of life—over a hundred thousand species—it has "seen" solutions to biological problems that humans haven't even thought of. There are weird deep-sea bacteria that survive in boiling water; there are organisms that live in pure acid. Evo has all of that "knowledge" in its weights.
It’s a library of every trick nature has ever pulled.
And it can mix and match them! It might take a "heat-stability" trick from an extremophile and apply it to a human-designed enzyme for a laundry detergent that works in boiling water. That kind of "cross-species" engineering is incredibly hard for humans to do manually.
It makes me think about the "democratization" aspect. Does this mean the next big biotech breakthrough might come from a garage startup rather than a multi-billion dollar pharma giant?
It certainly lowers the "compute" barrier. You still need the "wet lab" to validate, and that’s still expensive. But the time spent in the "guessing phase" goes from years to days. That’s a huge tailwind for small, agile teams.
What about the regulatory side? How do you even begin to regulate "AI-generated life"? If I design a bacteria that doesn't exist in nature, does it fall under existing GMO laws, or do we need a whole new category for "Synthetic Generative Biology"?
We are definitely going to need a new category. Current laws are mostly based on "moving" genes from one thing to another—like putting a fish gene in a tomato. But if the gene was "hallucinated" by an AI and doesn't exist anywhere else, existing laws are a bit fuzzy.
"Hallucinated" life. That’s a terrifying and beautiful phrase at the same time.
It really is. And the Arc Institute is being quite radical here by being so open. They are a non-profit, and they are releasing a lot of this work into the wild because they believe the "science" needs to happen in the light. We’ve talked about the Allen Institute for AI before and their "openness" mission—Arc seems to be playing in that same headspace for biology.
It’s a bold move. It forces the conversation into the public square before the tech is fully "locked down" by proprietary interests.
Right. And for our listeners who are in the tech or bio space, the takeaway here is that "Bio-AI" is no longer just about predicting structures. It’s about design. If you’re a developer, you might want to look at the "StripedHyena" architecture. If you’re a biologist, you should probably be looking at "Evo Designer."
It’s like the early days of LLMs where people were just figuring out what they could do. We’re in the "prompt engineering" phase of biology.
"Prompt: Design a protein that binds to this specific spike protein but is stable at room temperature." That’s a real thing you can do now.
So, what’s the "aha" moment for someone who isn't a biologist? Why should they care that a donkey and a sloth are talking about nucleotides?
Because this is how we solve the "irreducible" problems. We can’t "code" our way out of cancer with traditional software. We can’t "policy" our way out of the next pandemic if we don't have the tools to respond instantly. Evo represents the moment biology becomes a "programmable" discipline.
It’s the "software-ization" of the physical world.
Yes! And that has massive implications for everything from the food we eat to the materials our phones are made of. Imagine a phone case that was "grown" by an AI-designed fungus that is stronger than plastic and fully biodegradable.
I’m still waiting for my bioluminescent trees to replace streetlights. Is Evo going to give me my Avatar forest?
It’s closer than you think! But seriously, the practical, near-term stuff is more like "enzymes for plastic degradation." We have a massive plastic problem; nature hasn't had time to evolve a way to eat it efficiently. We can use Evo to "accelerate" that evolution in a computer and then deploy it in the real world.
That feels like a very "pro-human" application of the tech. Using our intelligence to fix the mess our previous intelligence made.
It’s the ultimate "second-order effect." Our industrial age created a biological mismatch with our environment. Our AI age might be the only way to re-align them.
I like that. It’s a bit more optimistic than the "AI-generated plague" scenario.
You have to have both in mind. But the potential for "generative biology" to fix things like "forever chemicals" or carbon capture is just too big to ignore.
So, if I’m an investor or a founder, what’s the "new category" of startup here? Is it "Model-as-a-Service" for bio?
I think it’s "Vertical Bio-Design." Startups that don't try to be "the next Evo," but instead use Evo to dominate a specific niche—like "the best agricultural enzymes" or "the most efficient carbon-sequestering algae." The "foundational" work is being done by Arc; the "application" layer is wide open.
It’s the "app store" for biology.
Precisely. And the "SDK" is the genetic code.
Okay, let's pivot slightly to the "Evo two" developments. I saw some news recently about them scaling this up even further. What’s the delta between the first version and what’s coming next?
Evo two is basically Evo one on steroids. They’ve scaled the training data by thirty times—we’re talking about nine trillion nucleotides now. And the architecture has been refined to be even more efficient. They’re calling it a "generalist" model, meaning it’s getting better at handling those complex "eukaryotic" genomes—meaning plants, animals, and us.
That’s the big jump, right? Moving from simple bacteria to complex multicellular life.
It is. The "grammar" of a human genome is vastly more complex than a E. coli bacterium. There’s so much "non-coding" DNA that we used to call "junk," but models like Evo are starting to realize it’s not junk at all—it’s the "operating system" that tells the "apps" (the genes) when to run.
So we might finally understand what all that "junk" DNA is actually doing?
We’re already seeing it! Evo two can identify disease-causing mutations in those non-coding regions that traditional tools just ignored. It’s like finding a bug in the "config file" of a program rather than in the "code" itself.
That’s a great analogy—which I know we’re supposed to keep to a minimum, but it works! It’s the difference between a typo in a sentence and a mistake in the page layout that makes the sentence unreadable.
And the "zero-shot" performance on these tasks is what’s blowing everyone’s minds. The model isn't "trained" to find heart disease mutations; it just "knows" what a healthy genome looks like so well that it can spot the "wrongness" of a mutation automatically.
It has a "sense" for biological "truth."
In a way, yes. It’s learned the underlying "manifold" of viable life. Anything that falls too far off that manifold is flagged as a potential problem.
I’m thinking about the "dual-use" concerns again. If we can identify "wrongness," can we also identify "optimal lethality"? This is where the "biosecurity" people get really nervous.
And they should be. There’s a lot of talk right now about "screening" DNA synthesis orders. If you try to order a sequence that looks like smallpox, the synthesis company should flag it. But if an AI designs a sequence that is "functionally" like smallpox but "looks" totally different to a simple screening tool... that’s a problem.
So the "security" tools also need to be powered by models like Evo to understand "functional" risk, not just "keyword" risk.
It’s an arms race, Corn. A literal biological arms race. But the good news is that the same models that can design the "threat" are our best hope for designing the "shield."
It’s the "Cybersecurity" model applied to our actual cells. We need "Bio-Firewalls" and "Bio-Antivirus."
We really do. And that’s a whole new industry in itself. I can see a future where every hospital has a "Bio-Sequencer" that is constantly scanning for any "AI-generated" patterns in the local environment.
It sounds intense, but honestly, it’s probably inevitable. We’ve opened the box.
We have. And the box is full of incredible potential. Let’s talk about some of the more "practical" takeaways for people who want to follow this. First, keep an eye on the Arc Institute’s publications in "Nature" and elsewhere. They are the gold standard for this right now.
And if you’re a dev, check out the "StripedHyena" repo. Even if you don't care about biology, the way they handle long-context sequences is a masterclass in modern AI architecture.
Also, for the investors out there: look at the companies that are specializing in "in-silico" design. The days of "spray and pray" drug discovery are numbered. The future is "Design, then Build."
"Design, then Build." It sounds so simple, but it’s taken us four billion years to get here.
It really has. And I think we should wrap up with the "big open question." As we start to "write" the code of life, what does it mean to be "natural"? If a bacteria is designed by an AI to clean up an oil spill, is it any less "part of nature" than a bacteria that evolved in a swamp?
That’s a deep one. We’re essentially becoming "co-authors" of the biosphere. It’s a lot of responsibility for a species that still can't agree on how to handle a basic virus.
It’s the ultimate test of our maturity as a civilization. We’ve grabbed the steering wheel of evolution. Now we have to learn how to drive.
Well, hopefully we don't drive it into a ditch. Or a swamp.
I’m cautiously optimistic. The people I see working on this—at Arc, at Anthropic, even at Google—are deeply concerned with the ethics. It’s not just "move fast and break things" when "things" are living organisms.
Let’s hope that "caution" stays as high as the "optimism."
Agreed. This has been a wild one, Daniel. Thanks for sending this prompt in. It’s definitely one of the most consequential topics we’ve ever covered.
And a big thanks to our producer, Hilbert Flumingtop, for keeping the gears turning behind the scenes.
Also, thanks to Modal for providing the GPU credits that power the generation of this show. We couldn't do these deep dives without that kind of compute.
This has been "My Weird Prompts." If you’re enjoying these deep dives into the future of tech and biology, do us a favor and leave a review on your podcast app. It really does help other curious minds find the show.
You can find all our episodes and more at myweirdprompts dot com.
Catch you in the next one.
See ya.