#1607: NVIDIA’s $26 Billion Pivot: From Chips to AI Models

NVIDIA is moving beyond chips to build the "brains" of AI. Explore the $26B shift into models, robotics, and the new Rubin platform.

0:000:00

Episode Details

Published: Mar 27
Duration: 18:35
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
LLM

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

The Evolution of the AI Factory

NVIDIA is undergoing a fundamental transformation. Long regarded as the primary provider of the "picks and shovels" for the AI gold rush, the company is now moving vertically to own the entire stack. This shift represents a move from being a pure hardware manufacturer to becoming a comprehensive AI powerhouse that designs the models, the software, and the silicon in tandem.

The centerpiece of this strategy is the concept of the "AI Factory." Rather than treating AI as software to be installed, NVIDIA is positioning it as a utility. By co-designing the new Vera CPU with the Rubin GPU platform, the company has created a unified memory architecture that eliminates traditional data bottlenecks. This allows for nearly instantaneous processing, with speech AI latency dropping below 25 milliseconds—effectively faster than the human brain processes language.

Specialized Intelligence: Nemotron and Cosmos

The release of the Nemotron-Nano series marks a pivot toward the "inference era." While competitors focus on trillion-parameter models for general conversation, NVIDIA is prioritizing lean, high-speed models optimized for the edge. These models are designed to power autonomous agents and industrial machines that require local, real-time decision-making without relying on the cloud.

Beyond text and speech, the Cosmos series introduces "world foundation models." These are designed for physical AI, allowing robots to understand the laws of physics and interact with their environments. By building the "brain" for the machines they already power, NVIDIA is creating a feedback loop that makes their hardware indispensable for the next generation of robotics and automotive technology.

The $26 Billion Open-Weight Gamble

Perhaps the most disruptive move is NVIDIA’s massive $26 billion investment in open-weight models. By releasing high-quality models that anyone can download and run, NVIDIA is effectively commoditizing intelligence. This strategy serves a dual purpose: it lowers the barrier for enterprises to adopt AI and increases the demand for the high-end hardware required to run these models locally.

This move creates a "co-opetition" dynamic with major customers like OpenAI and Microsoft. As NVIDIA begins to compete in the model space, software labs are increasingly looking to diversify their hardware providers. However, NVIDIA’s deep integration offers a level of performance and security—exemplified by the NemoClaw sandbox environment—that software-only companies struggle to match.

Security and the Agentic Future

As AI moves from simple chatbots to autonomous agents capable of executing code and managing databases, security has become the primary concern for the enterprise. NVIDIA’s response is a hardware-level security approach. By utilizing the Secure Enclave features within the Rubin chips, they can wrap AI agents in a protective layer that ensures "rogue" logic cannot compromise the broader system.

Ultimately, NVIDIA is betting that the future of AI belongs to those who own the entire pipeline. By controlling the path from the silicon gate to the neural network weights, they are building an ecosystem that is faster, more secure, and more cost-effective than a fragmented approach.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

Episode #1607: NVIDIA’s $26 Billion Pivot: From Chips to AI Models

Daniel's Prompt

Custom topic: NVIDIA as an AI model maker - we know them for GPUs, but they also make LLMs and speech models. What are NVIDIA's AI models, why do they see particular importance in offering full stacks (models + har

I was watching the highlights from the latest G T C keynote recently—the one that just wrapped up on March nineteenth—and it occurred to me that we might need to start thinking about NVIDIA as something other than a chip company. It is like watching a master chef who decided that owning the restaurant and the farm was not enough, so now they are designing the actual molecular structure of the ingredients. Today's prompt from Daniel is about this massive shift, specifically how NVIDIA has moved into the world of large language models and speech AI, and why they are suddenly spending twenty-six billion dollars to challenge the very labs they helped build.

It is a massive pivot, Corn. My name is Herman Poppleberry, and I have been obsessed with the technical specs coming out of that conference last week. For years, we have seen NVIDIA—that is en-VID-ee-uh for the uninitiated—as the hardware foundation, the people who sell the picks and shovels for the AI gold rush. But what we saw this month, especially with the unveiling of the Rubin platform and the Vera C P U, is a company that is no longer content just providing the infrastructure. They are building the intelligence that runs on it. Daniel is right to point this out because the landscape of twenty twenty-six looks very different than it did even eighteen months ago. We are seeing a vertical integration that should probably make every software-only AI lab a little bit nervous.

It is funny you say that because I remember when people thought NVIDIA making their own models was just a side project or a way to show off the hardware. Like a car manufacturer building one high-end supercar just to prove the engine works. But the Nemotron series and this new Cosmos video generation stuff they showed off... it feels like they are moving into the neighborhood and building a house that is twice as big as everyone else's. Let us start with the models themselves because I think most people still associate them strictly with the H one hundred or the Blackwell chips. What are they actually shipping in terms of weights and code right now?

The lineup has become surprisingly dense, and Kari Briski, their V P of AI Software, has really been the architect behind this push. The flagship right now is the NEE-mo-tron series. We just saw the release of Nemotron-Nano-twelve-B-version-two. Now, twelve billion parameters might sound small compared to the trillion-parameter monsters we hear about from OpenAI or Anthropic, but that is the point. It is optimized for what Jensen Huang calls the inference era. When you are running a model on the edge or trying to power thousands of autonomous agents, you do not want a massive, slow model. You want something lean that can run with incredibly low latency. And that is where NVIDIA is flexing. They also released the Nemotron Speech Realtime Collection this month. We are talking about transcription latency under twenty-five milliseconds. To put that in perspective, the human brain takes about two hundred milliseconds to even process a word. This is effectively faster than human thought.

Twenty-five milliseconds is basically instantaneous. I can barely decide to disagree with you that fast, Herman. But why does a hardware company care about speech models and things like Canary? I saw that Canary is their new multilingual A S R and translation model. Is this just about making better voice assistants, or is there a deeper play here with their robotics and automotive divisions?

It is about the feedback loop. If you want a robot to function in the real world, it cannot wait for a round trip to a cloud server in another state to understand a command. It needs to process speech, vision, and logic locally and instantly. That is why they are pushing the Cosmos series too. Cosmos is their play for physical AI and video generation. It is not just about making pretty movies like some other models we have seen. It is about world foundation models. It is about the AI understanding the laws of physics, how objects move, and how a robotic arm should interact with a physical environment. They are building the brain for the machines they are already powering with their G P Us.

So, instead of just selling the brain's neurons, they are writing the actual thoughts. It is a full-stack strategy, which is the second part of what Daniel wanted us to explore. You mentioned the Vera C P U and the Rubin platform. Explain the "why" behind this. Why spend all this energy building your own C P Us and your own models when you already own eighty percent of the G P U market? Is it just about squeezing more margin, or is there a technical wall they hit with generic hardware?

There is a massive technical wall called the "idle time" problem. In a traditional data center, you have your G P Us doing the heavy lifting, but they are often waiting on the C P U or the networking to feed them data. It is like having a world-class chef waiting for a slow delivery driver to bring the vegetables. The Vera C P U is an eighty-eight-core A R M-based processor specifically designed to eliminate those bottlenecks. By co-designing the Vera C P U with the Rubin G P U platform, NVIDIA can ensure that the data flows at the maximum possible speed. When you combine that with their own software, like NVIDIA Inference Microservices, or N I Ms—which rhymes with "rims"—you get what they call an AI Factory.

AI Factory. That sounds like a marketing term Jensen came up with while wearing a particularly shiny leather jacket. But I assume there is a real architectural meaning there.

The idea is that an AI model should not be treated like a piece of software you install and run. It should be treated like a utility. N I Ms are essentially pre-packaged containers that include the model, the necessary engines like Tensor R T, and the communication protocols. You just drop them into your infrastructure and they work. Because NVIDIA knows exactly how the Rubin chips handle memory and how the Vera C P U manages threads, they can tune the model to perform way better than if you were trying to run a generic open-source model on a generic cloud server. They are claiming this reduces the cost of inference by a factor of ten because you are not wasting any hardware cycles.

Which brings us to the big news from March eleventh. That twenty-six billion dollar investment in open-weight models. That is a staggering amount of money, even for a company with a market cap like theirs. It feels like a direct shot across the bow of OpenAI and Anthropic. If I am Sam Altman and I see my primary chip supplier suddenly dumping twenty-six billion dollars into models that anyone can download and run on their own hardware, I am starting to look for new friends.

And that is exactly what is happening. We are seeing this "co-opetition" dynamic. OpenAI recently started using Cerebras hardware for some of their lighter, specialized models. It is a clear signal that they want to diversify away from NVIDIA's total dominance. But NVIDIA's counter-move is brilliant. By investing in open-weight models, they are making it cheaper and easier for every other company on earth to build AI without needing a massive contract with a closed-lab provider. They are essentially saying, "Why pay a subscription to OpenAI when you can run a world-class Nemotron model on our hardware for a fraction of the cost?" They are commoditizing the intelligence to sell more of the infrastructure.

It is the classic platform play. Make the complement to your product as cheap as possible. If the models are free or open, the value of the hardware that runs them goes up. But I want to push back on the quality for a second. We know G P T-five-point-three is the king of consumer mindshare right now, and Claude four is the darling of the enterprise for its reasoning. Does a twelve-billion-parameter Nemotron model actually hold a candle to the giants, or is this just NVIDIA trying to be a "jack of all trades" and master of none?

If you are asking if it can write a screenplay as well as Claude four, maybe not. But if you are asking which model is better at being the "engine" for an autonomous agent, the answer changes. This is where the NemoClaw platform comes in, which they launched on March seventeenth. NemoClaw provides the security and the sandbox environment for agentic AI. These are systems that do not just chat; they act. They browse the web, they execute code, they manage databases. For that, you need high-speed reasoning and rock-solid reliability, not necessarily the ability to write poetry. NVIDIA is positioning their models as the industrial-grade choice. They are the workhorses.

I like that distinction. OpenAI is building a digital companion; NVIDIA is building a digital industrial worker. It is the difference between a high-end smartphone and a programmable logic controller on a factory floor. One is for fun and general productivity; the other is for keeping the lights on and the assembly line moving. But tell me more about this NemoClaw thing. The name sounds like something out of a spy movie. Is it just a security layer, or is it something more integrated?

It is a response to the biggest fear enterprises have about agentic AI, which is the "rogue agent" problem. If you give an AI model the ability to move money or change code, you need a way to wrap that agent in a protective layer. NemoClaw is a combination of hardware-level security and software sandboxing. It uses the Secure Enclave features in the new Rubin chips to ensure that even if the model's logic is compromised, it cannot access the rest of the system. It is a level of vertical integration that a software lab just cannot offer. They do not own the silicon, so they cannot guarantee security at the gate level. NVIDIA can.

So, they are effectively saying to big banks and healthcare companies, "Run your agents on our stack, because we are the only ones who can promise the model won't decide to start its own hedge fund with your data." It is a compelling pitch. But I am curious about the Vera C P U. We have seen A R M-based chips before, like Apple's M-series or Amazon's Graviton. What makes an eighty-eight-core chip from NVIDIA so special in this specific context?

The magic is in the memory architecture. In most systems, the C P U and G P U have separate pools of memory, and moving data between them is like trying to move a crowd through a single narrow door. Vera uses a unified memory architecture with the Rubin G P Us. They share the same high-bandwidth memory space. When a Nemotron model is processing a massive stream of real-time video, the C P U can handle the pre-processing and the G P U can handle the neural network inference without ever having to "copy" the data. It just sits there, and both chips can see it. That is how you get those sub-twenty-five-millisecond speech latencies. It is not just a faster chip; it is a shorter path.

It is the death of the bottleneck as we know it. I can see why Jensen is so confident. But let us talk about the risks here. If NVIDIA becomes the model provider, the hardware provider, and the software provider, they are basically the entire ecosystem. We have seen what happens when one company gets that much power. Does this move into models make them a competitor to their own best customers? If I am Microsoft or Google, and I am buying billions of dollars of chips from you, and then you release a model that competes with my own proprietary A I, things are going to get awkward at the company Christmas party.

It is already awkward. That is why we are seeing Google and Amazon racing to build their own AI chips, like the T P U and Trainium. The industry is in this weird state of total inter-dependence. Everyone is trying to build what their partners have. NVIDIA is building models because they want to control the end-user experience and ensure their hardware is utilized perfectly. Their customers are building chips because they do not want to be beholden to NVIDIA's pricing and supply chain. It is a high-stakes game of musical chairs, but NVIDIA currently owns most of the chairs.

And they are charging a lot for people to sit in them. I think the most interesting part of Daniel's prompt is the shift in how we define "AI Factory." We used to think of it as just a big room full of G P Us. Now, it is a co-designed organism. If you look at the Cosmos model for video generation, it is not just about pixels. It is about simulation. We talked about this in episode twelve twenty-four when we looked at C U D A dominance, but this is C U D A on steroids. It is moving from a programming language to a world-operating system.

That is a great way to put it. A world-operating system. When you look at the Cosmos release from this month, it is clear they are targeting the robotics sector. They want to be the engine for every autonomous vehicle and every humanoid robot. If you can simulate a million hours of robotic training in a virtual world that perfectly obeys the laws of physics, and then deploy that brain onto a chip that is co-designed for that specific model, you have a lead that is almost impossible to overcome. A software-only company cannot simulate the physics as efficiently because they do not control the math libraries at the hardware level.

It makes me wonder if the era of the general purpose model is ending. We have been obsessed with "one model to rule them all," but NVIDIA seems to be betting on a thousand specialized models, all tuned for specific industrial tasks. The Nemotron-Nano for edge devices, Canary for multilingual translation, Cosmos for physical reasoning. It is a much more fragmented, but perhaps more practical, vision of the future.

It is definitely more practical for the bottom line. Most businesses do not need an AI that can write a sonnet. They need an AI that can look at a piece of fruit on a conveyor belt and tell if it is bruised, or an AI that can listen to a customer service call and instantly translate it with zero lag. NVIDIA is building the tools for the "boring" AI that actually runs the economy. And by making those models open-weight, they are ensuring that their architecture becomes the global standard. It is the same strategy they used with C U D A. Make the software indispensable, and the hardware becomes a mandatory purchase.

It is a brilliant, if somewhat ruthless, strategy. I suppose we should look at what this means for the average developer or the small business owner. If you are starting an AI company today, are you better off building on top of a closed API like OpenAI, or should you be looking at this NVIDIA full-stack approach?

It depends on your scale. If you are building a simple wrapper app, the closed APIs are fine. But if you are building anything that requires real-time interaction, edge deployment, or heavy data processing, you have to look at the N I M approach. The ability to run a model like Nemotron-Nano locally on a Vera-based server gives you a level of control over your latency and your data privacy that you just cannot get from a cloud provider. Plus, with the twenty-six billion dollar investment in open weights, the quality of these "free" models is going to skyrocket. We are approaching a point where the base model is a commodity, and the value is all in the integration.

So, the takeaway for Daniel and our listeners is that NVIDIA is no longer just the engine room of the AI ship. They are the captain, the navigator, and they are starting to build the actual cargo. The transition to a full-stack architect is complete. It is a bold move, and as much as I like to tease you about your Poppleberry-level enthusiasm for eighty-eight-core C P Us, the technical reality is that they are creating a moat that is getting wider every day.

It is a moat made of silicon and optimized weights. And with the Rubin platform coming online, that moat is about to get a lot deeper. What I find most impressive is the speed. They are moving from an idea to a full-stack release in months, while traditional hardware cycles used to take years. They are operating at the speed of software but with the weight of hardware.

Well, I for one am looking forward to our future robotic overlords being powered by twenty-five-millisecond speech models. At least when they tell me to get to work, they will do it with very low latency. It has been a fascinating deep dive into the AI Factory world. We should probably wrap it up before you start reciting the cache hierarchy of the Vera C P U.

I could do that. It is actually quite interesting how they managed the L-three cache to support the transformer blocks...

No, Herman. We are ending it there. Save the cache talk for your diary. Thanks as always to our producer Hilbert Flumingtop for keeping the gears turning behind the scenes. And a huge thank you to Modal for sponsoring the show. They provide the serverless G P U credits that allow us to run our own experiments and power the generation of this very podcast. If you are a developer looking for a way to run your own N I Ms or experiment with the latest Nemotron weights without the headache of managing infrastructure, Modal is the place to do it.

This has been My Weird Prompts. If you are enjoying these deep dives into the changing landscape of AI, please consider leaving us a review on your favorite podcast app. It really does help other people find the show and keeps us motivated to keep digging into Daniel's excellent prompts.

Find us at myweirdprompts dot com for the full archive and all the ways to subscribe. We will be back soon with another exploration of the weird and wonderful world of AI. Until then, keep an eye on those leather jackets. You never know what Jensen is going to announce next.

Goodbye, everyone.

See ya.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.