AI

Artificial intelligence, machine learning, and everything LLM

873 episodes Page 13 of 44

#2178: How to Actually Evaluate AI Agents

Frontier models score 80% on one agent benchmark and 45% on another. The difference isn't the model—it's contamination, scaffolding, and how the te...

ai-agentsbenchmarksai-safety

agents-automation

Apr 12

#2177: Skip Fine-Tuning: Shape LLMs With Alignment Alone

Can you build a personalized LLM by skipping traditional fine-tuning and using only post-training alignment methods like DPO and GRPO? We break dow...

fine-tuningai-alignmentgpu-acceleration

inference-training

Apr 12

#2175: Let Your AI Argue With Itself

What happens when you let multiple AI personas debate each other instead of asking one model one question? A deep dive into synthetic perspective e...

prompt-engineeringreasoning-modelsai-reasoning

agents-automation

Apr 12

#2174: Role-Playing as Orchestration

How a role-playing protocol from NeurIPS 2023 became one of AI's most underrated agent frameworks—and what happens when you scale it to a million a...

ai-agentsprompt-engineeringai-orchestration

agents-automation

Apr 12

#2173: Inside MiroFish's Agent Simulation Architecture

MiroFish generates thousands of AI agents with distinct personalities to predict social dynamics. But research reveals a critical flaw: LLM agents ...

ai-agentsknowledge-graphsai-reasoning

agents-automation

Apr 12

#2172: Council of Models: How Karpathy Built AI Peer Review

Andrej Karpathy's llm-council uses anonymized peer review to make language models evaluate each other fairly—but can it really suppress model bias?

large-language-modelsai-reasoningai-alignment

model-architecture

Apr 12

#2171: How IQT Labs Built a Wargaming LLM (Then Archived It)

A deep code review of Snowglobe, IQT Labs' open-source LLM wargaming system that ran real national security simulations before being archived. What...

ai-agentslarge-language-modelsmilitary-strategy

agents-automation

Apr 12

#2170: Pricing Agentic AI When Nothing's Predictable

How do you charge fixed prices for systems that operate in fundamental uncertainty? Consultants are discovering frameworks that work—but they requi...

ai-agentsai-safetyprompt-engineering

agents-automation

Apr 12

#2169: How Enterprises Are Rethinking Agent Frameworks

Twelve major agentic AI frameworks exist—yet many serious developers avoid them entirely. What patterns emerge in real enterprise adoption?

ai-agentsai-safetysoftware-development

agents-automation

Apr 12

#2168: What Serious Agentic AI Developers Actually Need to Know

Python, TypeScript, LangGraph, and the frameworks reshaping how agents work. A technical map of the skills and concepts that separate prototypes fr...

ai-agentsai-orchestrationsoftware-development

agents-automation

Apr 12

#2167: Sync vs. Async: Architecting Agents for Scale

Why most enterprise AI agents fail in production has less to do with models and more to do with whether they're built synchronously or asynchronously.

ai-agentsmodel-context-protocoldistributed-systems

agents-automation

Apr 12

#2166: Code vs. Canvas: How Developers Pick Their Tools

LangGraph or Flowise? The honest answer isn't obvious. Developers gain speed and integrations with visual builders—but lose version control, testin...

ai-agentssoftware-developmentapi-integration

agents-automation

Apr 12

#2165: Strip Your Agent to Bash

The frameworks matter less than you think. What separates a working agent from a failing one is the harness—the orchestration, memory, and tool des...

ai-agentsai-orchestrationprompt-engineering

agents-automation

Apr 12

#2164: Why Bigger Context Windows Don't Fix Attention

Frontier models have million-token context windows, but attention degrades well before you hit the limit. New research reveals why bigger isn't bet...

context-windowai-reasoningai-memory

model-architecture

Apr 12

#2163: Designing Autonomy Boundaries for AI Agents

Production data reveals a surprising truth: fully autonomous AI agents waste 98% of their context window on tool descriptions. Here's why the indus...

ai-agentsai-orchestrationinference-parameters

agents-automation

Apr 12

#2162: When Knowledge Work Stops Being Safe

The knowledge economy promised safety from automation. Then AI arrived. Here's how we got here—and why the disruption this time is different.

ai-safetyworkforce-automationfuture-of-work

business-enterprise

Apr 12

#2160: Claude's Latency Profile and SLA Guarantees

Claude is measurably slower than competitors—and Anthropic's SLA promises are even thinner than the latency numbers suggest. What enterprises actua...

latencyai-inferenceanthropic

inference-training