Why I Migrated from LangChain to LangGraph (and What I Learned)

If you're searching "langchain vs langgraph," you've probably found a dozen comparison posts written by people who haven't shipped either in production. This isn't one of those. I migrated a RAG copilot serving 5,000+ users at Roche Diagnostics from LangChain to LangGraph — a system I proposed replacing, built the proof of concept for, and led the migration on. Here's what actually happened.

The Problem with LangChain in Production

The copilot started as a LangChain AgentExecutor — the standard pattern: tools, memory, a system prompt, and a prayer. For a prototype, it was fine. For a system used daily by thousands of diagnostics engineers, it became painful fast.

The core issue was abstraction without escape hatches. LangChain wraps everything in layers that work great until you need to understand what's happening. When a user reported a bad answer, debugging meant stepping through AgentExecutor internals, tracing implicit routing decisions I never asked for, and reading source code for behaviors that should have been explicit in my own codebase.

The original architecture looked something like this:

# LangChain AgentExecutor — one monolithic entry point
agent = create_openai_functions_agent(llm=chat_model, tools=tools, prompt=prompt)
executor = AgentExecutor.from_agent_and_tools(
    agent=agent, tools=tools, memory=memory
)
result = await executor.ainvoke({"input": user_message})

Tools lived in separate files as BaseTool subclasses — research, troubleshooting, knowledge base search — but the orchestration between them was invisible. LangChain decided when to call which tool, in what order, and how to handle failures. I couldn't add branching logic, parallel execution, or human-in-the-loop without fighting the framework.

Customizing behavior for edge cases meant monkey-patching or reimplementing internals. Adding features took disproportionately long because every change required understanding LangChain's implicit decisions. The codebase was growing, and maintainability was getting worse.

Why LangGraph Changed Everything

The langchain vs langgraph decision comes down to one mental model shift: chains are implicit, graphs are explicit.

LangChain is Rails magic — convention over configuration, things happen behind the scenes, you move fast until you need to understand why something broke. LangGraph is Flask explicitness — you define every step, every edge, every decision point. More code upfront, dramatically less confusion later.

Here's what the migrated architecture looks like:

# LangGraph — every step is a named node, every transition is explicit
workflow = StateGraph(CopilotState)

# Entry: parallel classification
workflow.add_node("classify_intent", classify_intent_node)
workflow.add_node("check_relevance", check_relevance_node)
workflow.add_node("search_products", search_products_node)

# Route based on classification results
workflow.add_conditional_edges(
    "combine_checks",
    route_after_checks,
    {
        "off_topic": "answer_off_topic",
        "profile": "answer_profile_query",
        "query": "route_to_workflow",
    },
)

# Workflow subgraphs — each is its own compiled graph
workflow.add_node("knowledge_base", knowledge_base_subgraph)
workflow.add_node("research", research_subgraph)
workflow.add_node("troubleshooting", troubleshooting_subgraph)

app = workflow.compile(
    checkpointer=MemorySaver(),
    interrupt_before=["parse_product_selection"],
)

A few things to notice. First, six nodes fan out in parallel from the entry point — intent classification, relevance checking, product search, and entity extraction all run concurrently, then converge at a single combine node. In LangChain, expressing this kind of parallel fan-out/fan-in required fighting LCEL's pipe syntax. In LangGraph, it's just edges.

Second, subgraphs. Each workflow (knowledge base, research, troubleshooting) is its own compiled StateGraph plugged into the parent graph as a node. The research subgraph even has an iterative loop — it generates research questions, answers them, and conditionally loops back for follow-up questions until it's satisfied. Try expressing a conditional loop in an LCEL chain.

Third, human-in-the-loop. The interrupt_before parameter pauses the graph at specific nodes, lets the user respond (e.g., confirming which product they meant), then resumes from the checkpoint. This required exactly zero hacks — it's a first-class LangGraph feature.

The state schema itself evolved from a flat TypedDict to a Pydantic BaseModel with custom reducers for merging parallel branch outputs:

class CopilotState(BaseModel):
    model_config = ConfigDict(validate_assignment=True)

    messages: Annotated[list[BaseMessage], add_messages] = Field(default_factory=list)
    retrieved_documents: Annotated[list[dict], merge_search_results] = Field(
        default_factory=list
    )
    # 30+ fields with defaults, descriptions, and validation

That merge_search_results reducer handles deduplication when six parallel branches all write documents to the same state field. In LangChain, you couldn't even express this problem.

How I Pitched and Led the Migration

This wasn't assigned to me — I identified the problem and proposed the solution. The codebase had become a tax on every feature we shipped, and I'd been working with LangGraph on side projects enough to know it could fix the structural issues.

I built a proof of concept on my own time. Not a slide deck — a working POC showing the same copilot flow expressed as a LangGraph StateGraph, with explicit nodes, testable in isolation. I walked stakeholders through a side-by-side: "Here's how we handle intent routing today. Here's how it looks in LangGraph. Which one can you read?"

POCs convince people. Slide decks don't.

After approval, I planned the migration as an incremental rollout, not a big-bang rewrite. The migration had a real dependency order across eight tickets spanning four sprints — state schema first, then node decomposition, then subgraph extraction, then legacy deprecation.

The key pattern was a feature flag that let both flows run in production simultaneously:

is_new_flow = get_user_beta_enabled() or settings.langgraph_flow_enabled

if is_new_flow:
    await langgraph_on_message(message)
else:
    await langchain_on_message(message)

This meant I could roll out to beta users first, validate behavior parity, then gradually shift traffic — all while 5,000+ users continued working without disruption. The old and new flows coexisted for weeks. No downtime, no "maintenance window," no crossed fingers.

One detail worth sharing: when I designed the observability abstraction layer — an adapter pattern decoupling our telemetry from any specific vendor — I hadn't planned to migrate off LiteralAI immediately. But having that abstraction in place meant another developer could swap in Langfuse using the interface I'd designed, without touching a single line of application code. I'll write about the observability side separately — it's its own story.

What Changed After the LangGraph Migration

The numbers: code maintainability improved roughly 60%, measured by time-to-change for new features. Test coverage went from 40% to 80% after introducing TDD and CI/CD quality gates that the new architecture made feasible.

But the qualitative shift mattered more. Debugging went from "what is LangChain doing internally?" to "look at this node's input and output." When a user reported a bad answer, I could trace the exact path through the graph, see which node made the wrong decision, and fix it in isolation.

Adding new capabilities became predictable. When the team needed multi-product comparison queries, I added a subgraph node and a conditional edge. The change was scoped, testable, and didn't risk breaking unrelated flows. Under the old architecture, that kind of feature meant modifying the monolithic executor and hoping nothing else shifted.

Testing individual nodes is trivially simple — mock the LLM call, pass in a state object, assert on the returned state update. No framework internals to stub out, no implicit middleware to account for. The test files mirror the graph structure: one test module per subgraph, one test per node.

And throughout all of this, those 5,000+ users kept using the copilot every day. The migration was invisible to them, which is the best outcome you can ask for.

What I'd Do Differently

Start with LangGraph from day one. The LangChain prototype saved maybe two weeks early on and cost months later. If I were building a new RAG system today, I'd skip straight to an explicit graph architecture. The upfront cost of defining nodes and edges pays for itself the first time you need to debug a production issue.

Build the observability abstraction earlier. The adapter pattern I designed for decoupling from LiteralAI should have existed from sprint one. Vendor lock-in in the LLM tooling space is a trap worth avoiding early.

Write integration tests before migrating, not during. I introduced TDD alongside the migration, which was the right call for the long term but added scope. Having a solid integration test suite for the existing behavior before touching the architecture would have made the migration less stressful.

Consider whether you need a framework at all. In 2026, PydanticAI and the OpenAI Agents SDK are real options. For simpler use cases — single-turn RAG, straightforward tool calling — raw SDK calls with structured outputs might be all you need. LangGraph earns its complexity when you have branching logic, parallel execution, human-in-the-loop, or iterative workflows. If you don't have those, a framework is overhead.

And I'll be honest about LangGraph's downsides: the learning curve is steeper, the documentation is thinner than LangChain's, and the Pydantic state schema can get unwieldy with 30+ fields. It's the right tool for production orchestration, but it's not the right tool for a weekend prototype.

Need Help with Your Migration?

If you're hitting the same LangChain pain points I described — opaque debugging, hard-to-test orchestration, growing maintenance burden — or you're evaluating LangGraph for a production system, I help companies with exactly this kind of migration. I've done it at enterprise scale with real users depending on the system.

Dealing with a messy LangChain codebase or considering LangGraph?

Book a free call