Engineering

Build-in-public dashboard. Real architecture decisions, real progress, real trade-offs. No post-hoc polish — just what we're actually building.

Build Progress

Where each system stands right now. Numbers are rough estimates, not KPIs.

Last updated: April 2026

Infrastructure Active
Neo4j + Qdrant + MongoDB containerized, multi-tenant namespacing live 85%
Core Pipeline Active
LangGraph orchestration, AST parsing (Python + JS/TS), graph ingestion 80%
VS Code Plugin Active
SSE streaming, chat interface, and voice input working end-to-end 95%
API Gateway In Progress
FastAPI multi-domain routing, auth middleware, rate limiting in progress 75%
Payment & Billing Planned
Stripe integration + waitlist-to-paid conversion flow, not yet started 10%
Deployment In Progress
Docker Compose staging environment up, production infra not yet finalized 40%

Architecture Decision Records

The choices that define how Cerebro is built. Context, decision, trade-offs — documented as they happened.

ADR-001 Accepted Jan 2026

Neo4j as primary knowledge graph store

Context

We need to store a code knowledge graph: functions, classes, modules, and their relationships (calls, imports, inherits, implements). The query pattern is traversal-heavy — "find all callers of this function up to depth 3" — not aggregation-heavy.

Decision

Use Neo4j (Community Edition, self-hosted) as the primary store for the code knowledge graph. Each codebase gets a logical namespace via a repo_id property on every node. Cypher is the query language.

Why

  • Native graph traversal is orders of magnitude faster than SQL JOINs at depth >2
  • Cypher is expressive for code relationship queries ("find all functions this module transitively depends on")
  • APOC library gives us path algorithms (shortest path, all paths) out of the box
  • Community Edition is free, avoiding vendor lock-in for early product phase

Trade-offs

Benefits
  • Traversal queries are natural and fast
  • Schema-flexible — easy to add new node/relationship types
  • Graph visualization tools available
Costs
  • Community Edition has no cluster support — single node only
  • Cypher learning curve for the team
  • Memory-heavy at large repo scale (not yet validated)

Alternatives considered

  • PostgreSQL with pg_graph: Familiar but graph traversal via recursive CTEs is slow and verbose at depth >3
  • Amazon Neptune: Managed but expensive and vendor lock-in at pre-revenue stage
  • DGraph: Open source but smaller community and less mature Cypher-compatible tooling
ADR-002 Accepted Jan 2026

BYOK — users bring their own LLM API key

Context

Cerebro needs to call LLM APIs (OpenAI, Anthropic, etc.) to generate code and answer questions. The question is whether we act as a proxy (calling the API with our own keys and billing users for token usage) or require users to supply their own API keys (BYOK).

Decision

Bring Your Own Key (BYOK). Users configure their own API keys in the VS Code plugin settings. Cerebro never stores keys server-side — they are used exclusively within the user's local plugin context and never transmitted to our backend.

Why

  • Eliminates LLM cost as a COGS line item — our pricing covers context infrastructure, not token pass-through
  • Users already have API keys for Copilot, Cursor, etc. — no friction
  • No regulatory risk around storing API keys or proxying model calls on behalf of users
  • Users get the exact model they want (GPT-4o, Claude Sonnet, etc.) without us managing model upgrades

Trade-offs

Benefits
  • Gross margins are high — no token costs
  • No vendor dependency on a single LLM provider
  • Simpler compliance posture (we handle code context, not model calls)
Costs
  • Higher setup friction — users must have an API key
  • We can't optimize prompt costs on behalf of users
  • Harder to offer a free tier with limited model calls

Alternatives considered

  • Proxy model (our keys, metered billing): Higher operational complexity, token cost management, and PCI/data compliance overhead at pre-revenue stage
ADR-003 Accepted Feb 2026

LangGraph for deterministic AI pipeline orchestration

Context

Cerebro's core intelligence is a multi-step pipeline: parse query → retrieve from Neo4j → retrieve from Qdrant → merge context → call LLM → stream response. This pipeline needs to be deterministic, debuggable, and interruptible (for streaming). Simple sequential code gets messy when steps have conditional branches, retries, and partial state.

Decision

Use LangGraph (from LangChain) to model the pipeline as an explicit state machine. Each step is a node; edges define transitions. State is typed (TypedDict). The graph is compiled once at startup and re-used per request. Streaming is handled via LangGraph's built-in async generator support.

Why

  • Explicit state makes debugging straightforward — you can see exactly what state entered each node
  • Conditional edges model "if vector search returns nothing, expand to graph traversal" cleanly
  • LangGraph's checkpointing allows resumable pipelines for long-running ingestion jobs
  • Async streaming is first-class, not a bolt-on

Trade-offs

Benefits
  • Pipeline is a first-class artifact — readable, testable, visualizable
  • State typing catches bugs at definition time
  • Handles retries and fallbacks naturally
Costs
  • LangGraph is a relatively young library — API surface changes between versions
  • Adds a LangChain ecosystem dependency (which is large)
  • Overkill for simple linear pipelines

Alternatives considered

  • Raw async Python: Simpler for linear flows but gets complicated fast with branching and retries
  • Prefect / Airflow: Better for batch data pipelines, not designed for low-latency per-request inference
  • Haystack pipelines: Too opinionated about the RAG pattern, harder to integrate with our custom Neo4j traversal step
ADR-004 Accepted Feb 2026

Logical namespace isolation for multi-tenancy (not physical DB per user)

Context

Cerebro is a multi-tenant product — each user's codebase must be isolated from other users'. The question is whether to use physical isolation (separate database instances per user) or logical isolation (a shared instance with namespace/tenant-ID filtering on all queries).

Decision

Logical namespace isolation. Every Neo4j node and Qdrant vector has a user_id property. Every query is scoped by this property. In Qdrant, this is a filter condition on every collection search. In Neo4j, the repo_id property gates every Cypher traversal. A single shared Neo4j instance and a single Qdrant collection serve all users in the current phase.

Why

  • Physical isolation (DB per user) is operationally expensive — provisioning, monitoring, backups multiply with user count
  • At early user counts (<1000), logical isolation is secure and dramatically simpler
  • Migration path to physical isolation exists if a high-value enterprise customer requires it

Trade-offs

Benefits
  • Single infrastructure footprint — lower cost, simpler ops
  • Centralized monitoring and backup
  • Faster to ship — no per-user provisioning logic
Costs
  • Noisy neighbor risk at scale (one heavy user affects others)
  • Query logic must always include tenant filter — missing it is a security bug
  • May not satisfy enterprise compliance requirements without physical isolation option

Alternatives considered

  • Physical DB per user: Secure by default but operationally unscalable before we have dedicated infra team
  • Schema-level isolation in PostgreSQL: Viable for relational data, not applicable to Neo4j or Qdrant
ADR-005 Accepted Mar 2026

Clean Architecture + DDD with 4 bounded contexts

Context

The backend spans multiple distinct concerns: parsing source code, managing the knowledge graph, serving the AI pipeline, and handling user accounts. Without explicit architecture boundaries, these tend to collapse into a monolithic service where Neo4j queries appear inside auth handlers and vice versa.

Decision

Apply Clean Architecture (ports and adapters) with Domain-Driven Design bounded contexts. The four contexts are: Ingestion (parse + ingest code into the graph), Intelligence (the LangGraph query pipeline), Identity (auth and user management), and Billing (plans and subscriptions). Each context has its own domain model and communicates with others through defined interfaces — never direct imports across boundaries.

Why

  • Ingestion and Intelligence can evolve independently — swapping the AST parser doesn't affect the query pipeline
  • Testability: each context can be tested with mocked ports, no need to spin up Neo4j for auth tests
  • Billing is isolated from product logic — easier to swap Stripe for another provider later

Trade-offs

Benefits
  • Clear ownership boundaries — no "who owns this code?" ambiguity
  • Changes in one context don't ripple unexpectedly into others
  • Each context is independently deployable as a microservice later if needed
Costs
  • More upfront structure than a simple Flask app would need
  • Cross-context calls require going through defined interfaces, which adds ceremony
  • Higher cognitive overhead for new contributors

Alternatives considered

  • Single FastAPI app, no layers: Faster to start but we've seen this become unmaintainable once the LangGraph pipeline grew complex
  • Full microservices from day one: Too much operational overhead (service discovery, inter-service auth) before product-market fit

Tech Stack

Every tool chosen for a reason. Every trade-off documented in the ADRs above.

Neo4j Neo4j
Knowledge Graph Store

Primary store for the code knowledge graph. Functions, classes, modules and their relationships (calls, imports, inherits) as nodes and edges. Cypher queries for graph traversal.

Qdrant Qdrant
Vector Search

Semantic code search via embedding vectors. Each function and class is embedded and stored. Query time: retrieve the top-K most semantically similar code entities before graph traversal.

LangGraph LangGraph
Pipeline Orchestration

Models the query pipeline as an explicit state machine: parse → retrieve → merge context → stream. Typed state, conditional edges, async streaming, and checkpointing for long ingestion jobs.

FastAPI FastAPI
API Gateway

Async Python API gateway with automatic OpenAPI docs, Pydantic validation, and Server-Sent Events (SSE) for streaming responses to the VS Code plugin. Multi-domain routing per bounded context.

MongoDB MongoDB
Document Store

Stores unstructured metadata about code entities — raw file contents, AST snapshots, ingestion job history, and per-user repository configurations. Flexible schema as the data model evolves.

Redis Redis
Cache & Queue

Query result caching to reduce Neo4j and Qdrant load on repeated lookups. Also used as the task queue for background ingestion jobs (via Redis Streams), avoiding a separate message broker.

Supabase
Auth & Billing

Managed PostgreSQL + Auth platform. Handles GitHub OAuth, user sessions, and subscription state. Also powers the engineering page comment system with Row Level Security.

Python LibCST + Babel
AST Parsers

LibCST for Python (preserves whitespace and comments for lossless round-trips), Babel for JS/TS (handles JSX and decorators). Output is a normalized entity model ingested into Neo4j and Qdrant.

Weekly Devlog

Unfiltered build notes. What shipped, what broke, what changed.

VS Code plugin: SSE streaming + voice input live

  • Shipped SSE-based streaming for chat responses — tokens now stream in real time inside the VS Code sidebar panel.
  • Added voice input via the Web Speech API. Users can describe what they want to build and Cerebro queries the knowledge graph hands-free.
  • Moved plugin to 95% build completion. Main remaining work: polish the auth flow and reconnect logic on SSE drop.
  • Decided to defer graph visualization UI to post-launch — it would add 3–4 weeks with unclear user value at this stage.

API Gateway: multi-domain routing + tenant namespacing

  • Finished the FastAPI multi-domain routing layer. Ingestion, Intelligence, and Identity bounded contexts each have their own router with prefix isolation.
  • Implemented tenant namespace injection middleware — every incoming request gets a user_id header validated against the session token and injected into all downstream Neo4j and Qdrant queries.
  • Hit a latency issue: cold-start for the LangGraph pipeline was adding ~800ms on first request per session. Fixed by keeping a warm graph instance per worker process.
  • Rate limiting is partially implemented (token bucket per user) but needs tuning for the ingestion endpoint, which should have much lower limits than the query endpoint.

Core pipeline: LangGraph integration + first end-to-end query

  • First end-to-end query through the full stack: VS Code plugin → API → LangGraph pipeline → Neo4j traversal + Qdrant vector search → LLM context → streamed response. Took 3 days of plumbing.
  • LangGraph state machine replaced our original sequential async code. Immediate win: the "fallback to graph traversal when vector search returns nothing" conditional edge that was messy in raw Python became a clean 5-line edge definition.
  • Discovered that LibCST's Python AST output needed normalization before ingest — function signatures with default arguments produce inconsistent node representations. Added a normalization pass.
  • Clean Architecture boundaries are enforcing themselves: when I tried to call a Neo4j adapter directly from the Identity context, the import failed by design. That's the point.

Work with us

We're open to collaboration with developers, researchers, and teams who share the same problems. Whether it's contributing to the project, exploring integration opportunities, or future hiring — we want to hear from you.

Open collaboration

Language parser plugins, embedding experiments, graph query optimization. If you've worked on similar problems, let's talk.

Future roles

We're not hiring yet, but when we do, this is where we'll post it first. Backend, infra, DevEx — people who build tools for other developers.

Comments

Questions, ideas, corrections — logged-in GitHub users can comment.

Building something similar? Following along?

Read the engineering blog → · Get early access