Engineering — Architecture Decisions, Stack, Weekly Progress

Product Status

Build Progress

Where each system stands right now. Numbers are rough estimates, not KPIs.

Last updated: April 2026

Neo4j + Qdrant + MongoDB containerized, multi-tenant namespacing live 85%

LangGraph orchestration, AST parsing (Python + JS/TS), graph ingestion 80%

SSE streaming, chat interface, and voice input working end-to-end 95%

FastAPI multi-domain routing, auth middleware, rate limiting in progress 75%

Stripe integration + waitlist-to-paid conversion flow, not yet started 10%

Docker Compose staging environment up, production infra not yet finalized 40%

Architecture Decisions

Architecture Decision Records

The choices that define how Cerebro is built. Context, decision, trade-offs — documented as they happened.

ADR-001 Accepted Jan 2026

Neo4j as primary knowledge graph store

Context

We need to store a code knowledge graph: functions, classes, modules, and their relationships (calls, imports, inherits, implements). The query pattern is traversal-heavy — "find all callers of this function up to depth 3" — not aggregation-heavy.

Decision

Use Neo4j (Community Edition, self-hosted) as the primary store for the code knowledge graph. Each codebase gets a logical namespace via a repo_id property on every node. Cypher is the query language.

Why

Native graph traversal is orders of magnitude faster than SQL JOINs at depth >2
Cypher is expressive for code relationship queries ("find all functions this module transitively depends on")
APOC library gives us path algorithms (shortest path, all paths) out of the box
Community Edition is free, avoiding vendor lock-in for early product phase

Trade-offs

Benefits

Traversal queries are natural and fast
Schema-flexible — easy to add new node/relationship types
Graph visualization tools available

Costs

Community Edition has no cluster support — single node only
Cypher learning curve for the team
Memory-heavy at large repo scale (not yet validated)

Alternatives considered

PostgreSQL with pg_graph: Familiar but graph traversal via recursive CTEs is slow and verbose at depth >3
Amazon Neptune: Managed but expensive and vendor lock-in at pre-revenue stage
DGraph: Open source but smaller community and less mature Cypher-compatible tooling

ADR-002 Accepted Jan 2026

BYOK — users bring their own LLM API key

Context

Cerebro needs to call LLM APIs (OpenAI, Anthropic, etc.) to generate code and answer questions. The question is whether we act as a proxy (calling the API with our own keys and billing users for token usage) or require users to supply their own API keys (BYOK).

Decision

Bring Your Own Key (BYOK). Users configure their own API keys in the VS Code plugin settings. Cerebro never stores keys server-side — they are used exclusively within the user's local plugin context and never transmitted to our backend.

Why

Eliminates LLM cost as a COGS line item — our pricing covers context infrastructure, not token pass-through
Users already have API keys for Copilot, Cursor, etc. — no friction
No regulatory risk around storing API keys or proxying model calls on behalf of users
Users get the exact model they want (GPT-4o, Claude Sonnet, etc.) without us managing model upgrades

Trade-offs

Benefits

Gross margins are high — no token costs
No vendor dependency on a single LLM provider
Simpler compliance posture (we handle code context, not model calls)

Costs

Higher setup friction — users must have an API key
We can't optimize prompt costs on behalf of users
Harder to offer a free tier with limited model calls

Alternatives considered

Proxy model (our keys, metered billing): Higher operational complexity, token cost management, and PCI/data compliance overhead at pre-revenue stage

ADR-003 Accepted Feb 2026

LangGraph for deterministic AI pipeline orchestration

Context

Cerebro's core intelligence is a multi-step pipeline: parse query → retrieve from Neo4j → retrieve from Qdrant → merge context → call LLM → stream response. This pipeline needs to be deterministic, debuggable, and interruptible (for streaming). Simple sequential code gets messy when steps have conditional branches, retries, and partial state.

Decision

Use LangGraph (from LangChain) to model the pipeline as an explicit state machine. Each step is a node; edges define transitions. State is typed (TypedDict). The graph is compiled once at startup and re-used per request. Streaming is handled via LangGraph's built-in async generator support.

Why

Explicit state makes debugging straightforward — you can see exactly what state entered each node
Conditional edges model "if vector search returns nothing, expand to graph traversal" cleanly
LangGraph's checkpointing allows resumable pipelines for long-running ingestion jobs
Async streaming is first-class, not a bolt-on

Trade-offs

Benefits

Pipeline is a first-class artifact — readable, testable, visualizable
State typing catches bugs at definition time
Handles retries and fallbacks naturally

Costs

LangGraph is a relatively young library — API surface changes between versions
Adds a LangChain ecosystem dependency (which is large)
Overkill for simple linear pipelines

Alternatives considered

Raw async Python: Simpler for linear flows but gets complicated fast with branching and retries
Prefect / Airflow: Better for batch data pipelines, not designed for low-latency per-request inference
Haystack pipelines: Too opinionated about the RAG pattern, harder to integrate with our custom Neo4j traversal step

ADR-004 Accepted Feb 2026

Logical namespace isolation for multi-tenancy (not physical DB per user)

Context

Cerebro is a multi-tenant product — each user's codebase must be isolated from other users'. The question is whether to use physical isolation (separate database instances per user) or logical isolation (a shared instance with namespace/tenant-ID filtering on all queries).

Decision

Logical namespace isolation. Every Neo4j node and Qdrant vector has a user_id property. Every query is scoped by this property. In Qdrant, this is a filter condition on every collection search. In Neo4j, the repo_id property gates every Cypher traversal. A single shared Neo4j instance and a single Qdrant collection serve all users in the current phase.

Why

Physical isolation (DB per user) is operationally expensive — provisioning, monitoring, backups multiply with user count
At early user counts (<1000), logical isolation is secure and dramatically simpler
Migration path to physical isolation exists if a high-value enterprise customer requires it

Trade-offs

Benefits

Single infrastructure footprint — lower cost, simpler ops
Centralized monitoring and backup
Faster to ship — no per-user provisioning logic

Costs

Noisy neighbor risk at scale (one heavy user affects others)
Query logic must always include tenant filter — missing it is a security bug
May not satisfy enterprise compliance requirements without physical isolation option

Alternatives considered

Physical DB per user: Secure by default but operationally unscalable before we have dedicated infra team
Schema-level isolation in PostgreSQL: Viable for relational data, not applicable to Neo4j or Qdrant

ADR-005 Accepted Mar 2026

Clean Architecture + DDD with 4 bounded contexts

Context

The backend spans multiple distinct concerns: parsing source code, managing the knowledge graph, serving the AI pipeline, and handling user accounts. Without explicit architecture boundaries, these tend to collapse into a monolithic service where Neo4j queries appear inside auth handlers and vice versa.

Decision

Apply Clean Architecture (ports and adapters) with Domain-Driven Design bounded contexts. The four contexts are: Ingestion (parse + ingest code into the graph), Intelligence (the LangGraph query pipeline), Identity (auth and user management), and Billing (plans and subscriptions). Each context has its own domain model and communicates with others through defined interfaces — never direct imports across boundaries.

Why

Ingestion and Intelligence can evolve independently — swapping the AST parser doesn't affect the query pipeline
Testability: each context can be tested with mocked ports, no need to spin up Neo4j for auth tests
Billing is isolated from product logic — easier to swap Stripe for another provider later

Trade-offs

Benefits

Clear ownership boundaries — no "who owns this code?" ambiguity
Changes in one context don't ripple unexpectedly into others
Each context is independently deployable as a microservice later if needed

Costs

More upfront structure than a simple Flask app would need
Cross-context calls require going through defined interfaces, which adds ceremony
Higher cognitive overhead for new contributors

Alternatives considered

Single FastAPI app, no layers: Faster to start but we've seen this become unmaintainable once the LangGraph pipeline grew complex
Full microservices from day one: Too much operational overhead (service discovery, inter-service auth) before product-market fit

Roadmap

✓
AST parsing (Python, JS, TS) LibCST + Babel
✓
Knowledge graph (Neo4j) Entities, relationships, traversal
✓
Semantic search (Qdrant) Code embeddings + vector search
✓
Orchestration pipeline LangGraph state machines
✓
VS Code plugin SSE streaming + chat + voice
→
API Gateway FastAPI + auth + rate limiting
→
Payment & Billing Stripe + waitlist-to-paid flow
○
Web dashboard Graph visualization (deferred)
○
Rust & Go support Language parser plugins
○
CI/CD integration Auto-sync on push
○
Mobile app Voice & chat interface for Cerebro

Infrastructure

Tech Stack

Every tool chosen for a reason. Every trade-off documented in the ADRs above.

Neo4j

Knowledge Graph Store

Primary store for the code knowledge graph. Functions, classes, modules and their relationships (calls, imports, inherits) as nodes and edges. Cypher queries for graph traversal.

Qdrant

Vector Search

Semantic code search via embedding vectors. Each function and class is embedded and stored. Query time: retrieve the top-K most semantically similar code entities before graph traversal.

LangGraph

Pipeline Orchestration

Models the query pipeline as an explicit state machine: parse → retrieve → merge context → stream. Typed state, conditional edges, async streaming, and checkpointing for long ingestion jobs.

FastAPI

API Gateway

Async Python API gateway with automatic OpenAPI docs, Pydantic validation, and Server-Sent Events (SSE) for streaming responses to the VS Code plugin. Multi-domain routing per bounded context.

MongoDB

Document Store

Stores unstructured metadata about code entities — raw file contents, AST snapshots, ingestion job history, and per-user repository configurations. Flexible schema as the data model evolves.

Redis

Cache & Queue

Query result caching to reduce Neo4j and Qdrant load on repeated lookups. Also used as the task queue for background ingestion jobs (via Redis Streams), avoiding a separate message broker.

Supabase

Auth & Billing

Managed PostgreSQL + Auth platform. Handles GitHub OAuth, user sessions, and subscription state. Also powers the engineering page comment system with Row Level Security.

LibCST + Babel

AST Parsers

LibCST for Python (preserves whitespace and comments for lossless round-trips), Babel for JS/TS (handles JSX and decorators). Output is a normalized entity model ingested into Neo4j and Qdrant.

Build Log

Weekly Devlog

Unfiltered build notes. What shipped, what broke, what changed.

Week 12 — March 24–28, 2026

VS Code plugin: SSE streaming + voice input live

Shipped SSE-based streaming for chat responses — tokens now stream in real time inside the VS Code sidebar panel.
Added voice input via the Web Speech API. Users can describe what they want to build and Cerebro queries the knowledge graph hands-free.
Moved plugin to 95% build completion. Main remaining work: polish the auth flow and reconnect logic on SSE drop.
Decided to defer graph visualization UI to post-launch — it would add 3–4 weeks with unclear user value at this stage.

Week 10 — March 10–14, 2026

API Gateway: multi-domain routing + tenant namespacing

Finished the FastAPI multi-domain routing layer. Ingestion, Intelligence, and Identity bounded contexts each have their own router with prefix isolation.
Implemented tenant namespace injection middleware — every incoming request gets a user_id header validated against the session token and injected into all downstream Neo4j and Qdrant queries.
Hit a latency issue: cold-start for the LangGraph pipeline was adding ~800ms on first request per session. Fixed by keeping a warm graph instance per worker process.
Rate limiting is partially implemented (token bucket per user) but needs tuning for the ingestion endpoint, which should have much lower limits than the query endpoint.

Week 8 — February 24–28, 2026

Core pipeline: LangGraph integration + first end-to-end query

First end-to-end query through the full stack: VS Code plugin → API → LangGraph pipeline → Neo4j traversal + Qdrant vector search → LLM context → streamed response. Took 3 days of plumbing.
LangGraph state machine replaced our original sequential async code. Immediate win: the "fallback to graph traversal when vector search returns nothing" conditional edge that was messy in raw Python became a clean 5-line edge definition.
Discovered that LibCST's Python AST output needed normalization before ingest — function signatures with default arguments produce inconsistent node representations. Added a normalization pass.
Clean Architecture boundaries are enforcing themselves: when I tried to call a Neo4j adapter directly from the Identity context, the import failed by design. That's the point.

Work with us

We're open to collaboration with developers, researchers, and teams who share the same problems. Whether it's contributing to the project, exploring integration opportunities, or future hiring — we want to hear from you.

Open collaboration

Language parser plugins, embedding experiments, graph query optimization. If you've worked on similar problems, let's talk.

Future roles

We're not hiring yet, but when we do, this is where we'll post it first. Backend, infra, DevEx — people who build tools for other developers.

contacto@cerebrolabs.tech

Comments

Questions, ideas, corrections — logged-in GitHub users can comment.