A real Dribba case study at CityXerpa: support tickets that used to take 4 hours to resolve now close in 8 seconds. No hallucinations — function tools over BigQuery, Claude reasoning, and a Go service on Cloud Run orchestrating it all.
It is not magic. It is classic data engineering with the right piece of AI on top. Here is how we built it, what we measured, and what we would do differently today.
4 h → 8 s
Average resolution time for recurring support tickets
≈70%
Tickets resolved by the agent without human escalation
120 k+
Active users served by the SuperApp
4 wks
From zero to production with 2 senior engineers
The actual problem
CityXerpa already had every piece of information in BigQuery: public-service status, transactions, schedules, fares, transport data, alerts. The problem was not data. It was access: the support team needed minutes per query and end users waited hours for a clear answer.
The classic trap here is to spin up a generic RAG over internal documentation and discover, two months later, that the LLM hallucinates because the actual question lives in a table with millions of rows, not in a PDF.
The right answer is not to vectorise BigQuery. It is to give the LLM the ability to query BigQuery directly through function calling, validating results before generating the response. RAG belongs to the data that is genuinely unstructured text.
Stack
No heavy frameworks, no fashion-driven abstractions. An orchestrator, a reasoner and a data warehouse — wired together with function tools and real observability.
Orchestration
A Go service (chi / echo) deployed on Cloud Run with 0→N autoscaling. Goroutine-based concurrency for parallel fan-out to Claude and BigQuery. Exponential-backoff retries, circuit breakers and context propagation come for free. Stable p99 under load, <30 MB footprint per instance.
When to scale: Cloud Run handles anywhere from 0 to millions of requests/day without configuration. If you need long-running workflows with durable state spanning hours or days (week-long retries, sagas), migrate that hot path to Temporal.
Reasoning + tool use
More disciplined function calling than GPT-4o on long-horizon queries. Native prompt caching (-90% cost on repeated system prompts). Steadier p50 latency under high throughput.
When to scale: Sonnet 4.x for ~80% of queries. Haiku for fast classification and rate-limited paths. Opus only for complex multi-step reasoning.
Data layer
Structured data at scale with plain SQL. Partitioning + clustering keeps most queries under 1 s. BI Engine for hot paths.
When to scale: For unstructured data (PDFs, FAQs, free-text knowledge bases) bolt on a vector layer — BigQuery VECTOR_SEARCH if you want to stay in-stack, pgvector if not.
Real case
120,000+ active users. An app that covers transport, public services, local commerce, alerts and government services — and a human support team that could not scale at the product's pace.
Before
4 h
Average time to resolve a recurring support ticket. The information lived in BigQuery, but flowed through a human who copy-pasted the query.
After
8 s
The agent classifies the question, calls the right tool against BigQuery, validates the result and returns a verifiable answer. It cites the SQL it ran for internal audit.
Escalation
≈30%
Of tickets still need a human (legal cases, decisions that are not pure data, edge cases). The agent prepares the context before handing the case off.
How we built it
Improvements roadmap
Ten concrete improvements over CityXerpa's current stack. Each one with cost, effort and expected metric. No maximalism — only what moves the needle.
The system prompt with the BigQuery schema, CityXerpa's business rules and few-shot examples weighs several thousand tokens. Turning on Anthropic's prompt caching, that part is billed at 10% of the price on every subsequent call within the 5-minute window.
Estimated effort: 1 day
An initial classifier with Claude Haiku decides whether the query is trivial (FAQ-style “when does the town hall open?”), medium (a single-join BigQuery lookup) or complex (multi-step reasoning). Each bucket uses a different model. Most tickets are trivial.
Estimated effort: 3 days
The most-repeated BigQuery queries (public-service status, schedules, fares) get materialized as views refreshed every N minutes. The agent reads them as a table — no recompute. Structural caching without Redis.
Estimated effort: 1 week
30% of tickets that escalate to human today do so because the answer lives in PDFs, manuals and support articles — not in BigQuery. Adding an embeddings layer (BigQuery VECTOR_SEARCH if you want to stay in-stack, pgvector if not) lets the agent cite documentation through a dedicated function tool.
Estimated effort: 2 weeks
A dataset of 200–500 labelled real queries. Every prompt or model deploy is run against the dataset before reaching production. Catches degradations from model updates (e.g. a more cautious Claude release) or prompt changes before users see them.
Estimated effort: 1 week
Token streaming from Anthropic propagated to the client over Server-Sent Events. In Go, context.Context flows from the HTTP handler down to the Anthropic call — if the client closes the connection, inference is cancelled and billable tokens are released. For queries that need tool use before generation, explicit microcopy (“Querying BigQuery…”) instead of a blind spinner.
Estimated effort: 1 week
Force JSON-schema responses on every tool-use call, deserialize into a tagged Go struct and fail fast with a retry on validation errors. The Go compiler catches 80% of schema bugs before they ship — better than dynamic Zod validation at runtime.
Estimated effort: 3 days
Every agent request as an OpenTelemetry trace from the Go handler: a span per Claude call, a span per BigQuery query, attributes for tokens consumed, model used and bytes scanned. Exported to Cloud Trace or Datadog. A support query has a measurable cost — without this, you cannot tell whether to optimise the prompt or the SQL.
Estimated effort: 1 week
For spiky workloads (campaigns, events), split the synchronous path (immediate response) from the asynchronous one (notifications, embeddings, indexing). Pub/Sub as the queue, Go workers with a semaphore-bounded goroutine pool — controlled saturation, no impact on the main service.
Estimated effort: 1 week
When the agent needs to run workflows that last minutes or hours (human escalation with SLA, deferred retries, distributed transactions with compensation), migrate that path to Temporal. The Go SDK is mature and integrates natively with the existing service — no rewrite needed.
Estimated effort: 2–3 weeks (when applicable)
Other use cases
If your company has data in BigQuery (or Snowflake, Redshift, ClickHouse) and a team that loses hours pulling answers in natural language, this pattern applies.
Non-technical staff ask in natural language and the agent returns the SQL it ran, the result and the chart. Replaces analysts for repetitive questions.
Agents that combine business rules with competitor data and historical demand in BigQuery to suggest prices in real time.
The agent cross-references warehouse signals, contextualises the suspicion and opens or closes the case with a readable rationale. Cuts noise in risk teams.
Weekly or monthly summaries combining structured data (KPIs in BigQuery) with LLM-generated narrative. Each figure is backed by the query that produced it.
Agent over the HR and operations knowledge base. Employee onboarding, questions about leave, expenses or internal policy without flooding the helpdesk.
The agent compares records across systems, identifies discrepancies and proposes resolutions. For finance, ops and compliance.
Technical FAQ
Only partially. Classic RAG uses a vector store (embeddings) to retrieve relevant text before passing it to the LLM. Here the primary source is BigQuery via SQL — the LLM reasons and calls function tools that run queries. It is Tool-Augmented Generation over structured data. When unstructured text matters (PDFs, manuals) we add a vector layer alongside, not instead of.
Because most support queries for a SuperApp like CityXerpa are against structured data (service status, schedules, transactions, fares), not text. For that, a vector DB is the wrong tool — it adds complexity and brings no value. When unstructured text is involved (long FAQs, manuals, regulation), then yes: a vector layer scoped to that subset.
Three practical reasons: (1) more disciplined and predictable function calling — Claude rarely invents parameters that do not exist; (2) native prompt caching from Sonnet 3.5 onwards, which makes large system prompts cheap; (3) more stable p50/p99 latency under load. For fast classification, Haiku wins on cost/latency. For projects already on GCP, Gemini Flash is reasonable as a second model in routing.
Go gives us three things in production that neither Python nor n8n deliver at the same cost: (1) stable p99 latency under load with no GIL or runtime warmup; (2) native concurrency through goroutines for parallel fan-out between Claude and BigQuery without heavy async frameworks; (3) a single <30 MB binary that boots in <100 ms on Cloud Run, making 0→N autoscaling transparent. Python fits for prototyping and notebooks; n8n fits for visual flows the ops team will touch. For a service on the customer's hot path with a real SLA, Go is the reasonable default.
Three levers: (1) Anthropic prompt caching, which reduces the system-prompt cost by 90% within the 5-minute window; (2) model routing — Haiku for classification, Sonnet for the bulk, Opus only for complex reasoning; (3) BigQuery materialized views that avoid repeated calls for common queries. At CityXerpa, the three combined keep average cost per ticket below €0.01.
Yes, with caveats. The Go service and BigQuery deploy inside the customer's VPC (Go service on private GKE or Cloud Run with VPC Service Controls; BigQuery via VPC Service Controls too). For Claude, Anthropic offers the model via AWS Bedrock or Google Vertex AI, both with VPC peering — data never leaves your cloud. For customers with full restriction (banking, defence) we use a self-hosted LLM (Llama 3.1 70B on vLLM in GKE) as fallback, accepting that quality drops a notch from Claude Sonnet.
For a focused agent over a bounded domain (one ticket type, one set of BigQuery tables), 4 to 8 weeks with two senior engineers. The bottleneck is never the code — it is scoping correctly, getting access to the data, and building the eval dataset with real queries. ROI is unclear in month one; from month two onwards it is usually obvious.
A well-scoped agent with BigQuery + Claude + Go service on Cloud Run and a bounded domain starts at €35,000–50,000 in development. With an added vector layer, legacy-system integrations or a mature eval pipeline, it climbs to €60,000–90,000. Monthly support and evolution from €3,500/month depending on volume and SLA.
Got data in BigQuery and a similar case?
Tell us the case, show us the schema and we will tell you whether an agent makes sense, what stack fits and what it costs at your real volume. If it does not fit, you will know on the same call.
100% senior in-house team in Barcelona and Andorra · Reply within 24 h