AI · Technical case

AI agents on BigQuery with
Claude and a Go backend.

A real Dribba case study at CityXerpa: support tickets that used to take 4 hours to resolve now close in 8 seconds. No hallucinations — function tools over BigQuery, Claude reasoning, and a Go service on Cloud Run orchestrating it all.

It is not magic. It is classic data engineering with the right piece of AI on top. Here is how we built it, what we measured, and what we would do differently today.

Respuesta directa

Caso real (CityXerpa SuperApp Andorra, 120k+ usuarios): agente IA combinando Claude (Anthropic) + servicio Go en Cloud Run + BigQuery reduce el tiempo de resolución de tickets de soporte de 4 h a 8 s. ~70% cerrados sin escalado humano. Stack: chi router, fan-out paralelo, OpenTelemetry tracing, retries con backoff. Implementación en 4 semanas.

4 h → 8 s

Average resolution time for recurring support tickets

≈70%

Tickets resolved by the agent without human escalation

120 k+

Active users served by the SuperApp

4 wks

From zero to production with 2 senior engineers

The actual problem

80% of “we need AI” enterprise projects are, in fact, natural-language queries over a data warehouse.

CityXerpa already had every piece of information in BigQuery: public-service status, transactions, schedules, fares, transport data, alerts. The problem was not data. It was access: the support team needed minutes per query and end users waited hours for a clear answer.

The classic trap here is to spin up a generic RAG over internal documentation and discover, two months later, that the LLM hallucinates because the actual question lives in a table with millions of rows, not in a PDF.

The right answer is not to vectorise BigQuery. It is to give the LLM the ability to query BigQuery directly through function calling, validating results before generating the response. RAG belongs to the data that is genuinely unstructured text.

Stack

Three layers, each doing what it does best.

No heavy frameworks, no fashion-driven abstractions. An orchestrator, a reasoner and a data warehouse — wired together with function tools and real observability.

⚡

Orchestration

Go service (Cloud Run)

A Go service (chi / echo) deployed on Cloud Run with 0→N autoscaling. Goroutine-based concurrency for parallel fan-out to Claude and BigQuery. Exponential-backoff retries, circuit breakers and context propagation come for free. Stable p99 under load, <30 MB footprint per instance.

When to scale: Cloud Run handles anywhere from 0 to millions of requests/day without configuration. If you need long-running workflows with durable state spanning hours or days (week-long retries, sagas), migrate that hot path to Temporal.

🧠

Reasoning + tool use

Claude (Anthropic)

More disciplined function calling than GPT-4o on long-horizon queries. Native prompt caching (-90% cost on repeated system prompts). Steadier p50 latency under high throughput.

When to scale: Sonnet 4.x for ~80% of queries. Haiku for fast classification and rate-limited paths. Opus only for complex multi-step reasoning.

🗄️

Data layer

BigQuery

Structured data at scale with plain SQL. Partitioning + clustering keeps most queries under 1 s. BI Engine for hot paths.

When to scale: For unstructured data (PDFs, FAQs, free-text knowledge bases) bolt on a vector layer — BigQuery VECTOR_SEARCH if you want to stay in-stack, pgvector if not.

Real case

CityXerpa — Andorra's SuperApp closing tickets in 8 seconds.

120,000+ active users. An app that covers transport, public services, local commerce, alerts and government services — and a human support team that could not scale at the product's pace.

Before

4 h

Average time to resolve a recurring support ticket. The information lived in BigQuery, but flowed through a human who copy-pasted the query.

After

8 s

The agent classifies the question, calls the right tool against BigQuery, validates the result and returns a verifiable answer. It cites the SQL it ran for internal audit.

Escalation

≈30%

Of tickets still need a human (legal cases, decisions that are not pure data, edge cases). The agent prepares the context before handing the case off.

How we built it

4 weeks. 2 senior engineers. A dataset of 300 real queries labelled before writing a line of code.

01Week 1 — Discovery with real data: we audited 90 days of support tickets. 70% concentrated in 12 categories. The rest, long tail. Decision: optimise for those 12 first, escalate to humans for the rest.
02Week 2 — Stack and prompts: Go service with chi on Cloud Run (0→N autoscaling, goroutines for parallel fan-out to Claude and BigQuery), Claude Sonnet with a system prompt that knows the BigQuery schema + few-shot examples per category, function tool with a table/column allowlist enforced in the Go handler.
03Week 3 — Eval and guardrails: dataset of 300 real queries with expected answers. Every prompt or model change passes the eval before reaching production. PII guardrail and per-user rate limit.
04Week 4 — Rollout and observability: progressive rollout (5% → 25% → 100% over 7 days), dashboards for cost/query, p50/p99 latency and deflection rate. Human support shadowed the first 7 days with override authority.

Improvements roadmap

What we would do today if we started from scratch.

Ten concrete improvements over CityXerpa's current stack. Each one with cost, effort and expected metric. No maximalism — only what moves the needle.

01−90% system-prompt cost

Anthropic prompt caching

The system prompt with the BigQuery schema, CityXerpa's business rules and few-shot examples weighs several thousand tokens. Turning on Anthropic's prompt caching, that part is billed at 10% of the price on every subsequent call within the 5-minute window.

Estimated effort: 1 day

02−60% average cost

Model routing by complexity

An initial classifier with Claude Haiku decides whether the query is trivial (FAQ-style “when does the town hall open?”), medium (a single-join BigQuery lookup) or complex (multi-step reasoning). Each bucket uses a different model. Most tickets are trivial.

Estimated effort: 3 days

03p99 < 500 ms

Materialized views for frequent queries

The most-repeated BigQuery queries (public-service status, schedules, fares) get materialized as views refreshed every N minutes. The agent reads them as a table — no recompute. Structural caching without Redis.

Estimated effort: 1 week

04+25% tickets resolved without escalation

Vector layer for unstructured data

30% of tickets that escalate to human today do so because the answer lives in PDFs, manuals and support articles — not in BigQuery. Adding an embeddings layer (BigQuery VECTOR_SEARCH if you want to stay in-stack, pgvector if not) lets the agent cite documentation through a dedicated function tool.

Estimated effort: 2 weeks

05Regression detection in hours, not weeks

Eval pipeline with Braintrust or Langfuse

A dataset of 200–500 labelled real queries. Every prompt or model deploy is run against the dataset before reaching production. Catches degradations from model updates (e.g. a more cautious Claude release) or prompt changes before users see them.

Estimated effort: 1 week

06−50% perceived response time

End-to-end SSE streaming with context cancellation in Go

Token streaming from Anthropic propagated to the client over Server-Sent Events. In Go, context.Context flows from the HTTP handler down to the Anthropic call — if the client closes the connection, inference is cancelled and billable tokens are released. For queries that need tool use before generation, explicit microcopy (“Querying BigQuery…”) instead of a blind spinner.

Estimated effort: 1 week

07Zero parsing errors in production

Structured output + typed validation in Go

Force JSON-schema responses on every tool-use call, deserialize into a tagged Go struct and fail fast with a retry on validation errors. The Go compiler catches 80% of schema bugs before they ship — better than dynamic Zod validation at runtime.

Estimated effort: 3 days

08Decisions backed by data, not by gut feel

Observability: OpenTelemetry tracing in the Go service

Every agent request as an OpenTelemetry trace from the Go handler: a span per Claude call, a span per BigQuery query, attributes for tokens consumed, model used and bytes scanned. Exported to Cloud Trace or Datadog. A support query has a measurable cost — without this, you cannot tell whether to optimise the prompt or the SQL.

Estimated effort: 1 week

0910× throughput without vertical scaling

Go workers with goroutine pool + Pub/Sub

For spiky workloads (campaigns, events), split the synchronous path (immediate response) from the asynchronous one (notifications, embeddings, indexing). Pub/Sub as the queue, Go workers with a semaphore-bounded goroutine pool — controlled saturation, no impact on the main service.

Estimated effort: 1 week

10Resilient multi-step workflows

Durable sagas with Temporal as it grows

When the agent needs to run workflows that last minutes or hours (human escalation with SLA, deferred retries, distributed transactions with compensation), migrate that path to Temporal. The Go SDK is mature and integrates natively with the existing service — no rewrite needed.

Estimated effort: 2–3 weeks (when applicable)

Other use cases

The same stack solves more than support.

If your company has data in BigQuery (or Snowflake, Redshift, ClickHouse) and a team that loses hours pulling answers in natural language, this pattern applies.

📊

Ad-hoc analysis over the data warehouse

Non-technical staff ask in natural language and the agent returns the SQL it ran, the result and the chart. Replaces analysts for repetitive questions.

🛒

Dynamic-pricing assistant

Agents that combine business rules with competitor data and historical demand in BigQuery to suggest prices in real time.

🛡️

Fraud and anomaly detection

The agent cross-references warehouse signals, contextualises the suspicion and opens or closes the case with a readable rationale. Cuts noise in risk teams.

📑

Executive report generation

Weekly or monthly summaries combining structured data (KPIs in BigQuery) with LLM-generated narrative. Each figure is backed by the query that produced it.

🤝

Onboarding and internal support

Agent over the HR and operations knowledge base. Employee onboarding, questions about leave, expenses or internal policy without flooding the helpdesk.

🔁

Reconciliation and audit

The agent compares records across systems, identifies discrepancies and proposes resolutions. For finance, ops and compliance.

See the full AI Integration service →·See our other AI agents for enterprise →

Technical FAQ

The questions every CTO asks us in week one.

Only partially. Classic RAG uses a vector store (embeddings) to retrieve relevant text before passing it to the LLM. Here the primary source is BigQuery via SQL — the LLM reasons and calls function tools that run queries. It is Tool-Augmented Generation over structured data. When unstructured text matters (PDFs, manuals) we add a vector layer alongside, not instead of.

Because most support queries for a SuperApp like CityXerpa are against structured data (service status, schedules, transactions, fares), not text. For that, a vector DB is the wrong tool — it adds complexity and brings no value. When unstructured text is involved (long FAQs, manuals, regulation), then yes: a vector layer scoped to that subset.

Three practical reasons: (1) more disciplined and predictable function calling — Claude rarely invents parameters that do not exist; (2) native prompt caching from Sonnet 3.5 onwards, which makes large system prompts cheap; (3) more stable p50/p99 latency under load. For fast classification, Haiku wins on cost/latency. For projects already on GCP, Gemini Flash is reasonable as a second model in routing.

Go gives us three things in production that neither Python nor n8n deliver at the same cost: (1) stable p99 latency under load with no GIL or runtime warmup; (2) native concurrency through goroutines for parallel fan-out between Claude and BigQuery without heavy async frameworks; (3) a single <30 MB binary that boots in <100 ms on Cloud Run, making 0→N autoscaling transparent. Python fits for prototyping and notebooks; n8n fits for visual flows the ops team will touch. For a service on the customer's hot path with a real SLA, Go is the reasonable default.

Three levers: (1) Anthropic prompt caching, which reduces the system-prompt cost by 90% within the 5-minute window; (2) model routing — Haiku for classification, Sonnet for the bulk, Opus only for complex reasoning; (3) BigQuery materialized views that avoid repeated calls for common queries. At CityXerpa, the three combined keep average cost per ticket below €0.01.

Yes, with caveats. The Go service and BigQuery deploy inside the customer's VPC (Go service on private GKE or Cloud Run with VPC Service Controls; BigQuery via VPC Service Controls too). For Claude, Anthropic offers the model via AWS Bedrock or Google Vertex AI, both with VPC peering — data never leaves your cloud. For customers with full restriction (banking, defence) we use a self-hosted LLM (Llama 3.1 70B on vLLM in GKE) as fallback, accepting that quality drops a notch from Claude Sonnet.

For a focused agent over a bounded domain (one ticket type, one set of BigQuery tables), 4 to 8 weeks with two senior engineers. The bottleneck is never the code — it is scoping correctly, getting access to the data, and building the eval dataset with real queries. ROI is unclear in month one; from month two onwards it is usually obvious.

A well-scoped agent with BigQuery + Claude + Go service on Cloud Run and a bounded domain starts at €35,000–50,000 in development. With an added vector layer, legacy-system integrations or a mature eval pipeline, it climbs to €60,000–90,000. Monthly support and evolution from €3,500/month depending on volume and SLA.

AI agents on BigQuery withClaude and a Go backend.

80% of “we need AI” enterprise projects are, in fact, natural-language queries over a data warehouse.

Three layers, each doing what it does best.

Go service (Cloud Run)

Claude (Anthropic)

BigQuery

CityXerpa — Andorra's SuperApp closing tickets in 8 seconds.

4 weeks. 2 senior engineers. A dataset of 300 real queries labelled before writing a line of code.

What we would do today if we started from scratch.

Anthropic prompt caching

Model routing by complexity

Materialized views for frequent queries

Vector layer for unstructured data

Eval pipeline with Braintrust or Langfuse

End-to-end SSE streaming with context cancellation in Go

Structured output + typed validation in Go

Observability: OpenTelemetry tracing in the Go service

Go workers with goroutine pool + Pub/Sub

Durable sagas with Temporal as it grows

The same stack solves more than support.

Ad-hoc analysis over the data warehouse

Dynamic-pricing assistant

Fraud and anomaly detection

Executive report generation

Onboarding and internal support

Reconciliation and audit

The questions every CTO asks us in week one.

30 minutes with a partner.No pitch, no script.

AI agents on BigQuery with
Claude and a Go backend.

30 minutes with a partner.
No pitch, no script.