Technical migration · Strangler pattern

Migrating from
Python to Go.

Your Django or Flask is not keeping up with growth. Latency rising, Cloud bill exploding, the GIL blocking what should be concurrent. We migrate to Go with the strangler pattern — no big-bang, keeping the business running every sprint.

Not a blind rewrite, not a language purist's crusade. A domain-by-domain migration with metrics before and after, and API compatibility for as long as the change takes.

Respuesta directa

Migración progresiva de backend Python (Django, Flask, FastAPI) a Go en Cloud Run con strangler pattern: rollout por porcentaje (1% → 100%), sin big-bang ni downtime. Reducción típica del coste Cloud Run del 40% y latencia p99 dividida entre 2–4. Proceso de 5 fases con métricas comparadas en cada salto.

−40%

Typical Cloud Run cost after migrating the hot path

10×

Throughput per instance with goroutines vs uvicorn workers

<100 ms

Typical cold-start, vs 1–3 s for a Python process with heavy deps

Big-bangs. Progressive strangler-pattern migration

When to migrate

Symptoms that tell you your Python backend has stopped scaling.

Not every Python backend needs migrating. These are the concrete symptoms where Go gives back more than it costs.

🥵

p99 latency rising with traffic

The GIL serialises real concurrency. As you push RPS, p99 spikes even with CPU at 30%. Adding uvicorn workers buys time, it does not solve it.

💸

Cloud bill growing fast

To serve the same throughput, Python needs 3–10× more instances than Go. On Cloud Run with autoscaling, that is direct invoice. Migrating the hot path cuts cost without changing logic.

🐢

Multi-second cold-starts

Python + Django/FastAPI + ML deps takes 1–3 s to boot. On Cloud Run with scale-to-zero, that means failed requests or timeouts. A Go binary boots in <100 ms.

🧱

Heavy concurrency with asyncio or Celery

asyncio works but its mental model is complex and breaks inside sync libraries. Celery + Redis for async tasks is operationally expensive. In Go, a goroutine + channel covers 90% of those cases.

🔒

Runtime types, production bugs

mypy helps but is not enforced. In Go, the compiler rejects type mismatches before they reach CI. The bug curve per release flattens visibly.

📦

Deploys held back by heavy base images

Python Docker images with NumPy, PyTorch and friends easily exceed 1 GB. In Go, a static binary on a scratch or distroless image lands around 30 MB. Faster deploys, smaller attack surface.

When NOT to migrate

·Your load is moderate (<100 RPS) and the current SLA is met without pain.
·The service is heavy in numerical/ML logic (NumPy, pandas, PyTorch). Better keep Python there and only migrate the HTTP service layer.
·The team is 100% Python with no appetite to learn Go. A failed migration is worse than a slow Python.
·The service will be deprecated in <12 months. No point investing in migrating something that is going away.

Honest comparison

Python vs Go in production.

Each language wins where it should. This is the comparison we hand to CTOs on the fence — no stack tribalism.

Dimension	Python	Go	Verdict
p99 latency at high concurrency	GIL-limited · uvicorn workers as workaround	Stable under load · native goroutines	Go
Cold-start (Cloud Run / Lambda)	1–3 s typical with dependencies	<100 ms with optimised binary	Go
Image footprint	300 MB – 2 GB	<30 MB with multi-stage	Go
Prototyping speed	Unbeatable · notebook → script → service	More boilerplate · always compiles	Python
ML / data science ecosystem	De-facto standard · NumPy, PyTorch, scikit	Limited · ONNX runtime for inference	Python
Typing and compile-time error catching	mypy / pyright are optional · runtime errors	Strict compiler · explicit errors	Go
Concurrency	asyncio works but breaks in sync libs · GIL	Goroutines + channels · first-class language feature	Go
Talent availability	Abundant at every level	More niche · but high average level	Tie
Cloud Run deployment	Reasonable · constant Dockerfile tweaks	Idiomatic · scratch/distroless image	Go
Monthly cost at equal throughput	Baseline	Typically −40% to −60%	Go

How we migrate

Strangler pattern. No big-bang, no downtime.

The new Go service progressively absorbs traffic via proxy or feature flags. The API stays compatible. The Python monolith stays alive until the last endpoint has been migrated.

Discovery + audit

Audit of the existing monolith: endpoints, per-endpoint latency, test coverage, critical dependencies, existing observability. We identify the 3–5 hot endpoints that justify starting the migration. We define the target SLO.

1–2 wks

API design and contracts

The new Go service exposes exactly the same APIs as the Python monolith. OpenAPI / Protobuf as a signed contract, contract testing in CI. No client changes — the frontend or mobile apps do not touch code.

1 wk

First domain implementation

We rewrite the hottest endpoint or set of endpoints in Go. Same DB, same Redis, same queue. Integration tests against real fixtures. Initial deployment at 0% traffic — full shadow run to validate behavioural parity.

3–6 wks

Progressive rollout with proxy/flags

Routing 1% → 5% → 25% → 100% of traffic to the Go service via proxy (Envoy, Cloud Load Balancer) or per-user feature flags. Metrics compared at every step: latency, error rate, response parity. Rollback in seconds if anything drifts.

2–3 wks per domain

Rest of the migration + monolith shutdown

We repeat the cycle per domain. When the last Python endpoint has migrated, we shut down the monolith. Operational documentation, runbooks, Grafana / Cloud Monitoring dashboards handed over to the customer's team.

Scope-dependent

What we migrate

Typical Python → Go migration cases.

Not everything always migrates. Sometimes only the hot path. These are the most frequent patterns we see in production.

🚪

API gateway or BFF

The Django/Flask serving the mobile app or SPA. Usually the highest-traffic path and the easiest to migrate — squeezes the most out of goroutines and immediately reduces Cloud cost.

🔁

Async processing workers

What Celery + Redis does today gets absorbed by a Go service with Pub/Sub or NATS. Bounded goroutine pool, dead-letter queue, controlled backpressure. 10× throughput per instance without touching the business logic.

🤖

Services that orchestrate LLMs

The orchestration part (function calling, retries, streaming) in Go; pure ML (training, fine-tuning) stays in Python. Common hybrid pattern — we apply it at CityXerpa.

🔌

Microservices extracted from the monolith

Splitting a Django monolith by domain (auth, billing, catalogue, notifications). Each domain becomes an independent Go microservice with its own DB. Strangler pattern per domain, leaving the rest untouched.

💳

Pricing engines and fintech services

Where p99 < 100 ms is contractual. Python does not get there; Java/JVM works but weighs more. Go fits exactly: right latency and footprint, cheaper talent than Java.

📡

Webhook and event handlers

Endpoints that receive webhooks (Stripe, Twilio, GitHub) and trigger downstream processing. Go makes these endpoints saturation-proof with goroutines + Pub/Sub to decouple the hot path.

See the Go backend specialty →·See the CityXerpa AI case →

Technical FAQ

The questions every CTO asks us in the first meeting.

It depends on the size and fragmentation of the monolith. A medium-sized Python service with 8–15 hot endpoints migrates in 3–5 months with the strangler pattern, with no downtime and the business running. Large monoliths with hundreds of endpoints are 9–18-month projects, but ROI usually shows in month 3 once the first hot domain has migrated.

Unit tests, yes — they are language-specific. Integration and end-to-end tests (that hit the API) NO: those are precisely the ones that validate the new Go service returns the same response as the old Python. Keeping them across the entire migration is the main safety net.

Common case. Three options: (1) Dribba runs the migration end-to-end and hands over docs + 2 weeks of pairing sessions; (2) hybrid model — our Go engineers embedded with your Python team to transfer knowledge while migrating; (3) intensive prior training (1–2 weeks) for the team and ongoing accompaniment. Python engineers with experience pick up Go in 4–8 weeks — the learning curve is short.

Fully. The exposed API does not change while the migration runs. Same endpoints, same schemas, same error codes. The frontend or mobile apps notice nothing. If you want to evolve the API to v2 after the migration, we do that as a separate project.

By default, only the application layer. The DB stays (usually Postgres). The Go service uses pgx or sqlc to hit the same tables. If the DB is itself part of the problem (poor model, heavy queries, missing indexes), we flag it during discovery and propose improvements as a separate track.

We do not migrate them. PyTorch, TensorFlow, scikit-learn, NumPy/pandas stay in Python — Go does not compete there. The split is: HTTP service layer in Go, ML layer in Python (minimal FastAPI or gRPC server). The Go service calls Python via internal gRPC. Common, effective pattern.

Four metrics at every traffic step: (1) p50/p95/p99 latency of the new Go service vs the old Python; (2) error rate per endpoint; (3) response parity — byte-for-byte response comparison for 1000 random requests; (4) Cloud Run cost per request. Grafana or Cloud Monitoring dashboards with automatic alerts if any metric drifts >5%.

Hot-path migration (3–5 endpoints in a medium Python service) ranges €60,000–120,000 over 3–5 months. Full migration of a monolith to a Go microservices architecture ranges €150,000–400,000 over 9–18 months. Fixed price per domain after Discovery. The bill usually amortises in 6–12 months from Cloud savings alone.

Migrating fromPython to Go.