Graceful Shutdown in Golang - Building Resilient Services
Learn how to implement graceful shutdown in your applications to ensure clean resource management and zero data loss. This post covers practical strategies for handling termination signals, closing connections, and finishing in-flight requests.
Imagine it’s 2 AM and your pager goes off: your phone lights up with alerts 502 Bad Gateway
on your main API. You roll out of bed, groggy, and see that right after your deployment,
every request is failing because the old server process was killed immediately,
dropping all live connections and truncating responses.
You scramble through logs and realize what happened: by hard killing the process, you severed every in flight HTTP request and database transaction without giving them a chance to finish. That left some customer orders only half written in your database, and uncommitted transactions sitting in limbo waiting to be rolled back manually. Now you’re stuck fixing broken data, answering support tickets, and wondering: “How can I stop my Go server in a way that lets each request and transaction complete before shutting down?”
Why Graceful Shutdown Matters
Every production service must eventually stop whether due to a deployment, an autoscaler, or a manual restart and handling that moment correctly prevents data loss, errors, and resource leaks. Abrupt termination (a “hard” shutdown) cuts off in flight requests, might leave database transactions half committed, and can leak file handles or goroutines, causing degraded performance over time. By contrast, a graceful shutdown process stops accepting new work while allowing ongoing operations to complete, then cleans up resources before exiting.
This approach:
- Prevents Data Corruption: Ongoing writes whether to a database, file, or external API are allowed to finish rather than being cut off mid stream.
- Improves User Experience: Clients see fewer errors and retries when they receive full responses instead of half written data or abrupt connection closures.
- Simplifies Resource Management: You can close connections, roll back pending transactions, and terminate background goroutines in a controlled fashion.
Graceful Shutdown Fundamentals
Catching Operating System Signals
In Go, termination signals like SIGINT (Ctrl+C) and SIGTERM (default container shutdown
signal) are captured via the os/signal package. Instead of writing your
own signal loop, Go 1.16+ provides signal.NotifyContext, which wraps a parent context
and cancels it when any specified signal arrives:
ctx, stop := signal.NotifyContext(context.Background(),
syscall.SIGINT,
syscall.SIGTERM)
defer stop()
This single call sets up listening for both signals, and the returned
ctx is cancelled immediately when the process is told to terminate.
Leveraging Context for Cancellation
Go’s context package provides a robust way to propagate cancellation signals across
API boundaries and goroutines. When you pass a context.Context to long running operations such
as database queries, HTTP handlers, or background workers, they can listen for ctx.Done()
and return promptly when a shutdown is underway. Combining context with
signal handling ensures that once a signal is received, all parts of
your application share a single source of truth about cancellation.
Shutting Down the HTTP Server
Since Go 1.8, the http.Server type includes a Shutdown(context.Context) error method that performs
a graceful termination: it stops accepting new connections, closes idle connections immediately,
and waits for active connections to finish before shutting down. If
the context’s deadline expires, Shutdown forcibly closes any remaining connections, ensuring your
service does not hang indefinitely.
Building a Graceful Shutdown from Scratch
Let’s combine these primitives into a minimal example that you can build upon for production use.
import (
"context"
"log"
"net/http"
"os/signal"
"syscall"
"time"
)
func main() {
// 1. Create a signal‑aware context
ctx, stop := signal.NotifyContext(
context.Background(),
syscall.SIGINT,
syscall.SIGTERM)
defer stop()
// 2. Define HTTP handler
mux := http.NewServeMux()
mux.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
// Simulate work
time.Sleep(2 * time.Second) w.Write([]byte("Hello, world!"))
})
// 3. Create the server
srv := &http.Server{
Addr: ":8080",
Handler: mux,
}
// 4. Start the server in a goroutine
go func() {
log.Println("Server starting on :8080")
err := srv.ListenAndServe()
if err != nil && err != http.ErrServerClosed {
log.Fatalf("ListenAndServe(): %v", err)
}
}()
// 5. Block until context cancellation (signal received)
<-ctx.Done()
log.Println("Shutdown signal received")
// 6. Create a timeout context for the shutdown process
shutdownCtx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
// 7. Attempt graceful shutdown
if err := srv.Shutdown(shutdownCtx); err != nil {
log.Fatalf("Server Shutdown Failed:%+v", err)
}
log.Println("Server exited properly")
}
- We start by wrapping
context.Background()withsignal.NotifyContextto listen forSIGINTandSIGTERM - We configure a basic HTTP multiplexer and handler.
- We launch
srv.ListenAndServe()in its own goroutine so the main goroutine can await shutdown signals. - Upon receiving a signal (
<-ctx.Done()), we create a secondary context with a 10‑second deadline to bound the shutdown duration. - Calling
srv.Shutdown(shutdownCtx)initiates the graceful shutdown process.
This code, while minimal, embodies the canonical pattern recommended by the Go team.
[!NOTE] A simple example is a great starting point, but real‑world services require handling more moving parts and edge cases.
Coordinating Background Goroutines
If your application spawns background workers for metrics collection, message queue consumers, or
scheduled tasks pass the parent context into them so they can terminate when ctx.Done() fires:
func worker(ctx context.Context, id int) {
for {
select {
case <-ctx.Done():
log.Printf("Worker %d shutting down", id) return
default:
// doing work
}
}
}
// main program that spawns workers
This prevents goroutines from leaking after shutdown and ensures all in‑flight work is accounted for.
Cleaning Up Database Transactions
Database clients (e.g., database/sql, GORM) typically accept a context for queries and
transactions. Ensure you use tx, err := db.BeginTx(ctx, nil) and propagate ctx
through subsequent operations so that pending transactions can roll back if the
shutdown deadline is exceeded.
Managing External Resources
Any connections to external systems Redis clients, Kafka producers, file handles should expose a Close()
or similar cleanup method.
Register custom shutdown hooks immediately after catching the signal:
go func() {
// block until the shutdown signal
<-ctx.Done()
// cleanup redis client
rc.Close()
// cleanup file
f.Close()
}()
Ordering these hooks relative to server shutdown depends on your dependencies: typically, signals to stop accepting new requests come first, then you close clients once no further calls will be made.
Setting Appropriate Timeouts
Choosing a shutdown timeout is a balance: too short, and slow clients or long running requests get cut off; too long, and your deployments may hang or delay rollouts. A window of 5–30 seconds is common, but measure your service’s normal request latencies and resource cleanup times to tune this value.
Integrating with Kubernetes
In a Kubernetes environment, pods receive a SIGTERM when scaled down or updated, and Kubernetes waits
up to terminationGracePeriodSeconds before force killing the container.
To align with this lifecycle:
- Set
terminationGracePeriodSecondsto slightly exceed your application’s shutdown timeout. - Ensure your service stops accepting new traffic immediately (Kubernetes sends a
SIGTERMand removes the pod from endpoints). - Let your code call
srv.Shutdown(ctx)onSIGTERMand exit before the grace period expires.
This coordination prevents failed readiness/liveness probes and avoids 502/504 errors for in‑flight requests.
Testing and Observability
Automated Shutdown Tests
Use Go’s testing framework to simulate signals and verify your service shuts down within expected bounds:
func TestGracefulShutdown(t *testing.T) {
// set up server and context
go main()
// send SIGTERM to process
proc, _ := os.FindProcess(os.Getpid())
proc.Signal(syscall.SIGTERM)
// assert that main() returns within timeout
}
Testing this path ensures your shutdown logic doesn’t regress over time.
Logging and Metrics
Emit structured logs at shutdown start, during resource cleanup, and upon exit. Expose Prometheus metrics (e.g., a gauge for “shutdown_in_progress”) to track how often and how long shutdowns take in production.
A Real‑World Example: Sample Microservice
Imagine a microservice that exposes an HTTP API, consumes messages from Kafka, and writes to PostgreSQL.
func main() {
ctx, stop := signal.NotifyContext(context.Background(), syscall.SIGINT, syscall.SIGTERM) defer stop()
// Set up database with context
db := setupDB()
defer db.Close()
// Start Kafka consumer with ctx
go startKafkaConsumer(ctx, db)
// Configure HTTP server
mux := setupRouter(db)
srv := &http.Server{
Addr: ":8080",
Handler: mux
}
// Start HTTP server
go func() {
if err := srv.ListenAndServe(); err != nil && err != http.ErrServerClosed {
log.Fatalf("HTTP server ListenAndServe: %v", err)
}
}()
<-ctx.Done()
log.Println("Shutdown signal received, commencing graceful shutdown")
// Close external resources first or in parallel
db.Close()
closeKafkaConsumer()
// Then shutdown HTTP server with timeout
shutdownCtx, cancel := context.WithTimeout(context.Background(), 15*time.Second)
defer cancel()
if err := srv.Shutdown(shutdownCtx); err != nil {
log.Fatalf("HTTP server Shutdown: %v", err)
}
log.Println("Service shut down gracefully")
}
Key points:
- Database and Kafka clients use
ctxto cancel in‑flight operations. - External clients are closed before or concurrently with HTTP shutdown.
- HTTP server shutdown is bounded by a timeout.
- Logs clearly delineate shutdown phases.
Best Practices Checklist
- Use
signal.NotifyContextfor OS signals, not manual channels. - Propagate context throughout all layers: HTTP handlers, DB drivers, background goroutines.
- Enforce a shutdown timeout with
context.WithTimeoutto prevent hanging deployments - Clean up all resources: DB, files, external clients, background workers.
- Align with Kubernetes grace periods via
terminationGracePeriodSeconds. - Write automated tests simulating SIGTERM to catch regressions.
- Monitor shutdown metrics and logs to detect issues early.
Conclusion
Implementing graceful shutdown in Go from scratch is straightforward when you leverage
the standard library’s os/signal, context, and http.Server.Shutdown primitives. By following the idioms
and patterns outlined capturing signals, propagating context, bounding shutdown time, cleaning up resources,
and integrating with container orchestration you can ensure your Go services terminate cleanly,
safely, and predictably.
With solid shutdown mechanics in place, you’ll deliver more reliable rollouts and a better experience
for both developers and users.