Designing Rate Limiting APIs: A Practical Guide with Go Examples
Gopher Stories — by Harsh Jha
Rate limiting protects your backend from abuse, unexpected spikes, and overload.
This article covers:
What rate limiting is
Why it matters
Different algorithms
How to architect them
Go implementations for Token Bucket, Sliding Window, and Leaky Bucket
1. What is Rate Limiting?
Rate limiting restricts how many requests a client can make within a given time window.
Example:
- 100 requests per minute per user
If exceeded:
HTTP 429 — Too Many Requests
2. Why Rate Limiting Matters
Prevents abuse and brute-force attacks
Protects databases and infrastructure
Keeps latency predictable
Ensures fair usage
3. Rate Limiting Algorithms
| Algorithm | Burst Allowed | Accuracy | Complexity |
| Fixed Window | No | Medium | Low |
| Sliding Window | Limited | High | Medium |
| Token Bucket | Yes | High | Medium |
| Leaky Bucket | No | Medium | Medium |
4. Architecture
High-Level
Client → Load Balancer → API Gateway → Go Service
↓
Redis
Distributed
Client
↓
Load Balancer
↓
Go API Instances
↓
Redis (shared state)
5. Token Bucket Implementation (Go + Redis)
Concept
Tokens refill at a constant rate. Each request consumes one token.
Redis Structure
Hash per user:
rate:{user}
tokens
last_refill
Go Code
type TokenBucket struct {
Capacity int
RefillRate float64
}
func (tb *TokenBucket) Allow(ctx context.Context, key string) (bool, int, error) {
now := time.Now().Unix()
pipe := rdb.TxPipeline()
tokensCmd := pipe.HGet(ctx, key, "tokens")
lastCmd := pipe.HGet(ctx, key, "last")
_, err := pipe.Exec(ctx)
tokens, _ := tokensCmd.Int()
last, _ := lastCmd.Int64()
if last == 0 {
tokens = tb.Capacity
last = now
}
elapsed := float64(now - last)
refill := int(elapsed * tb.RefillRate)
tokens = min(tb.Capacity, tokens+refill)
if tokens <= 0 {
return false, tokens, nil
}
tokens--
pipe = rdb.TxPipeline()
pipe.HSet(ctx, key, "tokens", tokens)
pipe.HSet(ctx, key, "last", now)
pipe.Expire(ctx, key, time.Hour)
_, err = pipe.Exec(ctx)
return true, tokens, err
}
6. Sliding Window Implementation
Concept
Count requests in the last N seconds using timestamps.
Redis Structure
Sorted set:
rate:{user}
score = timestamp
value = timestamp
Go Code
func AllowSlidingWindow(ctx context.Context, key string, limit int, window time.Duration) (bool, int, error) {
now := time.Now().UnixNano()
pipe := rdb.TxPipeline()
pipe.ZAdd(ctx, key, redis.Z{Score: float64(now), Member: now})
pipe.ZRemRangeByScore(ctx, key, "0", fmt.Sprintf("%d", now-int64(window)))
countCmd := pipe.ZCard(ctx, key)
pipe.Expire(ctx, key, window)
_, err := pipe.Exec(ctx)
if err != nil {
return false, 0, err
}
count := int(countCmd.Val())
if count > limit {
return false, count, nil
}
return true, count, nil
}
7. Leaky Bucket Implementation
Concept
Requests enter a queue and are processed at a fixed rate. Excess requests are dropped or delayed.
In-memory Leaky Bucket (single instance)
type LeakyBucket struct {
Capacity int
Rate time.Duration
Queue chan struct{}
}
func NewLeakyBucket(capacity int, rate time.Duration) *LeakyBucket {
lb := &LeakyBucket{
Capacity: capacity,
Rate: rate,
Queue: make(chan struct{}, capacity),
}
go func() {
ticker := time.NewTicker(rate)
for range ticker.C {
select {
case <-lb.Queue:
default:
}
}
}()
return lb
}
func (lb *LeakyBucket) Allow() bool {
select {
case lb.Queue <- struct{}{}:
return true
default:
return false
}
}
8. Middleware Example
func RateLimitMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
user := r.RemoteAddr
allowed, _, err := AllowSlidingWindow(r.Context(), "rate:"+user, 100, time.Minute)
if err != nil {
http.Error(w, "Internal error", 500)
return
}
if !allowed {
w.WriteHeader(429)
w.Write([]byte("Too many requests"))
return
}
next.ServeHTTP(w, r)
})
}
9. Design Considerations
Use Redis for distributed systems
Use API Gateway for coarse-grained limits
Apply different limits for different user plans
Always add expiration to avoid memory leaks
10. Common Pitfalls
In-memory counters in multi-instance systems
Clock drift between servers
Overly strict limits
Not excluding internal services
11. Rate Limiting with NGINX
NGINX provides built-in rate limiting using a shared memory zone.
Example: Limit by IP
http {
limit_req_zone $binary_remote_addr zone=api_limit:10m rate=100r/m;
server {
location /api/ {
limit_req zone=api_limit burst=20 nodelay;
proxy_pass http://backend;
}
}
}
Explanation
limit_req_zonecreates a shared memory zone.rate=100r/mmeans 100 requests per minute.burst=20allows short spikes.nodelayprocesses bursts immediately.
Example: Limit by API Key
map $http_x_api_key $api_key {
default $http_x_api_key;
}
limit_req_zone $api_key zone=key_limit:10m rate=60r/m;
server {
location /api/ {
limit_req zone=key_limit burst=10;
proxy_pass http://backend;
}
}
12. Rate Limiting with Cloudflare
Cloudflare provides rate limiting at the edge.
Example: API Abuse Protection
Rule logic:
(http.request.uri.path contains "/api/")
Settings:
Threshold: 100 requests
Period: 1 minute
Action: Block or Challenge
Example: Login Endpoint Protection
Rule:
(http.request.uri.path eq "/login")
Limit:
- 10 requests per minute per IP
Action:
- Managed challenge
Cloudflare JSON Rule Example
{
"expression": "(http.request.uri.path contains \"/api/\")",
"threshold": 100,
"period": 60,
"action": "block"
}
13. Where to Use What
| Layer | Tool | Use Case |
| Edge | Cloudflare | Bot & DDoS protection |
| Gateway | NGINX | Global and API-level limits |
| Backend | Redis + Go | User-level business logic |
14. Final Architecture
Client
↓
Cloudflare (Edge rate limit)
↓
NGINX (Gateway rate limit)
↓
Go API (Business rate limit)
↓
Redis
This layered approach gives you:
Early blocking of bad traffic
Protection from abuse
Fine-grained control
15. Conclusion
Rate limiting is essential for protecting your backend and ensuring fair usage.
Token Bucket is good for bursts, Sliding Window is fair and accurate, Leaky Bucket is useful for smoothing traffic.
Use Cloudflare for coarse protection.
Use NGINX for infrastructure-level limits.
Use Go + Redis for business-level control.
This multi-layered rate limiting approach is reliable, scalable, and production-ready.
Choose based on your system’s needs and traffic patterns.
Credits and Sources
This article is based on my learning and understanding from the following resources:
Martin Kleppmann, Designing Data-Intensive Applications (O’Reilly, 2017)
Google, Site Reliability Engineering (O’Reilly, 2016)
Sam Newman, Building Microservices
Redis Documentation — Rate Limiting Patterns
NGINX Documentation — HTTP Rate Limiting
Cloudflare Documentation — Rate Limiting Rules
Stripe Engineering Blog — API Rate Limiting
Uber Engineering Blog — Protecting APIs
Netflix Technology Blog — Resilience Patterns