Skip to main content

Command Palette

Search for a command to run...

Designing Rate Limiting APIs: A Practical Guide with Go Examples

Gopher Stories — by Harsh Jha

Published
5 min read

Rate limiting protects your backend from abuse, unexpected spikes, and overload.
This article covers:

  • What rate limiting is

  • Why it matters

  • Different algorithms

  • How to architect them

  • Go implementations for Token Bucket, Sliding Window, and Leaky Bucket


1. What is Rate Limiting?

Rate limiting restricts how many requests a client can make within a given time window.

Example:

  • 100 requests per minute per user

If exceeded:

HTTP 429 — Too Many Requests

2. Why Rate Limiting Matters

  • Prevents abuse and brute-force attacks

  • Protects databases and infrastructure

  • Keeps latency predictable

  • Ensures fair usage


3. Rate Limiting Algorithms

AlgorithmBurst AllowedAccuracyComplexity
Fixed WindowNoMediumLow
Sliding WindowLimitedHighMedium
Token BucketYesHighMedium
Leaky BucketNoMediumMedium

4. Architecture

High-Level

Client → Load Balancer → API Gateway → Go Service
                                  ↓
                                Redis

Distributed

Client
  ↓
Load Balancer
  ↓
Go API Instances
  ↓
Redis (shared state)

5. Token Bucket Implementation (Go + Redis)

Concept

Tokens refill at a constant rate. Each request consumes one token.

Redis Structure

Hash per user:

rate:{user}
  tokens
  last_refill

Go Code

type TokenBucket struct {
    Capacity   int
    RefillRate float64
}
func (tb *TokenBucket) Allow(ctx context.Context, key string) (bool, int, error) {
    now := time.Now().Unix()

    pipe := rdb.TxPipeline()
    tokensCmd := pipe.HGet(ctx, key, "tokens")
    lastCmd := pipe.HGet(ctx, key, "last")
    _, err := pipe.Exec(ctx)

    tokens, _ := tokensCmd.Int()
    last, _ := lastCmd.Int64()

    if last == 0 {
        tokens = tb.Capacity
        last = now
    }

    elapsed := float64(now - last)
    refill := int(elapsed * tb.RefillRate)
    tokens = min(tb.Capacity, tokens+refill)

    if tokens <= 0 {
        return false, tokens, nil
    }

    tokens--

    pipe = rdb.TxPipeline()
    pipe.HSet(ctx, key, "tokens", tokens)
    pipe.HSet(ctx, key, "last", now)
    pipe.Expire(ctx, key, time.Hour)
    _, err = pipe.Exec(ctx)

    return true, tokens, err
}

6. Sliding Window Implementation

Concept

Count requests in the last N seconds using timestamps.

Redis Structure

Sorted set:

rate:{user}
  score = timestamp
  value = timestamp

Go Code

func AllowSlidingWindow(ctx context.Context, key string, limit int, window time.Duration) (bool, int, error) {
    now := time.Now().UnixNano()

    pipe := rdb.TxPipeline()
    pipe.ZAdd(ctx, key, redis.Z{Score: float64(now), Member: now})
    pipe.ZRemRangeByScore(ctx, key, "0", fmt.Sprintf("%d", now-int64(window)))
    countCmd := pipe.ZCard(ctx, key)
    pipe.Expire(ctx, key, window)

    _, err := pipe.Exec(ctx)
    if err != nil {
        return false, 0, err
    }

    count := int(countCmd.Val())
    if count > limit {
        return false, count, nil
    }

    return true, count, nil
}

7. Leaky Bucket Implementation

Concept

Requests enter a queue and are processed at a fixed rate. Excess requests are dropped or delayed.


In-memory Leaky Bucket (single instance)

type LeakyBucket struct {
    Capacity int
    Rate     time.Duration
    Queue    chan struct{}
}

func NewLeakyBucket(capacity int, rate time.Duration) *LeakyBucket {
    lb := &LeakyBucket{
        Capacity: capacity,
        Rate:     rate,
        Queue:    make(chan struct{}, capacity),
    }

    go func() {
        ticker := time.NewTicker(rate)
        for range ticker.C {
            select {
            case <-lb.Queue:
            default:
            }
        }
    }()

    return lb
}
func (lb *LeakyBucket) Allow() bool {
    select {
    case lb.Queue <- struct{}{}:
        return true
    default:
        return false
    }
}

8. Middleware Example

func RateLimitMiddleware(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        user := r.RemoteAddr

        allowed, _, err := AllowSlidingWindow(r.Context(), "rate:"+user, 100, time.Minute)
        if err != nil {
            http.Error(w, "Internal error", 500)
            return
        }

        if !allowed {
            w.WriteHeader(429)
            w.Write([]byte("Too many requests"))
            return
        }

        next.ServeHTTP(w, r)
    })
}

9. Design Considerations

  • Use Redis for distributed systems

  • Use API Gateway for coarse-grained limits

  • Apply different limits for different user plans

  • Always add expiration to avoid memory leaks


10. Common Pitfalls

  • In-memory counters in multi-instance systems

  • Clock drift between servers

  • Overly strict limits

  • Not excluding internal services


11. Rate Limiting with NGINX

NGINX provides built-in rate limiting using a shared memory zone.

Example: Limit by IP

http {
    limit_req_zone $binary_remote_addr zone=api_limit:10m rate=100r/m;

    server {
        location /api/ {
            limit_req zone=api_limit burst=20 nodelay;
            proxy_pass http://backend;
        }
    }
}

Explanation

  • limit_req_zone creates a shared memory zone.

  • rate=100r/m means 100 requests per minute.

  • burst=20 allows short spikes.

  • nodelay processes bursts immediately.


Example: Limit by API Key

map $http_x_api_key $api_key {
    default $http_x_api_key;
}

limit_req_zone $api_key zone=key_limit:10m rate=60r/m;

server {
    location /api/ {
        limit_req zone=key_limit burst=10;
        proxy_pass http://backend;
    }
}

12. Rate Limiting with Cloudflare

Cloudflare provides rate limiting at the edge.

Example: API Abuse Protection

Rule logic:

(http.request.uri.path contains "/api/")

Settings:

  • Threshold: 100 requests

  • Period: 1 minute

  • Action: Block or Challenge


Example: Login Endpoint Protection

Rule:

(http.request.uri.path eq "/login")

Limit:

  • 10 requests per minute per IP

Action:

  • Managed challenge

Cloudflare JSON Rule Example

{
  "expression": "(http.request.uri.path contains \"/api/\")",
  "threshold": 100,
  "period": 60,
  "action": "block"
}

13. Where to Use What

LayerToolUse Case
EdgeCloudflareBot & DDoS protection
GatewayNGINXGlobal and API-level limits
BackendRedis + GoUser-level business logic

14. Final Architecture

Client
  ↓
Cloudflare (Edge rate limit)
  ↓
NGINX (Gateway rate limit)
  ↓
Go API (Business rate limit)
  ↓
Redis

This layered approach gives you:

  • Early blocking of bad traffic

  • Protection from abuse

  • Fine-grained control


15. Conclusion

Rate limiting is essential for protecting your backend and ensuring fair usage.
Token Bucket is good for bursts, Sliding Window is fair and accurate, Leaky Bucket is useful for smoothing traffic.

  • Use Cloudflare for coarse protection.

  • Use NGINX for infrastructure-level limits.

  • Use Go + Redis for business-level control.

This multi-layered rate limiting approach is reliable, scalable, and production-ready.

Choose based on your system’s needs and traffic patterns.

Credits and Sources

This article is based on my learning and understanding from the following resources:

  • Martin Kleppmann, Designing Data-Intensive Applications (O’Reilly, 2017)

  • Google, Site Reliability Engineering (O’Reilly, 2016)

  • Sam Newman, Building Microservices

  • Redis Documentation — Rate Limiting Patterns

  • NGINX Documentation — HTTP Rate Limiting

  • Cloudflare Documentation — Rate Limiting Rules

  • Stripe Engineering Blog — API Rate Limiting

  • Uber Engineering Blog — Protecting APIs

  • Netflix Technology Blog — Resilience Patterns