Designing Rate Limiting APIs in Go: Token Bucket, Sliding Window, and

Rate limiting protects your backend from abuse, unexpected spikes, and overload.
This article covers:

What rate limiting is
Why it matters
Different algorithms
How to architect them
Go implementations for Token Bucket, Sliding Window, and Leaky Bucket

1. What is Rate Limiting?

Rate limiting restricts how many requests a client can make within a given time window.

Example:

100 requests per minute per user

If exceeded:

HTTP 429 — Too Many Requests

2. Why Rate Limiting Matters

Prevents abuse and brute-force attacks
Protects databases and infrastructure
Keeps latency predictable
Ensures fair usage

3. Rate Limiting Algorithms

Algorithm	Burst Allowed	Accuracy	Complexity
Fixed Window	No	Medium	Low
Sliding Window	Limited	High	Medium
Token Bucket	Yes	High	Medium
Leaky Bucket	No	Medium	Medium

4. Architecture

High-Level

Client → Load Balancer → API Gateway → Go Service
                                  ↓
                                Redis

Distributed

Client
  ↓
Load Balancer
  ↓
Go API Instances
  ↓
Redis (shared state)

5. Token Bucket Implementation (Go + Redis)

Concept

Tokens refill at a constant rate. Each request consumes one token.

Redis Structure

Hash per user:

rate:{user}
  tokens
  last_refill

Go Code

type TokenBucket struct {
    Capacity   int
    RefillRate float64
}

func (tb *TokenBucket) Allow(ctx context.Context, key string) (bool, int, error) {
    now := time.Now().Unix()

    pipe := rdb.TxPipeline()
    tokensCmd := pipe.HGet(ctx, key, "tokens")
    lastCmd := pipe.HGet(ctx, key, "last")
    _, err := pipe.Exec(ctx)

    tokens, _ := tokensCmd.Int()
    last, _ := lastCmd.Int64()

    if last == 0 {
        tokens = tb.Capacity
        last = now
    }

    elapsed := float64(now - last)
    refill := int(elapsed * tb.RefillRate)
    tokens = min(tb.Capacity, tokens+refill)

    if tokens <= 0 {
        return false, tokens, nil
    }

    tokens--

    pipe = rdb.TxPipeline()
    pipe.HSet(ctx, key, "tokens", tokens)
    pipe.HSet(ctx, key, "last", now)
    pipe.Expire(ctx, key, time.Hour)
    _, err = pipe.Exec(ctx)

    return true, tokens, err
}

6. Sliding Window Implementation

Concept

Count requests in the last N seconds using timestamps.

Redis Structure

Sorted set:

rate:{user}
  score = timestamp
  value = timestamp

Go Code

func AllowSlidingWindow(ctx context.Context, key string, limit int, window time.Duration) (bool, int, error) {
    now := time.Now().UnixNano()

    pipe := rdb.TxPipeline()
    pipe.ZAdd(ctx, key, redis.Z{Score: float64(now), Member: now})
    pipe.ZRemRangeByScore(ctx, key, "0", fmt.Sprintf("%d", now-int64(window)))
    countCmd := pipe.ZCard(ctx, key)
    pipe.Expire(ctx, key, window)

    _, err := pipe.Exec(ctx)
    if err != nil {
        return false, 0, err
    }

    count := int(countCmd.Val())
    if count > limit {
        return false, count, nil
    }

    return true, count, nil
}

7. Leaky Bucket Implementation

Concept

Requests enter a queue and are processed at a fixed rate. Excess requests are dropped or delayed.

In-memory Leaky Bucket (single instance)

type LeakyBucket struct {
    Capacity int
    Rate     time.Duration
    Queue    chan struct{}
}

func NewLeakyBucket(capacity int, rate time.Duration) *LeakyBucket {
    lb := &LeakyBucket{
        Capacity: capacity,
        Rate:     rate,
        Queue:    make(chan struct{}, capacity),
    }

    go func() {
        ticker := time.NewTicker(rate)
        for range ticker.C {
            select {
            case <-lb.Queue:
            default:
            }
        }
    }()

    return lb
}

func (lb *LeakyBucket) Allow() bool {
    select {
    case lb.Queue <- struct{}{}:
        return true
    default:
        return false
    }
}

8. Middleware Example

func RateLimitMiddleware(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        user := r.RemoteAddr

        allowed, _, err := AllowSlidingWindow(r.Context(), "rate:"+user, 100, time.Minute)
        if err != nil {
            http.Error(w, "Internal error", 500)
            return
        }

        if !allowed {
            w.WriteHeader(429)
            w.Write([]byte("Too many requests"))
            return
        }

        next.ServeHTTP(w, r)
    })
}

9. Design Considerations

Use Redis for distributed systems
Use API Gateway for coarse-grained limits
Apply different limits for different user plans
Always add expiration to avoid memory leaks

10. Common Pitfalls

In-memory counters in multi-instance systems
Clock drift between servers
Overly strict limits
Not excluding internal services

11. Rate Limiting with NGINX

NGINX provides built-in rate limiting using a shared memory zone.

Example: Limit by IP

http {
    limit_req_zone $binary_remote_addr zone=api_limit:10m rate=100r/m;

    server {
        location /api/ {
            limit_req zone=api_limit burst=20 nodelay;
            proxy_pass http://backend;
        }
    }
}

Explanation

limit_req_zone creates a shared memory zone.
rate=100r/m means 100 requests per minute.
burst=20 allows short spikes.
nodelay processes bursts immediately.

Example: Limit by API Key

map $http_x_api_key $api_key {
    default $http_x_api_key;
}

limit_req_zone $api_key zone=key_limit:10m rate=60r/m;

server {
    location /api/ {
        limit_req zone=key_limit burst=10;
        proxy_pass http://backend;
    }
}

12. Rate Limiting with Cloudflare

Cloudflare provides rate limiting at the edge.

Example: API Abuse Protection

Rule logic:

(http.request.uri.path contains "/api/")

Settings:

Threshold: 100 requests
Period: 1 minute
Action: Block or Challenge

Rule:

(http.request.uri.path eq "/login")

Limit:

10 requests per minute per IP

Action:

Managed challenge

Cloudflare JSON Rule Example

{
  "expression": "(http.request.uri.path contains \"/api/\")",
  "threshold": 100,
  "period": 60,
  "action": "block"
}

13. Where to Use What

Layer	Tool	Use Case
Edge	Cloudflare	Bot & DDoS protection
Gateway	NGINX	Global and API-level limits
Backend	Redis + Go	User-level business logic

14. Final Architecture

Client
  ↓
Cloudflare (Edge rate limit)
  ↓
NGINX (Gateway rate limit)
  ↓
Go API (Business rate limit)
  ↓
Redis

This layered approach gives you:

Early blocking of bad traffic
Protection from abuse
Fine-grained control

15. Conclusion

Rate limiting is essential for protecting your backend and ensuring fair usage.
Token Bucket is good for bursts, Sliding Window is fair and accurate, Leaky Bucket is useful for smoothing traffic.

Use Cloudflare for coarse protection.
Use NGINX for infrastructure-level limits.
Use Go + Redis for business-level control.

This multi-layered rate limiting approach is reliable, scalable, and production-ready.

Choose based on your system’s needs and traffic patterns.

Credits and Sources

This article is based on my learning and understanding from the following resources:

Martin Kleppmann, Designing Data-Intensive Applications (O’Reilly, 2017)
Google, Site Reliability Engineering (O’Reilly, 2016)
Sam Newman, Building Microservices
Redis Documentation — Rate Limiting Patterns
NGINX Documentation — HTTP Rate Limiting
Cloudflare Documentation — Rate Limiting Rules
Stripe Engineering Blog — API Rate Limiting
Uber Engineering Blog — Protecting APIs
Netflix Technology Blog — Resilience Patterns

Designing Rate Limiting APIs: A Practical Guide with Go Examples

1. What is Rate Limiting?

2. Why Rate Limiting Matters

3. Rate Limiting Algorithms

4. Architecture

High-Level

Distributed

5. Token Bucket Implementation (Go + Redis)

Concept

Redis Structure

Go Code

6. Sliding Window Implementation

Concept

Redis Structure

Go Code

7. Leaky Bucket Implementation

Concept

In-memory Leaky Bucket (single instance)

8. Middleware Example

9. Design Considerations

10. Common Pitfalls

11. Rate Limiting with NGINX

Example: Limit by IP

Explanation

Example: Limit by API Key

12. Rate Limiting with Cloudflare

Example: API Abuse Protection

Cloudflare JSON Rule Example

13. Where to Use What

14. Final Architecture

15. Conclusion

Credits and Sources

Comments

More from this blog

Understanding the Transformer Model in Simple Words

Understanding Databases: SQL vs NoSQL Explained Simply

How to Structure a Go Project: From Small Applications to Production-Grade Systems

Optimistic vs Pessimistic Locking: A Deep Dive into Safe Concurrent Updates in Backend Systems

Concurrency vs Parallelism — A Deep Technical Guide

Command Palette

1. What is Rate Limiting?

2. Why Rate Limiting Matters

3. Rate Limiting Algorithms

4. Architecture

High-Level

Distributed

5. Token Bucket Implementation (Go + Redis)

Concept

Redis Structure

Go Code

6. Sliding Window Implementation

Concept

Redis Structure

Go Code

7. Leaky Bucket Implementation

Concept

In-memory Leaky Bucket (single instance)

8. Middleware Example

9. Design Considerations

10. Common Pitfalls

11. Rate Limiting with NGINX

Example: Limit by IP

Explanation

Example: Limit by API Key

12. Rate Limiting with Cloudflare

Example: API Abuse Protection

Example: Login Endpoint Protection

Cloudflare JSON Rule Example

13. Where to Use What

14. Final Architecture

15. Conclusion

Credits and Sources

Comments

More from this blog