Deep Dive

Production MCP Endpoint: Security, Rate Limiting, and Observability

May 3, 2026 · 8 min read · by ApexMCP

Running an MCP endpoint in a demo is easy. Running one in production — where real agents make real calls against real data — is a different problem. This post covers what a production MCP endpoint actually requires: authentication, rate limiting, credential security, audit logs, key rotation, and the failure modes that will eventually bite you if you skip them.

Authentication: The First Line

Every MCP tool call starts with an HTTP request to your endpoint. That request needs to prove it's authorized before touching anything. The two common patterns:

API key authentication

The most common pattern. The agent includes a bearer token in the Authorization header:

Authorization: Bearer apx_live_xxxxxxxxxxxxxxxxxxxxxxxx

Your gateway validates the key, resolves which org and permissions it maps to, then routes the call. Keys should be scoped — a key issued to a Cursor integration shouldn't have the same permissions as your admin dashboard key.

OAuth2 client credentials

Better for machine-to-machine agent flows. The agent exchanges a client_id + client_secret for a short-lived JWT, then uses that JWT for tool calls. JWT expiry limits blast radius if a token is leaked.

Common mistake: issuing a single API key shared across all agents and integrations. When that key leaks (and it will), you have no way to revoke just the affected integration without breaking everything else. Issue one key per agent or integration from day one.

Credential Security: Where Most MCP Servers Fail

Your MCP endpoint calls downstream services — databases, APIs, SaaS tools. Those connectors have credentials: connection strings, API keys, OAuth tokens. Where those credentials live matters enormously.

The failure modes in order of severity:

Credentials in environment variables — readable by anyone with server access, appear in crash dumps and logs
Credentials in the database — encrypted at rest (maybe), but queryable if the DB is compromised
Credentials in a secrets manager — AWS Secrets Manager, GCP Secret Manager — better, but rotation is manual
Credentials in HashiCorp Vault — audit trail on every access, dynamic secrets, automatic rotation, fine-grained policies

ApexMCP uses HashiCorp Vault. Every connector credential is encrypted on write, never stored in the application database, and never logged. The Vault access audit log records every credential fetch, which lets you detect anomalous access patterns before they become incidents.

Rate Limiting: Protecting Against Agent Loops

AI agents can loop. A buggy prompt, an infinite retry loop, or a misconfigured autonomous agent can issue thousands of tool calls in seconds. Without rate limiting, that hits your backend directly.

Effective rate limiting at an MCP gateway requires:

Per-API-key limits — each key has its own bucket, so one runaway agent doesn't affect others
Sliding window algorithm — smoother than fixed windows, prevents burst exploitation at window boundaries
Redis-backed counters — consistent limits across multiple gateway instances
Tier-aware limits — free tier gets 5 rps, paid tiers get more, enterprise gets configurable

When a rate limit is hit, the gateway returns 429 Too Many Requests with a Retry-After header. Well-behaved MCP clients respect this and back off. For clients that don't, a short circuit breaker at the gateway prevents cascading load.

Rate limit design tip: Set limits per API key, not per IP. Agents running in cloud environments share IPs with thousands of other services. IP-based limits are unreliable and will block legitimate traffic.

Audit Logs: The Compliance Layer

For any production MCP endpoint handling real data, you need an audit trail. What was called, by whom, when, with what parameters, and what was returned.

A minimal audit log entry for an MCP tool call:

{
  "timestamp": "2026-05-03T14:32:11.421Z",
  "org_id": "org_01abc",
  "api_key_id": "key_cursor_prod",
  "tool_name": "query_users",
  "input": { "filter": "role = 'admin'" },
  "latency_ms": 87,
  "status": "success",
  "rows_returned": 12
}

Note: log the input parameters, not the output data. Logging raw query results creates a secondary data store with its own compliance surface. Log enough to reconstruct what happened without duplicating your database.

Audit logs should be:

Append-only — no update or delete
Hash-chained — each entry includes a hash of the previous, making tampering detectable
Exportable — CSV and JSON export for SIEM ingestion or compliance audits
Searchable — filter by org, key, tool, time range, status

API Key Rotation

Keys leak. An API key committed to a git repo, pasted in a Slack message, or included in a bug report needs to be revocable immediately without downtime.

Key rotation in production requires:

Instant revocation — revoked keys stop working on the next request, not after a cache TTL expires
Parallel validity window — when rotating, the new key works before the old one expires, so agents can be updated without a gap
Rotation audit trail — log who rotated which key and when
Separate keys per integration — you can rotate one key without touching others

Versioning and Rollback

Your MCP endpoint exposes tools derived from your data model. When that model changes — a table renamed, a column dropped, a new connector added — the tool surface changes. Agents that were working break.

Production MCP endpoints need versioned provisioning: every change to which connectors are enabled and which tools are exposed is recorded as a version. If a deployment breaks an integration, you roll back to the previous version in one click, with no downtime.

This also matters for debugging. When a tool call starts failing, you need to know whether the issue is in the current provisioning or something that was accidentally changed. Versioning gives you that audit trail.

Observability: What to Monitor

The minimum monitoring surface for a production MCP endpoint:

Tool call latency — p50, p95, p99 per tool. Slow tools degrade the agent's reasoning loop.
Error rate per tool — spikes indicate connector issues or schema drift
Rate limit hit rate — sustained hits mean limits are too low or an agent is misbehaving
Auth failure rate — spike = leaked key being probed or misconfigured agent
Connector health — is the downstream database / API reachable? Tool calls will fail if not.

The Failure Mode Nobody Mentions

The most common production failure with MCP endpoints isn't security or rate limiting — it's schema drift. Your database schema changes, the tool schemas your endpoint advertises become stale, and agents start passing parameters that no longer exist or missing parameters that are now required. Calls fail silently or return confusing errors.

The fix: trigger schema re-discovery whenever your data model changes, not on a schedule. Make it part of your deployment process. Treat tool schema changes the same way you treat breaking API changes — with versioning and a rollout strategy.

Summary: Production MCP Endpoint Checklist

Per-integration API keys, never shared
Credentials in Vault, never in env vars or the app database
Redis-backed per-key rate limiting with sliding window
Append-only audit logs with hash chain
Instant key revocation with parallel validity window for rotation
Versioned provisioning with one-click rollback
Schema re-discovery on deploy, not on schedule
Latency, error rate, and connector health monitoring

All of this, already built

ApexMCP ships every item on this checklist. Join the waitlist for early access.

Join the waitlist