Deep Dive

Production MCP Endpoint: Security, Rate Limiting, and Observability

May 3, 2026 · 8 min read · by ApexMCP

Running an MCP endpoint in a demo is easy. Running one in production — where real agents make real calls against real data — is a different problem. This post covers what a production MCP endpoint actually requires: authentication, rate limiting, credential security, audit logs, key rotation, and the failure modes that will eventually bite you if you skip them.

Authentication: The First Line

Every MCP tool call starts with an HTTP request to your endpoint. That request needs to prove it's authorized before touching anything. The two common patterns:

API key authentication

The most common pattern. The agent includes a bearer token in the Authorization header:

Authorization: Bearer apx_live_xxxxxxxxxxxxxxxxxxxxxxxx

Your gateway validates the key, resolves which org and permissions it maps to, then routes the call. Keys should be scoped — a key issued to a Cursor integration shouldn't have the same permissions as your admin dashboard key.

OAuth2 client credentials

Better for machine-to-machine agent flows. The agent exchanges a client_id + client_secret for a short-lived JWT, then uses that JWT for tool calls. JWT expiry limits blast radius if a token is leaked.

Common mistake: issuing a single API key shared across all agents and integrations. When that key leaks (and it will), you have no way to revoke just the affected integration without breaking everything else. Issue one key per agent or integration from day one.

Credential Security: Where Most MCP Servers Fail

Your MCP endpoint calls downstream services — databases, APIs, SaaS tools. Those connectors have credentials: connection strings, API keys, OAuth tokens. Where those credentials live matters enormously.

The failure modes in order of severity:

ApexMCP uses HashiCorp Vault. Every connector credential is encrypted on write, never stored in the application database, and never logged. The Vault access audit log records every credential fetch, which lets you detect anomalous access patterns before they become incidents.

Rate Limiting: Protecting Against Agent Loops

AI agents can loop. A buggy prompt, an infinite retry loop, or a misconfigured autonomous agent can issue thousands of tool calls in seconds. Without rate limiting, that hits your backend directly.

Effective rate limiting at an MCP gateway requires:

When a rate limit is hit, the gateway returns 429 Too Many Requests with a Retry-After header. Well-behaved MCP clients respect this and back off. For clients that don't, a short circuit breaker at the gateway prevents cascading load.

Rate limit design tip: Set limits per API key, not per IP. Agents running in cloud environments share IPs with thousands of other services. IP-based limits are unreliable and will block legitimate traffic.

Audit Logs: The Compliance Layer

For any production MCP endpoint handling real data, you need an audit trail. What was called, by whom, when, with what parameters, and what was returned.

A minimal audit log entry for an MCP tool call:

{
  "timestamp": "2026-05-03T14:32:11.421Z",
  "org_id": "org_01abc",
  "api_key_id": "key_cursor_prod",
  "tool_name": "query_users",
  "input": { "filter": "role = 'admin'" },
  "latency_ms": 87,
  "status": "success",
  "rows_returned": 12
}

Note: log the input parameters, not the output data. Logging raw query results creates a secondary data store with its own compliance surface. Log enough to reconstruct what happened without duplicating your database.

Audit logs should be:

API Key Rotation

Keys leak. An API key committed to a git repo, pasted in a Slack message, or included in a bug report needs to be revocable immediately without downtime.

Key rotation in production requires:

  1. Instant revocation — revoked keys stop working on the next request, not after a cache TTL expires
  2. Parallel validity window — when rotating, the new key works before the old one expires, so agents can be updated without a gap
  3. Rotation audit trail — log who rotated which key and when
  4. Separate keys per integration — you can rotate one key without touching others

Versioning and Rollback

Your MCP endpoint exposes tools derived from your data model. When that model changes — a table renamed, a column dropped, a new connector added — the tool surface changes. Agents that were working break.

Production MCP endpoints need versioned provisioning: every change to which connectors are enabled and which tools are exposed is recorded as a version. If a deployment breaks an integration, you roll back to the previous version in one click, with no downtime.

This also matters for debugging. When a tool call starts failing, you need to know whether the issue is in the current provisioning or something that was accidentally changed. Versioning gives you that audit trail.

Observability: What to Monitor

The minimum monitoring surface for a production MCP endpoint:

The Failure Mode Nobody Mentions

The most common production failure with MCP endpoints isn't security or rate limiting — it's schema drift. Your database schema changes, the tool schemas your endpoint advertises become stale, and agents start passing parameters that no longer exist or missing parameters that are now required. Calls fail silently or return confusing errors.

The fix: trigger schema re-discovery whenever your data model changes, not on a schedule. Make it part of your deployment process. Treat tool schema changes the same way you treat breaking API changes — with versioning and a rollout strategy.

Summary: Production MCP Endpoint Checklist

All of this, already built

ApexMCP ships every item on this checklist. Join the waitlist for early access.

Join the waitlist