Production MCP Endpoint: Security, Rate Limiting, and Observability
Running an MCP endpoint in a demo is easy. Running one in production — where real agents make real calls against real data — is a different problem. This post covers what a production MCP endpoint actually requires: authentication, rate limiting, credential security, audit logs, key rotation, and the failure modes that will eventually bite you if you skip them.
Authentication: The First Line
Every MCP tool call starts with an HTTP request to your endpoint. That request needs to prove it's authorized before touching anything. The two common patterns:
API key authentication
The most common pattern. The agent includes a bearer token in the Authorization header:
Authorization: Bearer apx_live_xxxxxxxxxxxxxxxxxxxxxxxx
Your gateway validates the key, resolves which org and permissions it maps to, then routes the call. Keys should be scoped — a key issued to a Cursor integration shouldn't have the same permissions as your admin dashboard key.
OAuth2 client credentials
Better for machine-to-machine agent flows. The agent exchanges a client_id + client_secret for a short-lived JWT, then uses that JWT for tool calls. JWT expiry limits blast radius if a token is leaked.
Common mistake: issuing a single API key shared across all agents and integrations. When that key leaks (and it will), you have no way to revoke just the affected integration without breaking everything else. Issue one key per agent or integration from day one.
Credential Security: Where Most MCP Servers Fail
Your MCP endpoint calls downstream services — databases, APIs, SaaS tools. Those connectors have credentials: connection strings, API keys, OAuth tokens. Where those credentials live matters enormously.
The failure modes in order of severity:
- Credentials in environment variables — readable by anyone with server access, appear in crash dumps and logs
- Credentials in the database — encrypted at rest (maybe), but queryable if the DB is compromised
- Credentials in a secrets manager — AWS Secrets Manager, GCP Secret Manager — better, but rotation is manual
- Credentials in HashiCorp Vault — audit trail on every access, dynamic secrets, automatic rotation, fine-grained policies
ApexMCP uses HashiCorp Vault. Every connector credential is encrypted on write, never stored in the application database, and never logged. The Vault access audit log records every credential fetch, which lets you detect anomalous access patterns before they become incidents.
Rate Limiting: Protecting Against Agent Loops
AI agents can loop. A buggy prompt, an infinite retry loop, or a misconfigured autonomous agent can issue thousands of tool calls in seconds. Without rate limiting, that hits your backend directly.
Effective rate limiting at an MCP gateway requires:
- Per-API-key limits — each key has its own bucket, so one runaway agent doesn't affect others
- Sliding window algorithm — smoother than fixed windows, prevents burst exploitation at window boundaries
- Redis-backed counters — consistent limits across multiple gateway instances
- Tier-aware limits — free tier gets 5 rps, paid tiers get more, enterprise gets configurable
When a rate limit is hit, the gateway returns 429 Too Many Requests with a Retry-After header. Well-behaved MCP clients respect this and back off. For clients that don't, a short circuit breaker at the gateway prevents cascading load.
Rate limit design tip: Set limits per API key, not per IP. Agents running in cloud environments share IPs with thousands of other services. IP-based limits are unreliable and will block legitimate traffic.
Audit Logs: The Compliance Layer
For any production MCP endpoint handling real data, you need an audit trail. What was called, by whom, when, with what parameters, and what was returned.
A minimal audit log entry for an MCP tool call:
{
"timestamp": "2026-05-03T14:32:11.421Z",
"org_id": "org_01abc",
"api_key_id": "key_cursor_prod",
"tool_name": "query_users",
"input": { "filter": "role = 'admin'" },
"latency_ms": 87,
"status": "success",
"rows_returned": 12
}
Note: log the input parameters, not the output data. Logging raw query results creates a secondary data store with its own compliance surface. Log enough to reconstruct what happened without duplicating your database.
Audit logs should be:
- Append-only — no update or delete
- Hash-chained — each entry includes a hash of the previous, making tampering detectable
- Exportable — CSV and JSON export for SIEM ingestion or compliance audits
- Searchable — filter by org, key, tool, time range, status
API Key Rotation
Keys leak. An API key committed to a git repo, pasted in a Slack message, or included in a bug report needs to be revocable immediately without downtime.
Key rotation in production requires:
- Instant revocation — revoked keys stop working on the next request, not after a cache TTL expires
- Parallel validity window — when rotating, the new key works before the old one expires, so agents can be updated without a gap
- Rotation audit trail — log who rotated which key and when
- Separate keys per integration — you can rotate one key without touching others
Versioning and Rollback
Your MCP endpoint exposes tools derived from your data model. When that model changes — a table renamed, a column dropped, a new connector added — the tool surface changes. Agents that were working break.
Production MCP endpoints need versioned provisioning: every change to which connectors are enabled and which tools are exposed is recorded as a version. If a deployment breaks an integration, you roll back to the previous version in one click, with no downtime.
This also matters for debugging. When a tool call starts failing, you need to know whether the issue is in the current provisioning or something that was accidentally changed. Versioning gives you that audit trail.
Observability: What to Monitor
The minimum monitoring surface for a production MCP endpoint:
- Tool call latency — p50, p95, p99 per tool. Slow tools degrade the agent's reasoning loop.
- Error rate per tool — spikes indicate connector issues or schema drift
- Rate limit hit rate — sustained hits mean limits are too low or an agent is misbehaving
- Auth failure rate — spike = leaked key being probed or misconfigured agent
- Connector health — is the downstream database / API reachable? Tool calls will fail if not.
The Failure Mode Nobody Mentions
The most common production failure with MCP endpoints isn't security or rate limiting — it's schema drift. Your database schema changes, the tool schemas your endpoint advertises become stale, and agents start passing parameters that no longer exist or missing parameters that are now required. Calls fail silently or return confusing errors.
The fix: trigger schema re-discovery whenever your data model changes, not on a schedule. Make it part of your deployment process. Treat tool schema changes the same way you treat breaking API changes — with versioning and a rollout strategy.
Summary: Production MCP Endpoint Checklist
- Per-integration API keys, never shared
- Credentials in Vault, never in env vars or the app database
- Redis-backed per-key rate limiting with sliding window
- Append-only audit logs with hash chain
- Instant key revocation with parallel validity window for rotation
- Versioned provisioning with one-click rollback
- Schema re-discovery on deploy, not on schedule
- Latency, error rate, and connector health monitoring
All of this, already built
ApexMCP ships every item on this checklist. Join the waitlist for early access.
Join the waitlist