How We Hardened ApexMCP for Production: A Security Sprint Retrospective
Before opening ApexMCP to broader access, we ran a self-audit of the full production stack. We found seven issues serious enough to block launch. This post documents each one — what it was, how it got there, and how we fixed it.
We're publishing this because we think the MCP ecosystem benefits from honest technical writing about what production security actually involves. Security posturing ("we take security seriously") is everywhere. Specific postmortems are rare. Here's ours.
The Audit
We triaged findings into P0 (launch blockers), P1 (first sprint post-launch), and P2 (this quarter). The seven issues below are P0 and P1 — the ones we fixed before writing this post. Each one was real, exploitable in some scenario, and would have been embarrassing to disclose after the fact rather than before.
Finding 1: Vault Running in Dev Mode in Production
HashiCorp Vault launched with -dev flag in the production compose file
Dev mode stores all secrets in memory. Every restart wipes the vault. No persistence, no Shamir unseal, no audit backend.
ApexMCP uses Vault to store connector credentials — database connection strings, API keys, OAuth tokens. These are the most sensitive data in the system. Running Vault in dev mode meant every server restart silently deleted all stored credentials. We had been re-entering them manually after each deploy without fully understanding why they were disappearing.
More critically: dev mode uses a root token printed to stdout at startup. Anyone with access to container logs had a root Vault token.
File backend + 1-of-1 Shamir unseal
Vault now runs with a persistent file backend. On first deploy, vault-init.sh generates the unseal key and root token. The unseal key is stored as a GitHub Actions secret (VAULT_UNSEAL_KEY). The deploy pipeline runs vault-unseal.sh automatically after each restart. Root token is rotated and stored offline.
The migration was not seamless — dev-mode Vault contents are in-memory and gone on restart, so there were no credentials to migrate. All connector credentials had to be re-entered after the switch. That's the correct behavior; we just hadn't been tracking the data loss.
Finding 2: Hardcoded Admin Password Committed to the Repo
Zitadel bootstrap config contained a hardcoded plaintext password with passwordChangeRequired: false
Anyone with read access to the repository had god-mode credentials on the identity provider.
The initial Zitadel configuration had a human admin account with a fixed password committed directly to the YAML file. The passwordChangeRequired: false flag meant it would never prompt for rotation. This account is the IAM_OWNER for the entire identity layer — it controls who can authenticate, what OAuth clients exist, and which users have access to the platform.
Placeholder in YAML, real value via environment variable
The committed YAML now contains DevOnlyChangeMe1! as a placeholder. The production value is injected via ZITADEL_FIRSTINSTANCE_ORG_HUMAN_PASSWORD environment variable from a GitHub Actions secret. passwordChangeRequired: true is enforced. The live admin account password was rotated via the Zitadel management API through an SSH tunnel — an environment variable change alone doesn't update an already-bootstrapped instance.
Finding 3: Gateway Signing Agent Tokens with a Default Dev Secret
OAUTH_JWT_SECRET was never wired through the production compose file
The gateway was silently falling back to dev-oauth-jwt-secret-change-in-prod to sign all agent JWTs in production. Anyone who knew the default value could forge valid agent tokens.
This one was invisible unless you were specifically looking for it. The gateway started without error, tokens were issued, agents authenticated. The only sign something was wrong was that the default secret is in the source code. We caught it by auditing every secret reference in the production compose file against what was actually injected at runtime.
Secret injected via compose env block and fail-closed on missing value
OAUTH_JWT_SECRET is now injected from a GitHub Actions secret in the deploy workflow, referenced in docker-compose.prod.yml, and the gateway will hard-fail at startup if the value is missing or matches the dev default.
Finding 4: No Rate Limiting at Any Layer
The gateway, nginx, and Zitadel had no rate limiting configured
Credential stuffing, agent loops, and brute-force auth attempts had an unlimited budget.
Rate limiting is one of those things that feels optional until a single misconfigured agent starts hammering your endpoint at 500 requests per second and takes down the service for everyone. We also had hand-rolled CORS middleware that was missing the Vary: Origin header — a CDN cache poisoning risk — and only applied to /api/*, leaving /mcp/* routes uncovered.
Three-layer defense: nginx → gateway → Zitadel
nginx now enforces auth_limit (10 req/s, burst 30) on auth endpoints before requests reach the application. The gateway has @fastify/rate-limit at 300 req/min per IP globally, with health check endpoints exempted so monitors don't get throttled. Zitadel has internal rate limits configured (200 req/min global, 30 on /oauth/v2/, 20 on /ui/login). Hand-rolled CORS was replaced with @fastify/cors + @fastify/helmet registered globally — including on /mcp/* routes.
Finding 5: Application Database Role Had Excessive Privileges
The app database user could INSERT, UPDATE, DELETE, and TRUNCATE the audit log table
A compromised application layer could silently delete its own audit trail.
An audit log is only meaningful for security purposes if the code being audited cannot modify it. Our application database role had full DML permissions on every table, including audit_logs. A SQL injection in the application, or a compromised service container, could have modified or erased audit records.
DML-only app role + SECURITY DEFINER function for audit inserts
The application now connects as apexmcp_app, a Postgres role with SELECT/INSERT/UPDATE/DELETE on application tables but no DDL privileges. INSERT, UPDATE, DELETE, and TRUNCATE on audit_logs are explicitly revoked from apexmcp_app. All audit writes go through insert_audit_log_entry(), a SECURITY DEFINER function owned by the migrator role. Application code cannot touch the audit table directly — only through this function, which enforces the schema.
Finding 6: SSRF via User-Supplied Connector URLs
The connector service fetched user-supplied URLs without validation
An attacker could supply an internal network URL — http://169.254.169.254/, http://vault:8200/, other Docker-internal addresses — and have the connector service proxy requests to them on their behalf.
Server-Side Request Forgery is a common finding in platforms that accept URLs from users. When you're building a connector platform whose job is literally to reach out to external services based on user configuration, SSRF is easy to overlook. The fix needs to be in the right place — blocking happens in the connector service, not at the edge, because the edge doesn't know the semantics of the request.
URL validation with DNS resolution check before any outbound request
All user-supplied URLs are validated before use: scheme must be https (with a per-connector allowlist for http in dev), hostname must resolve to a public IP address (RFC 1918, loopback, link-local, and Docker-internal ranges are rejected), and the resolved IP is checked again at connection time to prevent DNS rebinding attacks. Private IP CIDRs and Docker network addresses are on the denylist.
Finding 7: Static Bearer Token for Vault Authentication
Services authenticated to Vault using a long-lived static bearer token in an HTTP header
A leaked request log, a misconfigured proxy, or an overly verbose error message could expose a token that never expires.
The X-Vault-Secret header was a static shared secret configured at deploy time. It was valid until manually rotated. A token that never expires is a liability — it's only a matter of time before it shows up in a log file somewhere.
Short-lived HS256 JWT with 60-second TTL
The static bearer token was replaced with a JWT signed using VAULT_JWT_SECRET (a separate secret from the Vault unseal key). Each token is issued immediately before the request, expires 60 seconds later, and carries an issued-at claim. The signing uses node:crypto only — no additional JWT library dependency. A leaked token from a log file is useless within a minute of capture.
What Else We Shipped
Beyond the seven primary findings, the hardening sprint included:
- Content Security Policy + Permissions-Policy headers — CSP with strict source allowlists on the web app; Permissions-Policy blocking camera, microphone, geolocation, and payment APIs we don't use.
- HSTS bumped to 1 year —
max-age=31536000; includeSubDomains; preload. nginxadd_headerinheritance bug fixed — headers must be repeated in everyserverblock, not just the outer config. - Image tag pinning — production Docker images now use
:${IMAGE_TAG}with a git SHA rather than:latest. Deployments are reproducible; a bad push to a registry can't silently change what's running. - Ephemeral CI tokens for container registry — replaced a long-lived personal access token (
CR_PAT) with GitHub Actions' ephemeralGITHUB_TOKENfor pushing images. Token expires with the workflow run. - Dead Docker socket mount removed — the mcp-manager container had a
/var/run/docker.sockmount from an earlier per-org-container architecture that was never actually used. A container with Docker socket access can escape to the host. Mount removed entirely.
What We Learned
The uncomfortable truth in this list is that none of these findings are exotic. Vault in dev mode, default secrets left in place, missing rate limiting — these are well-known pitfalls. We hit them anyway, because the focus during the build phase was on shipping features, and security review was treated as a pre-launch step rather than a continuous practice.
That's a common pattern. It's also a fixable one. Every finding above now has a corresponding check in our deploy pipeline or database migrations: the app will refuse to start with a dev-default secret, Vault is unsealed automatically and fails loudly if it can't reach the backend, and the CI workflow uses ephemeral tokens by default.
The goal isn't a clean audit report — it's a system that fails loudly and early when something is wrong, rather than silently accepting insecure defaults. We're not done, but we're significantly further along than we were a week ago.
Responsible disclosure: If you find a security issue in ApexMCP, please report it to [email protected]. We'll respond within 48 hours. Details on our disclosure process are at app-beta.apexmcp.ai/security.
Build on a platform that takes this seriously
ApexMCP is in private beta. MCP endpoints, connectors, and enterprise security — production-ready.
Join the waitlist