Deep Dive

How We Hardened ApexMCP for Production: A Security Sprint Retrospective

May 14, 2026 · 10 min read · by ApexMCP

Before opening ApexMCP to broader access, we ran a self-audit of the full production stack. We found seven issues serious enough to block launch. This post documents each one — what it was, how it got there, and how we fixed it.

We're publishing this because we think the MCP ecosystem benefits from honest technical writing about what production security actually involves. Security posturing ("we take security seriously") is everywhere. Specific postmortems are rare. Here's ours.

The Audit

We triaged findings into P0 (launch blockers), P1 (first sprint post-launch), and P2 (this quarter). The seven issues below are P0 and P1 — the ones we fixed before writing this post. Each one was real, exploitable in some scenario, and would have been embarrassing to disclose after the fact rather than before.

Finding 1: Vault Running in Dev Mode in Production

P0 — Critical

HashiCorp Vault launched with `-dev` flag in the production compose file

Dev mode stores all secrets in memory. Every restart wipes the vault. No persistence, no Shamir unseal, no audit backend.

ApexMCP uses Vault to store connector credentials — database connection strings, API keys, OAuth tokens. These are the most sensitive data in the system. Running Vault in dev mode meant every server restart silently deleted all stored credentials. We had been re-entering them manually after each deploy without fully understanding why they were disappearing.

More critically: dev mode uses a root token printed to stdout at startup. Anyone with access to container logs had a root Vault token.

Fixed — shipped 2026-05-12

File backend + 1-of-1 Shamir unseal

Vault now runs with a persistent file backend. On first deploy, vault-init.sh generates the unseal key and root token. The unseal key is stored as a GitHub Actions secret (VAULT_UNSEAL_KEY). The deploy pipeline runs vault-unseal.sh automatically after each restart. Root token is rotated and stored offline.

The migration was not seamless — dev-mode Vault contents are in-memory and gone on restart, so there were no credentials to migrate. All connector credentials had to be re-entered after the switch. That's the correct behavior; we just hadn't been tracking the data loss.

Finding 2: Hardcoded Admin Password Committed to the Repo

P0 — Critical

Zitadel bootstrap config contained a hardcoded plaintext password with `passwordChangeRequired: false`

Anyone with read access to the repository had god-mode credentials on the identity provider.

The initial Zitadel configuration had a human admin account with a fixed password committed directly to the YAML file. The passwordChangeRequired: false flag meant it would never prompt for rotation. This account is the IAM_OWNER for the entire identity layer — it controls who can authenticate, what OAuth clients exist, and which users have access to the platform.

Fixed — shipped 2026-05-10, password rotated 2026-05-13

Placeholder in YAML, real value via environment variable

The committed YAML now contains DevOnlyChangeMe1! as a placeholder. The production value is injected via ZITADEL_FIRSTINSTANCE_ORG_HUMAN_PASSWORD environment variable from a GitHub Actions secret. passwordChangeRequired: true is enforced. The live admin account password was rotated via the Zitadel management API through an SSH tunnel — an environment variable change alone doesn't update an already-bootstrapped instance.

Finding 3: Gateway Signing Agent Tokens with a Default Dev Secret

P0 — Critical

`OAUTH_JWT_SECRET` was never wired through the production compose file

The gateway was silently falling back to dev-oauth-jwt-secret-change-in-prod to sign all agent JWTs in production. Anyone who knew the default value could forge valid agent tokens.

This one was invisible unless you were specifically looking for it. The gateway started without error, tokens were issued, agents authenticated. The only sign something was wrong was that the default secret is in the source code. We caught it by auditing every secret reference in the production compose file against what was actually injected at runtime.

Fixed — shipped 2026-05-10

Secret injected via compose env block and fail-closed on missing value

OAUTH_JWT_SECRET is now injected from a GitHub Actions secret in the deploy workflow, referenced in docker-compose.prod.yml, and the gateway will hard-fail at startup if the value is missing or matches the dev default.

Finding 4: No Rate Limiting at Any Layer

P1 — High

The gateway, nginx, and Zitadel had no rate limiting configured

Credential stuffing, agent loops, and brute-force auth attempts had an unlimited budget.

Rate limiting is one of those things that feels optional until a single misconfigured agent starts hammering your endpoint at 500 requests per second and takes down the service for everyone. We also had hand-rolled CORS middleware that was missing the Vary: Origin header — a CDN cache poisoning risk — and only applied to /api/*, leaving /mcp/* routes uncovered.

Fixed — shipped 2026-05-10

Three-layer defense: nginx → gateway → Zitadel

nginx now enforces auth_limit (10 req/s, burst 30) on auth endpoints before requests reach the application. The gateway has @fastify/rate-limit at 300 req/min per IP globally, with health check endpoints exempted so monitors don't get throttled. Zitadel has internal rate limits configured (200 req/min global, 30 on /oauth/v2/, 20 on /ui/login). Hand-rolled CORS was replaced with @fastify/cors + @fastify/helmet registered globally — including on /mcp/* routes.

Finding 5: Application Database Role Had Excessive Privileges

P1 — High

The app database user could INSERT, UPDATE, DELETE, and TRUNCATE the audit log table

A compromised application layer could silently delete its own audit trail.

An audit log is only meaningful for security purposes if the code being audited cannot modify it. Our application database role had full DML permissions on every table, including audit_logs. A SQL injection in the application, or a compromised service container, could have modified or erased audit records.

Fixed — shipped 2026-05-12

DML-only app role + SECURITY DEFINER function for audit inserts

The application now connects as apexmcp_app, a Postgres role with SELECT/INSERT/UPDATE/DELETE on application tables but no DDL privileges. INSERT, UPDATE, DELETE, and TRUNCATE on audit_logs are explicitly revoked from apexmcp_app. All audit writes go through insert_audit_log_entry(), a SECURITY DEFINER function owned by the migrator role. Application code cannot touch the audit table directly — only through this function, which enforces the schema.

Finding 6: SSRF via User-Supplied Connector URLs

P1 — High

The connector service fetched user-supplied URLs without validation

An attacker could supply an internal network URL — http://169.254.169.254/, http://vault:8200/, other Docker-internal addresses — and have the connector service proxy requests to them on their behalf.

Server-Side Request Forgery is a common finding in platforms that accept URLs from users. When you're building a connector platform whose job is literally to reach out to external services based on user configuration, SSRF is easy to overlook. The fix needs to be in the right place — blocking happens in the connector service, not at the edge, because the edge doesn't know the semantics of the request.

Fixed — shipped 2026-05-12

URL validation with DNS resolution check before any outbound request

All user-supplied URLs are validated before use: scheme must be https (with a per-connector allowlist for http in dev), hostname must resolve to a public IP address (RFC 1918, loopback, link-local, and Docker-internal ranges are rejected), and the resolved IP is checked again at connection time to prevent DNS rebinding attacks. Private IP CIDRs and Docker network addresses are on the denylist.

Finding 7: Static Bearer Token for Vault Authentication

P1 — High

Services authenticated to Vault using a long-lived static bearer token in an HTTP header

A leaked request log, a misconfigured proxy, or an overly verbose error message could expose a token that never expires.

The X-Vault-Secret header was a static shared secret configured at deploy time. It was valid until manually rotated. A token that never expires is a liability — it's only a matter of time before it shows up in a log file somewhere.

Fixed — shipped 2026-05-12

Short-lived HS256 JWT with 60-second TTL

The static bearer token was replaced with a JWT signed using VAULT_JWT_SECRET (a separate secret from the Vault unseal key). Each token is issued immediately before the request, expires 60 seconds later, and carries an issued-at claim. The signing uses node:crypto only — no additional JWT library dependency. A leaked token from a log file is useless within a minute of capture.

What Else We Shipped

Beyond the seven primary findings, the hardening sprint included:

Content Security Policy + Permissions-Policy headers — CSP with strict source allowlists on the web app; Permissions-Policy blocking camera, microphone, geolocation, and payment APIs we don't use.
HSTS bumped to 1 year — max-age=31536000; includeSubDomains; preload. nginx add_header inheritance bug fixed — headers must be repeated in every server block, not just the outer config.
Image tag pinning — production Docker images now use :${IMAGE_TAG} with a git SHA rather than :latest. Deployments are reproducible; a bad push to a registry can't silently change what's running.
Ephemeral CI tokens for container registry — replaced a long-lived personal access token (CR_PAT) with GitHub Actions' ephemeral GITHUB_TOKEN for pushing images. Token expires with the workflow run.
Dead Docker socket mount removed — the mcp-manager container had a /var/run/docker.sock mount from an earlier per-org-container architecture that was never actually used. A container with Docker socket access can escape to the host. Mount removed entirely.

What We Learned

The uncomfortable truth in this list is that none of these findings are exotic. Vault in dev mode, default secrets left in place, missing rate limiting — these are well-known pitfalls. We hit them anyway, because the focus during the build phase was on shipping features, and security review was treated as a pre-launch step rather than a continuous practice.

That's a common pattern. It's also a fixable one. Every finding above now has a corresponding check in our deploy pipeline or database migrations: the app will refuse to start with a dev-default secret, Vault is unsealed automatically and fails loudly if it can't reach the backend, and the CI workflow uses ephemeral tokens by default.

The goal isn't a clean audit report — it's a system that fails loudly and early when something is wrong, rather than silently accepting insecure defaults. We're not done, but we're significantly further along than we were a week ago.

Responsible disclosure: If you find a security issue in ApexMCP, please report it to [email protected]. We'll respond within 48 hours. Details on our disclosure process are at app-beta.apexmcp.ai/security.

Build on a platform that takes this seriously

ApexMCP is in private beta. MCP endpoints, connectors, and enterprise security — production-ready.

Join the waitlist

How We Hardened ApexMCP for Production: A Security Sprint Retrospective

The Audit

Finding 1: Vault Running in Dev Mode in Production

HashiCorp Vault launched with -dev flag in the production compose file

File backend + 1-of-1 Shamir unseal

Finding 2: Hardcoded Admin Password Committed to the Repo

Zitadel bootstrap config contained a hardcoded plaintext password with passwordChangeRequired: false

Placeholder in YAML, real value via environment variable

Finding 3: Gateway Signing Agent Tokens with a Default Dev Secret

OAUTH_JWT_SECRET was never wired through the production compose file

Secret injected via compose env block and fail-closed on missing value

Finding 4: No Rate Limiting at Any Layer

The gateway, nginx, and Zitadel had no rate limiting configured

Three-layer defense: nginx → gateway → Zitadel

Finding 5: Application Database Role Had Excessive Privileges

The app database user could INSERT, UPDATE, DELETE, and TRUNCATE the audit log table

DML-only app role + SECURITY DEFINER function for audit inserts

Finding 6: SSRF via User-Supplied Connector URLs

The connector service fetched user-supplied URLs without validation

URL validation with DNS resolution check before any outbound request

Finding 7: Static Bearer Token for Vault Authentication

Services authenticated to Vault using a long-lived static bearer token in an HTTP header

Short-lived HS256 JWT with 60-second TTL

What Else We Shipped

What We Learned

Build on a platform that takes this seriously

HashiCorp Vault launched with `-dev` flag in the production compose file

Zitadel bootstrap config contained a hardcoded plaintext password with `passwordChangeRequired: false`

`OAUTH_JWT_SECRET` was never wired through the production compose file