Full-Stack Security Audit for a DeFi Startup: From Growth-Mode Shortcuts to Production-Grade Security

Dark cinematic security operations war room with three monitors showing red CRITICAL alerts, Ethereum lock icon, and Kubernetes warning nodes

A DeFi startup preparing for their Series A brought us in to harden their stack. They'd grown fast — React frontend, Node.js backend, microservices on EKS, an Ethereum node on EC2, and a user base actively moving crypto assets. The engineering team was strong but had prioritized shipping features over security infrastructure. That's normal for the stage they were at.

Our job was to bridge the gap between "it works" and "it's production-grade." We ran the audit in parallel across every layer of the stack — code, infrastructure, cloud, and blockchain.

Here is every finding, and every fix.

The Audit: What the Tools Found

We ran seven tools in parallel across the codebase, the AWS account, and the Kubernetes cluster. The findings were typical for a startup at this stage — nothing surprising, but several items that needed immediate attention before the Series A security review.

Layer Tool Key Findings
Git history TruffleHog 23 credentials (API keys, AWS access keys, service tokens)
Application code Semgrep JWT alg:none, no rate limiting, no logout endpoint
Containers + IaC Trivy + Checkov 14 high-severity container CVEs, 22 IaC policy violations
AWS account ScoutSuite EC2 in public subnets, 0.0.0.0/0 security groups, no WAF
Kubernetes kube-bench 47 CIS benchmark failures
Ethereum Custom scripts Geth RPC public, all methods enabled including personal_*
Frontend bundle Manual + DevTools Alchemy API key + JWT signing secret hardcoded in JS

Starting security posture: room for improvement across every layer — which is exactly why the founders brought us in.

Phase 1: Credential Hygiene

TruffleHog against the full git history — not just the current branch — returned 23 matches. This is common: developers commit credentials during early prototyping and they persist in git history even after being removed from the working tree.

trufflehog git https://github.com/org/defi-platform \
  --only-verified \
  --json \
  | jq '.SourceMetadata.Data.Git | {commit, file, line}'

The breakdown:

  • 9 AWS access key / secret key pairs (several with broad permissions)
  • 6 Alchemy and Infura API keys
  • 4 blockchain-related service credentials
  • 3 internal JWT signing secrets
  • 1 Stripe secret key

Most of these were committed during the early prototyping phase and never cleaned up — a pattern we see in virtually every startup audit. The team had moved the credentials to environment variables months ago, but the old commits still contained the plaintext values.

We coordinated a full rotation of all 23 credentials before continuing the audit. The team was responsive and had everything rotated within a day.

The Fix: git-secrets Pre-Commit + TruffleHog in CI

# Install git-secrets
git secrets --install
git secrets --register-aws

# Add custom patterns for Ethereum private keys and JWT secrets
git secrets --add '0x[0-9a-fA-F]{64}'  # ETH private key pattern
git secrets --add 'HS256|HS384|HS512'    # JWT secret algorithm references

# Block commits with detected secrets
git secrets --add-provider -- git secrets --list

And in CI (GitHub Actions):

- name: TruffleHog Secret Scan
  uses: trufflesecurity/trufflehog@main
  with:
    path: ./
    base: $
    head: HEAD
    extra_args: --only-verified --fail

--only-verified means TruffleHog only fails the build if the credential is still active — no alert fatigue from rotated secrets in old commits. Mean time to detect a new leak: under 5 minutes.

We also migrated every secret to AWS Secrets Manager with External Secrets Operator syncing them into Kubernetes:

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: defi-platform-secrets
  namespace: production
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-manager
    kind: ClusterSecretStore
  target:
    name: defi-platform-secrets
    creationPolicy: Owner
  data:
    - secretKey: ALCHEMY_API_KEY
      remoteRef:
        key: defi-platform/production
        property: alchemy_api_key
    - secretKey: JWT_SIGNING_SECRET
      remoteRef:
        key: defi-platform/production
        property: jwt_signing_secret

Phase 2: The Frontend Secrets and the JWT Disaster

While the git scan ran, we opened Chrome DevTools on the production app and searched the bundle for known keywords. Two minutes later:

// Found in main.chunk.js (line 847, minified but not obfuscated)
const ALCHEMY_KEY = "wss://eth-mainnet.alchemyapi.io/v2/AbC123XyZ..."
const JWT_SECRET  = "s3cr3t-pr0d-k3y-d0-n0t-sh4re"

The Alchemy key exposed API quota and rate-limited on-chain data. The JWT signing secret was worse: it was the same key the backend used to verify sessions. Anyone who extracted it could forge a valid token for any user ID.

The JWT implementation made this even more dangerous — the backend never configured algorithm restrictions:

// The vulnerable code (before)
const decoded = jwt.verify(token, JWT_SECRET);
// No algorithm enforcement — accepts alg:none

There was no /logout endpoint, no exp claim. A token issued on account creation was valid indefinitely.

The Fix: RS256 Enforcement, Expiry, Revocation, Logout

We moved from HMAC to RS256. The private key signs; the public key verifies. The frontend never touches the private key:

// Enforce RS256 — reject any other algorithm at the library level
const decoded = jwt.verify(token, PUBLIC_KEY, {
  algorithms: ['RS256'],  // whitelist only — alg:none throws immediately
  issuer: 'defi-platform',
  audience: 'defi-platform-users',
});

Token lifetimes:

// Access token: 15 minutes
const accessToken = jwt.sign(payload, PRIVATE_KEY, {
  algorithm: 'RS256',
  expiresIn: '15m',
});

// Refresh token: 7 days, stored in HttpOnly cookie
const refreshToken = jwt.sign({ userId: payload.userId }, PRIVATE_KEY, {
  algorithm: 'RS256',
  expiresIn: '7d',
});

Logout endpoint with Redis-backed revocation:

app.post('/logout', authenticate, async (req, res) => {
  const { jti, exp } = req.user;  // jti = JWT ID, exp = expiry timestamp
  const ttl = exp - Math.floor(Date.now() / 1000);

  // Add token ID to blocklist until it naturally expires
  await redis.set(`revoked:${jti}`, '1', 'EX', ttl);
  res.clearCookie('refresh_token');
  res.json({ success: true });
});

// In authentication middleware
async function authenticate(req, res, next) {
  const token = extractToken(req);
  const decoded = jwt.verify(token, PUBLIC_KEY, { algorithms: ['RS256'] });

  const isRevoked = await redis.get(`revoked:${decoded.jti}`);
  if (isRevoked) return res.status(401).json({ error: 'Token revoked' });

  req.user = decoded;
  next();
}

Phase 3: No Rate Limiting, No WAF, Bots Welcome

The wallet creation endpoint accepted unlimited POST requests. We spun up a simple test loop:

for i in $(seq 1 1000); do
  curl -s -o /dev/null -w "%{http_code}\n" \
    -X POST https://api.defi-platform.com/v1/wallets \
    -H "Content-Type: application/json" \
    -d '{"email": "test'$i'@example.com"}'
done
# Result: 1000x 200 OK. Zero throttling. Zero blocking.

A malicious actor could create thousands of wallets programmatically, farm gas fee credits, or exhaust backend resources. Transaction endpoints had the same problem.

The Fix: AWS WAF + nginx Rate Limiting

We deployed AWS WAF in front of the ALB with bot management and custom rate limit rules:

{
  "Name": "RateLimitPerIP",
  "Priority": 1,
  "Statement": {
    "RateBasedStatement": {
      "Limit": 100,
      "AggregateKeyType": "IP",
      "ScopeDownStatement": {
        "ByteMatchStatement": {
          "SearchString": "/api/",
          "FieldToMatch": { "UriPath": {} },
          "PositionalConstraint": "STARTS_WITH"
        }
      }
    }
  },
  "Action": { "Block": {} }
}

And stricter limits at the application layer for auth and wallet endpoints:

# nginx rate limiting for sensitive endpoints
limit_req_zone $binary_remote_addr zone=auth:10m rate=10r/m;
limit_req_zone $binary_remote_addr zone=wallet:10m rate=5r/m;

location /api/v1/auth/ {
    limit_req zone=auth burst=3 nodelay;
    limit_req_status 429;
    proxy_pass http://backend;
}

location /api/v1/wallets {
    limit_req zone=wallet burst=2 nodelay;
    limit_req_status 429;
    proxy_pass http://backend;
}

Phase 4: Kubernetes Hardening (47 CIS Failures → 3)

kube-bench: 47 CIS failures. The most critical — the EKS API server endpoint was bound to 0.0.0.0/0. Anyone on the internet could reach the Kubernetes control plane.

# What we found
aws eks describe-cluster --name defi-cluster \
  --query 'cluster.resourcesVpcConfig'

# Output:
# "endpointPublicAccess": true,
# "publicAccessCidrs": ["0.0.0.0/0"],   <-- open to the world
# "endpointPrivateAccess": false

Closing the API Server

aws eks update-cluster-config \
  --name defi-cluster \
  --resources-vpc-config \
    endpointPublicAccess=false,\
    endpointPrivateAccess=true

CI/CD access now routes through AWS Client VPN. No external access to the API server at all.

NetworkPolicies: Default Deny

# Default deny-all for every namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes:
    - Ingress
    - Egress
---
# Explicit allow: backend can reach postgres
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-backend-to-postgres
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: postgres
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app: backend
      ports:
        - protocol: TCP
          port: 5432

PodSecurityStandards (Restricted Profile)

apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

We also deployed Falco for runtime monitoring:

# Falco rule: alert on unexpected shell spawns in production pods
- rule: Shell Spawned in Production Container
  desc: A shell was spawned inside a production container
  condition: >
    spawned_process and
    container and
    k8s.ns.name = "production" and
    proc.name in (shell_binaries)
  output: >
    Shell spawned in production container
    (user=%user.name pod=%k8s.pod.name ns=%k8s.ns.name cmd=%proc.cmdline)
  priority: CRITICAL
  tags: [container, shell, production]

After hardening: 3 remaining CIS failures — all documented acceptable exceptions tied to managed EKS add-ons that AWS controls directly.

Phase 5: VPC Redesign and EC2 Isolation

Every EC2 instance — including the Ethereum node — sat in a public subnet with a public IP. SSH was open from 0.0.0.0/0. No bastion, no VPN, no SSM.

We redesigned the VPC from scratch:

Before:
  Public subnet ── EC2 (Ethereum node, public IP, SSH open)
  Public subnet ── EC2 (app servers, public IP, SSH open)
  
After:
  Public subnet  ── ALB only
  Private subnet ── EKS worker nodes (NAT Gateway outbound)
  Private subnet ── EC2 (Ethereum node, no public IP)
  Private subnet ── RDS, ElastiCache

The Terraform change for the Ethereum node:

resource "aws_instance" "ethereum_node" {
  ami                         = data.aws_ami.ubuntu.id
  instance_type               = "m5.2xlarge"
  subnet_id                   = aws_subnet.private_a.id   # was public
  associate_public_ip_address = false                      # was true
  vpc_security_group_ids      = [aws_security_group.eth_node_private.id]

  iam_instance_profile = aws_iam_instance_profile.ssm_profile.name
}

# SSM replaces bastion + SSH
resource "aws_iam_instance_profile" "ssm_profile" {
  name = "eth-node-ssm-profile"
  role = aws_iam_role.ssm_role.name
}

SSH was disabled entirely. Access to all EC2 instances now goes through AWS Systems Manager Session Manager:

# Connect to the Ethereum node — no SSH key, no open port 22
aws ssm start-session --target i-0abc123def456

Phase 6: Locking Down the Geth RPC

The Geth RPC endpoint was reachable from the public internet on port 8545 with all methods enabled — including personal_unlockAccount. One call to unlock an account, one call to eth_sendRawTransaction, and the hot wallet was gone.

We confirmed it with a probe:

curl -X POST http://[REDACTED]:8545 \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","method":"rpc_modules","params":[],"id":1}'

# Response: {"result":{"admin":"1.0","debug":"1.0","eth":"1.0",
#            "miner":"1.0","net":"1.0","personal":"1.0","txpool":"1.0","web3":"1.0"}}
# personal, admin, miner — all accessible. No auth.

The Fix: nginx Reverse Proxy with Method Allowlist

We placed an nginx reverse proxy in front of Geth with strict method filtering. The node itself binds only to localhost:

# Geth startup — only localhost, no public binding
geth \
  --http \
  --http.addr "127.0.0.1" \
  --http.port 8545 \
  --http.api "eth,net,web3" \
  --ws=false \
  --authrpc.jwtsecret /etc/geth/jwt.hex

nginx proxies only the allowlisted methods, adds authentication, and rate-limits:

upstream geth {
    server 127.0.0.1:8545;
}

limit_req_zone $binary_remote_addr zone=rpc:10m rate=50r/m;

server {
    listen 8546 ssl;
    ssl_certificate     /etc/ssl/geth.crt;
    ssl_certificate_key /etc/ssl/geth.key;

    location /rpc {
        limit_req zone=rpc burst=10 nodelay;

        # Block dangerous methods
        if ($request_body ~* '"method"\s*:\s*"(personal_|admin_|miner_|debug_)') {
            return 403 '{"error":"method not allowed"}';
        }

        proxy_pass http://geth;
        proxy_set_header Authorization "Bearer $http_authorization";
    }
}

The frontend now uses Alchemy for all read operations. The Geth node is internal-only for transaction signing.

The Final Scorecard

Finding Before After
Exposed secrets in git 23 0
Secret leak detection time Never (manual) < 5 minutes (CI)
K8s CIS benchmark failures 47 3 (documented exceptions)
EC2 instances in public subnets All 0
API server public endpoint Yes (0.0.0.0/0) No (VPC-only)
JWT algorithm enforcement None (alg:none accepted) RS256 only
Token expiry None 15 min access / 7 day refresh
Logout endpoint No Yes (with Redis revocation)
Geth RPC public Yes (all methods) No (localhost + nginx proxy)
Rate limiting on APIs None WAF + nginx (100/min general, 10/min auth)
Security score 23/100 89/100

What We'd Do Differently

Run TruffleHog against git history before onboarding any new codebase. We did this on day one here, but we've seen audits where secret scanning only ran against the current HEAD. Historical secrets are the most dangerous because they've had the most time to be discovered by someone else. It should be the first command, always.

Start the VPC redesign from a diagram, not from the console. We redesigned this VPC in Terraform, but we sketched the target architecture on a whiteboard first and presented it to the team before writing a single line of HCL. Skipping the whiteboard step leads to half-finished redesigns where some resources are in private subnets and some are still public because nobody mapped the dependencies. The diagram also makes the security improvement visible to non-engineers — stakeholders understood the risk reduction when they could see the before/after.

For DeFi specifically: bring in a smart contract auditor for the on-chain layer. This audit covered the infrastructure and application layer thoroughly. But the Solidity contracts themselves were outside our scope. Infrastructure security and smart contract security are different disciplines. A DeFi platform should treat them as separate workstreams, not as one combined "security review." We flagged several interactions between the backend and the contracts that warranted a formal smart contract audit — that work was scoped separately and done by a specialist firm.


Running blockchain infrastructure or DeFi applications? The attack surface is larger than traditional web apps — private key management, RPC security, and on-chain interactions each add layers that standard security checklists miss. Talk to us and we'll map your exposure before someone else does.

Frequently Asked Questions

What tools did you use for the security audit?

Seven tools across four categories: SAST — Semgrep for application code (custom rules for Ethereum/Web3 patterns plus the standard rulesets); container and IaC scanning — Trivy for Docker images and Terraform, Checkov for IaC policy enforcement; secret scanning — TruffleHog across the full git history (not just the latest commit); cloud and infrastructure — ScoutSuite for the AWS account audit, kube-bench for the CIS Kubernetes benchmark; Ethereum-specific — custom Python scripts to probe the Geth RPC endpoint and enumerate available methods. No single tool finds everything. You need all four categories working together.

How dangerous is the JWT alg:none vulnerability in practice?

Catastrophic if the backend holds anything valuable. The attack is trivial: take a valid JWT, change the header's alg field to none, remove the signature, and send it. A vulnerable backend will accept it as legitimate. In this case, the app controlled DeFi wallets and transaction signing — an attacker who forged a token for any user ID could initiate transactions on their behalf. The fix is strict: enforce a specific algorithm (RS256), reject anything else at the library level, and never trust the algorithm field from the token itself.

Why is a public Geth RPC endpoint so dangerous?

A fully open Geth RPC endpoint is essentially an unauthenticated remote control for your Ethereum node. With personal_unlockAccount available, an attacker can unlock any account the node manages and then call eth_sendRawTransaction to move funds. Even without unlocking accounts, a public endpoint leaks the full transaction history, mempool data, and connected peer information. The correct posture is: no public RPC at all. The frontend uses a third-party provider (Alchemy/Infura) for read operations. Internal transaction signing goes through a dedicated signing service with hardware wallet integration, never through a directly exposed Geth instance.

How do you prevent secrets from getting into git in the first place?

Two layers: pre-commit hooks that run TruffleHog locally before any commit completes (gitleaks also works), and CI pipeline scanning that re-runs the same check on every push. Pre-commit hooks can be bypassed with --no-verify, which is why the CI gate matters — it's mandatory and blocks the PR. We use TruffleHog in verified mode for CI, which checks whether leaked credentials are still active, not just whether the string pattern matches. Active secret detected in CI = PR blocked, Slack alert fires, rotation starts within minutes.

What's the right architecture for Ethereum private key management?

Hot wallet keys should never touch application code, environment variables on EC2, or Kubernetes secrets without additional protection. The right architecture: (1) Use AWS KMS for key encryption at rest — the plaintext key never leaves KMS; (2) For signing operations, use a dedicated signing service (e.g., AWS Nitro Enclaves or a HashiCorp Vault plugin) that accepts transaction payloads and returns signatures without ever exposing the private key; (3) For high-value wallets, use a hardware wallet (Ledger/Trezor) connected to an air-gapped signing machine for cold storage. The frontend should never have signing capability — it constructs transactions, the user's own wallet (MetaMask) signs them.

How do you harden a K8s API server endpoint on EKS?

Three steps: (1) Set endpointPublicAccess: false and endpointPrivateAccess: true in the EKS cluster config — this makes the API server only reachable from within the VPC; (2) If you need external access for CI/CD, connect via VPN or AWS Client VPN, never whitelist a broad CIDR; (3) Enable EKS audit logging to CloudWatch and alert on anomalous API patterns (e.g., exec into production pods, get secrets calls from unexpected service accounts). kube-bench should be run after any change to verify no new CIS failures were introduced.