New write-up: migrating a production LLM backend off AWS

AI/ML Infrastructure Engineer

Patrick McBride

I build production-grade LLM systems that hold up under real traffic. AI/ML infrastructure engineer at Bill.com, and co-founder & Head of AI at ApplyPass.

800K+

API requests / month

3B+

Tokens processed / month

Years building ML/AI

Read the Playbook Download Resume

About

I'm an AI/ML infrastructure engineer with an unusual path — from physics and materials science into deep learning and the production LLM systems I build today.

At Bill.com, I design and operate agentic LLM and ML inference pipelines — async FastAPI services on AWS ECS Fargate, built for high-throughput AI workloads with token budgeting, retries, and structured observability baked in.

As co-founder and Head of AI at ApplyPass, I built the entire AI backend from scratch — 800K requests and 3B tokens a month, a function-calling job classifier, and a two-tower recommendation system that lifted match accuracy from 70% to 90%+. Most recently I led its full migration from AWS to DigitalOcean and built an MCP server and agent-tooling suite on top of it.

I care about systems that work in production, not just in notebooks — structured outputs over hopeful parsing, evaluation loops over vibes, and cost-aware architecture over brute force.

LLMs & AI

OpenAI APIAmazon BedrockRAGGraphRAGFunction CallingStructured OutputsFine-tuningEvals

ML Frameworks

PyTorchTensorFlowScikit-learnHugging Face

Infrastructure

FastAPIAWS ECS FargateDigitalOceanDockerTerraformMCP

Data

PostgreSQLpgvectorMongoDBRedisNeo4j

Experience

Bill.com

Senior Software Engineer, Backend AI InfrastructureNow

Aug 2025 - Present • Remote, CA

▸Design and operate agentic LLM and ML inference pipelines as async FastAPI services on AWS ECS Fargate
▸Build high-throughput AI backend services with per-request token budgeting, exponential-backoff retries, and dead-letter queues
▸Instrument production AI systems with structured JSON logging, correlation-ID tracing, and prompt-hash response caching for cost control
▸Integrate Amazon Bedrock and OpenAI models behind a unified, retry-safe client layer for multi-agent orchestration

PythonFastAPIAWS ECS FargateBedrockSQS

ApplyPass

Head of AI & Machine Learning (Co-Founder)Now

June 2023 - Present • Remote, CA

▸Built the entire AI backend from scratch in Python/FastAPI — 800K requests and 3B tokens per month against the OpenAI API
▸Shipped a function-calling job classifier and a two-tower recommendation system, lifting job-match accuracy from 70% to 90%+
▸Cut LLM inference cost ~95% by migrating GPT-4 workloads to a fine-tuned GPT-3.5 with no measurable quality loss
▸Led the full migration of the AI backend from AWS ECS Fargate to DigitalOcean App Platform — Terraform IaC, GitHub Actions CI/CD, zero-downtime DNS cutover
▸Built an MCP server and Cowork agent-plugin suite (OAuth2 PKCE) so support and engineering teams can query the platform in natural language

FastAPIOpenAIPostgreSQLpgvectorDigitalOceanMCP

Verizon

Staff AI Software Engineer

Sept 2022 - Aug 2025 • Remote, CA

▸Spearheaded an AI chatbot for contract analysis achieving 90% accuracy in deviation detection
▸Directed a Neo4j GraphRAG tool for long contracts, reducing hallucinations in legal document analysis
▸Fine-tuned Llama/Mistral models for insurance language extraction — 90% accuracy, saving 5000+ hours/quarter
▸Created a GenAI competitive-intel chatbot with SQL queries to BigQuery — POC in 2 weeks, MVP in 1 month
▸Developed real-time speech-to-text with Whisper, reducing latency from 30s to 5s

LangChainNeo4jLlamaWhisperBigQuery

KLA

Senior AI Software Engineer

Mar 2016 - Sep 2022 • Milpitas, CA

▸Developed a data storage API with PostgreSQL + MinIO — 4X write speed, 6X read speed improvement
▸Created a multi-container Docker Compose app for DL inference analyzing 10,000+ images per workload
▸Led an SRGAN+CNN image classification team, improving SEM review throughput by 4X
▸Characterized GoogLeNet classification — reduced workloads from 10 hours to 30 minutes with 99%+ accuracy

PyTorchCNNsGANsPostgreSQLDocker

Projects

ProductionFeatured

ApplyPass AI Backend

The production LLM platform behind ApplyPass — function-calling job classification, answer generation, and a two-tower recommendation engine on async FastAPI, OpenAI, and PostgreSQL + pgvector.

800K+ requests · 3B+ tokens / month

FastAPIOpenAI APIpgvectorEmbeddings

Infrastructure

AWS → DigitalOcean Migration

Replatformed ApplyPass's LLM backend from AWS ECS Fargate to DigitalOcean App Platform — Terraform IaC, GitHub Actions CI/CD, and a zero-downtime DNS cutover. Full write-up on the blog.

Zero-downtime cutover

TerraformDigitalOceanDockerCI/CD

Developer Tools

ApplyPass MCP & Agent Tooling

An MCP server and Cowork plugin suite that lets support and engineering teams query the ApplyPass platform — customer lookups, match traces, escalation triage — in natural language, secured with OAuth2 PKCE.

Natural-language ops for the team

MCPOAuth2 PKCEPythonCowork

Enterprise

Contract Analysis GraphRAG

Neo4j-backed GraphRAG for long legal contracts — models the relationships between amendments and master agreements so retrieval stays grounded where vanilla RAG drifts.

Grounded retrieval, fewer hallucinations

Neo4jGraphRAGLangChainPython

Engineering Playbook

Opinionated guides on building production ML systems. Not tutorials — lessons learned shipping AI at scale.

12 min

Production LLM Architecture Patterns

How I design LLM pipelines that scale—preprocessing, chunking strategies, prompt engineering, and system architecture that survives production.

8 min

Structured Outputs > Prompting: How I Make LLMs Deterministic

Moving beyond prompt engineering to contract-first design with Pydantic schemas, validation layers, and structured outputs.

View all articles