Skip to main content
New write-up: migrating a production LLM backend off AWS

AI/ML Infrastructure Engineer

Patrick McBride

I build production-grade LLM systems that hold up under real traffic. AI/ML infrastructure engineer at Bill.com, and co-founder & Head of AI at ApplyPass.

800K+

API requests / month

3B+

Tokens processed / month

9+

Years building ML/AI

01

About

I'm an AI/ML infrastructure engineer with an unusual path — from physics and materials science into deep learning and the production LLM systems I build today.

At Bill.com, I design and operate agentic LLM and ML inference pipelines — async FastAPI services on AWS ECS Fargate, built for high-throughput AI workloads with token budgeting, retries, and structured observability baked in.

As co-founder and Head of AI at ApplyPass, I built the entire AI backend from scratch — 800K requests and 3B tokens a month, a function-calling job classifier, and a two-tower recommendation system that lifted match accuracy from 70% to 90%+. Most recently I led its full migration from AWS to DigitalOcean and built an MCP server and agent-tooling suite on top of it.

I care about systems that work in production, not just in notebooks — structured outputs over hopeful parsing, evaluation loops over vibes, and cost-aware architecture over brute force.

LLMs & AI

OpenAI APIAmazon BedrockRAGGraphRAGFunction CallingStructured OutputsFine-tuningEvals

ML Frameworks

PyTorchTensorFlowScikit-learnHugging Face

Infrastructure

FastAPIAWS ECS FargateDigitalOceanDockerTerraformMCP

Data

PostgreSQLpgvectorMongoDBRedisNeo4j

02

Experience

Bill.com

Senior Software Engineer, Backend AI InfrastructureNow

Aug 2025 - PresentRemote, CA

  • Design and operate agentic LLM and ML inference pipelines as async FastAPI services on AWS ECS Fargate
  • Build high-throughput AI backend services with per-request token budgeting, exponential-backoff retries, and dead-letter queues
  • Instrument production AI systems with structured JSON logging, correlation-ID tracing, and prompt-hash response caching for cost control
  • Integrate Amazon Bedrock and OpenAI models behind a unified, retry-safe client layer for multi-agent orchestration
PythonFastAPIAWS ECS FargateBedrockSQS

ApplyPass

Head of AI & Machine Learning (Co-Founder)Now

June 2023 - PresentRemote, CA

  • Built the entire AI backend from scratch in Python/FastAPI — 800K requests and 3B tokens per month against the OpenAI API
  • Shipped a function-calling job classifier and a two-tower recommendation system, lifting job-match accuracy from 70% to 90%+
  • Cut LLM inference cost ~95% by migrating GPT-4 workloads to a fine-tuned GPT-3.5 with no measurable quality loss
  • Led the full migration of the AI backend from AWS ECS Fargate to DigitalOcean App Platform — Terraform IaC, GitHub Actions CI/CD, zero-downtime DNS cutover
  • Built an MCP server and Cowork agent-plugin suite (OAuth2 PKCE) so support and engineering teams can query the platform in natural language
FastAPIOpenAIPostgreSQLpgvectorDigitalOceanMCP

Verizon

Staff AI Software Engineer

Sept 2022 - Aug 2025Remote, CA

  • Spearheaded an AI chatbot for contract analysis achieving 90% accuracy in deviation detection
  • Directed a Neo4j GraphRAG tool for long contracts, reducing hallucinations in legal document analysis
  • Fine-tuned Llama/Mistral models for insurance language extraction — 90% accuracy, saving 5000+ hours/quarter
  • Created a GenAI competitive-intel chatbot with SQL queries to BigQuery — POC in 2 weeks, MVP in 1 month
  • Developed real-time speech-to-text with Whisper, reducing latency from 30s to 5s
LangChainNeo4jLlamaWhisperBigQuery

KLA

Senior AI Software Engineer

Mar 2016 - Sep 2022Milpitas, CA

  • Developed a data storage API with PostgreSQL + MinIO — 4X write speed, 6X read speed improvement
  • Created a multi-container Docker Compose app for DL inference analyzing 10,000+ images per workload
  • Led an SRGAN+CNN image classification team, improving SEM review throughput by 4X
  • Characterized GoogLeNet classification — reduced workloads from 10 hours to 30 minutes with 99%+ accuracy
PyTorchCNNsGANsPostgreSQLDocker

03

Projects

ProductionFeatured

ApplyPass AI Backend

The production LLM platform behind ApplyPass — function-calling job classification, answer generation, and a two-tower recommendation engine on async FastAPI, OpenAI, and PostgreSQL + pgvector.

800K+ requests · 3B+ tokens / month

FastAPIOpenAI APIpgvectorEmbeddings
Infrastructure

AWS → DigitalOcean Migration

Replatformed ApplyPass's LLM backend from AWS ECS Fargate to DigitalOcean App Platform — Terraform IaC, GitHub Actions CI/CD, and a zero-downtime DNS cutover. Full write-up on the blog.

Zero-downtime cutover

TerraformDigitalOceanDockerCI/CD
Developer Tools

ApplyPass MCP & Agent Tooling

An MCP server and Cowork plugin suite that lets support and engineering teams query the ApplyPass platform — customer lookups, match traces, escalation triage — in natural language, secured with OAuth2 PKCE.

Natural-language ops for the team

MCPOAuth2 PKCEPythonCowork
Enterprise

Contract Analysis GraphRAG

Neo4j-backed GraphRAG for long legal contracts — models the relationships between amendments and master agreements so retrieval stays grounded where vanilla RAG drifts.

Grounded retrieval, fewer hallucinations

Neo4jGraphRAGLangChainPython