AI/ML Infrastructure Engineer
Patrick McBride
01
About
I'm an AI/ML infrastructure engineer with an unusual path — from physics and materials science into deep learning and the production LLM systems I build today.
At Bill.com, I design and operate agentic LLM and ML inference pipelines — async FastAPI services on AWS ECS Fargate, built for high-throughput AI workloads with token budgeting, retries, and structured observability baked in.
As co-founder and Head of AI at ApplyPass, I built the entire AI backend from scratch — 800K requests and 3B tokens a month, a function-calling job classifier, and a two-tower recommendation system that lifted match accuracy from 70% to 90%+. Most recently I led its full migration from AWS to DigitalOcean and built an MCP server and agent-tooling suite on top of it.
I care about systems that work in production, not just in notebooks — structured outputs over hopeful parsing, evaluation loops over vibes, and cost-aware architecture over brute force.
LLMs & AI
ML Frameworks
Infrastructure
Data
02
Experience
Bill.com
Senior Software Engineer, Backend AI InfrastructureNowAug 2025 - Present • Remote, CA
- ▸Design and operate agentic LLM and ML inference pipelines as async FastAPI services on AWS ECS Fargate
- ▸Build high-throughput AI backend services with per-request token budgeting, exponential-backoff retries, and dead-letter queues
- ▸Instrument production AI systems with structured JSON logging, correlation-ID tracing, and prompt-hash response caching for cost control
- ▸Integrate Amazon Bedrock and OpenAI models behind a unified, retry-safe client layer for multi-agent orchestration
ApplyPass
Head of AI & Machine Learning (Co-Founder)NowJune 2023 - Present • Remote, CA
- ▸Built the entire AI backend from scratch in Python/FastAPI — 800K requests and 3B tokens per month against the OpenAI API
- ▸Shipped a function-calling job classifier and a two-tower recommendation system, lifting job-match accuracy from 70% to 90%+
- ▸Cut LLM inference cost ~95% by migrating GPT-4 workloads to a fine-tuned GPT-3.5 with no measurable quality loss
- ▸Led the full migration of the AI backend from AWS ECS Fargate to DigitalOcean App Platform — Terraform IaC, GitHub Actions CI/CD, zero-downtime DNS cutover
- ▸Built an MCP server and Cowork agent-plugin suite (OAuth2 PKCE) so support and engineering teams can query the platform in natural language
Verizon
Staff AI Software EngineerSept 2022 - Aug 2025 • Remote, CA
- ▸Spearheaded an AI chatbot for contract analysis achieving 90% accuracy in deviation detection
- ▸Directed a Neo4j GraphRAG tool for long contracts, reducing hallucinations in legal document analysis
- ▸Fine-tuned Llama/Mistral models for insurance language extraction — 90% accuracy, saving 5000+ hours/quarter
- ▸Created a GenAI competitive-intel chatbot with SQL queries to BigQuery — POC in 2 weeks, MVP in 1 month
- ▸Developed real-time speech-to-text with Whisper, reducing latency from 30s to 5s
KLA
Senior AI Software EngineerMar 2016 - Sep 2022 • Milpitas, CA
- ▸Developed a data storage API with PostgreSQL + MinIO — 4X write speed, 6X read speed improvement
- ▸Created a multi-container Docker Compose app for DL inference analyzing 10,000+ images per workload
- ▸Led an SRGAN+CNN image classification team, improving SEM review throughput by 4X
- ▸Characterized GoogLeNet classification — reduced workloads from 10 hours to 30 minutes with 99%+ accuracy
03
Projects
The production LLM platform behind ApplyPass — function-calling job classification, answer generation, and a two-tower recommendation engine on async FastAPI, OpenAI, and PostgreSQL + pgvector.
800K+ requests · 3B+ tokens / month
ApplyPass MCP & Agent Tooling
An MCP server and Cowork plugin suite that lets support and engineering teams query the ApplyPass platform — customer lookups, match traces, escalation triage — in natural language, secured with OAuth2 PKCE.
Natural-language ops for the team
Contract Analysis GraphRAG
Neo4j-backed GraphRAG for long legal contracts — models the relationships between amendments and master agreements so retrieval stays grounded where vanilla RAG drifts.
Grounded retrieval, fewer hallucinations
04
Engineering Playbook
Opinionated guides on building production ML systems. Not tutorials — lessons learned shipping AI at scale.