Staff AI/ML Engineer • Co-Founder
Patrick McBride
About Me
I'm a Staff AI/ML Engineer with a unique path: from physics and materials science to deep learning and production LLM systems.
At Verizon, I lead AI initiatives including contract analysis chatbots achieving 90% accuracy, GraphRAG systems for legal documents, and real-time speech-to-text reducing latency from 30s to 5s.
As Co-Founder and Head of AI at ApplyPass, I built the entire AI backend from scratch—handling 800K requests/month with 3B tokens, developing recommendation systems that boosted match accuracy from 30% to 90%.
I care deeply about systems that work in production, not just in notebooks. That means structured outputs, evaluation loops, and cost-conscious architecture.
LLMs
ML Frameworks
Infrastructure
Data
Experience
Verizon
Staff AI Software EngineerSept 2022 - Present • Remote, CA
- •Spearheaded comprehensive AI Chatbot for contract analysis achieving 90% accuracy in deviation detection
- •Directed Neo4j GraphRAG tool for long contracts, reducing hallucinations in legal document analysis
- •Fine-tuned Llama/Mistral models for insurance language extraction—90% accuracy, saving 5000+ hours/quarter
- •Created GenAI competitive intel chatbot with SQL queries to BigQuery—POC in 2 weeks, MVP in 1 month
- •Developed real-time speech-to-text with Whisper, reducing latency from 30s to 5s
ApplyPass
Head of AI & Machine Learning (Co-Founder)June 2023 - Present • Remote, CA
- •Built entire AI backend with OpenAI API in Python/FastAPI—800K requests/month, 3B tokens/month
- •Created Job Classifier using function calling to convert listings to structured JSON
- •Migrated GPT-4 to GPT-3.5 achieving comparable accuracy at 5% of original cost
- •Fine-tuned custom GPT-3.5 improving job filter accuracy from 80% to 95%
- •Developed Two-Tower recommendation system boosting match accuracy from 70% to 90%+
KLA
Senior AI Software EngineerMar 2016 - Sep 2022 • Milpitas, CA
- •Developed data storage API with PostgreSQL + MinIO—4X write speed, 6X read speed improvement
- •Created multi-container Docker Compose app for DL inference analyzing 10,000+ images per workload
- •Led SRGAN+CNN image classification team, improving SEM review throughput by 4X
- •Characterized GoogLeNet classification—reduced workloads from 10 hours to 30 minutes with 99%+ accuracy
Projects
Production LLM system handling 800K+ requests/month with 3B tokens. Features job classification, answer generation, and Two-Tower recommendation system.
Contract Analysis GraphRAG
Neo4j-based GraphRAG system for parsing legal contracts, identifying relationships between amendments and master agreements, reducing hallucinations.
Mock technical interview chatbot with speech input, real-time code grading, and generated audio responses using Gradio interface.
Real-time Speech Transcription
OpenAI Whisper integration for agent-customer call transcription, reducing latency from 30s to 5s with speaker diarization.
Engineering Playbook
Opinionated guides on building production ML systems. Not tutorials—lessons learned shipping AI at scale.
Production LLM Architecture Patterns
How I design LLM pipelines that scale—preprocessing, chunking, prompt engineering, and system architecture.
Structured Outputs > Prompting
How I make LLMs deterministic in production with Pydantic schemas, validation, and contract-first design.
Embeddings in the Real World
Two-tower ranking, when cosine similarity fails, and evaluation beyond offline metrics.
Evaluation and Feedback Loops
From offline metrics to online learning—drift monitoring, regression tests, and continuous evaluation.