ML Performance Engineer at AWS optimizing large language model inference on custom ML accelerators. 6+ years of experience spanning ML systems, performance optimization, and distributed systems. Proficient in ML frameworks, profiling, collective communication, distributed inference, kernel-level optimization, and hardware-level systems analysis.
Experience
ML Performance Engineer
January 2025 – Present- Optimize large language model inference on Trainium, AWS' custom ML accelerators, through profiling and roofline analysis, kernel-level work on attention, MoE, and MLP, and sharding-strategy analysis across chips
- Build end-to-end benchmarking and observability infrastructure for monitoring inference accuracy, latency, throughput, and cost across model architectures
- Validate kernel and framework changes against accuracy benchmarks and PyTorch reference implementations to ensure optimizations preserve numerical correctness
- Lead a recurring paper reading group to grow the team's knowledge of ML systems research and emerging work
Software Engineer
August 2021 – December 2024- Developed a real-time distributed platform and API for ad measurement using AWS CDK, EC2/ECS, ELB and Elasticache, enabling the onboarding of new advertiser groups not previously supported
- Led system and feature launches end-to-end: data correctness validation, load testing, dashboards, monitoring, alarming, and regression detection
- Investigated and root-caused system-wide outages and accuracy issues impacting advertiser data
- Worked cross-functionally with Product Management to develop new features, perform data investigations, address Advertiser's concerns and explain business logic
Data Engineer
February 2019 – August 2021- Designed and built a multi-component streaming and cache-based application for address cleansing, achieving 30% cost reduction over the prior pipeline
- Built end-to-end metrics collection, monitoring, and reporting infrastructure across core production components
- Developed ETL pipelines with Spark and Airflow combining multiple data sources to back an internal API
Research
Attentional Guidance in Visual Search
2017- Conducted original research on the interaction between endogenous (top-down) and exogenous (bottom-up) attentional cues in visual search
- Designed a probabilistic cueing paradigm to measure how prior cue reliability shapes attentional guidance during visual search tasks
- Developed serial, parallel, and hybrid computational models of attentional guidance and evaluated them against human behavioural data
Lassonde Undergraduate Research Award
2016, 2017Technologies
ML & Systems: PyTorch, vLLM, NKI (Neuron Kernel Interface), AWS Neuron SDK, TensorFlow
Performance & Hardware: AWS Trainium / Inferentia, kernel optimization, roofline analysis, collective communication, distributed inference, profiling, perfetto
Languages: Python, Scala, Java, SQL, TypeScript
Data & Infrastructure: Apache Spark, Airflow, AWS CDK, Pandas, NumPy, Elasticsearch, EMR, EC2, S3, SQS, CloudWatch, DynamoDB, Aurora