CV — Kristen McIntosh

Toronto, ON

ML Performance Engineer at AWS optimizing large language model inference on custom ML accelerators. 6+ years of experience spanning ML systems, performance optimization, and distributed systems. Proficient in ML frameworks, profiling, collective communication, distributed inference, kernel-level optimization, and hardware-level systems analysis.

Experience

ML Performance Engineer

January 2025 – Present

Amazon Web Services

Own performance optimization of large language model inference on Trainium, AWS' custom ML accelerators, spanning profiling and roofline analysis, kernel-level work on attention, MoE, MLP, KV-cache management, and sharding-strategy analysis (tensor / data / expert parallelism, etc.)
Drive deep dives into accelerator hardware features through targeted microbenchmarking, characterizing performance limits and surfacing gaps that drove microcode and ISA changes across teams
Lead the design of mixed-sequence-length support in the attention kernel, defining the required framework- and kernel-level changes
Led the design and delivery of end-to-end benchmarking and profiling infrastructure for monitoring inference accuracy, latency, throughput, and cost across model architectures and serving configurations; partnered across teams to define its interface and mentored junior engineers through implementation
Lead a recurring paper reading group to grow the team's knowledge of ML systems research and emerging work

Software Engineer

August 2021 – December 2024

Amazon

Developed a real-time distributed platform and API for ad measurement using AWS CDK, EC2/ECS, ELB and Elasticache
Led system and feature launches end-to-end: capacity planning, load testing, data correctness validation, dashboards, monitoring, alarming, and regression detection
Investigated and root-caused system-wide outages and accuracy issues impacting advertiser data
Worked cross-functionally with Product Management to develop new features, perform data investigations, address Advertiser's concerns and explain business logic

Data Engineer

February 2019 – August 2021

Royal Bank of Canada

Designed and built a multi-component streaming and cache-based application for address cleansing, achieving 30% cost reduction over the prior pipeline
Built end-to-end metrics collection, monitoring, and reporting infrastructure across core production components
Developed ETL pipelines with Spark and Airflow combining multiple data sources to back an internal API

Research

Attentional Guidance in Visual Search

2017

Centre for Vision Research, York University, supervised by Prof. James Elder

Conducted original research on the interaction between endogenous (top-down) and exogenous (bottom-up) attentional cues in visual search
Designed a probabilistic cueing paradigm to measure how prior cue reliability shapes attentional guidance during visual search tasks
Developed serial, parallel, and hybrid computational models of attentional guidance and evaluated them against human behavioural data

Lassonde Undergraduate Research Award

2016, 2017

York University

Skills & Technologies

ML & Systems: NKI (Neuron Kernel Interface), AWS Neuron SDK, vLLM, PyTorch

Performance & Hardware: AWS Trainium / Inferentia, kernel optimization, roofline analysis, collective communication, distributed inference, profiling

Languages: Python, Scala, Java, SQL, TypeScript

Data & Infrastructure: Apache Spark, Airflow, AWS CDK, Pandas, NumPy, Elasticsearch, EMR, EC2, S3, SQS, CloudWatch, DynamoDB, Aurora

Education

York University

B.Sc., Honours in Computer Science