Kasra Mazaheri

I am a master's student at MIT interested in building more efficient machine learning models: architectures that do more with less compute, less memory, and less friction between an idea and a working implementation.

At MIT, I have been lucky to work with Prof. Daniela Rus and Prof. Song Han on efficient ML across different layers of the stack, from sparse architecture design, hardware-aware kernel design, distillation, and data-level methods. I am currently tinkering with data and training for LLM pre-training and post-training.

A long time ago, I was an International Olympiad in Informatics (IOI) Gold Medalist, surpassed 2900 on Codeforces, and spent a lot of time writing programming problems for CodeChef and other platforms. I also worked on educational projects, including contributing to a freely available book on Computational Graph Theory for Olympiad (GTOI) among other resources, which shaped how I think about access, teaching, and the joy of making difficult ideas easier for other people to enter.

Kasra Mazaheri

Blog

Selected Research

FlashMoBA mean pooling, top-k selection, and attention diagram

Flash MoBA: Optimizing Mixture of Block Attention

Guangxuan Xiao*, Junxian Guo*, Kasra Mazaheri, Song Han

Flash MoBA studies how mixture-of-block attention can route long-context computation sparsely while preserving quality, connecting routing accuracy, block size, local key aggregation, and efficient CUDA execution.

[paper] [code]

Broteina analysis figure summarizing score alignment and few-step protein generation

Broteina: Self-Distillation Unlocks Few-Step Protein Design

Kasra Mazaheri and collaborators

Broteina addresses train and test-time score misalignment in protein backbone diffusion models by first self-distilling the low-temperature sampling distribution, then distilling these 400-step models into single- and few-step generators.

Block-sparse projection heads reduce dense coupling in oscillator state-space models

FlashLinOSS: Expressive, Scalable, and Efficient Architectures for Oscillatory State-Space Models

MIT Distributed Robotics Lab

FlashLinOSS shows that block-sparse oscillator SSMs can outperform denser variants with fewer projection parameters, while IO-aware fused kernels reduce runtime by up to 7.8x and peak memory by about 3x.

[code]

Education

MIT mark

Massachusetts Institute of Technology

M.S. in Computer Science and Artificial Intelligence, expected 2026

B.S. in Computer Science and Engineering, minors in Mathematics and Music Technology, 2025

GPA: 5.0/5.0. Invited member of Phi Beta Kappa.

Industry Experience

Citadel LLC Global Quantitative Strategies (GQS)

Hudson River Trading LLC

The D. E. Shaw Group

Selected Awards

  • International Olympiad in Informatics (IOI), 2020: Gold Medalist
  • International Olympiad in Informatics (IOI), 2019: Silver Medalist
  • International Collegiate Programming Contest (ICPC): World Finalist
  • Iranian National Olympiad in Informatics (INOI), 2019: Gold Medalist
  • Iranian National Olympiad in Informatics (INOI), 2018: Gold Medalist