Kasra Mazaheri

Blog

Broteina analysis figure summarizing score alignment and few-step protein generation

Broteína: Self-Distillation Unlocks Few-Step Protein Design

Aligning the teacher with its useful low-temperature design distribution before compression lets few-step students surpass the 400-step base model's distinct designable yield.

Block-sparse projection heads reduce dense coupling in state space models

Scaling State Space Models with Block-Sparsity and Fused Kernels

A scaling story for oscillator state-space layers: block-sparse projection heads control dense coupling, while IO-aware FlashDOSS kernel fuses projection, scan, and projection-back work to avoid expensive state-domain traffic.

Selected Research

FlashMoBA mean pooling, top-k selection, and attention diagram

Flash MoBA: Optimizing Mixture of Block Attention

Guangxuan Xiao*, Junxian Guo*, Kasra Mazaheri, Song Han

Flash MoBA studies how mixture-of-block attention can route long-context computation sparsely while preserving quality, connecting routing accuracy, block size, local key aggregation, and efficient CUDA execution.

[paper] [code]

Broteína: Self-Distillation Unlocks Few-Step Protein Design

Kasra Mazaheri, Mohammed Ehab

Preprint available to share

Broteína aligns a protein backbone model's native score field with its useful low-temperature design distribution before compression, enabling few-step students that surpass the base model's distinct designable yield.

[blog]

Block-sparse projection heads reduce dense coupling in oscillator state-space models

FlashLinOSS: Expressive, Scalable, and Efficient Architectures for Oscillatory State-Space Models

Kasra Mazaheri, Jared Boyer, T. Konstantin Rusch, Daniela Rus

Preprint available to share

FlashLinOSS shows that block-sparse oscillator SSMs can outperform denser variants with fewer projection parameters, while IO-aware fused kernels reduce runtime by up to 7.8x and peak memory by about 3x.

[blog] [code]