Publications


See a full list on Google Scholar


Efficiently Programming Large Language Models using SGLang
Lianmin Zheng *, Liangsheng Yin, Zhiqiang Xie, Jeff Huang, Chuyue Sun, Cody Hao Yu, Shiyi Cao, Christos Kozyrakis, Ion Stoica, Joseph E. Gonzalez, Clark Barrett, Ying Sheng *
Preprint 2023
| paper | code |

LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset
Lianmin Zheng *, Wei-Lin Chiang *, Ying Sheng, Tianle Li, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zhuohan Li, Zi Lin, Eric P. Xing, Joseph E. Gonzalez, Ion Stoica, Hao Zhang
ICLR 2024 (spotlight)
| paper | dataset |

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng *, Wei-Lin Chiang *, Ying Sheng *, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P. Xing, Hao Zhang, Joseph E. Gonzalez, Ion Stoica
NeurIPS 2023 (Datasets and Benchmarks Track)
| paper | Vicuna Blog | MT-Bench Blog | code |

Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning
Lianmin Zheng *, Zhuohan Li *, Hao Zhang *, Yonghao Zhuang, Zhifeng Chen, Yanping Huang, Yida Wang, Yuanzhong Xu, Danyang Zhuo, Eric P. Xing, Joseph E. Gonzalez, Ion Stoica
OSDI 2022
| paper | code | slides | talk |

Ansor: Generating High-Performance Tensor Programs for Deep Learning
Lianmin Zheng, Chengfan Jia, Minmin Sun, Zhao Wu, Cody Hao Yu, Ameer Haj-Ali, Yida Wang, Jun Yang, Danyang Zhuo, Koushik Sen, Joseph E. Gonzalez, Ion Stoica
OSDI 2020
| paper | code | tutorial | slides | talk |

TenSet: A Large-scale Program Performance Dataset for Learned Tensor Compilers
Lianmin Zheng *, Ruochen Liu *, Junru Shao, Tianqi Chen, Joseph Gonzalez, Ion Stoica, Ameer Haj-Ali
NeurIPS 2021 (Datasets and Benchmarks Track)
| paper | code | slides | talk |

S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Ying Sheng, Shiyi Cao, Dacheng Li, Coleman Hooper, Nicholas Lee, Shuo Yang, Christopher Chou, Banghua Zhu, Lianmin Zheng, Kurt Keutzer, Joseph E. Gonzalez, Ion Stoica
MLSys 2024
| paper | code |

High-Throughput Generative Inference of Large Language Models with a Single GPU
Ying Sheng, Lianmin Zheng, Binhang Yuan, Zhuohan Li, Max Ryabinin, Daniel Y Fu, Zhiqiang Xie, Beidi Chen, Clark Barrett, Joseph E Gonzalez, Percy Liang, Christopher RĂ©, Ion Stoica, Ce Zhang
ICML 2023 (oral)
| paper | code |

AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving
Zhuohan Li *, Lianmin Zheng *, Yinmin Zhong *, Vincent Liu, Ying Sheng, Xin Jin, Yanping Huang, Zhifeng Chen, Hao Zhang, Joseph E Gonzalez, Ion Stoica
OSDI 2023
| paper | code |

Efficient Memory Management for Large Language Model Serving with PagedAttention
Woosuk Kwon *, Zhuohan Li *, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, Ion Stoica
SOSP 2023
| paper | code |

TensorIR: An Abstraction for Automatic Tensorized Program Optimization
Siyuan Feng, Bohan Hou, Hongyi Jin, Wuwei Lin, Junru Shao, Ruihang Lai, Zihao Ye, Lianmin Zheng, Cody Hao Yu, Yong Yu, Tianqi Chen
ASPLOS 2023
| paper | code |

Towards Optimal Caching and Model Selection for Large Model Inference
Banghua Zhu, Ying Sheng, Lianmin Zheng, Clark Barrett, Michael I. Jordan, Jiantao Jiao
NeurIPS 2023
| paper | code |

On Optimizing the Communication of Model Parallelism
Yonghao Zhuang, Hexu Zhao, Lianmin Zheng, Zhuohan Li, Eric P. Xing, Qirong Ho, Joseph E. Gonzalez, Ion Stoica, Hao Zhang
MLSys 2023
| paper | code |

ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training
Jianfei Chen *, Lianmin Zheng *, Zhewei Yao, Dequan Wang, Ion Stoica, Michael W. Mahoney, Joseph E. Gonzalez
ICML 2021 (long talk)
| paper | code | slides | talk |

A Hardware-Software Blueprint for Flexible Deep Learning Specialization
Thierry Moreau, Tianqi Chen, Luis Vega, Jared Roesch, Eddie Yan, Lianmin Zheng, Josh Fromm, Ziheng Jiang, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy
IEEE Micro 2019 (Best paper award)
| paper | code |

Learning to Optimize Tensor Programs
Tianqi Chen, Lianmin Zheng, Eddie Yan, Ziheng Jiang, Thierry Moreau, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy
NeurIPS 2018 (spotlight)
| paper | code | tutorial |

TVM: An Automated End-to-End Optimizing Compiler for Deep Learning
Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy
OSDI 2018
| paper | code |

MAgent: A Many-Agent Reinforcement Learning Platform for Artificial Collective Intelligence
Lianmin Zheng *, Jiacheng Yang *, Han Cai, Weinan Zhang, Jun Wang, Yong Yu
AAAI 2018 (Demo Track)
| paper | code | video |