SGLang: Efficient Language Model Program Execution

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to basket failed.

Please try again later

Add to Wish List failed.

Please try again later

Remove from Wish List failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

SGLang: Efficient Language Model Program Execution

Listen for free

View show details

About this listen

This June 2024 paper introduces SGLang, a framework designed to enhance the efficiency of Large Language Model (LLM) and Vision Language Model (VLM) serving. It achieves this through a co-design of a flexible frontend language and a fast backend runtime. The frontend simplifies programming with primitives for generation and parallelism, while the backend utilizes novel optimizations like RadixAttention for KV cache reuse and compressed finite state machines for faster structured output decoding. These innovations allow SGLang to significantly improve throughput and reduce latency compared to existing systems across various LLM applications and hardware platforms. The framework is open-source, boasts extensive model support, and has seen wide industry adoption due to its performance benefits in complex LM programs.

Sources:

https://arxiv.org/pdf/2312.07104

https://docs.sglang.ai/

https://github.com/sgl-project/sglang

No reviews yet

Audiobook Categories

More to Explore

GETTING STARTED

SGLang: Efficient Language Model Program Execution

Failed to add items

Add to basket failed.

Add to Wish List failed.

Remove from Wish List failed.

Follow podcast failed

Unfollow podcast failed

SGLang: Efficient Language Model Program Execution

About this listen