🔴TechBeats live : LLM Quantization "vLLM vs. Llama.cpp" cover art

🔴TechBeats live : LLM Quantization "vLLM vs. Llama.cpp"

🔴TechBeats live : LLM Quantization "vLLM vs. Llama.cpp"

Listen for free

View show details

About this listen

👋🏼Hey AI heads🎙️ 𝐉𝐨𝐢𝐧 𝐮𝐬 for the very first 𝐓𝐞𝐜𝐡 𝐁𝐞𝐚𝐭𝐬 𝐋𝐢𝐯𝐞🔴, hosted by Kosseila—aka @CloudDude , From @CloudThrill. 🎯 This chill & laid back livestream will unpack 𝐋𝐋𝐌 𝐪𝐮𝐚𝐧𝐭𝐢𝐳𝐚𝐭𝐢𝐨𝐧🔥: ✅𝐖𝐇𝐘 it matters ✅𝐇𝐎𝐖 it works✅ Enterprise (vllm) vs Consumer (@Ollama) tradeoffs ✅ and 𝐖𝐇𝐄𝐑𝐄 it’s going next.We’ll be joined by two incredible guest stars to talk about 𝐄𝐧𝐭𝐞𝐫𝐩𝐫𝐢𝐬𝐞 𝐯𝐬 𝐂𝐨𝐧𝐬𝐮𝐦𝐞𝐫 quantz 🗣️:🔷 𝐄𝐥𝐝𝐚𝐫 𝐊𝐮𝐫𝐭𝐢𝐜́, bringing the enterprise perspective with vLLM.🔷𝐂𝐨𝐥𝐢𝐧 𝐊𝐞𝐚𝐥𝐭𝐲, aka Bartowski, top downloaded GGUF quant 𝐋𝐋𝐌𝐬 on Hugging Face.🫵🏼 Come learn, and have some fun😎. 𝐂𝐡𝐚𝐩𝐭𝐞𝐫𝐬 :(00:00) Host Introduction(04:07) Eldar Intro (07:33) Bartowski Intro (13:04) What's Quantization! (16:19) Why LLMs Quantization matters? (20:39) Training Vs Inference "The new deal" (27:46) Biggest misconception about quantization(33:22) Enterprise Quantization in production (vLLM)(48:48) Consumer LLMs and quantization (Ollama, llama.cpp, GGUF) "LLMs for the people"(01:06:45) Bitnet 1Bit Quantization from Microsoft (01:28:14) How long it takes to Quantize a model (llama3 70B) GGUF or lm--compressor(01:34:23) What is I-Matrix, and why people confuse it with IQ Quantization ? (01:39:36) What's LoRA and LoRAQ(01:42:36) What is Sparsity ? (01:47:42) What is Distillation ?(01:52:34) Extreme Quantization (Unsloth) of Big models (Deepseek) at 2bits with 70% size cut(01:57:27) Will future models llama5 be trained on fp4 tensor cores ? if so why quantize it?(02:02:15) The future of LLMs on edge Devices (Google AI edge)(02:08:00) How to Evaluate the quality of Quantized model ?(02:26:09) Hugging face Role in the world of LLM/quantization (02:33:46) Hugging face Role in the world of LLM/quantization (02:36:41) Localllama Sub-redit Down (Moderator goes banana) (02:40:11) Guests Hope for the Future of LLMs and AI in General Check out quantization Blog : https://cloudthrill.ca/llm-quantizati...#AI #LLM #Quantization #TechBeatsLive #Locallama #VLLM #Ollama

What listeners say about 🔴TechBeats live : LLM Quantization "vLLM vs. Llama.cpp"

Average Customer Ratings

Reviews - Please select the tabs below to change the source of reviews.

In the spirit of reconciliation, Audible acknowledges the Traditional Custodians of country throughout Australia and their connections to land, sea and community. We pay our respect to their elders past and present and extend that respect to all Aboriginal and Torres Strait Islander peoples today.