AI: post transformers cover art

AI: post transformers

AI: post transformers

By: mcgrof
Listen for free

About this listen

The transformer architecture revolutionized the world of Neural Networks. It was a springboard for what we know today as modern artificial intelligence. This podcast focuses on modern state of the art research paper reviews starting from the transformer and on.mcgrof
Episodes
  • GPT-NeoX: Large-Scale Autoregressive Language Modeling in PyTorch
    Sep 7 2025

    Thus describes EleutherAI's GPT-NeoX library, a robust open-source framework for training large-scale autoregressive language models on GPUs, building upon the Megatron and DeepSpeed libraries. It highlights the library's advanced features like distributed training, support for various hardware and systems, and cutting-edge architectural innovations. The text also provides practical guidance on setup, configuration, data preparation, training, inference, and evaluation, alongside details on pretrained models like GPT-NeoX-20B and Pythia. Furthermore, it details how to export models to Hugging Face and monitor experiments, underscoring its widespread adoption in research and industry.


    Source:

    https://github.com/EleutherAI/gpt-neox


    Show More Show Less
    12 mins
  • SGLang: Efficient Language Model Program Execution
    Sep 7 2025

    This June 2024 paper introduces SGLang, a framework designed to enhance the efficiency of Large Language Model (LLM) and Vision Language Model (VLM) serving. It achieves this through a co-design of a flexible frontend language and a fast backend runtime. The frontend simplifies programming with primitives for generation and parallelism, while the backend utilizes novel optimizations like RadixAttention for KV cache reuse and compressed finite state machines for faster structured output decoding. These innovations allow SGLang to significantly improve throughput and reduce latency compared to existing systems across various LLM applications and hardware platforms. The framework is open-source, boasts extensive model support, and has seen wide industry adoption due to its performance benefits in complex LM programs.

    Sources:


    https://arxiv.org/pdf/2312.07104

    https://docs.sglang.ai/

    https://github.com/sgl-project/sglang

    Show More Show Less
    17 mins
  • Eleuther: evaluating LLMs
    Sep 7 2025

    These sources collectively explore various approaches to evaluating and improving Large Language Models (LLMs). Several papers introduce new benchmark datasets designed to test LLMs on complex reasoning tasks, such as the "BIG-Bench Hard (BBH)" suite, the graduate-level "GPQA" questions in science, and "MuSR" for multistep soft reasoning in natural language narratives. A key technique discussed across these sources is Chain-of-Thought (CoT) prompting, which encourages LLMs to show their step-by-step reasoning, leading to improved performance, often surpassing human-rater averages on challenging tasks. Additionally, the "Instruction-Following Eval (IFEval)" introduces a reproducible benchmark for verifiable instructions, allowing for objective assessment of an LLM's ability to follow explicit directives. The "MMLU-Pro Benchmark" further contributes a large-scale dataset across diverse disciplines to rigorously assess model capabilities, emphasizing the need for robust evaluation metrics and challenging data to push the boundaries of AI reasoning.


    Sources:

    https://github.com/EleutherAI/lm-evaluation-harness

    https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/leaderboard/README.md

    https://arxiv.org/pdf/2103.03874 - Measuring Mathematical Problem Solving With the

    MATH Dataset

    https://arxiv.org/pdf/2210.09261 - Challenging BIG-Bench tasks and

    whether chain-of-thought can solve them

    https://arxiv.org/pdf/2310.16049 - MUSR: TESTING THE LIMITS OF CHAIN-OF-THOUGHT

    WITH MULTISTEP SOFT REASONING

    https://arxiv.org/pdf/2311.07911 - Instruction-Following Evaluation for Large Language

    Models

    https://arxiv.org/pdf/2311.12022 - GPQA: A Graduate-Level Google-Proof

    Q&A Benchmark

    https://arxiv.org/pdf/2406.01574 - MMLU-Pro: A More Robust and Challenging

    Multi-Task Language Understanding Benchmark


    Show More Show Less
    27 mins
No reviews yet
In the spirit of reconciliation, Audible acknowledges the Traditional Custodians of country throughout Australia and their connections to land, sea and community. We pay our respect to their elders past and present and extend that respect to all Aboriginal and Torres Strait Islander peoples today.