AI: Origins cover art

AI: Origins

AI: Origins

By: mcgrof
Listen for free

About this listen

While we take AI for granted now, it's easy to forget it's unique history and haphazard advances. This reviews the principal concepts under which modern neural networks have built upon since the earliest known related papers.mcgrof
Episodes
  • FIM: Filling in the Middle for Language Models
    Aug 9 2025

    This 2022 academic paper explores Fill-in-the-Middle (FIM) capabilities in causal decoder-based language models, demonstrating that these models can learn to infill text effectively by simply rearranging parts of the training data. The authors propose a method where a middle section of text is moved to the end of a document during training, showing this data augmentation does not negatively impact the model's original left-to-right generative ability. The research highlights the efficiency of FIM training, suggesting it should be a default practice, and offers best practices and hyperparameters for optimal performance, particularly noting the superiority of character-level span selection and context-level FIM implementation. They also introduce new benchmarks to evaluate infilling performance, emphasizing the importance of sampling-based evaluations over traditional perplexity measures for gauging real-world utility.


    Source: https://arxiv.org/pdf/2207.14255

    Show More Show Less
    20 mins
  • BPE: Subword Units for Neural Machine Translation of Rare Words
    Aug 9 2025

    This 2016 academic paper addresses the challenge of translating rare and unknown words in Neural Machine Translation (NMT), a common issue as NMT models typically operate with a fixed vocabulary while translation itself is an open-vocabulary problem. The authors propose a novel approach where rare and unknown words are encoded as sequences of subword units, eliminating the need for a back-off dictionary. They introduce an adaptation of the Byte Pair Encoding (BPE) compression algorithm for word segmentation, which allows for an open vocabulary using a compact set of variable-length character sequences. Empirical results demonstrate that this subword unit method significantly improves translation quality, particularly for rare and out-of-vocabulary words, for English-German and English-Russian language pairs. The paper compares various segmentation techniques, concluding that BPE offers a more effective and simpler solution for handling the open-vocabulary problem in NMT compared to previous word-level models and dictionary-based approaches.


    Source: https://arxiv.org/pdf/1508.07909


    Show More Show Less
    16 mins
  • Distributed Word and Phrase Representations
    Aug 9 2025

    This 2013 paper introduces advancements to the continuous Skip-gram model, a method for learning high-quality distributed vector representations of words. The authors present extensions like subsampling frequent words and negative sampling to enhance vector quality and training speed. A significant contribution is the method for identifying and representing idiomatic phrases as single tokens, improving the model's ability to capture complex meanings. The paper demonstrates that these word and phrase vectors exhibit linear relationships, allowing for precise analogical reasoning through simple vector arithmetic. Overall, the research highlights improved efficiency and accuracy in learning linguistic representations, especially with large datasets, by optimizing the Skip-gram architecture.


    Source: https://arxiv.org/pdf/1310.4546

    Show More Show Less
    16 mins
No reviews yet
In the spirit of reconciliation, Audible acknowledges the Traditional Custodians of country throughout Australia and their connections to land, sea and community. We pay our respect to their elders past and present and extend that respect to all Aboriginal and Torres Strait Islander peoples today.