Episodes

  • Accelerating LLMs with TornadoVM: From GPU Kernels to Model Inference
    May 18 2025
    An airhacks.fm conversation with Juan Fumero (@snatverk) about: tornadovm as a Java parallel framework for accelerating data parallelization on GPUs and other hardware, first GPU experiences with ELSA Winner and Voodoo cards, explanation of TornadoVM as a plugin to existing JDKs that uses Graal as a library, TornadoVM's programming model with @parallel and @reduce annotations for parallelizable code, introduction of kernel API for lower-level GPU programming, TornadoVM's ability to dynamically reconfigure and select the best hardware for workloads, implementation of LLM inference acceleration with TornadoVM, challenges in accelerating Llama models on GPUs, introduction of tensor types in TornadoVM to support FP8 and FP16 operations, shared buffer capabilities for GPU memory management, comparison of Java Vector API performance versus GPU acceleration, discussion of model quantization as a potential use case for TornadoVM, exploration of Deep Java Library (DJL) and its ND array implementation, potential standardization of tensor types in Java, integration possibilities with Project Babylon and its Code Reflection capabilities, TornadoVM's execution plans and task graphs for defining accelerated workloads, ability to run on multiple GPUs with different backends simultaneously, potential enterprise applications for LLMs in Java including model distillation for domain-specific models, discussion of Foreign Function & Memory API integration in TornadoVM, performance comparison between different GPU backends like OpenCL and CUDA, collaboration with Intel on Level Zero API and integrated graphics support, future plans for RISC-V support in TornadoVM

    Juan Fumero on twitter: @snatverk

    Show More Show Less
    1 hr and 11 mins
  • Run Java with Java
    May 11 2025
    An airhacks.fm conversation with Christian Humer (@grashalm_) about: bachelor thesis on a Java bytecode interpreter written in Java, exploration of whether Java could be used as a systems language, benefits of implementing an ecosystem in itself as validation, C1X compiler based on C1 but reimplemented from scratch, concept of sea of nodes for mixing control and data flow, goal to rewrite the entire VM in Java, benefits of using one compiler throughout the stack for compatibility and maintainability, discussion of de-optimization process in JIT compilation, explanation of guards and assumptions in optimized code, three versions of Espresso (Java bytecode interpreter), first version as proof of concept, second version using Truffle with serialized ASTs, third version based on bytecodes with unrolling bytecode loops, explanation of bytecode quickening technique, sandboxing capabilities in GraalVM as replacement for deprecated security manager, isolating untrusted code in separate heaps for security, protection against speculative execution attacks, use case for running AI-generated Java code safely in isolated environments, GraalOS as a minimal operating system for running Java isolates, TRegex as GraalVM's optimized regular expression engine that compiles regex to machine code, bytecode interpreter DSL for generating efficient bytecode interpreters for different languages, memory improvements from using bytecode arrays instead of AST objects, potential future integration of TRegex as a Java API

    Christian Humer on twitter: @grashalm_

    Show More Show Less
    1 hr and 2 mins
  • LittleHorse Likes Sun
    May 4 2025
    An airhacks.fm conversation with Colt McNealy (@coltmcnealy) about: first computing experience with Sun workstations and network computing, background in hockey and other sports, using system76 Linux laptops for development, starting programming in high school with Java and later learning C, fortran, assembly, C++ and python, working at a real estate company with kubernetes and Kafka, the genesis of LittleHorse from experiencing challenges with distributed microservices and workflow management, LittleHorse as an open source workflow orchestration engine using Kafka as a commit log rather than a message queue, building a custom distributed database optimized for workflow orchestration, the recent move to fully open source licensing, comparison with AWS Step Functions but with more capabilities and open source benefits, using RocksDB and Kafka Streams for the underlying implementation, performance metrics of 12-40ms latency between tasks and hundreds of tasks per second, the multi-tenant architecture allowing for serverless offerings, integration with Kafka for event-driven architectures, the distinction between orchestration and choreography in distributed systems, using Java 21 with benefits from virtual threads and generational garbage collection, plans for Java 25 adoption, the naming story behind "Little Horse" and its competition with MuleSoft, the Sun Microsystems legacy and innovation culture, recent adoption of Quarkus for some components, the "Know Your Customer" flow as the Hello World example for Little Horse, the importance of observability and durability in workflow management, plans for serverless offerings and multi-tenant architecture, the balance between open source core and commercial offerings

    Colt McNealy on twitter: @coltmcnealy

    Show More Show Less
    1 hr and 4 mins
  • Apache Storm, Disruptor, JCTools and Linearizability
    Apr 27 2025
    An airhacks.fm conversation with Francesco Nigro (@forked_franz) about: JCTools as a Java concurrency utility library created by Nitsan Wakart, the history of JCTools and how Cliff Click donated his non-blocking HashMap algorithm to the project, contributions to JCTools including weight-free queue implementations, Apache Storm vs. Apache Kafka, explanation of how JCTools improves upon Java's standard concurrent queues by reducing garbage creation and optimizing memory layout, the difference between linked node implementations in standard Java collections versus array-based implementations in JCTools, detailed explanation of linearizability as a property of concurrent algorithms, the challenges of implementing concurrent data structures that maintain proper ordering guarantees, explanation of lock-free versus wait-free algorithms and their progress guarantees, discussion of the xadd instruction in x86 processors and how it's used in JCTools for atomic operations, the implementation of MessagePassingQueue API in JCTools that provides relaxed guarantees for better performance, comparison between JCTools and other solutions like Disruptor, explanation of how JCTools achieves 400 million operations per second in single-producer single-consumer scenarios, discussion of cooperative algorithms for multi-producer scenarios, the use of padding to avoid false sharing in concurrent data structures, the implementation of code generation in JCTools to create different flavors of queues, the use of Unsafe and AtomicLongFieldUpdater for low-level operations, real-world applications in high-frequency trading and medical data processing, integration of JCTools with quarkus and mutiny frameworks, the importance of proper memory layout for performance

    Francesco Nigro on twitter: @forked_franz

    Show More Show Less
    1 hr and 7 mins
  • Opensource and JVM Ports
    Apr 21 2025
    An airhacks.fm conversation with Volker Simonis (@volker_simonis) about: discussion about carnivorous plants, explanation of how different carnivorous plants capture prey through movement, glue, or digestive fluids, Utricularia uses vacuum to catch prey underwater, SAP's interest in developing their own JVM around Java 1.4/1.5 era, challenges with SAP's NetWeaver Java EE stack, difficulties maintaining Java across multiple Unix platforms (HP-UX, AIX, S390, Solaris) with different vendor JVMs, SAP's decision to license Sun's HotSpot source code, porting Hotspot to PA-RISC architecture on HP-UX, explanation of C++ interpreter versus Template interpreter in Hotspot, challenges with platform-specific C++ compilers and assembler code, detailed explanation of JVM internals including deoptimization, inlining, and safe points, SAP's contributions to openJDK including PowerPC port, challenges getting SAP to embrace open source, delays caused by Oracle's acquisition of Sun, SAP's extensive JVM porting work across multiple platforms, development of SAP JVM with additional features like profiling safe points, creation of SAP Machine as an open-source OpenJDK distribution, explanation of Java certification and trademark restrictions, Hotspot Express model allowing newer VM components in older Java versions, Volker's move to Amazon Corretto team after 15 years at SAP, brief discussion of ABAP versus Java at SAP, Volker's recent interest in GraalVM and native image technologies

    Volker Simonis on twitter: @volker_simonis

    Show More Show Less
    1 hr
  • Pure Java Blockchain
    Apr 11 2025
    An airhacks.fm conversation with Richard Bair (@RichardBair) about: discussion about Hedera public ledger and its underlying technology, explanation of Hashgraph algorithm for consensus and transaction ordering, comparison to other blockchain technologies like Bitcoin and ethereum, Hedera's democratic approach to block production versus leader-based systems, the Linux Foundation project called Hiero where Hedera's code is being moved, explanation of how nodes gossip transactions and come to consensus, the role of the Hedera Governing Council including companies like Dell and IBM, discussion of HBAR as the native token and fee system, comparison of Hedera's fixed dollar-denominated fees versus fluctuating fees in other blockchains, explanation of staking mechanism and how it creates a representative democracy for node selection, technical details about Hedera's Java implementation using Java 21 and modern language features, use of ZGC garbage collector with 200GB heap on consensus nodes, deployment on Linux using docker, discussion of Java modules and challenges with libraries like Netty, custom Protobuf to Java compiler called PBJ for performance optimization, consideration of replacing Netty with Helidon for better virtual thread support, discussion of supply chain security concerns and minimizing dependencies, custom logging implementation to avoid bloated frameworks like Log4j, importance of deterministic code execution across all nodes, challenges of distributed systems where iteration order must be consistent, explanation of node synchronization mechanisms when nodes fall behind, comparison to serverless cloud pricing models, discussion of vertical versus horizontal scaling in blockchain systems

    Richard Bair on twitter: @RichardBair

    Show More Show Less
    1 hr and 1 min
  • High-Performance Load Testing
    Apr 6 2025
    An airhacks.fm conversation with Francesco Nigro (@forked_franz) about: discussion about the importance of stress testing over System Tests and unit tests, Coordinated Omission Problem in load generators where they don't accurately measure server performance during slowdowns, introduction to HyperFoil as a high-performance load generator capable of generating millions of requests per second with just two cores, explanation of how HyperFoil avoids GC overhead by pre-allocating resources, the architecture of HyperFoil using Netty event loops and a graph-based execution model, comparison with other load testing tools like JMeter, K6, Apache Benchmark and Vegeta, introduction to QDUP as a shell automation tool for distributed testing, overview of Horreum for performance test results storage and analysis, explanation of how these tools work together in Red Hat's performance testing pipeline, discussion of JCTools and its importance for GC-free concurrent data structures, the Universal Scalability Law and its application to load balancing algorithms, the pick-two-random algorithm for efficient resource allocation, the benefits of using JBang for easy one-line execution of HyperFoil, potential drawbacks of HyperFoil including ergonomics and JIT compilation warm-up issues, the possibility of using GraalVM native image to avoid JIT compilation delays

    Francesco Nigro on twitter: @forked_franz

    Show More Show Less
    1 hr and 10 mins
  • Enterprise LLM Integration: Bridging Java and AI in Business Applications
    Mar 30 2025
    An airhacks.fm conversation with Burr Sutter (@burrsutter) about: discussion about integrating LLMs into enterprise Java applications, challenges with non-deterministic LLM outputs in deterministic code environments, limitations of chat interfaces for power users in enterprise settings, preference for form-based applications with prompts running behind the scenes, using LLMs to understand unstructured data while providing structured interfaces, maintaining existing CRUD systems while using LLMs for unstructured data like emails and support tickets, practical examples of using LLMs to generate code from business requirements, creating assistants with system messages and short user prompts, potential for embeddings to replace text prompts in the future, developer journey in learning LLM integration including prompts, tools, RAG, and agentic workflows, benefits of specialized agents over one general agent, using LLMs for code generation with limitations for complex use cases, hybrid approaches combining LLMs with human oversight, using LLMs for email routing and support case classification, potential for extracting knowledge from enterprise data sources like Confluence and SharePoint, quality assurance with LLM judges, discussion of small language models versus large ones, model distillation and fine-tuning for specific enterprise use cases, cost considerations for model training versus using off-the-shelf models with better tool invocation, prediction that models will become more efficient and run on commodity hardware in the future, focus on post-training inference and reliable results

    Burr Sutter on twitter: @burrsutter

    Show More Show Less
    1 hr and 5 mins