(FM-NVIDIA) Fugatto: Foundational Generative Audio Transformer Opus 1 cover art

(FM-NVIDIA) Fugatto: Foundational Generative Audio Transformer Opus 1

(FM-NVIDIA) Fugatto: Foundational Generative Audio Transformer Opus 1

Listen for free

View show details

About this listen

Fugatto, a new generalist audio synthesis and transformation model developed by NVIDIA, and ComposableART, an inference-time technique designed to enhance its capabilities. Fugatto distinguishes itself by its ability to follow free-form text instructions, often with optional audio inputs, addressing the challenge that audio data, unlike text, typically lacks inherent instructional information. The document details a comprehensive data and instruction generation strategy that leverages large language models (LLMs) and audio understanding models to create diverse and rich datasets, enabling Fugatto to handle a wide array of tasks including text-to-speech, text-to-audio, and audio transformations. Furthermore, ComposableART allows for compositional abilities, such as combining, interpolating, or negating instructions, providing fine-grained control over audio outputs beyond the training distribution. The text presents experimental evaluations demonstrating Fugatto's competitive performance against specialised models and highlights its emergent capabilities, such as synthesising novel sounds or performing tasks not explicitly trained for.

link: https://d1qx31qr3h6wln.cloudfront.net/publications/FUGATTO.pdf

What listeners say about (FM-NVIDIA) Fugatto: Foundational Generative Audio Transformer Opus 1

Average Customer Ratings

Reviews - Please select the tabs below to change the source of reviews.

In the spirit of reconciliation, Audible acknowledges the Traditional Custodians of country throughout Australia and their connections to land, sea and community. We pay our respect to their elders past and present and extend that respect to all Aboriginal and Torres Strait Islander peoples today.