Ep.33 CVPR 2025 Best Student Paper Honorable Mentions : Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to basket failed.

Please try again later

Add to Wish List failed.

Please try again later

Remove from Wish List failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Ep.33 CVPR 2025 Best Student Paper Honorable Mentions : Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens

Listen for free

View show details

About this listen

Discrete Diffusion Timestep (DDT) Tokensに関するこの論文は、マルチモーダル大規模言語モデル (MLLMs)における視覚理解と生成を統合する革新的なアプローチを提案しています。既存のMLLMが使用する空間視覚トークンは、言語に固有の再帰的構造が欠けているため、LLMが完全に習得するのが困難であるという問題点を指摘しています。この課題に対処するため、著者は拡散タイムステップを活用して、離散的で再帰的な視覚トークンを学習する新しい手法を導入しています。これらのDDTトークンは、ノイズの多い画像における漸進的な属性損失を再帰的に補償することで、LLMの自己回帰推論能力と拡散モデルの正確な画像生成能力を効果的に組み合わせ、シームレスなマルチモーダル理解と生成を可能にします。実験では、このアプローチが、他のMLLMと比較して、マルチモーダル理解と生成の両方で優れた性能を達成していることを示しています。

No reviews yet

Audiobook Categories

More to Explore

GETTING STARTED

Ep.33 CVPR 2025 Best Student Paper Honorable Mentions : Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens

Failed to add items

Add to basket failed.

Add to Wish List failed.

Remove from Wish List failed.

Follow podcast failed

Unfollow podcast failed

Ep.33 CVPR 2025 Best Student Paper Honorable Mentions : Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens

About this listen