Study Evaluates Transparency of Text Diffusion Model and Highlights Interpretability Challenges

Research conducted in collaboration with the Google DeepMind team indicates that DiffusionGemma has variable transparency similar to Gemma's, but exhibits lower algorithmic transparency.

A transparency audit conducted in collaboration with the interpretability and text diffusion teams at Google DeepMind (GDM) analyzed DiffusionGemma, the organization's text diffusion model. The study concluded that DiffusionGemma is not significantly less transparent than the autoregressive Gemma model, performing similarly in monitorability evaluations.

Although diffusion models inherently possess a greater opaque serial depth, researchers were able to apply the "logit lens" technique to intermediate vectors and remove uninterpretable information without compromising system performance. This indicates that the model's intermediate nodes are interpretable, reducing opaque depth to a level comparable to that of Gemma.

However, understanding the variables used at different stages does not guarantee an understanding of the algorithm the model employs to reach a final answer. To address this distinction, the study's authors divided the concept into two categories: variable transparency, which evaluates whether it is possible to understand snapshots of the model's processing, and algorithmic transparency, which verifies whether these snapshots allow for the reconstruction of the process used to generate outputs.

By default, algorithmic transparency is considerably lower in text diffusion models. In autoregressive models, reasoning occurs sequentially, token by token, allowing the exact state of the system to be known at each step and facilitating inferences about the model's decisions. In a diffusion model, however, all tokens are generated simultaneously on a single "canvas," making the causal relationship between them unclear.

This characteristic means that the diffusion model can, for instance, use tokens at the end of a sequence to determine which tokens should be generated at the beginning. The study investigated these and other phenomena through a series of case studies, highlighting the complexities involved in interpreting the processing flow of non-autoregressive models.