Research conducted in collaboration with the Google DeepMind team indicates that DiffusionGemma has variable transparency similar to Gemma's, but exhibits lower algorithmic transparency.
A transparency audit conducted in collaboration with the interpretability and text diffusion teams at Google DeepMind (GDM) analyzed DiffusionGemma, the organization's text diffusion model. The study concluded that DiffusionGemma is not significantly less transparent than the autoregressive Gemma model, performing similarly in monitorability evaluations.
Although diffusion models inherently possess a greater opaque serial depth, researchers were able to apply the "logit lens" technique to intermediate vectors and remove uninterpretable information without compromising system performance. This indicates that the model's intermediate nodes are interpretable, reducing opaque depth to a level comparable to that of Gemma.
However, understanding the variables used at different stages does not guarantee an understanding of the algorithm the model employs to reach a final answer. To address this distinction, the study's authors divided the concept into two categories: variable transparency, which evaluates whether it is possible to understand snapshots of the model's processing, and algorithmic transparency, which verifies whether these snapshots allow for the reconstruction of the process used to generate outputs.
By default, algorithmic transparency is considerably lower in text diffusion models. In autoregressive models, reasoning occurs sequentially, token by token, allowing the exact state of the system to be known at each step and facilitating inferences about the model's decisions. In a diffusion model, however, all tokens are generated simultaneously on a single "canvas," making the causal relationship between them unclear.
This characteristic means that the diffusion model can, for instance, use tokens at the end of a sequence to determine which tokens should be generated at the beginning. The study investigated these and other phenomena through a series of case studies, highlighting the complexities involved in interpreting the processing flow of non-autoregressive models.
Variable transparency evaluates whether it is possible to understand snapshots of the model's processing, while algorithmic transparency verifies whether these snapshots allow for the reconstruction of the exact process used to generate outputs.
In autoregressive models, reasoning occurs sequentially token by token, making the system's state clear at each step. In diffusion models, all tokens are generated simultaneously on a single canvas, making the causal relationship between them unclear and allowing later tokens to influence earlier ones.
Researchers applied the 'logit lens' technique to intermediate vectors and successfully removed uninterpretable information without compromising system performance, reducing the model's opaque depth to a level comparable to autoregressive models.