Multidimensional Evaluations Propose New Method for AI Cybersecurity

A text published on LessWrong advocates the use of multidimensional evaluations to measure and improve the security of AI-generated code.

A publication on LessWrong proposes a shift in how artificial intelligence evaluations are structured. Currently, tests tend to follow a format of one or multiple dimensions evaluated against a single set of samples. The author argues that the ideal model should expand this logic to multiple dimensions, allowing variables beyond the language model itself to be tested simultaneously.

In the context of cybersecurity, the proposal is to use this approach to measure system hardening. According to the text, there are three main approaches to strengthening AI-generated code: the first consists of an attack-and-defense cycle, where the model itself is used to find vulnerabilities and generate fixes. The second involves adapting formal proofs, using tools like Verus or Lean to validate the code. The third approach suggests rewriting the code from scratch, already in a format native to mathematical proofs.

Multidimensional evaluation would allow these different approaches to be tested comparatively. Instead of merely varying the AI model being tested, the methodology proposes altering the code implementation or the security specifications. In this way, the AI tool acts as a security property inspector, functioning similarly to a processor that evaluates performance characteristics.

The author notes that the feasibility of investing computational tokens in these security approaches may be limited by current costs and capabilities. However, the expectation is that the proposal will become practical in the coming months or years. The text notes that initiatives from companies like Glasswing and AISLE already explore the basic attack-and-defense cycle, but argues that the success of more complex approaches is measurable through these expanded evaluations.