LOGS / BENCHMARKS / BUILDS

Experiments

Technical experiments, benchmarks, and build logs. Things I run on my own compute, written up for anyone who finds them useful.

AI Red-Team Harness Finds Reproducible AgentDojo Injection

A self-healing AI red-team harness promoted a medium-severity AgentDojo prompt-injection finding only after an independent replay reproduced it.

First published benchmarks of Google's TurboQuant at 7B scale on RTX 4080. 45 data points, 4 models, real VRAM measurements.