Vizualisation of an inner hedgehog

Prompt2Box

Model prompts as axis-aligned hyperrectangles in n-d space.

February 2026 · Neeladri Bhuiya, Shib Sankar Dasgupta, Haw-Shiuan Chang, Andrew McCallum
Overview of the PLAGUE attack

PLAGUE: Plug-And-Play framework for life-long adaptive generation of multi-turn exploits

Agentic framework for discovering novel potent multi-turn jailbreak attacks that achieve an attack success rate of 67.3% on Claude Opus 4.1

September 2025 · Neeladri Bhuiya, Madhav Aggarwal, Diptanshu Purwar
Seemingly Plausible Distactors

Seemingly Plausible Distractors

This paper shows that LLMs struggle with challenging multi-hop reasoning, but in more subtle ways than the older gen PLMs. It does so by creating adversarial attack on the HotpotQA using dependency parsing.

November 2024 · Neeladri Bhuiya, Viktor Schlegel, Stefan Winkler