Experience

Industry

Reduced peak memory usage of the existing GCG attack by 50% (2× memory efficiency), enabling attacks with larger number of prompts.
Designed and implemented PLAGUE, a novel agentic multi-turn jailbreak attack framework that improved the attack success rate (ASR) by up to 40% points compared to the prior SOTA in models like Claude Opus 4.1.
Designed an orchestrator to coordinate multiple agents and integrated a RAG-based module for lifelong retrieval of strategies and in-context learning for PLAGUE.
Published paper: PLAGUE: Plug-and-play Framework for Lifelong Adaptive Generation of Multi-turn Exploits | First Author | ICLR 2026

Led a team of 4 to conduct research on creating a correctness classifier using LLM’s uncertainty estimation for code generation under the supervision of Dr. Andrew McCallum and Dr. Veronika Thost.
Engineered an optimized TokenSAR implementation with Tree-sitter, improving runtime efficiency and boosting correctness prediction accuracy by 15% over uncertainty-based baselines.

Conducting research under Dr. Andrew McCallum (IESL) and Dr. Hamed Zamani (CIIR) on leveraging box embeddings for evaluating LLM performance.
Proposed modeling LLM prompts as box embeddings — axis-aligned hyperrectangles that jointly encode semantic relevance and specificity, going beyond the topical similarity captured by standard vector embeddings.
Developed BoxSNE, an nD-to-2D visualization technique for box embeddings, improving interpretability. Demonstrated strong correlation between box volume and instruction complexity.
Results show improved retrieval performance and interesting scaling behaviors. See Prompt2Box.

Conducted a final year research project under the supervision of Dr. Stefan Winkler and Dr. Viktor Schlegel, leading to a deeper understanding of LLM’s performance under strong adversarial attacks.
Investigated multi-hop reasoning failures in LLMs by designing a dependency-parsing-based adversarial attack to expose systematic shortcutting behavior during inference.
Developed a novel adversarial attack which affects state-of-the-art LLM’s performance up to 45%.
Published paper: Seemingly Plausible Distractors in Multi-Hop Reasoning | First Author | EMNLP 2024