[{"content":" Resources Paper Github Website Abstract To discover the weaknesses of LLMs, researchers often embed prompts into a vector space and cluster them to extract insightful patterns. However, vector embeddings primarily capture topical similarity. As a result, prompts that share a topic but differ in specificity, and consequently in difficulty, are often represented similarly, making fine-grained weakness analysis difficult. To address this limitation, we propose Prompt2Box, which embeds prompts into a box embedding space using a trained encoder. The encoder, trained on existing and synthesized datasets, outputs box embeddings that capture not only semantic similarity but also specificity relations between prompts (e.g., \u0026ldquo;writing an adventure story\u0026rdquo; is more specific than \u0026ldquo;writing a story\u0026rdquo;). We further develop a novel dimension reduction technique for box embeddings to facilitate dataset visualization and comparison. Our experiments demonstrate that box embeddings consistently capture prompt specificity better than vector baselines. On the downstream task of creating hierarchical clustering trees for 17 LLMs from the UltraFeedback dataset, Prompt2Box can identify 8.9% more LLM weaknesses than vector baselines and achieves an approximately 33% stronger correlation between hierarchical depth and instruction specificity.\nCitation @misc{bhuiya2026prompt2boxuncoveringentailmentstructure, title={PROMPT2BOX: Uncovering Entailment Structure among LLM Prompts}, author={Neeladri Bhuiya and Shib Sankar Dasgupta and Andrew McCallum and Haw-Shiuan Chang}, year={2026}, eprint={2603.21438}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2603.21438}, } ","permalink":"https://zawedcvg.github.io/papers/paper3/","summary":"Model prompts as axis-aligned hyperrectangles in n-d space.","title":"Prompt2Box"},{"content":" Download Paper Website Abstract Large Language Models (LLMs) are improving at an exceptional rate. With the advent of agentic workflows, multi-turn dialogue has become the de facto mode of interaction with LLMs for completing long and complex tasks. While LLM capabilities continue to improve, they remain increasingly susceptible to jailbreaking, especially in multi-turn scenarios where harmful intent can be subtly injected across the conversation to produce nefarious outcomes. While single-turn attacks have been extensively explored, adaptability, efficiency and effectiveness continue to remain key challenges for their multi-turn counterparts. To address these gaps, we present PLAGUE, a novel plug-and-play framework for designing multi-turn attacks inspired by lifelong-learning agents. PLAGUE dissects the lifetime of a multi-turn attack into three carefully designed phases (Primer, Planner and Finisher) that enable a systematic and information-rich exploration of the multi-turn attack family. Evaluations show that red-teaming agents designed using PLAGUE achieve state-of-the-art jailbreaking results, improving attack success rates (ASR) by more than 30% across leading models in a lesser or comparable query budget. Particularly, PLAGUE enables an ASR (based on StrongReject (Souly et al., 2024)) of 81.4% on OpenAI’s o3 and 67.3% on Claude’s Opus 4.1, two models that are considered highly resistant to jailbreaks in safety literature. Our work offers tools and insights to understand the importance of plan initialization, context optimization and lifelong learning in crafting multi-turn attacks for a comprehensive model vulnerability evaluation.\nFigure 1: Performance of the attack Citation @misc{bhuiya2025plagueplugandplayframeworklifelong, title={PLAGUE: Plug-and-play framework for Lifelong Adaptive Generation of Multi-turn Exploits}, author={Neeladri Bhuiya and Madhav Aggarwal and Diptanshu Purwar}, year={2025}, eprint={2510.17947}, archivePrefix={arXiv}, primaryClass={cs.CR}, url={https://arxiv.org/abs/2510.17947}, } ","permalink":"https://zawedcvg.github.io/papers/paper2/","summary":"Agentic framework for discovering novel potent multi-turn jailbreak attacks that achieve an attack success rate of 67.3% on Claude Opus 4.1","title":"PLAGUE: Plug-And-Play framework for life-long adaptive generation of multi-turn exploits"},{"content":" Download Paper Code and data Abstract State-of-the-art Large Language Models (LLMs) are accredited with an increasing number of different capabilities, ranging from reading comprehension over advanced mathematical and reasoning skills to possessing scientific knowledge. In this paper we focus on multi-hop reasoning—the ability to identify and integrate information from multiple textual sources.Given the concerns with the presence of simplifying cues in existing multi-hop reasoning benchmarks, which allow models to circumvent the reasoning requirement, we set out to investigate whether LLMs are prone to exploiting such simplifying cues. We find evidence that they indeed circumvent the requirement to perform multi-hop reasoning, but they do so in more subtle ways than what was reported about their fine-tuned pre-trained language model (PLM) predecessors. We propose a challenging multi-hop reasoning benchmark by generating seemingly plausible multi-hop reasoning chains that ultimately lead to incorrect answers. We evaluate multiple open and proprietary state-of-the-art LLMs and show that their multi-hop reasoning performance is affected, as indicated by up to 45% relative decrease in F1 score when presented with such seemingly plausible alternatives. We also find that—while LLMs tend to ignore misleading lexical cues—misleading reasoning paths indeed present a significant challenge. The code and data are made available at https://github.com/zawedcvg/Are-Large-Language-Models-Attentive-Readers\nFigure 1: Example of the generated attack Citation @inproceedings{bhuiya-etal-2024-seemingly, title = \u0026#34;Seemingly Plausible Distractors in Multi-Hop Reasoning: Are Large Language Models Attentive Readers?\u0026#34;, author = \u0026#34;Bhuiya, Neeladri and Schlegel, Viktor and Winkler, Stefan\u0026#34;, editor = \u0026#34;Al-Onaizan, Yaser and Bansal, Mohit and Chen, Yun-Nung\u0026#34;, booktitle = \u0026#34;Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing\u0026#34;, month = nov, year = \u0026#34;2024\u0026#34;, address = \u0026#34;Miami, Florida, USA\u0026#34;, publisher = \u0026#34;Association for Computational Linguistics\u0026#34;, url = \u0026#34;https://aclanthology.org/2024.emnlp-main.147/\u0026#34;, doi = \u0026#34;10.18653/v1/2024.emnlp-main.147\u0026#34;, pages = \u0026#34;2514--2528\u0026#34;, abstract = \u0026#34;State-of-the-art Large Language Models (LLMs) are accredited with an increasing number of different capabilities, ranging from reading comprehension over advanced mathematical and reasoning skills to possessing scientific knowledge. In this paper we focus on multi-hop reasoning{---}the ability to identify and integrate information from multiple textual sources.Given the concerns with the presence of simplifying cues in existing multi-hop reasoning benchmarks, which allow models to circumvent the reasoning requirement, we set out to investigate whether LLMs are prone to exploiting such simplifying cues. We find evidence that they indeed circumvent the requirement to perform multi-hop reasoning, but they do so in more subtle ways than what was reported about their fine-tuned pre-trained language model (PLM) predecessors. We propose a challenging multi-hop reasoning benchmark by generating seemingly plausible multi-hop reasoning chains that ultimately lead to incorrect answers. We evaluate multiple open and proprietary state-of-the-art LLMs and show that their multi-hop reasoning performance is affected, as indicated by up to 45{\\%} relative decrease in F1 score when presented with such seemingly plausible alternatives. We also find that{---}while LLMs tend to ignore misleading lexical cues{---}misleading reasoning paths indeed present a significant challenge. The code and data are made available at https://github.com/zawedcvg/Are-Large-Language-Models-Attentive-Readers\u0026#34; } ","permalink":"https://zawedcvg.github.io/papers/paper1/","summary":"This paper shows that LLMs struggle with challenging multi-hop reasoning, but in more subtle ways than the older gen PLMs. It does so by creating adversarial attack on the HotpotQA using dependency parsing.","title":"Seemingly Plausible Distractors"},{"content":"Industry AI Safety Intern · A10 Networks June 2025 – August 2025 Reduced peak memory usage of the existing GCG attack by 50% (2× memory efficiency), enabling attacks with larger number of prompts. Designed and implemented PLAGUE, a novel agentic multi-turn jailbreak attack framework that improved the attack success rate (ASR) by up to 40% points compared to the prior SOTA in models like Claude Opus 4.1. Designed an orchestrator to coordinate multiple agents and integrated a RAG-based module for lifelong retrieval of strategies and in-context learning for PLAGUE. Published paper: PLAGUE: Plug-and-play Framework for Lifelong Adaptive Generation of Multi-turn Exploits | First Author | ICLR 2026 Research Extern Graduate Researcher · IBM January 2025 – April 2025 Led a team of 4 to conduct research on creating a correctness classifier using LLM\u0026rsquo;s uncertainty estimation for code generation under the supervision of Dr. Andrew McCallum and Dr. Veronika Thost. Engineered an optimized TokenSAR implementation with Tree-sitter, improving runtime efficiency and boosting correctness prediction accuracy by 15% over uncertainty-based baselines. Graduate Researcher · CIIR and IESL December 2024 – Present Conducting research under Dr. Andrew McCallum (IESL) and Dr. Hamed Zamani (CIIR) on leveraging box embeddings for evaluating LLM performance. Proposed modeling LLM prompts as box embeddings — axis-aligned hyperrectangles that jointly encode semantic relevance and specificity, going beyond the topical similarity captured by standard vector embeddings. Developed BoxSNE, an nD-to-2D visualization technique for box embeddings, improving interpretability. Demonstrated strong correlation between box volume and instruction complexity. Results show improved retrieval performance and interesting scaling behaviors. See Prompt2Box. Researcher · National University of Singapore August 2023 – May 2024 Conducted a final year research project under the supervision of Dr. Stefan Winkler and Dr. Viktor Schlegel, leading to a deeper understanding of LLM\u0026rsquo;s performance under strong adversarial attacks. Investigated multi-hop reasoning failures in LLMs by designing a dependency-parsing-based adversarial attack to expose systematic shortcutting behavior during inference. Developed a novel adversarial attack which affects state-of-the-art LLM\u0026rsquo;s performance up to 45%. Published paper: Seemingly Plausible Distractors in Multi-Hop Reasoning | First Author | EMNLP 2024 Education M.S. Computer Science · UMass Amherst 2024 – 2026 · GPA: 4.0/4.0 B.S. Computer Science · National University of Singapore 2020 – 2024 ","permalink":"https://zawedcvg.github.io/experience/","summary":"\u003ch2 id=\"industry\"\u003eIndustry\u003c/h2\u003e\n\u003ch3 id=\"ai-safety-intern--a10-networks\"\u003eAI Safety Intern · A10 Networks\u003c/h3\u003e\n\u003ch5 id=\"june-2025--august-2025\"\u003eJune 2025 – August 2025\u003c/h5\u003e\n\u003cul\u003e\n\u003cli\u003eReduced peak memory usage of the existing GCG attack by 50% (2× memory efficiency), enabling attacks with larger number of prompts.\u003c/li\u003e\n\u003cli\u003eDesigned and implemented PLAGUE, a novel agentic multi-turn jailbreak attack framework that improved the attack success rate (ASR) by up to 40% points compared to the prior SOTA in models like Claude Opus 4.1.\u003c/li\u003e\n\u003cli\u003eDesigned an orchestrator to coordinate multiple agents and integrated a RAG-based module for lifelong retrieval of strategies and in-context learning for PLAGUE.\u003c/li\u003e\n\u003cli\u003ePublished paper: \u003cem\u003ePLAGUE: Plug-and-play Framework for Lifelong Adaptive Generation of Multi-turn Exploits\u003c/em\u003e | First Author | \u003ca href=\"https://iclr.cc/virtual/2026\" target=\"_blank\"\u003eICLR 2026\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003chr\u003e\n\u003ch2 id=\"research\"\u003eResearch\u003c/h2\u003e\n\u003ch3 id=\"extern-graduate-researcher--ibm\"\u003eExtern Graduate Researcher · IBM\u003c/h3\u003e\n\u003ch5 id=\"january-2025--april-2025\"\u003eJanuary 2025 – April 2025\u003c/h5\u003e\n\u003cul\u003e\n\u003cli\u003eLed a team of 4 to conduct research on creating a correctness classifier using LLM\u0026rsquo;s uncertainty estimation for code generation under the supervision of Dr. Andrew McCallum and Dr. Veronika Thost.\u003c/li\u003e\n\u003cli\u003eEngineered an optimized TokenSAR implementation with Tree-sitter, improving runtime efficiency and boosting correctness prediction accuracy by 15% over uncertainty-based baselines.\u003c/li\u003e\n\u003c/ul\u003e\n\u003ch3 id=\"graduate-researcher--ciir-and-iesl\"\u003eGraduate Researcher · CIIR and IESL\u003c/h3\u003e\n\u003ch5 id=\"december-2024--present\"\u003eDecember 2024 – Present\u003c/h5\u003e\n\u003cul\u003e\n\u003cli\u003eConducting research under Dr. Andrew McCallum (IESL) and Dr. Hamed Zamani (CIIR) on leveraging box embeddings for evaluating LLM performance.\u003c/li\u003e\n\u003cli\u003eProposed modeling LLM prompts as box embeddings — axis-aligned hyperrectangles that jointly encode semantic relevance and specificity, going beyond the topical similarity captured by standard vector embeddings.\u003c/li\u003e\n\u003cli\u003eDeveloped BoxSNE, an nD-to-2D visualization technique for box embeddings, improving interpretability. Demonstrated strong correlation between box volume and instruction complexity.\u003c/li\u003e\n\u003cli\u003eResults show improved retrieval performance and interesting scaling behaviors. See \u003ca href=\"https://arxiv.org/pdf/2603.21438\" target=\"_blank\"\u003ePrompt2Box\u003c/a\u003e.\u003c/li\u003e\n\u003c/ul\u003e\n\u003ch3 id=\"researcher--national-university-of-singapore\"\u003eResearcher · National University of Singapore\u003c/h3\u003e\n\u003ch5 id=\"august-2023--may-2024\"\u003eAugust 2023 – May 2024\u003c/h5\u003e\n\u003cul\u003e\n\u003cli\u003eConducted a final year research project under the supervision of Dr. Stefan Winkler and Dr. Viktor Schlegel, leading to a deeper understanding of LLM\u0026rsquo;s performance under strong adversarial attacks.\u003c/li\u003e\n\u003cli\u003eInvestigated multi-hop reasoning failures in LLMs by designing a dependency-parsing-based adversarial attack to expose systematic shortcutting behavior during inference.\u003c/li\u003e\n\u003cli\u003eDeveloped a novel adversarial attack which affects state-of-the-art LLM\u0026rsquo;s performance up to 45%.\u003c/li\u003e\n\u003cli\u003ePublished paper: \u003cem\u003eSeemingly Plausible Distractors in Multi-Hop Reasoning\u003c/em\u003e | First Author | \u003ca href=\"https://aclanthology.org/2024.emnlp-main.147/\" target=\"_blank\"\u003eEMNLP 2024\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003chr\u003e\n\u003ch2 id=\"education\"\u003eEducation\u003c/h2\u003e\n\u003ch3 id=\"ms-computer-science--umass-amherst\"\u003eM.S. Computer Science · UMass Amherst\u003c/h3\u003e\n\u003ch5 id=\"2024--2026--gpa-4040\"\u003e2024 – 2026 · GPA: 4.0/4.0\u003c/h5\u003e\n\u003ch3 id=\"bs-computer-science--national-university-of-singapore\"\u003eB.S. Computer Science · National University of Singapore\u003c/h3\u003e\n\u003ch5 id=\"2020--2024\"\u003e2020 – 2024\u003c/h5\u003e","title":"Experience"}]