/

/

Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks

Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks

Feb 14, 20241 min read

seed
source/paper

Unimodal Attacks
Jailbreak Attacks
Prompt Injection
Multi-Modal Attacks
Manual Attacks
Systematic Adversarial Attacks
White-Box Attacks
Black-Box Attack
Additional Attacks
Adversarial Attacks in Complex Systems
Reference

Unimodal Attacks

Jailbreak Attacks

Initial Ad-Hoc Jailbreak Attempts

Multi-step Jailbreaking Privacy Attacks on ChatGPT

Analysing In-The-Wild Jailbreaks Prompts

Do Anything Now - Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models

Exploring Model Size, Safety Training, and Capabilities

Jailbroken - How Does LLM Safety Training Fail?

Automating Jailbreak Prompt Generation and Analysing Defenses in LLM Chatbots

Prompt Injection

Prompt Injection Definition Instruction Following, Model Capabilities, and Data Safety

Exploring Prompt Injection Attack Variants

System Prompt as Intellectual Property

Exploring Indirect and Virtual Prompt Injection Attacks

Enhancing Prompt Injection Attacks: Automation and Countermeasures

Multi-Modal Attacks

Manual Attacks

Systematic Adversarial Attacks

White-Box Attacks

Continuous Image Space vs. Limited Token Space

Are Aligned Neural Networks Adversarially Aligned

Dialogue Poisoning, Social Engineering Skills, Scale

Abusing Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs

Black-Box Attack

Cross-Modality Vulnerabilities

Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models

Additional Attacks

Adversarial Attacks in Complex Systems

Reference

Shayegani, E., Mamun, M. A. A., Fu, Y., Zaree, P., Dong, Y., & Abu-Ghazaleh, N. (2023). Survey of vulnerabilities in large language models revealed by adversarial attacks. arXiv preprint arXiv:2310.10844.

Graph View

Unimodal Attacks
Jailbreak Attacks
Prompt Injection
Multi-Modal Attacks
Manual Attacks
Systematic Adversarial Attacks
White-Box Attacks
Black-Box Attack
Additional Attacks
Adversarial Attacks in Complex Systems
Reference

Graph View

Created with Quartz v4.5.1 © 2025

github