Unimodal Attacks
Jailbreak Attacks
Initial Ad-Hoc Jailbreak Attempts
Analysing In-The-Wild Jailbreaks Prompts
Exploring Model Size, Safety Training, and Capabilities
Automating Jailbreak Prompt Generation and Analysing Defenses in LLM Chatbots
Prompt Injection
Prompt Injection Definition Instruction Following, Model Capabilities, and Data Safety
Exploring Prompt Injection Attack Variants
System Prompt as Intellectual Property
Exploring Indirect and Virtual Prompt Injection Attacks
Enhancing Prompt Injection Attacks: Automation and Countermeasures
Multi-Modal Attacks
Manual Attacks
Systematic Adversarial Attacks
White-Box Attacks
Continuous Image Space vs. Limited Token Space
Dialogue Poisoning, Social Engineering Skills, Scale
Black-Box Attack
Cross-Modality Vulnerabilities
Additional Attacks
Adversarial Attacks in Complex Systems
Reference
Shayegani, E., Mamun, M. A. A., Fu, Y., Zaree, P., Dong, Y., & Abu-Ghazaleh, N. (2023). Survey of vulnerabilities in large language models revealed by adversarial attacks. arXiv preprint arXiv:2310.10844.