Unimodal Attacks

Jailbreak Attacks

Initial Ad-Hoc Jailbreak Attempts

Analysing In-The-Wild Jailbreaks Prompts

Exploring Model Size, Safety Training, and Capabilities

Automating Jailbreak Prompt Generation and Analysing Defenses in LLM Chatbots

Prompt Injection

Prompt Injection Definition Instruction Following, Model Capabilities, and Data Safety

Exploring Prompt Injection Attack Variants

System Prompt as Intellectual Property

Exploring Indirect and Virtual Prompt Injection Attacks

Enhancing Prompt Injection Attacks: Automation and Countermeasures

Multi-Modal Attacks

Manual Attacks

Systematic Adversarial Attacks

White-Box Attacks

Continuous Image Space vs. Limited Token Space

Dialogue Poisoning, Social Engineering Skills, Scale

Black-Box Attack

Cross-Modality Vulnerabilities

Additional Attacks

Adversarial Attacks in Complex Systems

Reference

  • Shayegani, E., Mamun, M. A. A., Fu, Y., Zaree, P., Dong, Y., & Abu-Ghazaleh, N. (2023). Survey of vulnerabilities in large language models revealed by adversarial attacks. arXiv preprint arXiv:2310.10844.