This paper examines the practical implications of large language models (LLMs) in offensive cybersecurity, moving beyond theoretical possibilities to assess their real-world effectiveness. The research, conducted by the CTI Layer Team at OWASP Top Ten For LLMs, explores the ability of LLMs such as GPT-4o, Claude, and DeepSeek r-1 to exploit vulnerabilities in the OWASP Juice Shop, a simulated vulnerable web application. Using the Cybench framework as a benchmark, the team tested OpenAI’s ChatGPT-4o and Anthropic’s Claude against five hacking tasks while also assessing local models from DeepSeek, which failed to complete preliminary tasks.
- GEN AI SECURITY
- resources
- Whitepapers/Guides
OWASP LLM Exploit Generation v1.0
- February 26, 2025
About
Additional Resources
- December 17, 2025
- Publications, Resources
OWASP AIBOM Generator
- December 9, 2025
- Publications, Resources
