💉(WIP) Offensive Approach for Prompt Injection Attacks
As we discussed earlier, Prompt Injection can be a way to exploit many different vulnerabilities in your Generative AI Application.
All the security engineers and folks may not be scientists, but security can be practical in fascinating ways. The best way to secure complex stuff is to test them in practical and simplified ways.
Let's jump on the approach.
Analyze the architecture - Recon
Offensive testing starts with good reconnaissance techniques. Security is still security, so you should know your target. I assume you're going through approved white-box testing for your role.
Read all the architectural references, even for little plugins that may impact the system. Then, re-draw the architecture in your way. In my way, I track down the user input's journey and re-draw that journey in the security language.
Track down the Input
In every testing discipline, input manipulation-response detection is the most helpful tactic for discovering weaknesses. It's the same in LLM-integrated apps. You need to identify if your input changes after you enter it. This change can increase the context quality with extra reasoning, safety filter masking/shaping before feeding it to the LLM, etc.
Code Review is your BFF
In a good code review, you should identify prompt filters, reasoning points, RAG search queries/field names, agency relationships like executions, data searches, prompt template details like template-supported filters/guardrails, etc.
Identifying all those critical points in the code review may give you better visibility for your tracked input. If you track down your input like in the 3rd step, now you can plan which point you can manipulate.
Last updated