19 C
New York
Thursday, April 3, 2025

Understanding Bad Likert Judge Prompt Injection Attack


Multi-turn prompt injection is an advanced jailbreak attack technique that manipulates Large Language Models (LLMs) over multiple interactions rather than a single prompt. By gradually continuing the conversation, attackers exploit the model’s contextual understanding to bypass safeguards and generate restricted outputs. Recently, a very interesting one such prompt injection technique named “Bad Likert Judge” is designed by the Palo Alto Unit 42 Incident Response team which exploits the multi-turn iteration by leveraging the Likert Scale – a psychometric measurement rating scale used to measure opinions, attitudes and behaviours.

An attacker can take advantage of the Likert scale to jailbreak an LLM by using the multi-turn and single-turn technique. This is currently supported in BreakingPoint Systems (BPS).

Key Steps of the Attack:


Figure 1: Bad Likert Judge Prompt Injection Attack Steps

  1. Evaluator Prompt:

The attacker begins by instructing the target LLM to assume the role of a judge tasked with evaluating responses generated by other models. This involves providing the LLM with specific guidelines to assess content, such as evaluating information on steps of making a chemical weapon.

  1. Scoring with the Likert Scale:

The target LLM is then prompted to rate the harmfulness of a given response using the Likert scale, a rating system measuring levels of agreement or disagreement.

  1. Generating Example Responses:

Next, the attacker asks the target LLM to generate example responses corresponding to each point on the Likert scale.

  1. Extracting Harmful Content:

The example associated with the highest point on the Likert scale often contains the harmful content that the model’s safety mechanisms are designed to prevent.

Uncovering a Single-Turn Jailbreak Method

The security researchers from ATI Keysight have replicated the attack scenario to gain a deeper understanding of its mechanics. During the research, we also discovered that the same jailbreak attack can be performed using a single-turn approach against Grok LLM server.

This can be achieved by structuring the conversation with a predefined assistant response, such as including the “assistant” response in the message sequence when sending a HTTP POST request to the Grok LLM server.


Figure 2: Bad Likert Judge Prompt Injection using Single-Turn Approach

This setup tricks the model into assuming the evaluation process is complete, allowing the attacker to request and extract responses based on specific Likert scale scores as shown below –


Figure 3: Sample Bad Likert Judge Prompt Injection Attack Response

Bad Likert Judge Prompt Injection Strike in BPS

At Keysight Technologies, our Application and Threat Intelligence (ATI) team added the support of this new type of Prompt Injection attack i.e. Bad Likert Judge prompt injection using single-turn technique in ATI-2025-05 StrikePack released on April 02, 2025.

This update includes a new Strike named “AI LLM Bad Likert Judge Prompt Injection”. This strike sends a “Bad Likert Judge” Prompt to the Grok LLM. This technique manipulates the Grok LLM by embedding Likert scale evaluations into text-based prompts, coercing the model into generating increasingly harmful responses. The structured rating pattern subtly guides the LLM into formulating high-risk responses under the guise of judgment-based assessments. Note: This Strike will randomly select a harmful category related to its questions and embed it within the prompt.


Figure 4: Bad Likert Judge Prompt Injection Strike in BPS

Additionally, a new LLM Transport Layer, “Grok API over HTTP,” has been added under the LLM TransportLayer section of the BreakingPoint Security evasion profile.


Figure 5: New LLM TransportLayer “Grok API over HTTP” inside BPS Security Evasion

The demonstration of this Bad Likert Judge Prompt Injection Strike presents a novel and creative approach for testing LLM security. As more organizations adopt AI-driven systems, it’s essential to identify vulnerabilities and ensure these technologies are deployed securely and reliably. By using such methods, we can better safeguard our systems from emerging threats and uphold the integrity of AI applications.

Leverage Subscription Service to Stay Ahead of Attacks

Keysight’s Application and Threat Intelligence subscription provides daily malware and bi-weekly updates of the latest application protocols and vulnerabilities for use with Keysight test platforms. The ATI Research Centre continuously monitors threats as they appear in the wild. BreakingPoint and in the future, other tools like CyPerf, now provide customers with access to attack campaigns for different advanced persistent threats, enabling them to test their currently deployed security controls’ ability to detect or block such attacks.

References

  1. https://unit42.paloaltonetworks.com/multi-turn-technique-jailbreaks-llms/
  2. https://www.keysight.com/blogs/en/tech/nwvs/2024/10/04/prompt-injection-101-for-llm
  3. https://en.wikipedia.org/wiki/Likert_scale



Source link

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles