Gemini (formerly known as Bard) is an advanced AI model developed by Google DeepMind. Its multimodal feature enables it to easily understand and works with various type of information, including text, image etc. The Gemini API provides access to various natural language processing tasks, allowing developers to integrate these capabilities into their applications through a straightforward HTTP-based interface.
In this blog, we will see the network communications that happens in the background when we call the Gemini’s GenerateContent API.
Network Traffic Analysis
The ATI team in Keysight has analysed the network traffic of Gemini API call and found some interesting insights, which can be helpful for other researchers.
When we call the Gemini API using ‘curl’ or python’s ‘requests’ library, it generally sends a POST request to the Gemini API endpoint which utilizes TLS 1.3 (by default) for encryption and HTTP/1.1 for communication.
Let’s take a detailed look at the decrypted traffic:
Request Components:
Figure1: Sample Gemini API HTTP Request
- Request URL
When we call the Gemini’s REST API, the POST request URL (request line) structure looks like below:
/<API version>/model/<model name>:generateContent
- API Version: Indicates the version of the Gemini REST API that the user wants to use. As example: “v1” or “v1beta”.
- Model Name: Indicates the name of the Gemini Large Language Model (LLM) the user wants to use to generate the contents. List of available Gemini models are “gemini-1.5-flash”, “gemini-1.0-pro”, “gemini-1.5-pro” etc.
Note: The Gemini API key can be provided inside the Request URL. For that case the URL will look like below:
/<API version>/model/<model name>:generateContent?key=<Gemini API KEY>
- Headers
The POST request contains the following headers:
- Host: generativelanguage.googleapis.com (indicating the request is being sent to this particular API server)
- User-Agent: python-requests/2.25.1 (when the REST API is called using Python’s “requests” library of version 2.25.1)
- Accept-encoding: gzip, deflate
- Accept: */*
- Connection: Keep-Alive
- Content-Type: application/json
- x-goog-api-key: <Gemini API key> (If the Gemini API key is not provided inside the request URL, it can be mentioned using this specific header)
- Body
The POST request body/payload contains the prompt that the users want to send to the Gemini API server in JSON format. It contains 2 types of Instructions as mentioned below:
- System Instruction (optional): Used to set the behaviour, rules or context for the assistant. It provides instructions that define how the assistant should respond throughout the conversation. It is an optional field.
- Contents or User Instruction: The input or question from the end-user interacting with the assistant. It is the primary content to which the assistant responds.
Inside the decrypted traffic the request body looks like below:
- Without System Instruction:
Without system instruction (prompt), the JSON body looks like below –
With both the system instruction and contents (user instruction), the JSON body looks like below –
If the user wants to send any prompt against any image, then image data needs to be sent in Base64 encoded format inside the “inline_data” -> “data” JSON field and “inline_data” -> “mime_type” will contain the image type as shown below –
Response Components:
After the successful authentication and processing of the API request, the Gemini API server responds with a 200 Ok HTTP response which looks like below –
The HTTP response header includes –
- Content-Type: application/json; charset=UTF-8
- Vary: Origin
- Vary: X-Origin
- Vary: Referer
- Content-Encoding: gzip
- Date: <Timestamp in GMT format>
- Server: scaffolding on HTTPServer2
- Cache-Control: private
- X-XSS-Protection: 0
- X-Frame-Options: SAMEORIGIN
- X-Content-Type-Options: nosniff
- Server-Timing: gfet4t7; dur=3194
- Alt-Svc: h3=”:443″; ma=2592000,h3-29=”:443″; ma=2592000
- Transfer-Encoding: chunked
These headers include different important information, such as the “server” name referred to as “scaffolding on HTTPServer2,” along with details like “Transfer-Encoding: chunked” and “Content-Encoding: gzip” etc.
The response also contains the answer of the user prompt in JSON format as shown below –
Figure 6: Sample Gemini API HTTP Response Paylpad
Gemini API Traffic Simulation in Keysight ATI
At Keysight Technologies, our Application and Threat Intelligence (ATI) team, researchers have examined the traffic pattern of Gemini API call and added its support in ATI-2024-18 StrikePack release on September 13, 2024.
We have added 4 new Gemini API Superflows: 1-arm (Client-side simulation) and 2-arm (both Client and Server-side simulation) as shown below –
Here, the traffic for both the Text and Image Prompts based Superflows is customizable which allows the user to choose their own values for Hostname, User Agent, API Key, User Prompt, System Prompt, Gemini Model Name (Gemini-1.5-Flash, Gemini-1.5-Pro or Gemini-1.0-Pro), Gemini API Version (v1beta or v1), API Key Location (Inside URL or Inside Header), Upload File (for uploading image file), probability score for different categories inside response prompt etc during the BreakingPoint System (BPS) simulation as shown below –
Leverage Subscription Service to Stay Ahead of Attacks
Keysight’s Application and Threat Intelligence subscription provides daily malware and bi-weekly updates of the latest application protocols and vulnerabilities for use with Keysight test platforms. The ATI Research Centre continuously monitors threats as they appear in the wild. Customers of BreakingPoint now have access to attack campaigns for different advanced persistent threats, allowing BreakingPoint Customers to test their currently deployed security control’s ability to detect or block such attacks.