
Use the following context to answer the user's question.
If the question cannot be answered from the context, state that clearly.
Context:
{context}
Question:
{question}
Then I created a new SpringAIRagService:
package com.infoworld.springaidemo.service;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.ai.chat.prompt.PromptTemplate;
import org.springframework.ai.document.Document;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.ai.vectorstore.SearchRequest;
import org.springframework.stereotype.Service;
@Service
public class SpringAIRagService {
@Value("classpath:/templates/rag-template.st")
private Resource promptTemplate;
private final ChatClient chatClient;
private final VectorStore vectorStore;
public SpringAIRagService(ChatClient.Builder chatClientBuilder, VectorStore vectorStore) {
this.chatClient = chatClientBuilder.build();
this.vectorStore = vectorStore;
}
public String query(String question) {
SearchRequest searchRequest = SearchRequest.builder()
.query(question)
.topK(2)
.build();
List<Document> similarDocuments = vectorStore.similaritySearch(searchRequest);
String context = similarDocuments.stream()
.map(Document::getText)
.collect(Collectors.joining("\n"));
Prompt prompt = new PromptTemplate(promptTemplate)
.create(Map.of("context", context, "question", question));
return chatClient.prompt(prompt)
.call()
.content();
}
}
The SpringAIRagService wires in a ChatClient.Builder, which we use to build a ChatClient, along with our VectorStore. The query() method accepts a question and uses the VectorStore to build the context. First, we need to build a SearchRequest, which we do by:
- Invoking its static
builder()method. - Passing the question as the query.
- Using the
topK() method to specify how many documents we want to retrieve from the vector store. - Calling its
build()method.
In this case, we want to retrieve the top two documents that are most similar to the question. In practice, you’ll use something larger, such as the top three or top five, but since we only have three documents, I limited it to two.
Next, we invoke the vector store’s similaritySearch() method, passing it our SearchRequest. The similaritySearch() method will use the vector store’s embedding model to create a multidimensional vector of the question. It will then compare that vector to each document and return the documents that are most similar to the question. We stream over all similar documents, get their text, and build a context String.
Next, we create our prompt, which tells the LLM to answer the question using the context. Note that it is important to tell the LLM to use the context to answer the question and, if it cannot, to state that it cannot answer the question from the context. If we don’t provide these instructions, the LLM will use the data it was trained on to answer the question, which means it will use information not in the context we’ve provided.
Finally, we build the prompt, setting its context and question, and invoke the ChatClient. I added a SpringAIRagController to handle POST requests and pass them to the SpringAIRagService:
package com.infoworld.springaidemo.web;
import com.infoworld.springaidemo.model.SpringAIQuestionRequest;
import com.infoworld.springaidemo.model.SpringAIQuestionResponse;
import com.infoworld.springaidemo.service.SpringAIRagService;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RestController;
@RestController
public class SpringAIRagController {
private final SpringAIRagService springAIRagService;
public SpringAIRagController(SpringAIRagService springAIRagService) {
this.springAIRagService = springAIRagService;
}
@PostMapping("/springAIQuestion")
public ResponseEntity<SpringAIQuestionResponse> askAIQuestion(@RequestBody SpringAIQuestionRequest questionRequest) {
String answer = springAIRagService.query(questionRequest.question());
return ResponseEntity.ok(new SpringAIQuestionResponse(answer));
}
}
The askAIQuestion() method accepts a SpringAIQuestionRequest, which is a Java record:
package com.infoworld.springaidemo.model;
public record SpringAIQuestionRequest(String question) {
}
The SpringAIQuestionRequest returns a SpringAIQuestionResponse:
package com.infoworld.springaidemo.model;
public record SpringAIQuestionResponse(String answer) {
}
Now restart your application and execute a POST to /springAIQuestion. In my case, I sent the following request body:
{
"question": "Does Spring AI support RAG?"
}
And received the following response:
{
"answer": "Yes. Spring AI explicitly supports Retrieval Augmented Generation (RAG), including chat memory, integrations with major vector stores, a portable vector store API with metadata filtering, and a document injection ETL framework to build RAG pipelines."
}
As you can see, the LLM used the context of the documents we loaded into the vector store to answer the question. We can further test whether it is following our directions by asking a question that is not in our context:
{
"question": "Who created Java?"
}
Here is the LLM’s response:
{
"answer": "The provided context does not include information about who created Java."
}
This is an important validation that the LLM is only using the provided context to answer the question and not using its training data or, worse, trying to make up an answer.
Conclusion
This article introduced you to using Spring AI to incorporate large language model capabilities into Spring-based applications. You can configure LLMs and other AI technologies using Spring’s standard application.yaml file, then wire them into Spring components. Spring AI provides an abstraction to interact with LLMs, so you don’t need to use LLM-specific SDKs. For experienced Spring developers, this entire process is similar to how Spring Data abstracts database interactions using Spring Data interfaces.
In this example, you saw how to configure and use a large language model in a Spring MVC application. We configured OpenAI to answer simple questions, introduced prompt templates to externalize LLM prompts, and concluded by using a vector store to implement a simple RAG service in our example application.
Spring AI has a robust set of capabilities, and we’ve only scratched the surface of what you can do with it. I hope the examples in this article provide enough foundational knowledge to help you start building AI applications using Spring. Once you are comfortable with configuring and accessing large language models in your applications, you can dive into more advanced AI programming, such as building AI agents to improve your business processes.
Read next: The hidden skills behind the AI engineer.

