What can we learn from this tutorial?

You’ll set up DeepSeek locally with Ollama, build a Spring Boot chat service, configure UI, test, and optimize performance.

How do I install and run DeepSeek via Ollama?

Use `ollama pull deepseek‑r1:1.5b`, then run `ollama serve` and `ollama run deepseek‑r1:1.5b` to start the model locally.

What dependencies and properties are required?

Include Spring Web, WebFlux, Thymeleaf, and `spring‑ai‑ollama‑spring‑boot‑starter`. Set `spring.ai.ollama.base-url` and `spring.ai.ollama.chat.model` in properties.

How do I send and stream chat responses?

Service uses `Flux ` with `.stream()` for reactive responses and you can handle streamed chunks in UI via SSE or fetch.

What are common troubleshooting tips?

Ensure Ollama is running, correct model pulled, application.properties configured properly, and dependencies not mixed.

Spring Boot Chat App With DeepSeek And Ollama

Last Updated on July 6th, 2026

This tutorial provides a comprehensive guide to implement an intelligent chat application using Spring Boot, integrated with both DeepSeek and Ollama’s local LLM capabilities. As these are locally installed tools, there are no chances of latency from internet roundtrips. We can process requests directly on our local server. Using it, we can train the AI our unique terminology such as company jargons, industry-specific knowledge, local dialects. Additionally, we can modify the model directly without relying on cloud provider updates.

At the later stage, we can transition to cloud easily as the same Spring Boot code works both ways. This approach gives ultimate control while teaching foundational AI/software integration concepts used in enterprise systems. Let’s go through a step-by-step article on Spring Boot Chat App with DeepSeek and Ollama, complete with code examples and explanations.

What can we learn from this Article?

How to set up DeepSeek locally using Ollama?
How to use DeepSeek with Spring Boot using Spring AI?
How to build an intelligent Spring Boot Chat Application with DeepSeek and Ollama?

Prerequisites

Java 17+
Spring Boot 3.2+
Maven/Gradle
Ollama installed locally (installation guide)
Basic familiarity with Spring Boot especially Spring Boot MVC.

How to Implement Spring Boot Chat App with DeepSeek and Ollama

Step#1: Set Up DeepSeek Locally with Ollama

Install Ollama: Windows/macOS/Linux: Download and install Ollama from https://ollama.com.

Download the DeepSeek Model

Open a terminal and run:

> Ollama pull deepseek-r1:1.5b

This downloads the DeepSeek model to your machine. Here, ‘r1:1.5b’ is a particular version of DeepSeek. We can provide here a specific version as per our requirement.

Start Ollama

Run the Ollama server:

> Ollama serve

The server runs at http://localhost:11434 by default.

Run the DeepSeek Model

Run the DeepSeek Model if not running already.

> Ollama run deepseek-r1:1.5b

Step#2: Create a Spring Boot Project

Use Spring Initializr or any IDE of your preference and add these dependencies:

Spring Web
Thymeleaf
Ollama
Spring Reactive Web

Or add these to your pom.xml:

<dependencies> <dependency> <groupId>org.springframework.bootgroupId> <artifactId>spring-boot-starter-webartifactId> dependency> <dependency> <groupId>org.springframework.bootgroupId> <artifactId>spring-boot-starter-thymeleafartifactId> dependency> <dependency> <groupId>org.springframework.aigroupId> <artifactId>spring-ai-ollama-spring-boot-starterartifactId> dependency> org.springframework.boot spring-boot-starter-webflux dependencies>

Step#3: Update application.properties file

Add below eentries to src/main/resources/application.properties:

# Ollama Configuration
spring.ai.ollama.base-url=http://localhost:11434
spring.ai.ollama.chat.model=deepseek-r1:1.5b

Step#4: Create a Service Class

Create DeepSeekService.java as below:

import org.springframework.ai.chat.client.ChatClient; import org.springframework.stereotype.Service; @Service public class DeepSeekService { private final ChatClient chatClient; public DeepSeekService(ChatClient.Builder chatClient) { this.chatClient = chatClient.build(); } public String getResponse(String prompt) { return chatClient.prompt() .user(prompt) // Send the user’s input to DeepSeek .call() // Call the Ollama server .content(); // Extract the response } public String getStreamingResponse(String prompt) { return chatClient.prompt() .user(prompt) .stream() .content(); } }

Step#5: Create a Controller Class

Create ChatController.java as below:

import org.springframework.stereotype.Controller; import org.springframework.web.bind.annotation.GetMapping; import org.springframework.web.bind.annotation.PostMapping; import org.springframework.web.bind.annotation.RequestParam; import org.springframework.web.bind.annotation.ResponseBody; import reactor.core.publisher.Flux; @Controller public class ChatController { private final DeepSeekService deepSeekService; public ChatController(DeepSeekService deepSeekService) { this.deepSeekService = deepSeekService; } // Show the chat form @GetMapping(“/”) public String showChat() { return “chat”; } // Receive Response from DeepSeek @PostMapping(“/api/chat”) @ResponseBody public Flux<String> chatStream(@RequestParam String prompt) { return deepSeekService.getStreamingResponse(prompt) .map(chunk -> chunk.replaceAll(“”, “”) .replaceAll(“”, “”)); } }

Step#6: Create Chat Form as UI

Create src/main/resources/templates/chat.html:

<html xmlns:th=“http://www.thymeleaf.org“>
<head>
<meta charset=“UTF-8“>
<title>DeepSeek Chattitle>
<style>
body {
margin: 0;
font-family: ‘Lucida Sans’,Roboto;
font-size: large;
text-align: center;
background: #f0f2f5;
}
.chat-container {
max-width: 800px;
margin: 20px auto;
background: white;
border-radius: 10px;
box-shadow: 0 2px 10px rgba(0,0,0,0.1);
height: 60vh;
display: flex;
flex-direction: column;
}
.chat-header {
padding: 18px;
background: mediumvioletred;
color: white;
border-radius: 10px 10px 0 0;
}
.chat-header h2 {
font-size: 2.0rem; 
margin: 0; 
}
.chat-messages {
flex: 1;
padding: 20px;
overflow-y: auto;
}
.message {
margin-bottom: 15px;
display: flex;
}
.user-message {
justify-content: flex-end;
}
.bot-message {
justify-content: flex-start;
}
.message-bubble {
max-width: 70%;
padding: 10px 15px;
border-radius: 15px;
}
.user-message .message-bubble {
background: dodgerblue;
color: white;
}
.bot-message .message-bubble {
background: lightgreen;
color: black;
}
.input-area {
padding: 15px;
border-top: 1px solid #ddd;
display: flex;
gap: 10px;
}
#message-input {
flex: 1;
padding: 10px;
border: 1px solid #ddd;
border-radius: 20px;
outline: none;
}
#send-button {
padding: 10px 20px;
background: green;
color: white;
border: none;
border-radius: 20px;
cursor: pointer;
}
style>
head>
<body>
<div class=“chat-container“>
<div class=“chat-header“>
<h2>Spring Boot DeepSeek Chat Applicationh2>
div>
<div class=“chat-messages“ id=“message-container“>
div>
<div class=“input-area“>
<input type=“text“ id=“message-input“ placeholder=“Type your message…“>
<button id=“send-button“>Sendbutton>
div>
div>
<script>
const messageInput = document.getElementById(‘message-input’);
const sendButton = document.getElementById(‘send-button’);
const messageContainer = document.getElementById(‘message-container’);
 // Handle sending messages
async function sendMessage() {
const prompt = messageInput.value.trim();
if (!prompt) return;
 // Add user message
appendMessage(prompt, ‘user’);
messageInput.value = ”;
try {
 // Get response from backend
const response = await fetch(‘/api/chat’, {
method: ‘POST’,
headers: {
‘Content-Type’: ‘application/x-www-form-urlencoded’,
},
body: `prompt=${encodeURIComponent(prompt)}`
});
const botResponse = await response.text();
 // Add bot response
appendMessage(botResponse, ‘bot’);
} catch (error) {
appendMessage(“Error: Could not get response”, ‘bot’);
}
}
 // Add messages to the chat
function appendMessage(text, sender) {
const messageDiv = document.createElement(‘div’);
messageDiv.className = `message ${sender}-message`;
const bubble = document.createElement(‘div’);
bubble.className = ‘message-bubble’;
bubble.textContent = text;
messageDiv.appendChild(bubble);
messageContainer.appendChild(messageDiv);
 // Auto-scroll to bottom
messageContainer.scrollTop = messageContainer.scrollHeight;
}
 // Event listeners
sendButton.addEventListener(‘click’, sendMessage);
messageInput.addEventListener(‘keypress’, (e) => {
if (e.key === ‘Enter’) sendMessage();
});
script>
body>
html>

Step#7: Test the Application

Start the Spring Boot app:

Open your browser at http://localhost:8080.
Ask any question and observe the response provided by DeepSeek.

Common Troubleshooting

Ollama Not Running

If Ollama is not running, open the terminal and run the command: Ollama serve is running in the background.

If it is already running, you will see a message “Only one usage of each socket address (protocol/network address/port) is normally permitted.”.

Model Not Found

Verify the model is downloaded by running the command: Ollama list. It will display all versions of DeepSeek model installed by Ollama.

Connection Errors

Check properly the key & value of base-url and chat model entries in application.properties. For example:

spring.ai.ollama.base-url=http://localhost:11434
spring.ai.ollama.chat.model=deepseek-r1:1.5b

Next-Level Enhancements

In order to enhance the chat application at next level, below points can be considered:

Add loading indicators while waiting for responses.
Implement message timestamps.
Add support for markdown/code syntax highlighting.
Implement chat history with Spring Data JPA (database/local storage).
Add streaming responses for real-time interaction.

Below can be some production level considerations:

Add rate limiting with Spring Boot Starter Actuator
Implement circuit breakers with Resilience4j
Add monitoring endpoints
Configure proper security with Spring Security
Enable request/response logging
Add API documentation with Springdoc OpenAPI

Furthermore, some advanced integration options can be:

Add multiple Ollama model support
Implement temperature/max_tokens parameters
Add response caching for common queries
Implement load balancing between AI providers

This implementation provides a robust foundation for enterprise-grade AI chat applications. The architecture allows easy extension to additional AI providers while maintaining consistent API contracts and error handling.

Performance Optimizations

Chat application may require some performance optimizations at later level. Let’s discuss some of the factors that can play a role for the same one by one.

Optimize Ollama & Model Settings

1) Use a Smaller Model Variant

For example: If using deepseek:33b, switch to deepseek:7b (smaller, faster, but slightly less accurate).

> Ollama pull deepseek:7b  # Smaller model

Update application.properties:

spring.ai.ollama.model=deepseek:7b

2) Reduce Response Length

Limit the maximum tokens (response length) in your service class:

public String getResponse(String prompt) {
    return chatClient.prompt()
        .user(prompt)
        .options(Map.of("num_predict", 200)) // Limit response to 200 tokens
        .call()
        .content();
}

3) Run Ollama with GPU Acceleration (If available)

For NVIDIA GPUs: Start Ollama with GPU support:
```
ollama serve --gpu
```
Verify GPU usage: Check Ollama logs for VRAM allocation.

Optimize Spring Boot Integration

1) Enable Streaming Responses

Modify your service and UI to stream responses incrementally (no waiting for full completion). This optimization is already in place with the demonstrated example.

Service Class:

public Flux<String> getStreamingResponse(String prompt) {
    return chatClient.prompt()
        .user(prompt)
        .stream()
        .content();
}

Controller:

@PostMapping(value = "/api/chat", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<String> chatStream(@RequestParam String prompt) {
    return deepSeekService.getStreamingResponse(prompt);
}

UI Update (JavaScript):

// Replace the fetch call with:
const eventSource = new EventSource(`/api/chat?prompt=${encodeURIComponent(prompt)}`);

eventSource.onmessage = (event) => {
    // Append chunks to the message bubble incrementally
    bubble.textContent += event.data;
};

2) Tune HTTP Timeouts

Add these to application.properties to prevent delays. Below are the example entries. These keys may be different based on the version of Ollama & Spring Boot.

# Reduce connection/read timeouts (in milliseconds)
spring.ai.ollama.client.connect-timeout=5000
spring.ai.ollama.client.read-timeout=30000

Hardware/Infrastructure Tweaks

1) Allocate More RAM to Ollama

Linux/macOS: Start Ollama with increased memory:

OLLAMA_MAX_LOADED_MODELS=3 ollama serve  # Allow more models in memory

2) Run Ollama and Spring Boot on the Same Machine

Avoid network latency by running both locally during development.

Monitor Bottlenecks

1) Check Ollama Logs

Ollama logs

Look for warnings like slow GPU or insufficient RAM.

2) Profile Spring Boot

Add Actuator to monitor request times by adding below dependency in the Spring Boot Project:

<dependency> <groupId>org.springframework.bootgroupId> <artifactId>spring-boot-starter-actuatorartifactId> dependency>

Run HTML

Check metrics at http://localhost:8080/actuator/metrics/http.server.requests.

Expected Results

Optimization	Expected Speed Improvement*
Smaller model (7B vs 33B)	2-3x faster
Response token limit	1.5-2x faster
GPU acceleration	5-10x faster
Streaming responses	Perceived 2x faster

Conclusion

Building a Spring Boot Chat Application with DeepSeek and Ollama is a great way to integrate AI-powered conversations into our projects. We can achieve faster responses, better control over our AI model, and enhanced privacy by setting up DeepSeek locally with Ollama.

This application demonstrates how Spring Boot simplifies backend development. DeepSeek provides intelligent and dynamic chat capabilities. If you are building a chatbot for customer support, personal assistance, or any AI-driven application, this setup is a solid foundation.

With further enhancements like database integration, authentication, and real-time updates, we can expand the application’s functionality. The possibilities are endless. We can start experimenting, customize as needed, and build smarter chat applications with DeepSeek.

You may also go through a detailed article on ‘DeepSeek Spring AI Integration with Spring Boot‘.

For other articles, kindly visit Spring Boot Concept Tutorials with Implementations section.

References:

https://docs.spring.io/spring-ai/reference/api/chatclient.html

https://docs.spring.io/spring-ai/reference/api/chat/deepseek-chat.html

Spring Boot Chat App with DeepSeek and Ollama