Spring Boot Chat Application with DeepSeek and Ollama DeepSeek java Spring AI Spring Boot Spring Boot 3 by devs5003 - February 13, 2025February 17, 20250 Last Updated on February 17th, 2025This tutorial provides a comprehensive guide to implement an intelligent chat application using Spring Boot, integrated with both DeepSeek and Ollama’s local LLM capabilities. As these are locally installed tools, there are no chances of latency from internet roundtrips. We can process requests directly on our local server. Using it, we can train the AI our unique terminology such as company jargons, industry-specific knowledge, local dialects. Additionally, we can modify the model directly without relying on cloud provider updates. At the later stage, we can transition to cloud easily as the same Spring Boot code works both ways. This approach gives ultimate control while teaching foundational AI/software integration concepts used in enterprise systems. Let’s go through a step-by-step article on Spring Boot Chat Application with DeepSeek and Ollama, complete with code examples and explanations. Table of Contents Toggle What can we learn from this Article?PrerequisitesHow to Implement Spring Boot Chat Application with DeepSeek and OllamaStep#1: Set Up DeepSeek Locally with OllamaStep#2: Create a Spring Boot ProjectStep#3: Update application.properties fileStep#4: Create a Service ClassStep#5: Create a Controller ClassStep#6: Create Chat Form as UIStep#7: Test the ApplicationCommon TroubleshootingOllama Not RunningModel Not FoundConnection ErrorsNext-Level EnhancementsPerformance OptimizationsOptimize Ollama & Model SettingsOptimize Spring Boot IntegrationHardware/Infrastructure TweaksMonitor BottlenecksExpected ResultsConclusion What can we learn from this Article? How to set up DeepSeek locally using Ollama? How to use DeepSeek with Spring Boot using Spring AI? How to build an intelligent Spring Boot Chat Application with DeepSeek and Ollama? Prerequisites Java 17+ Spring Boot 3.2+ Maven/Gradle Ollama installed locally (installation guide) Basic familiarity with Spring Boot especially Spring Boot MVC. How to Implement Spring Boot Chat Application with DeepSeek and Ollama Step#1: Set Up DeepSeek Locally with Ollama Install Ollama: Windows/macOS/Linux: Download and install Ollama from https://ollama.com. Download the DeepSeek Model Open a terminal and run: > Ollama pull deepseek-r1:1.5b This downloads the DeepSeek model to your machine. Here, ‘r1:1.5b’ is a particular version of DeepSeek. We can provide here a specific version as per our requirement. Start Ollama Run the Ollama server: > Ollama serve The server runs at http://localhost:11434 by default. Run the DeepSeek Model Run the DeepSeek Model if not running already. > Ollama run deepseek-r1:1.5b Step#2: Create a Spring Boot Project Use Spring Initializr or any IDE of your preference and add these dependencies: Spring Web Thymeleaf Ollama Spring Reactive Web Or add these to your pom.xml: <dependencies> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-thymeleaf</artifactId> </dependency> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-ollama-spring-boot-starter</artifactId> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-webflux</artifactId> </dependency> </dependencies> Step#3: Update application.properties file Add below eentries to src/main/resources/application.properties: # Ollama Configuration spring.ai.ollama.base-url=http://localhost:11434 spring.ai.ollama.chat.model=deepseek-r1:1.5b Step#4: Create a Service Class Create DeepSeekService.java as below: import org.springframework.ai.chat.client.ChatClient; import org.springframework.stereotype.Service; @Service public class DeepSeekService { private final ChatClient chatClient; public DeepSeekService(ChatClient.Builder chatClient) { this.chatClient = chatClient.build(); } public String getResponse(String prompt) { return chatClient.prompt() .user(prompt) // Send the user's input to DeepSeek .call() // Call the Ollama server .content(); // Extract the response } public String getStreamingResponse(String prompt) { return chatClient.prompt() .user(prompt) .stream() .content(); } } Step#5: Create a Controller Class Create ChatController.java as below: import org.springframework.stereotype.Controller; import org.springframework.web.bind.annotation.GetMapping; import org.springframework.web.bind.annotation.PostMapping; import org.springframework.web.bind.annotation.RequestParam; import org.springframework.web.bind.annotation.ResponseBody; import reactor.core.publisher.Flux; @Controller public class ChatController { private final DeepSeekService deepSeekService; public ChatController(DeepSeekService deepSeekService) { this.deepSeekService = deepSeekService; } // Show the chat form @GetMapping("/") public String showChat() { return "chat"; } // Receive Response from DeepSeek @PostMapping("/api/chat") @ResponseBody public Flux<String> chatStream(@RequestParam String prompt) { return deepSeekService.getStreamingResponse(prompt)      .map(chunk -> chunk.replaceAll("<think>", "") .replaceAll("</think>", "")); } } Step#6: Create Chat Form as UI Create src/main/resources/templates/chat.html: Step#7: Test the Application Start the Spring Boot app: Open your browser at http://localhost:8080. Ask any question and observe the response provided by DeepSeek. Common Troubleshooting Ollama Not Running If Ollama is not running, open the terminal and run the command: Ollama serve is running in the background. If it is already running, you will see a message “Only one usage of each socket address (protocol/network address/port) is normally permitted.”. Model Not Found Verify the model is downloaded by running the command: Ollama list. It will display all versions of DeepSeek model installed by Ollama. Connection Errors Check properly the key & value of base-url and chat model entries in application.properties. For example: spring.ai.ollama.base-url=http://localhost:11434 spring.ai.ollama.chat.model=deepseek-r1:1.5b Next-Level Enhancements In order to enhance the chat application at next level, below points can be considered: Add loading indicators while waiting for responses. Implement message timestamps. Add support for markdown/code syntax highlighting. Implement chat history with Spring Data JPA (database/local storage). Add streaming responses for real-time interaction. Below can be some production level considerations: Add rate limiting with Spring Boot Starter Actuator Implement circuit breakers with Resilience4j Add monitoring endpoints Configure proper security with Spring Security Enable request/response logging Add API documentation with Springdoc OpenAPI Furthermore, some advanced integration options can be: Add multiple Ollama model support Implement temperature/max_tokens parameters Add response caching for common queries Implement load balancing between AI providers This implementation provides a robust foundation for enterprise-grade AI chat applications. The architecture allows easy extension to additional AI providers while maintaining consistent API contracts and error handling. Performance Optimizations Chat application may require some performance optimizations at later level. Let’s discuss some of the factors that can play a role for the same one by one. Optimize Ollama & Model Settings 1) Use a Smaller Model Variant For example: If using deepseek:33b, switch to deepseek:7b (smaller, faster, but slightly less accurate). > Ollama pull deepseek:7b # Smaller model Update application.properties: spring.ai.ollama.model=deepseek:7b 2) Reduce Response Length Limit the maximum tokens (response length) in your service class: public String getResponse(String prompt) { return chatClient.prompt() .user(prompt) .options(Map.of("num_predict", 200)) // Limit response to 200 tokens .call() .content(); } 3) Run Ollama with GPU Acceleration (If available) For NVIDIA GPUs: Start Ollama with GPU support: ollama serve --gpu Verify GPU usage: Check Ollama logs for VRAM allocation. Optimize Spring Boot Integration 1) Enable Streaming Responses Modify your service and UI to stream responses incrementally (no waiting for full completion). This optimization is already in place with the demonstrated example. Service Class: public Flux<String> getStreamingResponse(String prompt) { return chatClient.prompt() .user(prompt) .stream() .content(); } Controller: @PostMapping(value = "/api/chat", produces = MediaType.TEXT_EVENT_STREAM_VALUE) public Flux<String> chatStream(@RequestParam String prompt) { return deepSeekService.getStreamingResponse(prompt); } UI Update (JavaScript): // Replace the fetch call with: const eventSource = new EventSource(`/api/chat?prompt=${encodeURIComponent(prompt)}`); eventSource.onmessage = (event) => { // Append chunks to the message bubble incrementally bubble.textContent += event.data; }; 2) Tune HTTP Timeouts Add these to application.properties to prevent delays. Below are the example entries. These keys may be different based on the version of Ollama & Spring Boot. # Reduce connection/read timeouts (in milliseconds) spring.ai.ollama.client.connect-timeout=5000 spring.ai.ollama.client.read-timeout=30000 Hardware/Infrastructure Tweaks 1) Allocate More RAM to Ollama Linux/macOS: Start Ollama with increased memory: OLLAMA_MAX_LOADED_MODELS=3 ollama serve # Allow more models in memory 2) Run Ollama and Spring Boot on the Same Machine Avoid network latency by running both locally during development. Monitor Bottlenecks 1) Check Ollama Logs Ollama logs Look for warnings like slow GPU or insufficient RAM. 2) Profile Spring Boot Add Actuator to monitor request times by adding below dependency in the Spring Boot Project: <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-actuator</artifactId> </dependency> Check metrics at http://localhost:8080/actuator/metrics/http.server.requests. Expected Results Optimization Expected Speed Improvement* Smaller model (7B vs 33B) 2-3x faster Response token limit 1.5-2x faster GPU acceleration 5-10x faster Streaming responses Perceived 2x faster Conclusion Building a Spring Boot Chat Application with DeepSeek and Ollama is a great way to integrate AI-powered conversations into our projects. We can achieve faster responses, better control over your AI model, and enhanced privacy by setting up DeepSeek locally with Ollama. This application demonstrates how Spring Boot simplifies backend development. DeepSeek provides intelligent and dynamic chat capabilities. If you are building a chatbot for customer support, personal assistance, or any AI-driven application, this setup is a solid foundation. With further enhancements like database integration, authentication, and real-time updates, you can expand the application’s functionality. The possibilities are endless, start experimenting, customize as needed, and build smarter chat applications with DeepSeek! You may also go through a detailed article on ‘DeepSeek Spring AI Integration with Spring Boot‘. For other articles, kindly visit Spring Boot Concept Tutorials with Implementations section. References: https://docs.spring.io/spring-ai/reference/api/chatclient.html https://docs.spring.io/spring-ai/reference/api/chat/deepseek-chat.html Related