Spring AI with Gemini Free Tier: Build AI-Powered Java Apps AI for developers and Programmers Google Gemini java Spring Spring AI Spring Boot by devs5003 - May 6, 2026May 8, 20260 Last Updated on May 8th, 2026 Spring AI with Gemini (Free Tier): Build AI-Powered Java Apps Without Spending a Penny If you’ve been working to add AI capabilities to your Spring Boot applications but didn’t want to deal with billing accounts or complex cloud setups, this guide is exactly for you. Google provides a free tier for its Gemini models via Google AI Studio, and Spring AI makes consuming it surprisingly simple. No Vertex AI. No Google Cloud project. Just an API key and a few lines of Java code. Let’s walk through everything from scratch. Table of Contents Toggle What is Spring AI?Why Gemini Free Tier?Google AI Studio vs Vertex AIThe Secret: OpenAI CompatibilityArchitecture OverviewUserREST Controller (ChatController)ChatClient (Spring AI)OpenAI-Compatible AdapterGoogle Gemini API (Free Tier)Gemini Model (gemini-2.0-flash)The Return FlowStep-by-Step: Build Spring AI with Gemini Free TierStep#1: Get Your Free API KeyStep#2: Create the Spring Boot ProjectStep#3: Configure application.propertiesStep#4: Create the ChatClient BeanStep#5: Build the REST ControllerStep#6: Add a Dedicated “Service Layer” SectionStep#7: Add a System Prompt (Optional but Recommended)Step#8: Add a “Multimodal: Image + Text” SectionStep#9: Add a “Testing Spring AI” SectionStep#10: Test With cURLGoing Further: Streaming ResponsesAdvanced: Prompt TemplatesFree Tier Limits to Know“Free Tier vs Paid Migration” Checklist Production ConsiderationsWhat Can You Build With This?FAQsQ#1. Is Google Gemini API completely free for Spring Boot projects?Q#2. Do I need Vertex AI to use Gemini with Spring AI?Q#3. Which Gemini model is best for the Spring AI free tier?Q#4. What is the difference between Spring AI and LangChain4j?Q#5. Can Spring AI stream responses from Gemini?Q#6. How do I handle rate limit errors (429) when using Gemini Free Tier in Spring AI?Quick ReferenceRelated What is Spring AI? Spring AI is a framework that brings the power of Large Language Models (LLMs) into the familiar world of Spring Boot. Think of it like Spring Data, but for AI; instead of connecting to a database, you connect to an AI model. It gives you a consistent API to chat with OpenAI, Google Gemini, Anthropic Claude, and others, so you’re not locked into any single provider. Here’s what makes it really useful for Java developers: Model Abstraction: Switch between AI providers without rewriting business logic Prompt Templates: Define reusable, parameterized prompts cleanly Streaming Responses: Get responses token-by-token, great for chat UIs Spring Boot Integration: Works with @Bean, dependency injection, and application.properties like any other Spring component Why Gemini Free Tier? Google’s Gemini API, available through Google AI Studio, offers a genuinely free plan with no billing account required. This makes it perfect for: Proof-of-concepts (POCs) and demos Learning Spring AI without investing money Personal projects and side experiments Interview-prep apps, chatbots, Q&A tools etc. The free models available include Gemini 2.0 Flash (very capable, fast, and at $0.00 cost for input/output tokens) and Gemini 2.5 Pro (limited to 25 requests/day on free tier). Important distinction: The free Gemini API (via AI Studio) is completely different from Vertex AI. Vertex AI requires a Google Cloud project with billing enabled. The AI Studio API just needs a Google account and an API key. Google AI Studio vs Vertex AI Feature Google AI Studio (Free) Vertex AI (Paid) Sign-up Google account only Google Cloud account + billing Authentication Simple API key Service account / OAuth Cost Free (rate limits apply) Pay per token Enterprise features None Encryption, VPC, data residency Ideal for POCs, learning, testing Production, enterprise apps Playground AI Studio Vertex AI Studio The Secret: OpenAI Compatibility Here’s the clever trick. Google made its Gemini API compatible with OpenAI’s interface. That means you can use Spring AI’s ‘spring-ai-starter-model-openai’ library to talk to Gemini, no Gemini-specific SDK needed. You simply point the base URL to Google’s endpoint instead of OpenAI’s. This approach saves you from the complex Vertex AI setup and lets you use a single, well-supported library. Architecture Overview Here’s what the full request/response flow looks like inside your application: When your user hits the REST endpoint, the request flows through ChatClient, which uses the OpenAI-compatible adapter pointed at Google’s servers. Gemini processes the prompt and sends back a response that Spring AI wraps into a clean Java object. User This is the starting point, a person (or client system) sending a message, typically via an HTTP POST request to your REST endpoint. For example, calling /chat/ask with a question like “Answer questions clearly with code examples. Focus on Java 17+ features and Spring Boot.”. REST Controller (ChatController) The entry point of your Spring Boot application. The @RestController receives the HTTP request, extracts the user’s message from the request body, and delegates it to the next layer. It doesn’t contain any AI logic itself. It just routes the call. ChatClient (Spring AI) This is the core abstraction provided by Spring AI. ChatClient is the equivalent of a template class, similar to JdbcTemplate or RestTemplate, but for AI models. It takes your prompt, manages the request lifecycle, and handles the response. You interact with it using the fluent chain: chatClient.prompt(msg).call().content(). OpenAI-Compatible Adapter This is the clever bridge layer. Instead of using a Gemini-specific SDK, Spring AI uses its OpenAI adapter, but points it at Google’s servers. Google designed Gemini’s API to be fully compatible with the OpenAI REST interface, so this adapter translates your ChatClient call into a standard HTTP request that Gemini understands. This is why you use spring-ai-starter-model-openai in your pom.xml and set the base-url to https://generativelanguage.googleapis.com/v1beta/openai. Google Gemini API (Free Tier) This is Google’s server-side API endpoint, hosted at generativelanguage.googleapis.com. It receives the HTTP request (with your API key in the header), authenticates it, and forwards it to the actual AI model. The free tier allows up to 1,500 requests/day for Gemini 2.0 Flash with no billing required. Gemini Model (gemini-2.0-flash) This is the actual Large Language Model (LLM), the AI brain. It processes your prompt, generates a response token by token, and sends it back up the chain. gemini-2.0-flash is Google’s fast, lightweight model optimized for speed and high request volume, making it perfect for the free tier. The Return Flow The response travels back up the same stack in reverse: Gemini model generates the text response Google’s API wraps it in an OpenAI-compatible JSON response Spring AI’s OpenAI adapter deserializes it into a ChatResponse object ChatClient extracts the content ChatController returns it as an HTTP response to the user Step-by-Step: Build Spring AI with Gemini Free Tier Step#1: Get Your Free API Key Go to Google AI Studio and sign in with your Google account Click “Get API Key” → “Create API Key” Copy the key and store it securely. Never hardcode it in source code Set it as an environment variable: export GEMINI_API_KEY=your_key_here Step#2: Create the Spring Boot Project Use https://start.spring.io and add: Spring Web (for REST endpoints) OpenAI (yes, OpenAI, not Vertex AI Gemini) Your pom.xml should look like this: The spring-ai-bom (Bill of Materials) ensures all Spring AI components use compatible versions. <dependencies> <dependency> <groupId>org.springframework.bootgroupId> <artifactId>spring-boot-starter-webartifactId> dependency> <dependency> <groupId>org.springframework.aigroupId> <artifactId>spring-ai-starter-model-openaiartifactId> dependency> dependencies> <dependencyManagement> <dependencies> <dependency> <groupId>org.springframework.aigroupId> <artifactId>spring-ai-bomartifactId> <version>1.0.0version> <type>pomtype> <scope>importscope> dependency> dependencies> dependencyManagement> Step#3: Configure application.properties This is the most important part. Instead of pointing to OpenAI’s servers, you redirect to Google’s OpenAI-compatible endpoint: # application.properties spring.ai.openai.api-key=${GEMINI_API_KEY} spring.ai.openai.base-url=https://generativelanguage.googleapis.com/v1beta/openai spring.ai.openai.chat.completions-path=/chat/completions spring.ai.openai.chat.options.model=gemini-2.0-flash-exp Or in application.yml if you prefer YAML: spring: ai: openai: api-key: "${GEMINI_API_KEY}" base-url: https://generativelanguage.googleapis.com/v1beta/openai chat: completions-path: /chat/completions options: model: gemini-2.0-flash-exp Want to try the more powerful model? Just change the model name to gemini-2.5-pro-exp-03-25, but remember the 25 requests/day cap on the free tier. Step#4: Create the ChatClient Bean Spring AI’s ChatClient is your primary interface for talking to Gemini. Register it as a Spring @Bean: // ChatClientConfig.java import org.springframework.ai.chat.client.ChatClient; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; @Configuration public class ChatClientConfig { @Bean public ChatClient chatClient(ChatClient.Builder builder) { return builder.build(); } } Spring Boot’s auto-configuration does most of the heavy lifting here. You just build the client from the injected Builder, and it picks up all your ‘application.properties’ settings automatically. Step#5: Build the REST Controller Now let’s expose a simple chat endpoint: // ChatController.java import org.springframework.ai.chat.client.ChatClient; import org.springframework.web.bind.annotation.*; @RestController @RequestMapping("/chat") public class ChatController { private final ChatClient chatClient; public ChatController(ChatClient chatClient) { this.chatClient = chatClient; } @PostMapping("/ask") public String ask(@RequestBody String userMessage) { return chatClient .prompt(userMessage) .call() .content(); } } The prompt() → call() → content() chain is the core Spring AI pattern. It sends the message, waits for the full response, and returns it as a plain String. Step#6: Add a Dedicated “Service Layer” Section The service class with proper Spring patterns: // GeminiChatService.java @Service public class GeminiChatService { private final ChatClient chatClient; public GeminiChatService(ChatClient chatClient) { this.chatClient = chatClient; } public String askQuestion(String question) { return chatClient .prompt(question) .call() .content(); } public String askWithPersona(String systemPrompt, String userQuestion) { return chatClient .prompt() .system(systemPrompt) .user(userQuestion) .call() .content(); } public Flux streamAnswer(String question) { return chatClient .prompt(question) .stream() .content(); } } This also makes the controller much cleaner & show the refactored controller calling the service instead of ChatClient directly Step#7: Add a System Prompt (Optional but Recommended) System prompts let you set the AI’s persona or scope of answers. For example, if you’re building a Java interview prep tool: @PostMapping("/java-interview") public String javaInterview(@RequestBody String question) { return chatClient .prompt() .system("You are an expert Java interviewer. Answer questions clearly with code examples. Focus on Java 17+ features and Spring Boot.") .user(question) .call() .content(); } This is incredibly useful when you want Gemini to stay focused on a specific domain without going off-topic. Step#8: Add a “Multimodal: Image + Text” Section Gemini is multimodal. It can analyze images. This is a powerful feature most readers won’t know Spring AI supports: @PostMapping("/analyze-image") public String analyzeImage(@RequestParam String imageUrl, @RequestParam String question) { var userMessage = new UserMessage(question, List.of(new ImageContent(imageUrl)) ); return chatClient .prompt(new Prompt(userMessage)) .call() .content(); } Step#9: Add a “Testing Spring AI” Section Show how to write a simple test using MockMvc and a mocked ChatClient: @WebMvcTest(ChatController.class) class ChatControllerTest { @MockBean ChatClient chatClient; @Autowired MockMvc mockMvc; @Test void shouldReturnAiResponse() throws Exception { when(chatClient.prompt(anyString()) .call().content()) .thenReturn("HashMap is not thread-safe."); mockMvc.perform(post("/chat/ask") .content("Explain HashMap") .contentType(MediaType.TEXT_PLAIN)) .andExpect(status().isOk()) .andExpect(content().string(containsString("HashMap"))); } } Step#10: Test With cURL Run your Spring Boot application (mvn spring-boot:run) and test: # Basic chat curl -X POST http://localhost:8080/chat/ask \ -H "Content-Type: text/plain" \ -d "Explain the difference between HashMap and ConcurrentHashMap in Java" # Java interview endpoint curl -X POST http://localhost:8080/chat/java-interview \ -H "Content-Type: text/plain" \ -d "What is the purpose of the volatile keyword in Java?" You’ll get a detailed AI-generated response from Gemini within seconds. Going Further: Streaming Responses For a real chat-like experience, use streaming instead of waiting for the full response: @GetMapping(value = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE) public Flux streamChat(@RequestParam String message) { return chatClient .prompt(message) .stream() .content(); } The stream() method returns a Flux, which emits tokens as Gemini generates them, perfect for building real-time chat UIs with Server-Sent Events (SSE). Advanced: Prompt Templates Hard-coding prompts inside controller methods gets messy fast. Spring AI’s PromptTemplate keeps things clean and reusable: import org.springframework.ai.chat.prompt.PromptTemplate; @PostMapping("/review") public String reviewCode(@RequestBody String code) { PromptTemplate template = new PromptTemplate(""" You are a senior Java developer. Review the following code and highlight any issues, suggest improvements, and explain best practices: {code} """); var prompt = template.create(Map.of("code", code)); return chatClient.prompt(prompt).call().content(); } Templates support {variable} placeholders, making it trivial to build parameterized AI prompts for things like code review, interview feedback, or document summarization. Free Tier Limits to Know Before you go build-crazy, here are the rate limits to keep in mind: Model Requests/Minute Requests/Day Notes Gemini 2.0 Flash 15 RPM 1,500/day Best for free dev Gemini 2.5 Flash 10 RPM 500/day Good balance Gemini 2.5 Pro 5 RPM 25/day High quality, limited For POCs and learning, Gemini 2.0 Flash is your best bet. It’s fast, capable, and has the most generous free quota. “Free Tier vs Paid Migration” Checklist A practical checklist: Replace AI Studio API key with Vertex AI service account credentials Change base-url to your Google Cloud project’s Vertex endpoint Add spring-ai-starter-model-vertex-ai-gemini dependency Set up IAM roles (roles/aiplatform.user) Enable request logging and monitoring Remove retry workarounds (paid tier handles higher volume natively) Production Considerations The free tier is genuinely great for learning and prototyping, but before you go live, keep these in mind: Never commit your API key: use environment variables or Spring Cloud Config Add proper error handling:handle 429 Too Many Requests gracefully with retry logic Switch to Vertex AI for production: it provides SLAs, enterprise security, and no rate limits for paying customers Use @Value(“${spring.ai.openai.api-key}”) validation at startup to catch missing keys early Cache common responses: don’t call Gemini for the same question twice if you can avoid it What Can You Build With This? Here are some practical project ideas to get you going: Java Interview Chatbot: Answer interview questions with detailed explanations and code examples Code Reviewer API: Accept a code snippet, return feedback and suggestions Spring Boot Documentation Assistant: Ask questions about your own codebase Resume Analyser: Match a JD against a resume and score relevance SQL Query Generator: Convert plain English to SQL queries All of these are completely achievable using the free Gemini tier with Spring AI, and each one makes for a great portfolio project or digital product. FAQs Q#1. Is Google Gemini API completely free for Spring Boot projects? Yes, absolutely. Google Gemini offers a free tier via Google AI Studio with no billing account required. You simply sign in with your Google account, generate an API key, and start using it. The Gemini 2.0 Flash model allows up to 1,500 requests per day for free more than enough for learning, prototyping, and building personal projects with Spring Boot. Q#2. Do I need Vertex AI to use Gemini with Spring AI? No, not at all. This is one of the most common misconceptions. You do not need a Google Cloud project, a service account, or Vertex AI. Instead, you can use the spring-ai-starter-model-openai dependency and simply point the base-url to Google’s OpenAI-compatible endpoint: https://generativelanguage.googleapis.com/v1beta/openai No Vertex AI. No Google Cloud billing. Just a free API key. Q#3. Which Gemini model is best for the Spring AI free tier? Gemini 2.0 Flash is the best choice for free tier usage. Here’s why: Up to 1,500 requests/day (most generous free quota) Fast response times: optimised for low latency Supports multimodal input (text + images) More than capable for chat, Q&A, and code review use cases If you need higher quality output and can live with only 25 requests/day, gemini-2.5-pro is available on the free tier as well. Q#4. What is the difference between Spring AI and LangChain4j? Both are Java frameworks for building AI-powered applications, but they serve slightly different audiences: Feature Spring AI LangChain4j Native Spring support Yes Partial Learning curve for Spring devs Low Medium–High Agent & RAG workflows Basic Advanced @Bean / application.properties Fully supported Not native Best for Spring Boot apps Complex AI pipelines Spring AI feels like home if you already know Spring Boot. LangChain4j shines when you need advanced agent orchestration, memory chains, or complex RAG (Retrieval-Augmented Generation) pipelines. Q#5. Can Spring AI stream responses from Gemini? Yes! Spring AI supports streaming out of the box. Instead of waiting for the full response, you can use the .stream() method which returns a Flux emitting tokens in real-time as Gemini generates them: @GetMapping(value = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE) public Flux stream(@RequestParam String message) { return chatClient .prompt(message) .stream() .content(); } This is perfect for building real-time chat UIs using Server-Sent Events (SSE). The user sees the response appearing word by word just like ChatGPT instead of waiting for the whole answer. Q#6. How do I handle rate limit errors (429) when using Gemini Free Tier in Spring AI? Rate limit errors are inevitable on the free tier, especially with gemini-2.5-pro (only 25 req/day). Spring AI has built-in retry support that you can configure directly in application.properties: # Retry configuration for rate limits spring.ai.retry.max-attempts=3 spring.ai.retry.on-http-codes=429,500,503 spring.ai.retry.backoff.initial-interval=2000 spring.ai.retry.backoff.multiplier=2 spring.ai.retry.backoff.max-interval=30000 This tells Spring AI to automatically retry up to 3 times with an exponential backoff, starting at 2 seconds and doubling each time. For more control, you can also add a global exception handler: @RestControllerAdvice public class AiExceptionHandler { @ExceptionHandler(Exception.class) public ResponseEntity handleAiError(Exception ex) { if (ex.getMessage() != null && ex.getMessage().contains("429")) { return ResponseEntity .status(HttpStatus.TOO_MANY_REQUESTS) .body("Rate limit reached. Please wait a moment and try again."); } return ResponseEntity .status(HttpStatus.INTERNAL_SERVER_ERROR) .body("AI service error: " + ex.getMessage()); } } Best practices to avoid hitting rate limits: Use gemini-2.0-flash as your default (1,500 req/day is generous) Cache responses for identical or similar prompts using Spring Cache (@Cacheable) Add a request queue for high-traffic endpoints Log API usage to monitor how close you are to the daily limit Switch to paid tier (Vertex AI) once your app goes to production Quick Reference 📦 Dependency : spring-ai-starter-model-openai 🔑 API Key From : aistudio.google.com 🌐 Base URL : https://generativelanguage.googleapis.com/v1beta/openai 🤖 Model : gemini-2.0-flash-exp 💡 Pattern : chatClient.prompt(msg).call().content() 🆓 Free Limit : 1,500 req/day (Gemini 2.0 Flash) Spring AI with Gemini is one of those rare combinations where the barrier to entry is almost zero, a Google account, a few Maven dependencies, and about 20 lines of configuration. The fact that Google made Gemini OpenAI-compatible means the entire Spring AI ecosystem just works without any Gemini-specific plumbing. For Java developers who want to experiment with AI in 2025, this is the most practical starting point available. Spring AI Concepts Article: How to integrate AI with Spring Boot?. For other Java related topics, kindly go through: Microservices Tutorial, Spring Boot Tutorial, Core Java, System Design Tutorial, Java MCQs/Quizzes, Java Design Patterns etc. Related