You are here

Spring AI with Gemini Free Tier: Build AI-Powered Java Apps

Spring AI with Gemini (Free Tier): Build AI-Powered Java Apps Without Spending a Penny

Spring AI with Gemini Free TierIf you’ve been working to add AI capabilities to your Spring Boot applications but didn’t want to deal with billing accounts or complex cloud setups, this guide is exactly for you. Google provides a free tier for its Gemini models via Google AI Studio, and Spring AI makes consuming it surprisingly simple. No Vertex AI. No Google Cloud project. Just an API key and a few lines of Java code.

Let’s walk through everything from scratch.

Spring AI is a framework that brings the power of Large Language Models (LLMs) into the familiar world of Spring Boot. Think of it like Spring Data, but for AI; instead of connecting to a database, you connect to an AI model. It gives you a consistent API to chat with OpenAI, Google Gemini, Anthropic Claude, and others, so you’re not locked into any single provider.

Here’s what makes it really useful for Java developers:

  • Model Abstraction: Switch between AI providers without rewriting business logic

  • Prompt Templates: Define reusable, parameterized prompts cleanly

  • Streaming Responses: Get responses token-by-token, great for chat UIs

  • Spring Boot Integration: Works with @Bean, dependency injection, and application.properties like any other Spring component

Why Gemini Free Tier?

Google’s Gemini API, available through Google AI Studio, offers a genuinely free plan with no billing account required. This makes it perfect for:

  • Proof-of-concepts (POCs) and demos

  • Learning Spring AI without investing money

  • Personal projects and side experiments

  • Interview-prep apps, chatbots, Q&A tools etc.

The free models available include Gemini 2.0 Flash (very capable, fast, and at $0.00 cost for input/output tokens) and Gemini 2.5 Pro (limited to 25 requests/day on free tier).

Important distinction: The free Gemini API (via AI Studio) is completely different from Vertex AI. Vertex AI requires a Google Cloud project with billing enabled. The AI Studio API just needs a Google account and an API key.

Google AI Studio vs Vertex AI

Feature Google AI Studio (Free) Vertex AI (Paid)
Sign-up Google account only Google Cloud account + billing
Authentication Simple API key Service account / OAuth
Cost Free (rate limits apply) Pay per token
Enterprise features None Encryption, VPC, data residency
Ideal for POCs, learning, testing Production, enterprise apps
Playground AI Studio Vertex AI Studio

The Secret: OpenAI Compatibility

Here’s the clever trick. Google made its Gemini API compatible with OpenAI’s interface. That means you can use Spring AI’s ‘spring-ai-starter-model-openai’ library to talk to Gemini, no Gemini-specific SDK needed. You simply point the base URL to Google’s endpoint instead of OpenAI’s.

This approach saves you from the complex Vertex AI setup and lets you use a single, well-supported library.

Architecture Overview

Spring AI with Gemini Free TierHere’s what the full request/response flow looks like inside your application:

When your user hits the REST endpoint, the request flows through ChatClient, which uses the OpenAI-compatible adapter pointed at Google’s servers. Gemini processes the prompt and sends back a response that Spring AI wraps into a clean Java object.

User

This is the starting point, a person (or client system) sending a message, typically via an HTTP POST request to your REST endpoint. For example, calling /chat/ask with a question like Answer questions clearly with code examples. Focus on Java 17+ features and Spring Boot.”.

REST Controller (ChatController)

The entry point of your Spring Boot application. The @RestController receives the HTTP request, extracts the user’s message from the request body, and delegates it to the next layer. It doesn’t contain any AI logic itself. It just routes the call.

ChatClient (Spring AI)

This is the core abstraction provided by Spring AI. ChatClient is the equivalent of a template class, similar to JdbcTemplate or RestTemplate, but for AI models. It takes your prompt, manages the request lifecycle, and handles the response. You interact with it using the fluent chain: chatClient.prompt(msg).call().content().

OpenAI-Compatible Adapter

This is the clever bridge layer. Instead of using a Gemini-specific SDK, Spring AI uses its OpenAI adapter, but points it at Google’s servers. Google designed Gemini’s API to be fully compatible with the OpenAI REST interface, so this adapter translates your ChatClient call into a standard HTTP request that Gemini understands. This is why you use spring-ai-starter-model-openai in your pom.xml and set the base-url to https://generativelanguage.googleapis.com/v1beta/openai.

Google Gemini API (Free Tier)

This is Google’s server-side API endpoint, hosted at generativelanguage.googleapis.com. It receives the HTTP request (with your API key in the header), authenticates it, and forwards it to the actual AI model. The free tier allows up to 1,500 requests/day for Gemini 2.0 Flash with no billing required.

Gemini Model (gemini-2.0-flash)

This is the actual Large Language Model (LLM), the AI brain. It processes your prompt, generates a response token by token, and sends it back up the chain. gemini-2.0-flash is Google’s fast, lightweight model optimized for speed and high request volume, making it perfect for the free tier.

The Return Flow

The response travels back up the same stack in reverse:

  1. Gemini model generates the text response
  2. Google’s API wraps it in an OpenAI-compatible JSON response
  3. Spring AI’s OpenAI adapter deserializes it into a ChatResponse object
  4. ChatClient extracts the content
  5. ChatController returns it as an HTTP response to the user

Step-by-Step: Build Spring AI with Gemini Free Tier

Step#1: Get Your Free API Key

  1. Go to Google AI Studio and sign in with your Google account

  2. Click “Get API Key” → “Create API Key”

  3. Copy the key and store it securely. Never hardcode it in source code

  4. Set it as an environment variable:
    export GEMINI_API_KEY=your_key_here

Step#2: Create the Spring Boot Project

Use https://start.spring.io and add:

  • Spring Web (for REST endpoints)

  • OpenAI (yes, OpenAI, not Vertex AI Gemini)

Your pom.xml should look like this: The spring-ai-bom (Bill of Materials) ensures all Spring AI components use compatible versions.

<dependencies> 
    <dependency> 
        <groupId>org.springframework.bootgroupId> 
        <artifactId>spring-boot-starter-webartifactId> 
    dependency> 
 
    <dependency> 
        <groupId>org.springframework.aigroupId> 
        <artifactId>spring-ai-starter-model-openaiartifactId> 
    dependency> 
dependencies>
<dependencyManagement> 
     <dependencies> 
         <dependency> 
             <groupId>org.springframework.aigroupId> 
             <artifactId>spring-ai-bomartifactId> 
             <version>1.0.0version>
             <type>pomtype> 
             <scope>importscope> 
        dependency> 
    dependencies> 
dependencyManagement>

Step#3: Configure application.properties

This is the most important part. Instead of pointing to OpenAI’s servers, you redirect to Google’s OpenAI-compatible endpoint:

# application.properties

spring.ai.openai.api-key=${GEMINI_API_KEY}
spring.ai.openai.base-url=https://generativelanguage.googleapis.com/v1beta/openai
spring.ai.openai.chat.completions-path=/chat/completions
spring.ai.openai.chat.options.model=gemini-2.0-flash-exp

Or in application.yml if you prefer YAML:

spring:
  ai:
    openai:
      api-key: "${GEMINI_API_KEY}"
      base-url: https://generativelanguage.googleapis.com/v1beta/openai
      chat:
        completions-path: /chat/completions
        options:
          model: gemini-2.0-flash-exp

Want to try the more powerful model? Just change the model name to gemini-2.5-pro-exp-03-25, but remember the 25 requests/day cap on the free tier.

Step#4: Create the ChatClient Bean

Spring AI’s ChatClient is your primary interface for talking to Gemini. Register it as a Spring @Bean:

// ChatClientConfig.java

import org.springframework.ai.chat.client.ChatClient;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class ChatClientConfig {
    @Bean
    public ChatClient chatClient(ChatClient.Builder builder) {
        return builder.build();
    }
}

Spring Boot’s auto-configuration does most of the heavy lifting here. You just build the client from the injected Builder, and it picks up all your ‘application.properties’ settings automatically.

Step#5: Build the REST Controller

Now let’s expose a simple chat endpoint:

// ChatController.java

import org.springframework.ai.chat.client.ChatClient;
import org.springframework.web.bind.annotation.*;

@RestController
@RequestMapping("/chat")
public class ChatController {

    private final ChatClient chatClient;
    public ChatController(ChatClient chatClient) {
        this.chatClient = chatClient;
    }

    @PostMapping("/ask")
    public String ask(@RequestBody String userMessage) {
        return chatClient
                .prompt(userMessage)
                .call()
                .content();
    }
}

The prompt() → call() → content() chain is the core Spring AI pattern. It sends the message, waits for the full response, and returns it as a plain String.

Step#6: Add a Dedicated “Service Layer” Section

The service class with proper Spring patterns:

// GeminiChatService.java
@Service
public class GeminiChatService {

    private final ChatClient chatClient;

    public GeminiChatService(ChatClient chatClient) {
        this.chatClient = chatClient;
    }

    public String askQuestion(String question) {
        return chatClient
         .prompt(question)
         .call()
         .content();
    }

    public String askWithPersona(String systemPrompt, String userQuestion) {
        return chatClient
         .prompt()
         .system(systemPrompt)
         .user(userQuestion)
         .call()
         .content();
    }

    public Flux streamAnswer(String question) {
        return chatClient
         .prompt(question)
         .stream()
         .content();
    } 
}

This also makes the controller much cleaner & show the refactored controller calling the service instead of ChatClient directly

Step#7: Add a System Prompt (Optional but Recommended)

System prompts let you set the AI’s persona or scope of answers. For example, if you’re building a Java interview prep tool:

@PostMapping("/java-interview")
public String javaInterview(@RequestBody String question) {
    return chatClient
        .prompt()
        .system("You are an expert Java interviewer. Answer questions clearly 
                with code examples. Focus on Java 17+ features and Spring Boot.")
        .user(question)
        .call()
        .content();
}

This is incredibly useful when you want Gemini to stay focused on a specific domain without going off-topic.

Step#8: Add a “Multimodal: Image + Text” Section

Gemini is multimodal. It can analyze images. This is a powerful feature most readers won’t know Spring AI supports:

@PostMapping("/analyze-image")
public String analyzeImage(@RequestParam String imageUrl,

    @RequestParam String question) {
    var userMessage = new UserMessage(question,
         List.of(new ImageContent(imageUrl))
    );

    return chatClient
      .prompt(new Prompt(userMessage))
      .call()
      .content();
}

Step#9: Add a “Testing Spring AI” Section

Show how to write a simple test using MockMvc and a mocked ChatClient:

@WebMvcTest(ChatController.class)
class ChatControllerTest {

    @MockBean ChatClient chatClient;
    @Autowired MockMvc mockMvc;

    @Test
    void shouldReturnAiResponse() throws Exception {
         when(chatClient.prompt(anyString())
            .call().content())
            .thenReturn("HashMap is not thread-safe.");

         mockMvc.perform(post("/chat/ask")
            .content("Explain HashMap")
            .contentType(MediaType.TEXT_PLAIN))
            .andExpect(status().isOk())
            .andExpect(content().string(containsString("HashMap")));
    }
}

Step#10: Test With cURL

Run your Spring Boot application (mvn spring-boot:run) and test:

# Basic chat
curl -X POST http://localhost:8080/chat/ask \
  -H "Content-Type: text/plain" \
  -d "Explain the difference between HashMap and ConcurrentHashMap in Java"

# Java interview endpoint
curl -X POST http://localhost:8080/chat/java-interview \
  -H "Content-Type: text/plain" \
  -d "What is the purpose of the volatile keyword in Java?"

You’ll get a detailed AI-generated response from Gemini within seconds.

Going Further: Streaming Responses

For a real chat-like experience, use streaming instead of waiting for the full response:

@GetMapping(value = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux streamChat(@RequestParam String message) {
    return chatClient
        .prompt(message)
        .stream()
        .content();
}

The stream() method returns a Flux, which emits tokens as Gemini generates them, perfect for building real-time chat UIs with Server-Sent Events (SSE).

Advanced: Prompt Templates

Hard-coding prompts inside controller methods gets messy fast. Spring AI’s PromptTemplate keeps things clean and reusable:

import org.springframework.ai.chat.prompt.PromptTemplate;

@PostMapping("/review")
public String reviewCode(@RequestBody String code) {
  PromptTemplate template = new PromptTemplate("""
    You are a senior Java developer.
    Review the following code and highlight any issues,
    suggest improvements, and explain best practices:
    {code}
    """);

   var prompt = template.create(Map.of("code", code));
   return chatClient.prompt(prompt).call().content();
}

Templates support {variable} placeholders, making it trivial to build parameterized AI prompts for things like code review, interview feedback, or document summarization.

Free Tier Limits to Know

Before you go build-crazy, here are the rate limits to keep in mind:

Model Requests/Minute Requests/Day Notes
Gemini 2.0 Flash 15 RPM 1,500/day Best for free dev
Gemini 2.5 Flash 10 RPM 500/day Good balance
Gemini 2.5 Pro 5 RPM 25/day High quality, limited

For POCs and learning, Gemini 2.0 Flash is your best bet. It’s fast, capable, and has the most generous free quota.

“Free Tier vs Paid Migration” Checklist 

A practical checklist:

  • Replace AI Studio API key with Vertex AI service account credentials
  • Change base-url to your Google Cloud project’s Vertex endpoint
  • Add spring-ai-starter-model-vertex-ai-gemini dependency
  • Set up IAM roles (roles/aiplatform.user)
  • Enable request logging and monitoring
  • Remove retry workarounds (paid tier handles higher volume natively)

Production Considerations

The free tier is genuinely great for learning and prototyping, but before you go live, keep these in mind:

  • Never commit your API key: use environment variables or Spring Cloud Config
  • Add proper error handling:handle 429 Too Many Requests gracefully with retry logic
  • Switch to Vertex AI for production: it provides SLAs, enterprise security, and no rate limits for paying customers
  • Use @Value(“${spring.ai.openai.api-key}”) validation at startup to catch missing keys early
  • Cache common responses: don’t call Gemini for the same question twice if you can avoid it

What Can You Build With This?

Here are some practical project ideas to get you going:

  • Java Interview Chatbot: Answer interview questions with detailed explanations and code examples

  • Code Reviewer API: Accept a code snippet, return feedback and suggestions

  • Spring Boot Documentation Assistant: Ask questions about your own codebase

  • Resume Analyser: Match a JD against a resume and score relevance

  • SQL Query Generator: Convert plain English to SQL queries

All of these are completely achievable using the free Gemini tier with Spring AI, and each one makes for a great portfolio project or digital product.

FAQs

Q#1. Is Google Gemini API completely free for Spring Boot projects?

Yes, absolutely. Google Gemini offers a free tier via Google AI Studio with no billing account required. You simply sign in with your Google account, generate an API key, and start using it. The Gemini 2.0 Flash model allows up to 1,500 requests per day for free more than enough for learning, prototyping, and building personal projects with Spring Boot.

Q#2. Do I need Vertex AI to use Gemini with Spring AI?

No, not at all. This is one of the most common misconceptions. You do not need a Google Cloud project, a service account, or Vertex AI. Instead, you can use the spring-ai-starter-model-openai dependency and simply point the base-url to Google’s OpenAI-compatible endpoint:

https://generativelanguage.googleapis.com/v1beta/openai

No Vertex AI. No Google Cloud billing. Just a free API key.

Q#3. Which Gemini model is best for the Spring AI free tier?

Gemini 2.0 Flash is the best choice for free tier usage. Here’s why:

  • Up to 1,500 requests/day (most generous free quota)

  • Fast response times: optimised for low latency

  • Supports multimodal input (text + images)

  • More than capable for chat, Q&A, and code review use cases

If you need higher quality output and can live with only 25 requests/day, gemini-2.5-pro is available on the free tier as well.

Q#4. What is the difference between Spring AI and LangChain4j?

Both are Java frameworks for building AI-powered applications, but they serve slightly different audiences:

Feature Spring AI LangChain4j
Native Spring support Yes  Partial
Learning curve for Spring devs Low Medium–High
Agent & RAG workflows Basic Advanced
@Bean / application.properties Fully supported  Not native
Best for Spring Boot apps Complex AI pipelines

Spring AI feels like home if you already know Spring Boot. LangChain4j shines when you need advanced agent orchestration, memory chains, or complex RAG (Retrieval-Augmented Generation) pipelines.

Q#5. Can Spring AI stream responses from Gemini?

Yes! Spring AI supports streaming out of the box. Instead of waiting for the full response, you can use the .stream() method which returns a Flux emitting tokens in real-time as Gemini generates them:

@GetMapping(value = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux stream(@RequestParam String message) {
    return chatClient
        .prompt(message)
        .stream()
        .content();
}

This is perfect for building real-time chat UIs using Server-Sent Events (SSE). The user sees the response appearing word by word just like ChatGPT instead of waiting for the whole answer.

Q#6. How do I handle rate limit errors (429) when using Gemini Free Tier in Spring AI?

Rate limit errors are inevitable on the free tier, especially with gemini-2.5-pro (only 25 req/day). Spring AI has built-in retry support that you can configure directly in application.properties:

# Retry configuration for rate limits
spring.ai.retry.max-attempts=3
spring.ai.retry.on-http-codes=429,500,503
spring.ai.retry.backoff.initial-interval=2000
spring.ai.retry.backoff.multiplier=2
spring.ai.retry.backoff.max-interval=30000

This tells Spring AI to automatically retry up to 3 times with an exponential backoff, starting at 2 seconds and doubling each time. For more control, you can also add a global exception handler:

@RestControllerAdvice
public class AiExceptionHandler {

    @ExceptionHandler(Exception.class)
    public ResponseEntity handleAiError(Exception ex) {
        if (ex.getMessage() != null && ex.getMessage().contains("429")) {
            return ResponseEntity
                .status(HttpStatus.TOO_MANY_REQUESTS)
                .body("Rate limit reached. Please wait a moment and try again.");
        }
        return ResponseEntity
            .status(HttpStatus.INTERNAL_SERVER_ERROR)
            .body("AI service error: " + ex.getMessage());
    }
}

Best practices to avoid hitting rate limits:

  • Use gemini-2.0-flash as your default (1,500 req/day is generous)
  • Cache responses for identical or similar prompts using Spring Cache (@Cacheable)
  • Add a request queue for high-traffic endpoints
  • Log API usage to monitor how close you are to the daily limit
  • Switch to paid tier (Vertex AI) once your app goes to production

Quick Reference

📦 Dependency   : spring-ai-starter-model-openai
🔑 API Key From : aistudio.google.com
🌐 Base URL     : https://generativelanguage.googleapis.com/v1beta/openai
🤖 Model        : gemini-2.0-flash-exp
💡 Pattern      : chatClient.prompt(msg).call().content()
🆓 Free Limit   : 1,500 req/day (Gemini 2.0 Flash)

Spring AI with Gemini is one of those rare combinations where the barrier to entry is almost zero, a Google account, a few Maven dependencies, and about 20 lines of configuration. The fact that Google made Gemini OpenAI-compatible means the entire Spring AI ecosystem just works without any Gemini-specific plumbing. For Java developers who want to experiment with AI in 2025, this is the most practical starting point available.


Spring AI Concepts Article: How to integrate AI with Spring Boot?.

For other Java related topics, kindly go through:

Microservices Tutorial,  Spring Boot Tutorial,  Core Java,  System Design Tutorial,  Java MCQs/QuizzesJava Design Patterns etc.

Leave a Reply


Top