Spring AI is Not Just Another Library

Every new technology wave brings a flood of integration libraries, and the LLM revolution is no different. OpenAI’s SDK, Anthropic’s SDK, LangChain ports for every language imaginable—developers have plenty of options for talking to large language models.

So why does Spring AI matter? Why add another library to the stack when you can just import openai and call it a day?

The answer isn’t about Spring AI’s feature checklist. It’s about the architecture decision you make—often implicitly—when you hard-code a single LLM provider into your application.

Over the last few years, LLM platforms have changed models, limits, and APIs on a regular cadence. Providers document rate limits (RPM/TPM), model deprecations, and retirement/upgrade windows; when these change, applications must adapt. This is not hypothetical—it’s the normal operating environment for production LLM integrations.

At the same time, competitors routinely ship models that shift the price‑performance frontier (for example, Anthropic’s Claude 3.5 Sonnet in June 2024). Whether any single benchmark fits your workload or not, credible alternatives appear frequently—making switching costs a first‑order concern.

Spring AI isn’t trying to be a nicer wrapper for one vendor. It answers a different question: How do you build LLM‑powered features without coupling business logic to one provider’s API shape and operational rules? That’s the same portability Spring brought to databases decades ago.

If you’re building something disposable, hard‑code a provider and move on. If you’re building something that must evolve with the model market, abstraction is architectural insurance.


The Real Problem

Here’s what provider lock‑in looks like in code:

public class ChatService {
    private final OpenAiService openAi;

    public ChatService(OpenAiService openAi) {
        this.openAi = openAi;
    }

    public String generateResponse(String prompt) {
        return openAi.createCompletion(
            CompletionRequest.builder()
                .model("gpt-4")
                .prompt(prompt)
                .build()
        ).getChoices().get(0).getText();
    }
}

Six months later, your provider updates rate limits, retires a model version, or introduces a more suitable API. You’re not changing one call—you’re unpicking token countingerror/timeout and backoff logiccost trackingprompt/response shapes, and test doubles that all assumed a single provider. When models are retired, some platforms explicitly state that retired deployments return errors, which means migration work must be scheduled across code, configuration, tests, and operations.

The rational response is to isolate provider specifics behind a stable abstraction so that a change in pricing, limits, or model capability is a configuration decision, not a multi‑file refactor.


Spring AI’s Approach: Boring (in the Best Way)

Spring AI follows a familiar Spring pattern: a clean, portable API with vendor implementations you swap via configuration.

Program against ChatClient, not a vendor class

@RestController
public class AssistantController {
    private final ChatClient chat;

    public AssistantController(ChatClient chat) {
        this.chat = chat; // injected bean
    }

    @GetMapping("/ai")
    public String reply(@RequestParam("user") String user) {
        return chat.prompt()
                   .user(user)
                   .call()
                   .content();
    }
}
  • Fluent API (prompt()…call().content()) with synchronous and streaming modes.
  • No provider imports in your business code.

Switch providers by configuration, not refactoring

# OpenAI
spring:
  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
      chat:
        options:
          model: gpt-4o
          temperature: 0.7
# Anthropic
spring:
  ai:
    anthropic:
      api-key: ${ANTHROPIC_API_KEY}
      chat:
        options:
          model: claude-3-5-sonnet-20240620
          temperature: 0.7

Swap keys/model names and redeploy. Your controllers and services remain unchanged. (See Spring AI’s OpenAI and Anthropic configuration docs.)

Streaming without provider ceremony

Flux<String> stream = chat.prompt()
    .user("Tell me a joke about dependency injection.")
    .stream()
    .content();

The same code path works across supported providers; Spring AI handles protocol differences.

Tool / Function calling with annotations

Spring AI lets you expose plain Java methods as tools with the @Tool annotation and (optionally) enrich parameters with @ToolParam. You then register the annotated class as default tools on the ChatClient.Builder, so your business code can call the model without passing tools each time.

Declare tools

import org.springframework.stereotype.Component;
import org.springframework.ai.tool.annotation.Tool;
import org.springframework.ai.tool.annotation.ToolParam;

@Component // Spring bean; AOT/Graal friendly
public class WeatherTools {

    @Tool(description = "Get current weather in Fahrenheit for a US city")
    public String getWeather(
        @ToolParam(description = "City name (e.g., 'Austin')") String city
    ) {
        // Domain service or external API
        return "Sunny, 72°F in " + city;
    }
}

@Tool turns a method into a callable tool; @ToolParam adds parameter hints the model uses when preparing the call. If the tool class is a Spring bean, Spring AI’s AOT support kicks in out of the box.

Register as default tools (one-time wiring)

import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.model.ChatModel;

@Configuration
class AiConfig {

    @Bean
    public ChatClient chatClient(ChatModel chatModel, WeatherTools tools) {
        return ChatClient.builder(chatModel)
                .defaultTools(tools) // available for all calls
                .build();
    }
}

defaultTools(...) on the builder adds the tools globally for clients built from it; you can still override per‑request if needed.

Use in business code (no-per-call tool plumbing)

import org.springframework.stereotype.Service;
import org.springframework.ai.chat.client.ChatClient;

@Service
public class AssistantService {
    private final ChatClient chat;

    public AssistantService(ChatClient chat) {
        this.chat = chat;
    }

    public String answer(String userMessage) {
        return chat.prompt()
                  .system("Helpful assistant.")
                  .user(userMessage)
                  .call()
                  .content(); // default tools are already registered
    }
}

When you want a tool to be available only on specific requests (e.g., admin‑only tools), register a minimal default set and add more with .tools(...) at call time.

Structured output (JSON → Java) without hand‑rolled parsers

import org.springframework.ai.parser.BeanOutputConverter;

public record ProductRecommendation(
    String productName,
    String reason,
    double confidenceScore
) {}

public class RecommendationService {
    private final ChatClient chat;

    public RecommendationService(ChatClient chat) {
        this.chat = chat;
    }

    public ProductRecommendation recommend(String userQuery) {
        BeanOutputConverter<ProductRecommendation> converter =
            new BeanOutputConverter<>();

        String content = chat.prompt()
            .user(u -> u.text("""
                Recommend one product for: {query}
                {format}
                """)
                .param("query", userQuery)
                .param("format", converter.getFormat()))
            .call()
            .content();

        return converter.convert(content);
    }
}

BeanOutputConverter generates a JSON Schema and converts the model’s response into your type. (Spring AI also supports provider‑specific structured outputs where available.)

Retrieval‑Augmented Generation (RAG) via VectorStore

import java.util.stream.Collectors;
import org.springframework.ai.vectorstore.SearchRequest;
import org.springframework.ai.document.Document;

public class DocumentAssistant {
    private final ChatClient chatClient;
    private final VectorStore vectorStore;

    public DocumentAssistant(ChatClient chatClient, VectorStore vectorStore) {
        this.chatClient = chatClient;
        this.vectorStore = vectorStore;
    }

    public String answerQuestion(String question) {
        SearchRequest request = SearchRequest.builder()
                                            .query(question)
                                            .topK(3)
                                            .build();

        List<Document> docs = vectorStore.similaritySearch(request);

        String context = docs.stream()
                             .map(Document::getContent)
                             .collect(Collectors.joining("\n\n"));

        String promptText = "Use the following context to answer the question. "
            + "If the answer isn't in the context, say so.\n\n"
            + "Context:\n" + context + "\n\n"
            + "Question: " + question;

        return chatClient.prompt(promptText)
                         .call()
                         .content();
    }
}

VectorStore and SearchRequest abstract over multiple DBs (Pinecone, Weaviate, PgVector, etc.), so your RAG code is portable too.

Production‑grade observability

Spring AI emits Micrometer metrics you can scrape via Spring Boot Actuator; you can also add custom timings/counters as usual.


Testing Without Pain

import static org.mockito.ArgumentMatchers.any;
import static org.mockito.Mockito.when;

import java.util.List;
import org.junit.jupiter.api.Test;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.model.ChatResponse;
import org.springframework.ai.chat.model.Generation;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.boot.test.mock.mockito.MockBean;

import static org.junit.jupiter.api.Assertions.assertEquals;

@SpringBootTest
class AssistantServiceTest {

    @MockBean
    private ChatClient chatClient;

    @Autowired
    private AssistantService assistantService;

    @Test
    void testGenerateResponse() {
        ChatResponse mockResponse = new ChatResponse(
            List.of(new Generation("Test response"))
        );

        when(chatClient.call(any())).thenReturn(mockResponse);

        String response = assistantService.answer("Hello");

        assertEquals("Test response", response);
    }
}

You test your business logic, not vendor integrations—no API keys, no flaky network calls.


What You Avoid by Not Hard‑Coding a Provider

Model lifecycle churn: When providers deprecate/retire models or shift defaults, you update config—not dozens of call sites and tests. (Azure’s model policy explicitly sets availability and retirement windows; OpenAI maintains a deprecations list.)

Operational coupling: Rate‑limit semantics and headers differ; with abstraction, retry/backoff and request shaping live in one place. (Providers document RPM/TPM and quota mechanics.)

Testing drag: You mock ChatClient and test your logic without network calls or vendor‑specific stubs. (API shape is stable across providers through Spring AI.)

Cross‑provider experiments: You can route different use cases to different models (fast vs. smart) without re‑plumbing. (Spring AI supports multiple ChatClients and manual client creation.)

If you prefer a gateway layer on top of vendors, industry references describe unified APIs that reduce lock‑in and help with throughput and routing; that pattern complements Spring AI nicely.


Real‑World Patterns (Copy/Paste Ready)

Chat with memory (portable)

import java.util.concurrent.ConcurrentHashMap;

@Service
public class ConversationService {
    private final ChatClient chat;
    private final Map<String, List<Message>> sessions = new ConcurrentHashMap<>();

    public ConversationService(ChatClient chat) {
        this.chat = chat;
    }

    public String chat(String sessionId, String userMsg) {
        List<Message> history = sessions.computeIfAbsent(
            sessionId,
            key -> new ArrayList<Message>()
        );

        history.add(new UserMessage(userMsg));

        Prompt prompt = new Prompt(history);
        String reply = chat.prompt(prompt).call().content();

        history.add(new AssistantMessage(reply));
        return reply;
    }
}

(Uses Spring AI’s portable Prompt/message model.)

Multi‑provider strategy (fast vs. smart)

@Configuration
public class AiConfigMulti {

  @Bean("fastChat")
  public ChatClient fast(ChatModel fastModel) {
    return ChatClient.builder(fastModel).build(); // cost-optimized
  }

  @Bean("smartChat")
  public ChatClient smart(ChatModel smartModel) {
    return ChatClient.builder(smartModel).build(); // reasoning-strong
  }
}

@Service
public class Router {
  private final ChatClient fast;
  private final ChatClient smart;

  public Router(@Qualifier("fastChat") ChatClient fast,
                @Qualifier("smartChat") ChatClient smart) {
    this.fast = fast;
    this.smart = smart;
  }

  public String process(String input, boolean needsReasoning) {
    ChatClient client = needsReasoning ? smart : fast;
    return client.prompt(input).call().content();
  }
}

(Manual creation/multiple clients is a documented pattern.)

Observability wrapper (Micrometer + Actuator)

import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.core.instrument.Timer;

@Component
public class ChatMetrics {
    private final MeterRegistry registry;
    private final ChatClient chat;

    public ChatMetrics(MeterRegistry registry, ChatClient chat) {
        this.registry = registry;
        this.chat = chat;
    }

    public String respond(String prompt) {
        Timer.Sample sample = Timer.start(registry);
        try {
            String out = chat.prompt(prompt).call().content();
            registry.counter("ai.chat.success").increment();
            return out;
        } catch (RuntimeException ex) {
            registry.counter("ai.chat.error", "type", ex.getClass().getSimpleName()).increment();
            throw ex;
        } finally {
            sample.stop(registry.timer("ai.chat.duration"));
        }
    }
}

(Use standard Spring Boot observability; Spring AI also exposes Micrometer metrics.)


Conclusion: Infrastructure You Can Ignore

The best infrastructure is the kind you forget about. With Spring AI, chat.prompt().call() isn’t an architectural bet that will haunt you; it’s a small, stable surface that lets you optimize for cost, speed, or capability later—by swapping configuration, not rewriting code.

Use Spring AI if you:

  • Build on Spring Boot and want vendor portability
  • Need fast, deterministic tests without live APIs
  • Plan to evaluate models/providers over time
  • Prefer boring, reliable infrastructure

Skip Spring AI if you:

  • Aren’t on Java/Spring
  • Are shipping a one‑off prototype
  • Know you’ll never switch providers
  • Must use bleeding‑edge, vendor‑specific features the day they appear (you can still add escape hatches)

Ready to try Spring AI? Start with the official docs and autoconfigured ChatClient.Builder; set spring.ai.*properties and make your first call in minutes.

When you reach that first “should we switch models/providers?” moment, you’ll be glad the abstraction is already in place.

The book cover of 'Future-Proof Your Java Career With Spring AI', a guide for enterprise Java developers on becoming AI Orchestrators.

Enjoyed this article? Take the next step.

Future-Proof Your Java Career With Spring AI

The age of AI is here, but your Java & Spring experience isn’t obsolete—it’s your greatest asset.

This is the definitive guide for enterprise developers to stop being just coders and become the AI Orchestrators of the future.

View on Amazon Kindle →

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.