DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Enterprise AI Trend Report: Gain insights on ethical AI, MLOps, generative AI, large language models, and much more.

2024 Cloud survey: Share your insights on microservices, containers, K8s, CI/CD, and DevOps (+ enter a $750 raffle!) for our Trend Reports.

PostgreSQL: Learn about the open-source RDBMS' advanced capabilities, core components, common commands and functions, and general DBA tasks.

AI Automation Essentials. Check out the latest Refcard on all things AI automation, including model training, data security, and more.

Related

  • PostgresML: Streamlining AI Model Deployment With PostgreSQL Integration
  • Getting Started With Spring AI and PostgreSQL PGVector
  • PostgresML: Extension That Turns PostgreSQL Into a Platform for AI Apps
  • Making Spring AI and OpenAI GPT Useful With RAG on Your Own Documents

Trending

  • Code Complexity in Practice
  • The Impact of Biometric Authentication on User Privacy and the Role of Blockchain in Preserving Secure Data
  • Types of Data Breaches in Today’s World
  • Building Safe AI: A Comprehensive Guide to Bias Mitigation, Inclusive Datasets, and Ethical Considerations
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Implementing RAG With Spring AI and Ollama Using Local AI/LLM Models

Implementing RAG With Spring AI and Ollama Using Local AI/LLM Models

In this article, learn how to use AI with RAG independent from external AI/LLM services with Ollama-based AI/LLM models.

By 
Sven Loesekann user avatar
Sven Loesekann
·
Feb. 06, 24 · Tutorial
Like (3)
Save
Tweet
Share
3.3K Views

Join the DZone community and get the full member experience.

Join For Free

This article is based on this article that describes the AIDocumentLibraryChat project with a RAG-based search service based on the Open AI Embedding/GPT model services.

The AIDocumentLibraryChat project has been extended to have the option to use local AI models with the help of Ollama. That has the advantage that the documents never leave the local servers. That is a solution in case it is prohibited to transfer the documents to an external service.

Architecture

With Ollama, the AI model can run on a local server. That changes the architecture to look like this:

Ollama architecture

The architecture can deploy all needed systems in a local deployment environment that can be controlled by the local organization. An example would be to deploy the AIDocumentLibraryChat application, the PostgreSQL DB, and the Ollama-based AI Model in a local Kubernetes cluster and to provide user access to the AIDocumentLibraryChat with an ingress. With this architecture, only the results are provided by the AIDocumentLibraryChat application and can be accessed by external parties.

The system architecture has the UI for the user and the application logic in the AIDocumentLibraryChat application. The application uses Spring AI with the ONNX library functions to create the embeddings of the documents. The embeddings and documents are stored with JDBC in the PostgreSQL database with the vector extension. To create the answers based on the documents/paragraphs content, the Ollama-based model is called with REST. The AIDocumentLibraryChat application, the Postgresql DB, and the Ollama-based model can be packaged in a Docker image and deployed in a Kubernetes cluster. That makes the system independent of external systems. The Ollama models support the needed GPU acceleration on the server.

The shell commands to use the Ollama Docker image are in the runOllama.sh file. The shell commands to use the Postgresql DB Docker image with vector extensions are in the runPostgresql.sh file.

Building the Application for Ollama

The Gradle build of the application has been updated to switch off OpenAI support and switch on Ollama support with the useOllama property:

Kotlin
 
plugins {
 id 'java'
 id 'org.springframework.boot' version '3.2.1'
 id 'io.spring.dependency-management' version '1.1.4'
}

group = 'ch.xxx'
version = '0.0.1-SNAPSHOT'

java {
 sourceCompatibility = '21'
}

repositories {
 mavenCentral()
 maven { url "https://repo.spring.io/snapshot" }
}

dependencies {
 implementation 'org.springframework.boot:spring-boot-starter-actuator'
 implementation 'org.springframework.boot:spring-boot-starter-data-jpa'
 implementation 'org.springframework.boot:spring-boot-starter-security'
 implementation 'org.springframework.boot:spring-boot-starter-web'
 implementation 'org.springframework.ai:spring-ai-tika-document-reader:
   0.8.0-SNAPSHOT'
 implementation 'org.liquibase:liquibase-core'
 implementation 'net.javacrumbs.shedlock:shedlock-spring:5.2.0'
 implementation 'net.javacrumbs.shedlock:
   shedlock-provider-jdbc-template:5.2.0'
 implementation 'org.springframework.ai:
   spring-ai-pgvector-store-spring-boot-starter:0.8.0-SNAPSHOT'
 implementation 'org.springframework.ai:
   spring-ai-transformers-spring-boot-starter:0.8.0-SNAPSHOT'
 testImplementation 'org.springframework.boot:spring-boot-starter-test'
 testImplementation 'org.springframework.security:spring-security-test'
 testImplementation 'com.tngtech.archunit:archunit-junit5:1.1.0'
 testRuntimeOnly 'org.junit.platform:junit-platform-launcher'
 
 if(project.hasProperty('useOllama')) {
   implementation 'org.springframework.ai:
     spring-ai-ollama-spring-boot-starter:0.8.0-SNAPSHOT'
 } else {	    
   implementation 'org.springframework.ai:
     spring-ai-openai-spring-boot-starter:0.8.0-SNAPSHOT'
 }
}

bootJar {
 archiveFileName = 'aidocumentlibrarychat.jar'
}

tasks.named('test') {
 useJUnitPlatform()
}


The Gradle build adds the Ollama Spring Starter and the Embedding library with 'if(project.hasProperty('useOllama))' statement, and otherwise, it adds the OpenAI Spring Starter.

Database Setup

The application needs to be started with the Spring Profile 'ollama' to switch on the features needed for Ollama support. The database setup needs a different embedding vector type that is changed with the application-ollama.properties file:

Properties files
 
...
spring.liquibase.change-log=classpath:/dbchangelog/db.changelog-master-ollama.xml
...


The spring.liquibase.change-log property sets the Liquibase script that includes the Ollama initialization. That script includes the db.changelog-1-ollama.xml script with the initialization:

XML
 
<databaseChangeLog
  xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog
  http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-3.8.xsd">
    <changeSet id="8" author="angular2guy">
      <modifyDataType tableName="vector_store" columnName="embedding" 
        newDataType="vector(384)"/> 
    </changeSet>
</databaseChangeLog>


The script changes the column type of the embedding column to vector(384) to support the format that is created by the Spring AI ONNX Embedding library.

Add Ollama Support to the Application

To support Ollama-based models, the application-ollama.properties file has been added:

Properties files
 
spring.ai.ollama.base-url=${OLLAMA-BASE-URL:http://localhost:11434}
spring.ai.ollama.model=stable-beluga:13b
spring.liquibase.change-log=classpath:/dbchangelog/db.changelog-master-ollama.xml
document-token-limit=150


The spring.ai.ollama.base-url property sets the URL to access the Ollama model. The spring.ai.ollama.model sets the name of the model that is run in Ollama. The document-token-limit sets the amount of tokens that the model gets as context from the document/paragraph.

The DocumentService has new features to support the Ollama models:

Java
 
private final String systemPrompt = "You're assisting with questions about 
  documents in a catalog.\n" + "Use the information from the DOCUMENTS section to provide accurate answers.\n" + "If unsure, simply state that you don't know.\n" + "\n" + "DOCUMENTS:\n" + "{documents}";

private final String ollamaPrompt = "You're assisting with questions about 
  documents in a catalog.\n" + "Use the information from the DOCUMENTS 
  section to provide accurate answers.\n" + "If unsure, simply state that you 
  don't know.\n \n" + " {prompt} \n \n" + "DOCUMENTS:\n" + "{documents}";

@Value("${embedding-token-limit:1000}")
private Integer embeddingTokenLimit;
@Value("${document-token-limit:1000}")
private Integer documentTokenLimit;	
@Value("${spring.profiles.active:}")
private String activeProfile;


Ollama supports only system prompts that require a new prompt that includes the user prompt in the {prompt} placeholder. The embeddingTokenLimit and the documentTokenLimit are now set in the application properties and can be adjusted for the different profiles. The activeProfile property gets the space-separated list of the profiles the application was started with.

Java
 
public Long storeDocument(Document document) {
  ...
  var aiDocuments = tikaDocuments.stream()
    .flatMap(myDocument1 -> this.splitStringToTokenLimit(
      myDocument1.getContent(), embeddingTokenLimit).stream()
    .map(myStr -> new TikaDocumentAndContent(myDocument1, myStr)))
    .map(myTikaRecord -> new org.springframework.ai.document.Document(
      myTikaRecord.content(), myTikaRecord.document().getMetadata()))
    .peek(myDocument1 -> myDocument1.getMetadata().put(ID,      
      myDocument.getId().toString()))
    .peek(myDocument1 -> myDocument1.getMetadata()
      .put(MetaData.DATATYPE, MetaData.DataType.DOCUMENT.toString()))
    .toList();
  ...
}

public AiResult queryDocuments(SearchDto searchDto) {
...
  Message systemMessage = switch (searchDto.getSearchType()) {
    case SearchDto.SearchType.DOCUMENT ->        
      this.getSystemMessage(documentChunks, 
        this.documentTokenLimit, searchDto.getSearchString());
    case SearchDto.SearchType.PARAGRAPH -> 
      this.getSystemMessage(mostSimilar.stream().toList(), 
        this.documentTokenLimit, searchDto.getSearchString());
...
};

private Message getSystemMessage(
  String documentStr = this.cutStringToTokenLimit(
    similarDocuments.stream().map(entry -> entry.getContent())
      .filter(myStr -> myStr != null && !myStr.isBlank())
      .collect(Collectors.joining("\n")), tokenLimit);
  SystemPromptTemplate systemPromptTemplate = this.activeProfile
    .contains("ollama")	? new SystemPromptTemplate(this.ollamaPrompt)
      : new SystemPromptTemplate(this.systemPrompt);
  Message systemMessage = systemPromptTemplate.createMessage(
    Map.of("documents", documentStr, "prompt", prompt));
  return systemMessage;
}


The storeDocument(...) method now uses the embeddingTokenLimit of the properties file to limit the text chunk to create the embedding. The queryDocument(...) method now uses the documentTokenLimit of the properties file to limit the text chunk provided to the model for the generation.

The systemPromptTemplate checks the activeProfile property for the ollama profile and creates the SystemPromptTemplate that includes the question. The createMessage(...) method creates the AI Message and replaces the documents and prompt placeholders in the prompt string.

Conclusion

Spring AI works very well with Ollama. The model used in the Ollama Docker container was stable-beluga:13b. The only difference in the implementation was the changed dependencies and the missing user prompt for the Llama models, but that is a small fix. 

Spring AI enables very similar implementations for external AI services like OpenAI and local AI services like Ollama-based models. That decouples the Java code from the AI model interfaces very well. 

The performance of the Ollama models required a decrease of the document-token-limit from 2000 for OpenAI to 150 for Ollama without GPU acceleration. The quality of the AI Model answers has decreased accordingly. To run an Ollama model with parameters that will result in better quality with acceptable response times of the answers, a server with GPU acceleration is required.

For commercial/production use a model with an appropriate license is required. That is not the case for the beluga models: the falcon:40b model could be used.

AI Spring Framework PostgreSQL

Published at DZone with permission of Sven Loesekann. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • PostgresML: Streamlining AI Model Deployment With PostgreSQL Integration
  • Getting Started With Spring AI and PostgreSQL PGVector
  • PostgresML: Extension That Turns PostgreSQL Into a Platform for AI Apps
  • Making Spring AI and OpenAI GPT Useful With RAG on Your Own Documents

Partner Resources


Comments

ABOUT US

  • About DZone
  • Send feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: