DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Enterprise AI Trend Report: Gain insights on ethical AI, MLOps, generative AI, large language models, and much more.

2024 Cloud survey: Share your insights on microservices, containers, K8s, CI/CD, and DevOps (+ enter a $750 raffle!) for our Trend Reports.

PostgreSQL: Learn about the open-source RDBMS' advanced capabilities, core components, common commands and functions, and general DBA tasks.

AI Automation Essentials. Check out the latest Refcard on all things AI automation, including model training, data security, and more.

Related

  • Implement a Distributed Database to Your Java Application
  • Recovering an MS SQL Database From Suspect Mode: Step-By-Step Guide
  • Keep Calm and Column Wise
  • SQL Data Storytelling: A Comprehensive Guide

Trending

  • Automated Data Extraction Using ChatGPT AI: Benefits, Examples
  • DZone's Article Types
  • Building a Sustainable Data Ecosystem
  • Harnessing the Power of Observability in Kubernetes With OpenTelemetry
  1. DZone
  2. Data Engineering
  3. Databases
  4. Unleashing Great Potential for Your AI Applications With Vector Embedding Models

Unleashing Great Potential for Your AI Applications With Vector Embedding Models

Explore this blog to learn how to use Jina Embeddings v2 with the MyScale EmbedText function and empower your AI project.

By 
Nan Xiang user avatar
Nan Xiang
·
Jan. 31, 24 · Tutorial
Like (1)
Save
Tweet
Share
1.6K Views

Join the DZone community and get the full member experience.

Join For Free

MyScale(opens new window) has introduced the EmbedText function (opens new window)in the latest version of the integrated SQL vector database. This powerful feature brings together the efficiency of SQL querying and state-of-the-art AI-driven text embedding technology so that you can use familiar SQL syntax to do precise text matching and efficient semantic similarity computing.

With full integration of Jina Embeddings v2 (opens new window)models, MyScale EmbedText allows users to harness the capabilities of Jina AI within MyScale for processing text with an input length of up to 8K using the standard SQL syntax, which makes it possible to understand and process much longer texts than ever before. Whether processing complex multilingual data or creating advanced AI applications, developers can instantly take advantage of Jina AI's top embedding models through MyScale at every point in the development process.

What Is MyScale?

MyScale is a cloud-native SQL vector database that enables developers familiar with SQL to build production-quality generative AI applications. Built on top of ClickHouse (opens a new window, MyScale integrates vector search and storage with a scalable relational database, providing efficient storage and processing of structured and unstructured data and streamlining complex database engineering while ensuring the highest reliability and performance for AI applications.

MyScale's EmbedText Function leverages the familiar syntax of SQL to simplify the generation of text embedding vectors, enabling users to adopt popular AI models for their projects. Using EmbedText's automated batch processing, developers can greatly improve performance in processing large amounts of data without relying on external tools or doing any complex programming.

What Is Jina Embeddings?

Jina Embeddings v2 is the world's first-ever and, so far, only open-source text embedding model that supports 8192 token input sizes. It is available in three versions: English-only (opens new window), bilingual Chinese-English (opens new window), and bilingual German-English (opens new window.

Features:

  • Industry-leading performance comparable to OpenAI's closed-source Ada 2 model.
  • Support for texts of over 8 thousand tokens, breaking the barrier to long text vector representations and allowing developers to fully represent the semantics of texts at multiple scales.
  • Multilingual support, with a model that represents Chinese and English in one embedding space and another that does the same for German and English, with more languages to come. Jina Emebddings enables cross-language applications using models specialized in those specific languages rather than a massive, inefficient AI model with unequal and unclear performance for large numbers of different languages.
  • Ranked by LlamaIndex (opens new window) among the world's best embedding models for RAG (Retrieval-Augmented Generation) applications.

Using Jina Embeddings v2 in MyScale

Developers can use Jina Embeddings with EmbedText Function in MyScale for two operations: data insertion and embedding-based querying. This section will get into the details of both.

Create a Simplified Function

One practical strategy is to declare an SQL User-Defined Function (UDF) that creates text embeddings and contains the relevant model name, provider, and API key so that this information doesn't have to be repeated and can be easily changed when needed.

The SQL statement below declares the function JinaAIEmbedText for that purpose. Insert your own API key in the appropriate place.

SQL
 
CREATE FUNCTION JinaAIEmbedText ON CLUSTER '{cluster}'
AS (x) -> EmbedText(x, 'Jina', '', 'YOUR_API_KEY', '{"model":"jina-embeddings-v2-base-en"}')


Now, to get an embedding for a text, you just have to call JinaAIEmbedText:

SQL
 
SELECT JinaAIEmbedText('YOUR_TEXT')


Optimizing Vector Searches Using Jina Embeddings

Once you have created the simplified function, you can use Jina Embeddings in MyScale to optimize the vector search. Querying using embeddings follows standard SQL methods. It's very simple using JinaAIEmbedText:

SQL
 
SELECT id, distance(vector_column_name, JinaAIEmbedText('YOUR_QUERY_TEXT')) AS dist
FROM table_name ORDER BY dist LIMIT 10


This will populate a table with the ten records that best match your query according to their embedding vectors.

Data Insertion

You can create an SQL table that converts text data into vectors using the JinaAIEmbedText function from above. For example:

SQL
 
CREATE TABLE jina_embedding
(
  id UInt32,
  paragraph String,
  vector Array(Float32) DEFAULT JinaAIEmbedText(paragraph),
  CONSTRAINT check_length CHECK length(vector) = 768
)
ENGINE = MergeTree
ORDER BY id


Then, insert data into this table to automatically generate embeddings:

SQL
 
INSERT INTO jina_embedding (id, paragraph)
VALUES (1, 'YOUR_TEXT_1'), (2, 'YOUR_TEXT_2')

Benefits to AI Developers

MyScale's integration of Jina Embeddings v2 models offers developers a robust framework for building database-driven generative AI applications, saving time, effort and money bringing new applications to market.

Its specific benefits include:

  1. Reduced computing costs: MyScale delivers superior database performance with a remarkable reduction in memory consumption compared to its competitors, making it a highly cost-effective choice to back an AI application. Jina Embeddings, by giving developers a choice between different model sizes and embedding vector sizes, offers them tools to manage their computing and storage costs.
  2. Enhanced flexibility: The synergy between MyScale and Jina Embeddings provides developers with enhanced flexibility, particularly in challenging application scenarios like long documents and large document collections.
  3. More accurate searching: MyScale achieves powerful metadata-filtered search through its unique MSTG algorithm (opens new window), while Jina Embeddings delivers more precise representations of text semantics, improving accuracy in information retrieval. This leads to more informed decision-making and superior application performance, especially in improving the accuracy of RAG applications. The combination of these two technologies elevates the search to new heights.

Combining MyScale with Jina Embeddings opens up practical applications, especially for RAG-enhanced chatbots. MyScale, enhanced with Jina Embeddings, can act as a single data source for your chatbot, ensuring data security, consistency, and integrity. MyScale also reduces data redundancy by storing references to records, improving accessibility, and offering you advanced access control.

Jina Embeddings v2's ability to process long texts makes it ideal for managing inputs to dialog systems. Chatbots made with Jina Embeddings have a greater understanding of conversational context, dramatically improving performance in long chats and complex scenarios.

Looking into the Future

The deep integration of MyScale and Jina Embeddings v2 empowers developers to bring AI into their projects. This includes the creation of intelligent customer service robots, developing more accurate cross-language search applications, and optimizing legal and business document analysis and management processes. Developers can explore a wider range of application scenarios with MyScale and Jina Embeddings and build more innovative and practical AI applications that provide users with greater value.

AI Database Data (computing) sql

Published at DZone with permission of Nan Xiang. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Implement a Distributed Database to Your Java Application
  • Recovering an MS SQL Database From Suspect Mode: Step-By-Step Guide
  • Keep Calm and Column Wise
  • SQL Data Storytelling: A Comprehensive Guide

Partner Resources


Comments

ABOUT US

  • About DZone
  • Send feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: