DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Enterprise AI Trend Report: Gain insights on ethical AI, MLOps, generative AI, large language models, and much more.

2024 Cloud survey: Share your insights on microservices, containers, K8s, CI/CD, and DevOps (+ enter a $750 raffle!) for our Trend Reports.

PostgreSQL: Learn about the open-source RDBMS' advanced capabilities, core components, common commands and functions, and general DBA tasks.

AI Automation Essentials. Check out the latest Refcard on all things AI automation, including model training, data security, and more.

Related

  • Strategies for Effective Shard Key Selection in Sharded Database Architectures
  • Organizing Knowledge With Knowledge Graphs: Industry Trends
  • Too Many Indexes [Comic]
  • Creating AI Data Analyst With DBeaver

Trending

  • Initializing Services in Node.js Application
  • How To Optimize Your Agile Process With Project Management Software
  • WebSocket vs. Server-Sent Events: Choosing the Best Real-Time Communication Protocol
  • Understanding Escape Analysis in Go
  1. DZone
  2. Data Engineering
  3. Databases
  4. Navigating Vector Databases and Search Through the Prism of Colors

Navigating Vector Databases and Search Through the Prism of Colors

Vector index and vector search are widely available in databases. Understanding the core approach makes it simpler to choose and use it.

By 
Keshav Murthy user avatar
Keshav Murthy
DZone Core CORE ·
Oct. 09, 23 · Tutorial
Like (5)
Save
Tweet
Share
3.8K Views

Join the DZone community and get the full member experience.

Join For Free

Vector technology in AI, often referred to with implementations, vector indexes, and vector search, offers a robust mechanism index and query through high-dimensional data entities spanning images, text, audio, and video. Their prowess becomes evident across diverse spectrums like similarity-driven searches, multi-modal retrieval, dynamic recommendation engines, and platforms leveraging the Retrieval Augmented Generation (RAG) paradigm. Due to its potential impact on a multitude of use cases, vectors have emerged as a hot topic. As one delves deeper, attempting to demystify the essence of "what precisely is vector search?", they are often greeted by a barrage of terms — AI, LLM, generative AI — to name a few. This article aims to paint a clearer picture (quite literally) by likening the concept to something we all know: colors.

Infinite hues bloom,

A million shades dance and play,

Colors light our world.

Just the so-called "official colors" span across three long Wikipedia pages. While it's straightforward to store and search these colors by their names using conventional search indices like those in Elastic Search or Couchbase FTS, there's a hitch. Think about the colors Navy and Ocean. Intuitively, they feel closely related, evoking images of deep, serene waters. Yet, linguistically, they share no common ground. This is where traditional search engines hit a wall.

The typical workaround? Synonyms. You could map Navy to a plethora of related terms: blue, azure, ocean, turquoise, sky, and so on. But now, consider the gargantuan task of doing this for every color name. Moreover, these lists don't give us a measure of the closeness between colors. Is azure closer to the navy than the sky? A list won't tell you that. To put it simply, seeking similarities among colors is a daunting task. Trying to craft relationships between colors to gauge their similarity? Even more challenging.

The simple solution to this is the well-known RGB. Encoding colors in the RGB vector scheme solves both the similarity and distance problem.

When we talk about a color's RGB values, we're essentially referencing its coordinates in this 3D space where each dimension can have values ranging from 0 (zero) to 255, totaling 256 values. The vector (R, G, B) is defined by three components: the intensity of Red (R), the intensity of Green (G), and the intensity of Blue (B). Each of these components typically ranges from 0 to 255, allowing for over 16 million, 16777216 to be exact, unique combinations, each representing a distinct color. For instance, the vector (255, 0, 0) signifies the full intensity of red with no contributions from green or blue, resulting in the color red.

RGB

Here are sample RGB values for some colors:

  • Navy: (0, 0, 128)
  • Turquoise: (64, 224, 208)
  • Orange: (255, 165, 0)
  • Green: (0, 128, 0) 
  • Gray: (128, 128, 128)

The three values here can be seen as vectors representing a unique value in the color space containing 16777216 colors. Visualizing RGB values as vectors offers a profound advantage: the spatial proximity of two vectors gives a measure of color similarity. Colors that are close in appearance will have vectors that are close in the RGB space. This vector representation, therefore, not only provides a means to encode colors but also allows for an intuitive understanding of color relationships and similarities.

Similarity Searching

To find colors within an Euclidean distance of 1 from the color (148, 201, 44) in the RGB space, we vary each R, G, and B value by one up and one down to create the search space. This method will generate 3 x 3 x 3 = 27 color combinations but gives us a list of similar colors with specific distances. 

This is like identifying a small cube inside a larger RGB cube...

Identifying smaller cube inside larger RGB cube

Plain Text
 
(147, 200, 43), (147, 200, 44), (147, 200, 45)

(147, 201, 43), (147, 201, 44), (147, 201, 45)

(147, 202, 43), (147, 202, 44), (147, 202, 45)

(148, 200, 43), (148, 200, 44), (148, 200, 45)

(148, 201, 43), (148, 201, 44)  <- This is the original color, (148, 201, 45)

(148, 202, 43), (148, 202, 44), (148, 202, 45)

(149, 200, 43), (149, 200, 44), (149, 200, 45)

(149, 201, 43), (149, 201, 44), (149, 201, 45)

(149, 202, 43), (149, 202, 44), (149, 202, 45)


All these 27 colors are similar to our original colors (148, 201, 44). This principle can be expanded to various distances and multiple ways to calculate the distance. If we were to store, index, and search RGBs in a database, let's see how this is done.

Similarity search on colors via RGB model

Hopefully, this gave you a good understanding of how the RGB models the color schemes and solves the similarity search problem.

Let's replace the RGB model with an LLM model and input text and images about tennis. We then search for "French open." Even though the input text or image didn't include "French open" directly, the effect of the similarity search is that Djokovic and the two tennis images will still be returned! That's the magic of the LLM model and vector search.

Replace the RGB model with an LLM model and input text and images about tennis

Vector indexing and vector search follow the same path. RGB encodes the 16 million colors in 3 bytes. But, the real-world data is more complicated. Languages, images, and videos much. more complicated. Hence, the vector databases use not three, but 300 or 3000 or more dimensions to encode data. Because of this, we need novel methods to store, index, and do similarity searches efficiently. However, the core principle is the same. More on how vector indexing and searching is done in a future blog!

Database Indexing Service

Opinions expressed by DZone contributors are their own.

Related

  • Strategies for Effective Shard Key Selection in Sharded Database Architectures
  • Organizing Knowledge With Knowledge Graphs: Industry Trends
  • Too Many Indexes [Comic]
  • Creating AI Data Analyst With DBeaver

Partner Resources


Comments

ABOUT US

  • About DZone
  • Send feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: