96 DBMS_HYBRID_VECTOR

The DBMS_HYBRID_VECTOR package contains a JSON-based SEARCH API that lets you search by keywords and vectors against hybrid vector indexes. By integrating traditional keyword-based text search with vector-based similarity search, you can improve the overall search experience and provide users with more accurate information.

96.1 SEARCH

Use the DBMS_HYBRID_VECTOR.SEARCH PL/SQL function to run textual queries, vector similarity queries, or hybrid queries against hybrid vector indexes.

Purpose

To search by vectors and keywords. This function lets you perform the following tasks:

  • Facilitate a combined (hybrid) query of textual documents and vectorized chunks:

    You can query a hybrid vector index in multiple vector and keyword search combinations called search modes, as described in Understand Hybrid Search.

  • Fuse and reorder the search results:

    Because a hybrid query by keywords and vectors involves independent searches on text and vector indexes, this API merges (fuses) the two search results into a unified result set as CLOB, which is reordered by a combined score. The results are fused using a range of fusion set operators. The new relevance scores are evaluated using standard algorithms such as Reciprocal Rank Fusion (RRF) and Relative Score Fusion (RSF).

  • Run a default query for a simplified search experience:

    The minimum input parameters required are the hybrid vector index name and a search text string. The same search text string is used to query against a vectorized chunk index and a text document index. The search text is transformed into a CONTAINS query for keyword search, and is vectorized for a VECTOR_DISTANCE query for semantic search. By default, the results are fused using INTERSECT and scored using RSF.

Syntax

DBMS_HYBRID_VECTOR.SEARCH(
   json(
     '{ "hybrid_index_name" :  "hybrid_vector_index_name",                  
         "search_text"      :  "query string for vector and keyword search", 
         "search_scorer"    :  "RRF | RSF",                      
         "search_fusion"    :  "INTERSECT | UNION | TEXT_ONLY | VECTOR_ONLY | MINUS_TEXT | MINUS_VECTOR",         
         "vector":
          {
           "search_text"    :  "query string for vector search",            
           "search_vector"  :  "[vector_embedding]",           
           "search_mode"    :  "DOCUMENT | CHUNK",               
           "aggregator"     :  "MAX | AVG | MEDIAN | AVGN | MEDN | MAXAVGMED",   
           "score_weight"   :  score_weight,
           "rank_penalty"   :  rank_penalty      
          },
         "text":
          {
           "contains"       :  "query keyword for text search",               
           "score_weight"   :  score_weight,
           "rank_penalty"   :  rank_penalty    
          },
         "return":
          {
           "topN"           :   "topN_value",                               
           "values"         :   ["rowid | score | vector_score | text_score | vector_rank | text_rank | chunk_text | chunk_id"],         
           "format"         :   "JSON | XML"                              
         }
     }'
  )
)

This API accepts a JSON specification for all query parameters.

hybrid_index_name

Specify the name of the hybrid vector index to use.

For information on how to create a hybrid vector index if not already created, see Manage Hybrid Vector Indexes.

search_text

Specify the search text (one or more query input), depending on the search construct to use.

This API supports the following two constructs of search:

  • With common search text for hybrid search (default setting).

    The same text is used for a keyword query on document text index (by converting the search_text into a CONTAINS ACCUM operator syntax) and a semantic query on vectorized chunk index (by vectorizing or embedding the search_text for a VECTOR_DISTANCE search).

    Example of hybrid search with common search text:

    DBMS_HYBRID_VECTOR.SEARCH(
        json('{ "hybrid_index_name" : "my_hybrid_idx",
                "search_text"       : "C, Python"
              }'))
    from dual;
  • With separate search texts using vector and text sub-elements:

    • Text-only search: The search text is converted into a CONTAINS query operator syntax for pure keyword search.

      Example of text-only search:

      DBMS_HYBRID_VECTOR.SEARCH(
          json('{ "hybrid_index_name" : "my_hybrid_idx",
                  "text":
                         { "contains" : "C and Python" }
                }'))
      from dual;
    • Vector-only search: The search text is converted into a vector embedding, and is used in VECTOR_DISTANCE.

      Example of vector-only search:

      Here you use the search_text parameter to specify query input for vector search.

      DBMS_HYBRID_VECTOR.SEARCH(
          json('{ "hybrid_index_name" : "my_hybrid_idx",
                  "vector":
                          { "search_text" : "C, Python }
                }'))
      from dual;

      Here you use the search_vector parameter to directly pass a vector embedding as query input for your vector search.

      dbms_hybrid_vector.search(
          json('{ "hybrid_index_name" : "my_hybrid_idx",
               "vector":
                        {
                         "search_vector" : vector_serialize(
                                                  vector_embedding(doc_model
                                                              using
                                                              "C, Python, Database"
                                                              as data)
                                                          RETURNING CLOB), 
                        },}')) 
      from dual;
    • Hybrid search: Conducts two separate keyword and semantic queries, where keyword scores and semantic scores are combined. Specify two different search texts: One search text or a vector embedding to run a VECTOR_DISTANCE query for semantic search. A second search text to run a CONTAINS query for keyword search.

      Example of hybrid search with separate search texts:

      DBMS_HYBRID_VECTOR.SEARCH(
          json('{ "hybrid_index_name" : "my_hybrid_idx",
                  "vector":
                          { "search_text" : "leadership experience" },
                   "text":
                          { "contains" : "C and Python" }
                }'))
      from dual;

    Note:

    You cannot use both the constructs in a query, that is, a common search_text field along with separate search texts (using the vector or text sub-elements).

search_scorer

Specify the method to evaluate the combined "fusion" search scores from separate text and vector searches. The values can be:

  • RSF to use the relative score fusion algorithm (default setting)

  • RRF to use the reciprocal rank fusion algorithm

For a deeper understanding of how these algorithms work in hybrid search modes, see Understand Hybrid Search.

Example of search_scorer (hybrid search with common search text):

DBMS_HYBRID_VECTOR.SEARCH(
    json(
      '{ "hybrid_index_name" : "my_hybrid_idx",
         "search_text"       : "C, Python",
         "search_scorer"     : "rsf"
      }'))
from dual;

Example of search_scorer (hybrid search with separate search texts):

DBMS_HYBRID_VECTOR.SEARCH(
    json(
      '{ "hybrid_index_name" : "my_hybrid_idx",
         "search_scorer"     : "rsf",
         "vector":
          { "search_text"   : "leadership experience" },
         "text":
          { "contains"      : "C and Python" }
      }'))
from dual;

search_fusion

Specify the method to combine the search results for hybrid search modes. Note that vector-only and text-only searches do not combine any results.

Parameter Description

INTERSECT (default)

Combines all distinct rows selected by both text and vector searches.

Score condition: text_score > 0 AND vector_score > 0

UNION

Combines all distinct rows selected by either text or vector search.

Score condition: text_score > 0 OR vector_score > 0

TEXT_ONLY

Combines all distinct rows selected by text search.

Score condition: text_score > 0

VECTOR_ONLY

Combines all distinct rows selected by vector-only search.

Score condition: vector_score > 0

MINUS_TEXT

Combines all distinct rows selected by vector search but not text search.

Score condition: text_score = 0

MINUS_VECTOR

Combines all distinct rows selected by text search but not vector search.

Score condition: vector_score = 0

Example of search_fusion:

DBMS_HYBRID_VECTOR.SEARCH(
    json('{ "hybrid_index_name" : "my_hybrid_idx",
            "search_fusion"     : "INTERSECT",
            "vector":
                    { "search_text" : "leadership experience" },
             "text":
                    { "contains" : "C and Python" }
          }'))
from dual;

vector

Specify query parameters for a VECTOR_DISTANCE query on the vector index part of the hybrid vector index:

  • search_text: Search text string to use for vector search.

  • search_vector: Vector embedding, if available. Note that the vector embedding to be passed here must be generated using the same embedding model used for vector similarity search by the specified hybrid vector index.

  • search_mode: Mode of the vector search at a document or chunk level:

    Parameter Description

    DOCUMENT (default)

    Returns the top-matched row IDs from your base table corresponding to the list of best documents identified.

    CHUNK

    Returns the top-matched chunk IDs and chunk texts from the files stored in your base table, regardless of whether the chunks come from the same document or different documents. The content from these chunks can be used as input for LLMs to formulate responses.

  • aggregator: Aggregate function to apply for aggregating the vector scores for each document in DOCUMENT SEARCH_MODE.

    Parameter Description

    MAX (default)

    Standard database aggregate function that selects the top chunk score as the result score. With this function, TopN and minScore are ignored.

    AVG

    Standard database aggregate function that sums the chunk scores and divides by the count. With this function, TopN and minScore are ignored.

    MEDIAN

    Standard database aggregate function that computes the middle value or an interpolated value of the sorted scores. With this function, TopN and minScore are ignored.

    AVGN(topN,minScore)

    This is similar to the standard database AVG aggregate, but it sums the topN chunk scores (>= minScore) divided by topN.

    MEDN(topN,minScore)

    This is the similar to the standard database MEDIAN aggregate, but it computes the median value of the sorted topN scores (>= minScore).

    MAXAVGMED(topN,minScore,maxWgt,avgWgt,medWgt)

    This function computes a weighted sum of the MAX, AVGN, and MEDN values (with topN and minScore parameters in effect). The max, average, and median weights are passed as arg3, arg4, and arg5.

    MAXAVGMED = (MAX * maxWgt + AVGN * avgWgt + MEDN * medWgt) / (maxWgt + avgWgt + medWgt)

    If weight of zero is passed, then that corresponding value is not used.

  • score_weight: Relative weight (degree of importance or preference) to assign to the semantic VECTOR_DISTANCE query. This value is used when combining the results of RSF ranking.

    Value: Any positive integer greater than 0 (zero)

    Default: 10 (implies 10 times more importance to vector query than text query)

  • rank_penalty: Penalty (degree of reduction in the relevance score) to assign to vector query. This can help in balancing the relevance score by reducing the importance of unnecessary or repetitive words in a document. This value is used when combining the results of RRF ranking.

    Value: Any positive integer greater than 0 (zero)

    Default: 1 (implies a minimal reduction of relevance score by 1 point)

Example of vector-only search (chunk result)::

DBMS_HYBRID_VECTOR.SEARCH(
    json(
      '{ "hybrid_index_name" : "my_hybrid_idx",
         "vector":
          {
             "search_text"   : "leadership experience",
             "search_mode"   : "CHUNK"
          },
      }'))
from dual;

Example of vector-only search (document result):

DBMS_HYBRID_VECTOR.SEARCH(
    json(
      '{ "hybrid_index_name" : "my_hybrid_idx",
         "vector":
          {
             "search_text"   : "leadership experience",
             "search_mode"   : "DOCUMENT",
             "aggregator"    : "MAX"
          },
      }'))
from dual;

text

Specify query parameters for a CONTAINS query on the Oracle Text index part of the hybrid vector index:

Parameter Description

contains

Oracle Text index operator to specify query expressions for full-text search. See Oracle Text CONTAINS Query Operators.

score_weight

Relative weight (degree of importance or preference) to assign to the text CONTAINS query. This value is used when combining the results of RSF ranking.

Value: Any positive integer greater than 0 (zero)

Default: 1 (implies neutral weight)

rank_penalty

Penalty (degree of reduction in the relevance score) to assign to text query. This can help in balancing the relevance score by reducing the importance of unnecessary or repetitive words in a document. This value is used when combining the results of RRF ranking.

Value: Any positive integer greater than 0 (zero)

Default: 5 (implies a reduction of relevance score by 5 points)

Example of text-only search with all text sub-elements:

DBMS_HYBRID_VECTOR.SEARCH(
    json(
      '{ "hybrid_index_name" : "my_hybrid_idx",
         "text":
          {
             "contains"      : "C and Python",
             "score_weight"  : 1,
             "rank_penalty"  : 5
          },
      }'))
from dual;

return

Specify which fields to return in the result set:

Parameter Description

topN

Maximum number of best-matched results to be returned

Value: Any integer greater than 0 (zero)

Default: 20

values

Return attributes for the search results:

  • rowid: Row ID value for each source document

  • score: Combined text-and-vector search score

  • vector_score: Vector-only search score

  • text_score: Text-only search score

  • vector_rank: Vector-only search rank

  • text_rank: Text-only search rank

  • chunk_text: Human-readable content from each chunk

  • chunk_id: ID of each chunk text

Default: All the above return attributes are shown by default.

format

Format of the results as JSON (default) or XML.

Example of return sub-elements:

DBMS_HYBRID_VECTOR.SEARCH(
    json(
      '{ "hybrid_index_name" : "my_hybrid_idx",
         "search_text"       : "C, Python",
         "return":
          {
             "values"        : [ "rowid", "score" ],
             "topN"          : 3,
             "format"        : "JSON"
          }
      }'))
from dual;

Complete Example With All Query Parameters

The following example shows a hybrid search query that performs separate text and vector searches against my_hybrid_idx. This query specifies the search_text for vector search using the vector_distance function as prioritize teamwork and leadership experience and the keyword for text search using the contains Oracle Text operator as C and Python. The search mode is DOCUMENT to return the search results as topN documents.

select json_Serialize(
  DBMS_HYBRID_VECTOR.SEARCH(
    json(
      '{ "hybrid_index_name" : "my_hybrid_idx",
         "search_scorer"     : "rsf",
         "search_fusion"     : "INTERSECT",
         "vector":
          {
             "search_text"   : "prioritize teamwork and leadership experience",
             "search_mode"   : "DOCUMENT",
             "score_weight"  : 10,
             "rank_penalty"  : 1,
             "aggregator"    : "MAX"
          },
         "text":
          {
             "contains"      : "C and Python",
             "score_weight"  : 1,
             "rank_penalty"  : 5
          },
         "return":
          {
             "format"        : "JSON",
             "topN"          : 3,
             "values"        : [ "rowid", "score", "vector_score",
                                 "text_score", "vector_rank",
                                 "text_rank", "chunk_text", "chunk_id" ]
          }
      }'
    )
  ) pretty)
from dual;

The top 3 rows are ordered by relevance, with higher scores indicating a better match. All the return attributes are shown by default:

[
  {
    "rowid"         : "AAAR9jAABAAAQeaAAA",
    "score"         : 58.64,
    "vector_score"  : 61,
    "text_score"    : 35,
    "vector_rank"   : 1,
    "text_rank"     : 2,
    "chunk_text"    : "Candidate 1: C Master. Optimizes low-level system (i.e. Database)
                       performance with C. Strong leadership skills in guiding teams to 
                       deliver complex projects.",
    "chunk_id"      : "1"
  },
  {
    "rowid"         : "AAAR9jAABAAAQeaAAB",
    "score"         : 56.86,
    "vector_score"  : 55.75,
    "text_score"    : 68,
    "vector_rank"   : 3,
    "text_rank"     : 1,
    "chunk_text"    : "Candidate 3: Full-Stack Developer. Skilled in Database, C, HTML,
                       JavaScript, and Python with experience in building responsive web 
                       applications. Thrives in collaborative team environments.",
    "chunk_id"      : "1"
  },
  {
    "rowid"         : "AAAR9jAABAAAQeaAAD",
    "score"         : 51.67,
    "vector_score"  : 56.64,
    "text_score"    : 2,
    "vector_rank"   : 2,
    "text_rank"     : 3,
    "chunk_text"    : "Candidate 2: Database Administrator (DBA). Maintains and secures
                       enterprise database (Oracle, MySql, SQL Server). Passionate about 
                       data integrity and optimization. Strong mentor for junior DBA(s).",
    "chunk_id"      : "1"
  }
]

End-to-end example:

To see how to create a hybrid vector index and explore all types of queries against the index, see Query Hybrid Vector Indexes End-to-End Example.