96 DBMS_HYBRID_VECTOR
The DBMS_HYBRID_VECTOR package contains a JSON-based SEARCH API that lets you search by keywords and vectors against hybrid vector indexes. By integrating traditional keyword-based text search with vector-based similarity search, you can improve the overall search experience and provide users with more accurate information.
Related Topics
96.1 SEARCH
Use the DBMS_HYBRID_VECTOR.SEARCH PL/SQL function to run textual queries, vector similarity queries, or hybrid queries against hybrid vector indexes.
Purpose
To search by vectors and keywords. This function lets you perform the following tasks:
-
Facilitate a combined (hybrid) query of textual documents and vectorized chunks:
You can query a hybrid vector index in multiple vector and keyword search combinations called search modes, as described in Understand Hybrid Search.
-
Fuse and reorder the search results:
Because a hybrid query by keywords and vectors involves independent searches on text and vector indexes, this API merges (fuses) the two search results into a unified result set as
CLOB, which is reordered by a combined score. The results are fused using a range of fusion set operators. The new relevance scores are evaluated using standard algorithms such as Reciprocal Rank Fusion (RRF) and Relative Score Fusion (RSF). -
Run a default query for a simplified search experience:
The minimum input parameters required are the hybrid vector index name and a search text string. The same search text string is used to query against a vectorized chunk index and a text document index. The search text is transformed into a
CONTAINSquery for keyword search, and is vectorized for aVECTOR_DISTANCEquery for semantic search. By default, the results are fused usingINTERSECTand scored using RSF.
Syntax
DBMS_HYBRID_VECTOR.SEARCH(
json(
'{ "hybrid_index_name" : "hybrid_vector_index_name",
"search_text" : "query string for vector and keyword search",
"search_scorer" : "RRF | RSF",
"search_fusion" : "INTERSECT | UNION | TEXT_ONLY | VECTOR_ONLY | MINUS_TEXT | MINUS_VECTOR",
"vector":
{
"search_text" : "query string for vector search",
"search_vector" : "[vector_embedding]",
"search_mode" : "DOCUMENT | CHUNK",
"aggregator" : "MAX | AVG | MEDIAN | AVGN | MEDN | MAXAVGMED",
"score_weight" : score_weight,
"rank_penalty" : rank_penalty
},
"text":
{
"contains" : "query keyword for text search",
"score_weight" : score_weight,
"rank_penalty" : rank_penalty
},
"return":
{
"topN" : "topN_value",
"values" : ["rowid | score | vector_score | text_score | vector_rank | text_rank | chunk_text | chunk_id"],
"format" : "JSON | XML"
}
}'
)
)This API accepts a JSON specification for all query parameters.
hybrid_index_name
Specify the name of the hybrid vector index to use.
For information on how to create a hybrid vector index if not already created, see Manage Hybrid Vector Indexes.
search_text
Specify the search text (one or more query input), depending on the search construct to use.
This API supports the following two constructs of search:
-
With common search text for hybrid search (default setting).
The same text is used for a keyword query on document text index (by converting the
search_textinto aCONTAINS ACCUMoperator syntax) and a semantic query on vectorized chunk index (by vectorizing or embedding thesearch_textfor aVECTOR_DISTANCEsearch).Example of hybrid search with common search text:
DBMS_HYBRID_VECTOR.SEARCH( json('{ "hybrid_index_name" : "my_hybrid_idx", "search_text" : "C, Python" }')) from dual; -
With separate search texts using
vectorandtextsub-elements:-
Text-only search: The search text is converted into a
CONTAINSquery operator syntax for pure keyword search.Example of text-only search:
DBMS_HYBRID_VECTOR.SEARCH( json('{ "hybrid_index_name" : "my_hybrid_idx", "text": { "contains" : "C and Python" } }')) from dual; -
Vector-only search: The search text is converted into a vector embedding, and is used in
VECTOR_DISTANCE.Example of vector-only search:
Here you use the
search_textparameter to specify query input for vector search.DBMS_HYBRID_VECTOR.SEARCH( json('{ "hybrid_index_name" : "my_hybrid_idx", "vector": { "search_text" : "C, Python } }')) from dual;Here you use the
search_vectorparameter to directly pass a vector embedding as query input for your vector search.dbms_hybrid_vector.search( json('{ "hybrid_index_name" : "my_hybrid_idx", "vector": { "search_vector" : vector_serialize( vector_embedding(doc_model using "C, Python, Database" as data) RETURNING CLOB), },}')) from dual; -
Hybrid search: Conducts two separate keyword and semantic queries, where keyword scores and semantic scores are combined. Specify two different search texts: One search text or a vector embedding to run a
VECTOR_DISTANCEquery for semantic search. A second search text to run aCONTAINSquery for keyword search.Example of hybrid search with separate search texts:
DBMS_HYBRID_VECTOR.SEARCH( json('{ "hybrid_index_name" : "my_hybrid_idx", "vector": { "search_text" : "leadership experience" }, "text": { "contains" : "C and Python" } }')) from dual;
Note:
You cannot use both the constructs in a query, that is, a commonsearch_textfield along with separate search texts (using thevectorortextsub-elements). -
search_scorer
Specify the method to evaluate the combined "fusion" search scores from separate text and vector searches. The values can be:
-
RSFto use the relative score fusion algorithm (default setting) -
RRFto use the reciprocal rank fusion algorithm
For a deeper understanding of how these algorithms work in hybrid search modes, see Understand Hybrid Search.
Example of search_scorer (hybrid search with common search text):
DBMS_HYBRID_VECTOR.SEARCH(
json(
'{ "hybrid_index_name" : "my_hybrid_idx",
"search_text" : "C, Python",
"search_scorer" : "rsf"
}'))
from dual;Example of search_scorer (hybrid search with separate search texts):
DBMS_HYBRID_VECTOR.SEARCH(
json(
'{ "hybrid_index_name" : "my_hybrid_idx",
"search_scorer" : "rsf",
"vector":
{ "search_text" : "leadership experience" },
"text":
{ "contains" : "C and Python" }
}'))
from dual;search_fusion
Specify the method to combine the search results for hybrid search modes. Note that vector-only and text-only searches do not combine any results.
| Parameter | Description |
|---|---|
|
|
Combines all distinct rows selected by both text and vector searches. Score condition: |
|
|
Combines all distinct rows selected by either text or vector search. Score condition: |
|
|
Combines all distinct rows selected by text search. Score condition: |
|
|
Combines all distinct rows selected by vector-only search. Score condition: |
|
|
Combines all distinct rows selected by vector search but not text search. Score condition: |
|
|
Combines all distinct rows selected by text search but not vector search. Score condition: |
Example of search_fusion:
DBMS_HYBRID_VECTOR.SEARCH(
json('{ "hybrid_index_name" : "my_hybrid_idx",
"search_fusion" : "INTERSECT",
"vector":
{ "search_text" : "leadership experience" },
"text":
{ "contains" : "C and Python" }
}'))
from dual;vector
Specify query parameters for a VECTOR_DISTANCE query on the vector index part of the hybrid vector index:
-
search_text: Search text string to use for vector search. -
search_vector: Vector embedding, if available. Note that the vector embedding to be passed here must be generated using the same embedding model used for vector similarity search by the specified hybrid vector index. -
search_mode: Mode of the vector search at a document or chunk level:Parameter Description DOCUMENT(default)Returns the top-matched row IDs from your base table corresponding to the list of best documents identified.
CHUNKReturns the top-matched chunk IDs and chunk texts from the files stored in your base table, regardless of whether the chunks come from the same document or different documents. The content from these chunks can be used as input for LLMs to formulate responses.
-
aggregator: Aggregate function to apply for aggregating the vector scores for each document inDOCUMENT SEARCH_MODE.Parameter Description MAX(default)Standard database aggregate function that selects the top chunk score as the result score. With this function,
TopNandminScoreare ignored.AVGStandard database aggregate function that sums the chunk scores and divides by the count. With this function,
TopNandminScoreare ignored.MEDIANStandard database aggregate function that computes the middle value or an interpolated value of the sorted scores. With this function,
TopNandminScoreare ignored.AVGN(topN,minScore)This is similar to the standard database
AVGaggregate, but it sums thetopNchunk scores (>= minScore) divided bytopN.MEDN(topN,minScore)This is the similar to the standard database
MEDIANaggregate, but it computes the median value of the sortedtopNscores (>= minScore).MAXAVGMED(topN,minScore,maxWgt,avgWgt,medWgt)This function computes a weighted sum of the
MAX,AVGN, andMEDNvalues (withtopNandminScoreparameters in effect). The max, average, and median weights are passed asarg3,arg4, andarg5.MAXAVGMED = (MAX * maxWgt + AVGN * avgWgt + MEDN * medWgt) / (maxWgt + avgWgt + medWgt)If weight of zero is passed, then that corresponding value is not used.
-
score_weight: Relative weight (degree of importance or preference) to assign to the semanticVECTOR_DISTANCEquery. This value is used when combining the results of RSF ranking.Value: Any positive integer greater than
0(zero)Default:
10(implies 10 times more importance to vector query than text query) -
rank_penalty: Penalty (degree of reduction in the relevance score) to assign to vector query. This can help in balancing the relevance score by reducing the importance of unnecessary or repetitive words in a document. This value is used when combining the results of RRF ranking.Value: Any positive integer greater than
0(zero)Default:
1(implies a minimal reduction of relevance score by 1 point)
Example of vector-only search (chunk result)::
DBMS_HYBRID_VECTOR.SEARCH(
json(
'{ "hybrid_index_name" : "my_hybrid_idx",
"vector":
{
"search_text" : "leadership experience",
"search_mode" : "CHUNK"
},
}'))
from dual;Example of vector-only search (document result):
DBMS_HYBRID_VECTOR.SEARCH(
json(
'{ "hybrid_index_name" : "my_hybrid_idx",
"vector":
{
"search_text" : "leadership experience",
"search_mode" : "DOCUMENT",
"aggregator" : "MAX"
},
}'))
from dual;text
Specify query parameters for a CONTAINS query on the Oracle Text index part of the hybrid vector index:
| Parameter | Description |
|---|---|
|
|
Oracle Text index operator to specify query expressions for full-text search. See Oracle Text CONTAINS Query Operators. |
|
|
Relative weight (degree of importance or preference) to assign to the text Value: Any positive integer greater than Default: |
|
|
Penalty (degree of reduction in the relevance score) to assign to text query. This can help in balancing the relevance score by reducing the importance of unnecessary or repetitive words in a document. This value is used when combining the results of RRF ranking. Value: Any positive integer greater than Default: |
Example of text-only search with all text sub-elements:
DBMS_HYBRID_VECTOR.SEARCH(
json(
'{ "hybrid_index_name" : "my_hybrid_idx",
"text":
{
"contains" : "C and Python",
"score_weight" : 1,
"rank_penalty" : 5
},
}'))
from dual;return
Specify which fields to return in the result set:
| Parameter | Description |
|---|---|
|
|
Maximum number of best-matched results to be returned Value: Any integer greater than Default: |
|
|
Return attributes for the search results:
Default: All the above return attributes are shown by default. |
|
|
Format of the results as |
Example of return sub-elements:
DBMS_HYBRID_VECTOR.SEARCH(
json(
'{ "hybrid_index_name" : "my_hybrid_idx",
"search_text" : "C, Python",
"return":
{
"values" : [ "rowid", "score" ],
"topN" : 3,
"format" : "JSON"
}
}'))
from dual;Complete Example With All Query Parameters
The following example shows a hybrid search query that performs separate text and vector searches against my_hybrid_idx. This query specifies the search_text for vector search using the vector_distance function as prioritize teamwork and leadership experience and the keyword for text search using the contains Oracle Text operator as C and Python. The search mode is DOCUMENT to return the search results as topN documents.
select json_Serialize(
DBMS_HYBRID_VECTOR.SEARCH(
json(
'{ "hybrid_index_name" : "my_hybrid_idx",
"search_scorer" : "rsf",
"search_fusion" : "INTERSECT",
"vector":
{
"search_text" : "prioritize teamwork and leadership experience",
"search_mode" : "DOCUMENT",
"score_weight" : 10,
"rank_penalty" : 1,
"aggregator" : "MAX"
},
"text":
{
"contains" : "C and Python",
"score_weight" : 1,
"rank_penalty" : 5
},
"return":
{
"format" : "JSON",
"topN" : 3,
"values" : [ "rowid", "score", "vector_score",
"text_score", "vector_rank",
"text_rank", "chunk_text", "chunk_id" ]
}
}'
)
) pretty)
from dual;The top 3 rows are ordered by relevance, with higher scores indicating a better match. All the return attributes are shown by default:
[
{
"rowid" : "AAAR9jAABAAAQeaAAA",
"score" : 58.64,
"vector_score" : 61,
"text_score" : 35,
"vector_rank" : 1,
"text_rank" : 2,
"chunk_text" : "Candidate 1: C Master. Optimizes low-level system (i.e. Database)
performance with C. Strong leadership skills in guiding teams to
deliver complex projects.",
"chunk_id" : "1"
},
{
"rowid" : "AAAR9jAABAAAQeaAAB",
"score" : 56.86,
"vector_score" : 55.75,
"text_score" : 68,
"vector_rank" : 3,
"text_rank" : 1,
"chunk_text" : "Candidate 3: Full-Stack Developer. Skilled in Database, C, HTML,
JavaScript, and Python with experience in building responsive web
applications. Thrives in collaborative team environments.",
"chunk_id" : "1"
},
{
"rowid" : "AAAR9jAABAAAQeaAAD",
"score" : 51.67,
"vector_score" : 56.64,
"text_score" : 2,
"vector_rank" : 2,
"text_rank" : 3,
"chunk_text" : "Candidate 2: Database Administrator (DBA). Maintains and secures
enterprise database (Oracle, MySql, SQL Server). Passionate about
data integrity and optimization. Strong mentor for junior DBA(s).",
"chunk_id" : "1"
}
]End-to-end example:
To see how to create a hybrid vector index and explore all types of queries against the index, see Query Hybrid Vector Indexes End-to-End Example.