VECTOR_DISTANCE

VECTOR_DISTANCE is the main function that you can use to calculate the distance between two vectors.

Purpose

VECTOR_DISTANCE takes two vectors as parameters. You can optionally specify a distance metric to calculate the distance. If you do not specify a distance metric, then the default distance metric is Cosine distance. If the input vectors are BINARY vectors, the default metric is Hamming distance.

You can optionally use the following shorthand vector distance functions:

  • L1_DISTANCE

  • L2_DISTANCE

  • COSINE_DISTANCE

  • INNER_PRODUCT

  • HAMMING_DISTANCE

  • JACCARD_DISTANCE

All the vector distance functions take two vectors as input and return the distance between them as a BINARY_DOUBLE.

Note the following caveats, if you use VECTOR_DISTANCE to perform a similarity search:
  • If you specify a metric as the third argument, then that metric it is used .

  • If you do not specify a metric, then the following rule applies:

    • In the case of a single vector column as in: VECTOR_DISTANCE(vec1, :bind), if there is a vector index defined on vec1, then the metric used when defining the vector index is used.

      If no vector index is defined on vec1, then the COSINE metric is used.

    • In the case of multiple vector columns as in: VECTOR_DISTANCE(vec1, vec2), or VECTOR_DISTANCE(vec1+vec2, :bind), then for all indexed columns, if their metrics in the definition are the same, then that metric is used.

      On the other hand, if the indexed columns do not have a common metric, or none of the columns have an index defined, then the metric COSINE is used.

  • If you specify a distance metric that conflicts with the distance metric specified in a vector index, then the distance metric that you specify is used to perform the exact search.

  • If you specify a distance metric that matches the distance metric specified in a vector index, then the distance metric specified in the vector index is used for both exact and approximate (index-based) searches.

  • You must specify the same metric as the index's metric in order to do a similarity search.

Parameters

  • expr1 and expr2 must evaluate to vectors and have the same format and number of dimensions.

    If you use JACCARD_DISTANCE or the JACCARD metric, then expr1 and expr2 must evaluate to BINARY vectors.

  • This function returns NULL if either expr1 or expr2 is NULL.

  • metric must be one of the following tokens :

    • COSINE metric is the default metric. It calculates the cosine distance between two vectors.

    • DOT metric calculates the negated dot product of two vectors. The INNER_PRODUCT function calculates the dot product, as in the negation of this metric.

    • EUCLIDEAN metric, also known as L2 distance, calculates the Euclidean distance between two vectors.

    • EUCLIDEAN_SQUARED metric, also called L2_SQUARED, is the Euclidean distance without taking the square root.

    • HAMMING metric calculates the hamming distance between two vectors by counting the number dimensions that differ between the two vectors.

    • MANHATTAN metric, also known as L1 distance or taxicab distance, calculates the Manhattan distance between two vectors.

    • JACCARD metric calculates the Jaccard distance. The two vectors used in the query must be BINARY vectors.

Shorthand Operators for Distances

Syntax

  • expr1 <-> expr2

    <-> is the Euclidean distance operator: expr1 <-> expr2 is equivalent to L2_DISTANCE(expr1, expr2) or VECTOR_DISTANCE(expr1, expr2, EUCLIDEAN)

  • expr1 <=> expr2

    <=> is the cosine distance operator: expr1 <=> expr2 is equivalent toCOSINE_DISTANCE(expr1, expr2) or VECTOR_DISTANCE(expr1, expr2, COSINE)

  • expr1 <#> expr2

    <#> is the negative dot product operator: expr1 <#> expr2 is equivalent to -1*INNER_PRODUCT(expr1, expr2) or VECTOR_DISTANCE(expr1, expr2, DOT)

Examples Using Shorthand Operators for Distances

 '[1, 2]' <-> '[0,1]'
v1 <-> '[' || '1,2,3' || ']' is equivalent to v1 <-> '[1, 2, 3]'
v1 <-> '[1,2]' is equivalent to L2_DISTANCE(v1, '[1,2]')
v1 <=> v2 is equivalent to COSINE_DISTANCE(v1, v2)
 v1 <#> v2 is equivalent to -1*INNER_PRODUCT(v1, v2)

Examples

VECTOR_DISTANCE with metric EUCLIDEAN is equivalent to L2_DISTANCE:

VECTOR_DISTANCE(expr1, expr2, EUCLIDEAN);
L2_DISTANCE(expr1, expr2);

VECTOR_DISTANCE with metric COSINE is equivalent to COSINE_DISTANCE:

VECTOR_DISTANCE(expr1, expr2, COSINE);
COSINE_DISTANCE(expr1, expr2);

VECTOR_DISTANCE with metric DOT is equivalent to -1 * INNER_PRODUCT:

VECTOR_DISTANCE(expr1, expr2, DOT);
-1*INNER_PRODUCT(expr1, expr2);

VECTOR_DISTANCE with metric MANHATTAN is equivalent to L1_DISTANCE:

VECTOR_DISTANCE(expr1, expr2, MANHATTAN);
L1_DISTANCE(expr1, expr2);

VECTOR_DISTANCE with metric HAMMING is equivalent to HAMMING_DISTANCE:

VECTOR_DISTANCE(expr1, expr2, HAMMING);
HAMMING_DISTANCE(expr1, expr2);

VECTOR_DISTANCE with metric JACCARD is equivalent to JACCARD_DISTANCE:

VECTOR_DISTANCE(expr1, expr2, JACCARD);
JACCARD_DISTANCE(expr1, expr2);