Text search
Search the Index for Objects relevant to the text query.
Key features include:
- Semantic, automatically understanding natural language queries, synonyms, typos, and multiple languages.
- Hybrid, capable of exact matching and approximate matching in one API without requiring you to develop lexical / keyword matching as well as vector-based approximate nearest neighbors (ANN) engines.
- Multimodal, enabling each Object in the index to be described by information coming from different media types such as images, audio and videos,
- Deep, surfacing relevant Highlights from different media by going inside the content and pulling out relevant bits.
Filtering search results
Search results may be filtered based on conditions or exact matches on fields in the Object. Filters are applied by using the filter_query=
query parameter. This parameter takes a string value that represents the filter query to be applied to the results.
URL Encoding
Filter queries should be URL encoded before being sent to the API. This means that filter_query=color:"dark green"
should be sent to the API as filter_query%3Dcolor%3A%22dark%20green%22
. For more info on URL encoding, see https://developer.mozilla.org/en-US/docs/Glossary/Percent-encoding.
Field types
In order to filter on a field, the Index must be configured with this field as filterable
, and the field must have a type. See the create Index documentation for how to index a field as filterable. The following types are supported for filtering:
- string
- int
- float
- bool
- datetime (we support RFC 3339 formatted dates. For example
2015-11-03T15:01:00.05Z
) - geo coordinate
Fields that are indexed as filterable
must have homogenous types. This means that all of the values of the field that are indexed must share the same type. If a value in an Object does not conform to the type defined in the index configuration that Object will fail to index, and thus will not be filterable. This means this Object will not appear in results when filtered upon.
Field names
Field names must contain only alphanumeric characters or underscores (_
) in order to be filterable. This means if your data has a field called “Leg/hem length”, to filter by this field you should rename to leg_hem_length
before pushing to the Object Store.
Supported filters
The following types of filters are supported:
- Exact match
- Boolean operators
- Range filters
- Inclusive and exclusive
- Bounded and unbounded
- Sets
- Geographic range
Exact match
To filter search results by a field, give the field name and the value separated by a colon, i.e. {field name}:{value}
. For example, to filter by color to only green values you would set filter_query=color:"green"
.
To filter by a value with spaces wrap the value in double quotes, i.e. filter_query=color:"dark green"
.
Boolean operators
Filters can be combined using AND, OR and NOT. So to search for items that are red or green, you would set filter_query=color:"green" OR color:"red"
.
Filters can also be combined across fields, so to search for items that are red and shoes you would provide filter_query=color:"red" AND category:"shoes"
. When providing multiple filters you may group them using parenthesis ()
. For example, to search for shoes that are red or green you would provide filter_query=(color:"green" OR color:"red") AND category:shoes
.
Ranges
You can use interval notation to define the set to which a field’s value must belong. Range filters are supported on the following field types:
- int
- float
- date
Your Filter Query will include two brackets with values separated by TO
, where the type of bracket on either side indicates inclusivity (a square bracket) or exclusivity (a curly bracket), for that endpoint of the set. Omitting a value on either side will unbound that side of the expression.
Description | Inequality Notation | Interval Notation |
---|---|---|
between 0 and 10, inclusive | 0 <= 𝑥 <= 10 | [0 TO 10] |
between 0 and 10, exclusive | 0 < 𝑥 < 10 | (0 TO 10) |
greater than or equal to 0 and less than 10 | 0 <= 𝑥 < 10 | [0 TO 10) |
less than or equal to 9.5 | 𝑥 <= 9.5 | (* TO 9.5] |
greater than or equal to 9.5 | 9.5 >= 𝑥 | [9.5 TO *) |
For example, to search for shoes that cost 10`, which would be interpreted as a string.
Sets
Using the IN
operator, a field can be matched against a set of literals, e.g. title: IN ["a" "b" "cd"]
will match documents where title is either a, b or cd, but do so more efficiently than the alternative query title:"a" OR title:"b" OR title:"c"
does. Example: filter_query=title: IN ["a" "b" "cd"]
.
Geographic range
We support filtering by geographic ranges on coordinate data. In order to filter on coordinates Objects must contain data with an object containing lat
and lon
fields. For example:
{
"field_name": {
"lat": 132.232,
"lon": 987.567
}
}
The syntax for filtering by geographic range is:
{field_name}:@{lat},{lon},{radius}
The radius
value is measured in meters. The filter will apply a radius from the coordinates provided and include results in that radius’ distance.
Example filter query:
# only include results within 10 meters from 39.0522,-120.2437
/search?query=test&filter_query=user_location:@39.0522,-120.2437,10
Errors when filtering
When constructing filter queries there are several types of errors that can occur. These errors result in HTTP 429 error codes, indicating that the query could not be applied. In this case the API will respond with HTTP 429 and a JSON containing a message with the error.
Example:
HTTP 429
{
"detail": "Expected a valid string, boolean, integer, float, or datetime literal. Position: 44."
}
Ranking Signals
The order in which the results are returned can be adjusted by leveraging the ranking signals syntax ranking_expr="<your-expression>"
. For a given search query, each object has a relevance score which determines the baseline ranking and can be modified with a ranking expression. This allows you to combine values available in your object store with the relevance score and perform mathematical or logical operations. The syntax can be as simple as ranking_expr="a + b + 1"
or as complex as you need.
Syntax and keywords for "<your-expression>"
Operation | Description / Examples |
---|---|
uncalibrated_relevance (float) | the unmodified relevance score of the object |
object.<property> (varies) | the value of <property> for the object as defined in the object store. Nested fields can be accessed with the dot notation e.g. object.a.b.c |
NOW (float) | the current timestamp in unix time (seconds) |
timestamp(<timestamp>) (float) | function to convert a RFC 3339 <timestamp> string to unix time (seconds), e.g. timestamp("2015-11-03T15:01:00.05Z") |
Operators | ^ , * , / , % , + , - , if , etc. |
Functions | min , max , len , floor , etc. |
Value types | string , boolean , integer , float , tuple , empty |
Variables | foo = 1 |
Comments | // some comment |
See the “Supported Operations” page for more details
Important: The relevance is “uncalibrated” and is only meant to compare objects. The value does not have an intrinsic meaning; its distribution and bounds can vary depending on the evaluation method. We advise against using its value as a cutoff.
Important: The ranking signals are meant as nudges to improving the ordering of objects with otherwise similar relevance. If you find yourself using these signals to drastically modify the relevance score, you possibly have a larger problem that should be addressed at the source.
Simple Example
You created a value in your object store named promotion_boost
that has a value of 0 by default but can be set to 0.01 for objects that are being promoted. The following ranking expression:
uncalibrated_relevance + object.promotion_boost
would score the promoted items higher than similar items.
For another use case, you want to boost a specific brand that is on promotion and write the following expression:
uncalibrated_relevance * if(object.brandName == "SomeBrand", 1.01, 1.0)
so items belonging to “SomeBrand” will be scored 1% higher.
Complex Example
You want to prioritize results that are both recent and relevant by using more advanced operations such mathematical functions and variables. You create the following ranking expression:
decay_time = 60 * 60 * 24; uncalibrated_relevance * (0.8 + 0.2 * math::exp(-(NOW - timestamp(object.details.timestamp)) / decay_time))
which uses the object’s timestamp object.details.timestamp
to decay the uncalibrated_relevance
from 100% (now) to 87% (one day ago) to 80% (>few days ago) of its unmodified value. It also uses an exponential function math::exp
and the <variable> = <value>;
syntax to define a decay_time
of one day.
Requesting Object fields be returned
By default our API returns a “minimal” search result response containing only the ID of the resulting Object. There are several benefits to this including decreased latency, data size on the wire and network fees.
You may request additional fields are returned as part of the "object"
field in the SearchResult
. To do this, list the fields you want returned separated by commas in the object_fields
query paramter.
For example, to return the fields called “title” and “images” you would use object_fields=title,images
.
Requesting all fields in the Object
We support a shorthand to request the entire object by using the wildcard character *
. To request all of the fields in the object use object_fields=*
.
Pagination
Every search API response contains a pagination
section to help facilitate paging through the results. Here is example pagination data for an initial query for the first page of results:
"pagination": {
"pages": 100,
"page": 1,
"next": {
"offset": 10,
"limit": 10
}
Using the offset
and limit
fields in the next
object, you can execute a second request to get the next page of results by adding &offset=10&limit=10
to your query (or by setting the limit
and offset
properties if you’re using an SDK). The offset
parameter is the number of results to skip and the limit
parameter is the number of results to return.
To request results from a specific page, you can multiply the offset
by the desired page
number minus 1 to get the correct offset value. For example, if you are limiting searches to 10 objects per page, you can jump to the 10th page of results by multiplying the offset
by 9: &limit=10&offset=90
.
The pages
field in the pagination
object is the total number of pages available.
Compression
Compressing results leads to lower latency and smaller size on the wire. We suggest all clients use compression when calling our search API. You may set the content encoding via the (Accept-Encoding)[https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Encoding] header. We suggest using the brotli algorithm by setting Accept-Encoding: br
. We support the following values: br, gzip, deflate.
Example usage: Accept-Encoding: br,gzip,deflate
View API Reference
See how to programmatically search an Index