Search
Search for documents matching a specific query in the given index.
POST /indexes/{index_name}/search
Path parameters
Name | Type | Description |
---|---|---|
index_name |
String | name of the requested index |
Body
The body parameters below would be used for HTTP requests (if you were using cURL, for example). Python client users
should use the pythonic snakecase equivalents (for example, searchable_attributes
rather than searchableAttributes
).
Search Parameter | Type | Default value | Description |
---|---|---|---|
q |
String OR Dict | null |
Query string, weighted query strings or custom vector object. Optional for tensor search if context parameter is used. |
limit |
Integer | 10 |
Maximum number of documents to be returned |
offset |
Integer | 0 |
Number of documents to skip (used for pagination) |
filter |
String | null |
Filter string in the Marqo DSL Language. In the Python client this parameter is called filter_string : mq.search("my query", filter_string="country:(United States)") |
searchableAttributes |
Array of strings | null |
Attributes to be queried during the search. Only supported in structured indexes |
Boolean | true |
Return highlights for the document match. Only applicable for TENSOR search. With LEXICAL search, highlights will always be [] . |
|
searchMethod |
String | "TENSOR" |
The search method, can be LEXICAL, TENSOR or HYBRID. |
hybridParameters |
Dict | null |
Parameters used for hybrid search. |
attributesToRetrieve |
Array of strings | null |
Attributes to return in the search response |
efSearch |
Integer | 2000 |
efSearch is the size of the dynamic list for the nearest neighbors (used during the search) - higher gives better recall at the cost of latency. Also efSearch must be greater than limit and limit is capped at 400 |
approximate |
Boolean | True |
Approximate toggles between exact KNN and approximate KNN (with HNSW) |
reRanker |
String | null |
Method to use for reranking results |
imageDownloadHeaders |
Dict | {} |
Headers for the image download. Can be used to authenticate the images for download. |
context |
Dict | null |
Dictionary of "tensor":{List[{"vector": List[floats], "weight": (float)}]} to bring your own vectors into search. |
scoreModifiers |
Dict | null |
A dictionary to modify the score based on field values. Check here for examples. |
modelAuth |
Dict | null |
Authorisation details used by Marqo to download non-publicly available models. Check here for examples. |
textQueryPrefix |
String | null |
The prefix added to text queries when embedding. This field overrides the textQueryPrefix set in the index settings during index creation. If it unset by the user, it defaults to the prefixes defined in the index settings. For more information on default values for index settings, see create_index. |
Note on Attributes to Retrieve per Query
It is beneficial to explicitly set the attributesToRetrieve
parameter to limit the amount of data Marqo returns per document. Latency will increase as the number of attributes and documents retrieved increases. If you have documents with many fields that are not used by systems interfacing with Marqo's results, setting attributesToRetrieve
to the minimal set of fields required can reduce latency and improve throughput.
Query parameters
Search Parameter | Type | Default value | Description |
---|---|---|---|
device |
String | null |
The device used to search. If device is not specified and CUDA devices are available to Marqo (see here for more info), Marqo will speed up search by using an available CUDA device. Otherwise, the CPU will be used. Options include cpu and cuda , cuda1 , cuda2 etc. The cuda option tells Marqo to use any available cuda devices. |
telemetry |
Boolean | False |
If true, the telemetry object is returned in the search response body. This includes information like latency metrics. This is set at client instantiation time in the Python client: mq = marqo.Client(return_telemetry=True) |
Search result pagination
Use parameters limit
and offset
to paginate your results, meaning to query a certain number of results at a time
instead of all at once.
The limit
parameter sets the size of a page. If you set limit
to 10
, Marqo's response will contain a maximum of 10
search results. The offset
parameter skips a number of search results. If you set offset
to 20
, Marqo's response
will skip the first 20 search results.
Let's say you want each page to have 10 results, and you want to receive the 2nd page. Try setting limit
and offset
like so:
# Specify page properties
page_size = 10
page_num = 2
# Set limit and offset accordingly
limit = page_size
offset = (page_num - 1) * page_size
Pagination limitations
Search results can only be 10,000 results deep. This means limit + offset must be less than or equal to 10000. Also, efSearch must be greater than limit+offset.
Using pagination with search_method="TENSOR"
may result in some results being skipped or duplicated (often near the
edge of pages) within the first few pages if the page size is much smaller than the total search result count. Please
keep this in mind when looking for particular results or when result order is essential.
Lexical search: exact matches
Use searchMethod="LEXICAL"
to perform keyword search instead of tensor search. With lexical search, you can enable
exact match searching using double quotes: ""
.
Any term enclosed in ""
will be labeled a required term
, which must exist in at least one field of every result hit.
Note that terms enclosed in double quotes must also have a space between them and the terms before and after them, same
as regular terms. Use this feature to filter your results to only documents containing certain terms. For example, if
you want to search for results containing fruits, vegetables, or candy, but they must be green, you can construct your
query as such:
mq.index("my-first-index").search(
q='fruit vegetable candy "green"',
search_method="LEXICAL"
)
If you want to escape the double quotes (interpret them as text), use 2 escape keys \\
. For
example: q = 'Dwayne \\"The Rock\\" Johnson'
.
Note: syntax errors
If your use of ""
does not follow proper syntax, the entire query will simply be interpreted literally, with no
required terms. Here some examples of syntax errors:
# Quoted terms without spaces before/after
q = 'apples"oranges" bananas'
q = 'cucumbers "melons and watermelons""grapefruit"'
# Unescaped quotes
q = 'There is a quote right"here'
# Unbalanced quotes
q = '"Dr. Seuss" "Thing 1" "Thing 2'
Response
Name | Type | Description |
---|---|---|
hits |
Array of objects | Results of the query |
limit |
Integer | Number of documents chunks specified in the query |
offset |
Integer | Number of skipped results specified in the query |
processingTimeMs |
Number | Processing time of the query |
query |
String | Query originating the response |
Example
cURL -XPOST 'http://localhost:8882/indexes/my-first-index/search' -H 'Content-type:application/json' -d '
{
"q": "what is the best outfit to wear on the moon?",
"limit": 10,
"offset": 0,
"showHighlights": true,
"searchMethod": "TENSOR",
"attributesToRetrieve": ["Title", "Description"]
}'
mq.index("my-first-index").search(
q="What is the best outfit to wear on the moon?",
limit=10,
offset=0,
show_highlights=True,
search_method="LEXICAL",
attributes_to_retrieve=["Title", "Description"]
)
For Marqo Cloud, you will need to access the endpoint of your index and replace your_endpoint
with this. To do this, visit Find Your Endpoint. You will also need your API Key. To obtain this key visit Find Your API Key.
cURL -XPOST 'your_endpoint/indexes/my-first-index/search' \
-H 'x-api-key: XXXXXXXXXXXXX' \
-H 'Content-type:application/json' -d '
{
"q": "what is the best outfit to wear on the moon?",
"limit": 10,
"offset": 0,
"showHighlights": true,
"searchMethod": "TENSOR",
"attributesToRetrieve": ["Title", "Description"]
}'
mq.index("my-first-index").search(
q="What is the best outfit to wear on the moon?",
limit=10,
offset=0,
show_highlights=True,
search_method="LEXICAL",
attributes_to_retrieve=["Title", "Description"]
)
Response: 200 Ok
{
"hits": [
{
"Title": "Extravehicular Mobility Unit (EMU)",
"Description": "The EMU is a spacesuit that provides environmental protection, mobility, life support, and communications for astronauts",
"_highlights": [
{
"Description": "The EMU is a spacesuit that provides environmental protection, mobility, life support, and communications for astronauts"
}
],
"_id": "article_591",
"_score": 1.2387788
},
{
"Title": "The Travels of Marco Polo",
"Description": "A 13th-century travelogue describing Polo's travels",
"_highlights": [
{
"Title": "The Travels of Marco Polo"
}
],
"_id": "e00d1a8d-894c-41a1-8e3b-d8b2a8fce12a",
"_score": 1.2047464
}
],
"limit": 10,
"offset": 0,
"processingTimeMs": 49,
"query": "What is the best outfit to wear on the moon?"
}
Query (q)
Parameter: q
Expected value: Search string, a dictionary of weighted search strings. Optional for tensor search if context parameter is used.
Search strings are text or a url to an image, if the index has treatUrlsAndPointersAsImages
set to True.
If queries are weighted, each weight act as a (possibly negative) multiplier for that query, relative to the other queries.
If your search method is TENSOR
, this parameter is optional if you are using the context
parameter. At least one
of q
or context
must be specified for this search.
If you are using a custom vector you can also specify a dictionary of the form {'customVector': {'vector': [0.1,...,0], 'content': 'some string'}}
.
Default value: null
Examples
# query string:
q = "How do I keep my plant alive?"
# a dictionary of weighted query strings
q = {
# a weighting of 1 gives this query a neutral effect:
"Which dogs are the best pets": 1.0,
# we give this a weighting of 2 because we really want results similar to this:
"https://image_of_a_golden_retriever.png": 2.0,
# we give this a negative weighting to make it less likely to appear:
"Poodle": -1
}
# providing a custom vector for tensor search
q = {
"customVector" : {"vector": [0.1]*512}
}
# providing a custom vector and content for hybrid search
q = {
# providing a custom vector and content if using hybrid search with a custom vector
"customVector" : {"vector": [0.1]*512, "content": "some content that matches the vector"}
}
Limit
Parameter: limit
Expected value: Any positive integer
Default value: 10
Max: 1000
Sets the maximum number of documents returned by a single query.
Offset
Parameter: offset
Expected value: Any integer greater than or equal to 0
Default value: 0
Max: 10000
Sets the number of documents to skip. For example, if offset = 20
, The first result returned will be the 21st result.
Only set this parameter for single-field searches (multi-field support to follow).
Filter
Parameter: filter
Expected value: A filter string written in Marqo's query DSL.
Default value: null
Uses filter expressions to refine search results.
Read our guide on filtering, faceted search and filter expressions.
Example
You can write a filter expression in string syntax using logical connectives (see filtering in Marqo):
"(type:confectionary AND food:(ice cream)) OR animal:hippo"
Searchable attributes
Parameter: searchableAttributes
Expected value: An array strings
Default value: null
Configures which attributes will be searched for query matches. This field is only supported in structured indexes.
If no value is specified, all fields will be searched.
Example
You can write the searchableAttributes as a list of strings, for example if you only wanted to search the "Description" field of your documents:
["Description"]
Reranker
Parameter: reRanker
Expected value: One of "owl/ViT-B/32"
, "owl/ViT-B/16"
, "owl/ViT-L/14"
Default value: null
Selects the method for reranking results. See the Models reference reranking section for more details.
If no value is specified, reRanker
will be set to null
and no reranking will occur.
Example
You can write reRanker as a string, for example:
"owl/ViT-B/32"
Context
Parameter: context
Expected value: Dictionary of "tensor":{List[{"vector": List[floats], "weight": (float)}]}
Default value: null
Context allows you to use your own vectors as context for your queries. Your vectors will be incorporated into the query using a weighted sum approach, allowing you to reduce the number of inference requests for duplicated content. The dimension of the provided vectors should be consistent with the index dimension.
Example
mq.index("my-first-index").search(
q={"Chocolate chip cookies": 1},
# the dimension of the vector (which is 768 here) should match the dimension of the index
context={"tensor": [{"vector": [0.3, ] * 768, "weight": 2}, # custom vector 1
{"vector": [0.12, ] * 768, "weight": -1}, ] # custom vector 2
}
)
Score modifiers
Parameter: scoreModifiers
Expected value: An object with two optional keys: multiply_score_by
and add_to_score
. The value of each of these
keys is an array of objects that each contain the name of a numeric field in the document as the field_name
key and
the weighting that should be applied to the numeric value, as the weight
key, if it is found in the doc. If the score modifier
field in the document is a map, access the subfield value using dot notation.
Default value: null
Score modifiers allows you to modify the initial score of the document by multiplying, and adding to, the initial search with values found within the document itself. This allows you to modify the search results based on metadata not included in the vectors.
The default weight
value is 1
in the multiply_score_by
object and 0
in the add_to_score
object.
The multiply_score_by
modifiers will be applied to the document's score before the add_to_score
modifiers. If a
field specified in the score modification objects isn't found in the document, then the score modification will be
skipped for that document's field.
For map score modifiers, avoid retrieving the score modifier fields in the query if they are not necessary for retrieval. For more information, see attributesToRetrieve.
There is negligible performance impact in performing queries with 1000 score modifiers against large dictionaries of upwards of 15,000 score modifiers per document.
Example
mq.index("my-first-index").add_documents(
documents=[
{
"productImage": "https://my-images.com/cool-tshirt-1.png",
"itemPopularity": 2.1,
"negativeReviewCount": 4
}],
tensor_fields=['productImage']
)
mq.index("my-first-index").search(
q="T-shirts with a cartoon character",
score_modifiers={
"multiply_score_by": [{"field_name": "itemPopularity", "weight": 1.8}],
"add_to_score": [{"field_name": "negativeReviewCount", "weight": -0.1}]
}
)
# if the initial score of the search query against this document is 0.67, then, after applying score modifiers,
# it will be modifed to 0.67 * (1.8 * 2.1) + (-0.1 * 4) = 2.13
Example Using Map Score Modifiers
docs = [
{"_id": "1", "text_field": "a photo of a cat", "map_score_mods": {"a": 0.5}},
{"_id": "2", "text_field": "a photo of a dog", "map_score_mods": {"b": 0.5}},
{"_id": "3", "text_field": "a photo of a cat", "map_score_mods": {"c": 0.5}},
{"_id": "4", "text_field": "a photo of a cat", "map_score_mods_int": {"a": 1}},
{"_id": "5", "text_field": "a photo of a cat", "map_score_mods_int": {"b": 1}},
{"_id": "6", "text_field": "a photo of a cat", "map_score_mods_int": {"c": 1}},
{"_id": "7", "text_field": "a photo of a cat", "map_score_mods_int": {"c": 1}, "map_score_mods": {"a": 0.5}},
{"_id": "8", "text_field": "a photo of a dog", "my_int": 2},
]
res = mq.index("my-unstructured-index").add_documents(
documents=docs,
tensor_fields=["text_field"],
)
# The same search sytax is used for both structured and unstructured indexes
res = mq.index("map-score-modifiers-index").search(
q="",
score_modifiers={
"add_to_score": [{"field_name": "map_score_mods_int.c", "weight": 2}],
"multiply_score_by": [{"field_name": "map_score_mods.a", "weight": 4}]
},
attributes_to_retrieve=["_id", "text_field"]
)
print(json.dumps(res, indent=2))
Model Auth
Parameter: modelAuth
Expected value: Dictionary with either an s3
or an hf
model store authorisation object.
Default value: null
The ModelAuth
object allows searching on indexes that use OpenCLIP and CLIP models from private Hugging Face and AWS
S3 stores.
The modelAuth
object contains either an s3
or an hf
model store authorisation object. The model store
authorisation object contains credentials needed to access the index's non publicly accessible model. See the example
for details.
The index's settings must specify the non publicly accessible model's location in the setting's modelProperties
object.
ModelAuth
is used to initially download the model. After downloading, Marqo caches the model so that it doesn't need
to be redownloaded.
Example: AWS s3
# Create an index that specifies the non-public location of the model.
# Note the `auth_required` field in `modelProperties` which tells Marqo to use
# the modelAuth it finds during search to download the model
mq.create_index(
index_name="my-cool-index",
settings_dict={
"treatUrlsAndPointersAsImages": True,
"model": 'my_s3_model',
"normalizeEmbeddings": True,
"modelProperties": {
"name": "ViT-B/32",
"dimensions": 512,
"model_location": {
"s3": {
"Bucket": "<SOME BUCKET>",
"Key": "<KEY TO IDENTIFY MODEL>",
},
"auth_required": True
},
"type": "open_clip",
}
}
)
# Specify the authorisation needed to access the private model during search:
# We recommend setting up the credential's AWS user so that it has minimal
# accesses needed to retrieve the model
mq.index("my-cool-index").search(
q="Chocolate chip cookies",
model_auth={
's3': {
"aws_access_key_id": "<SOME ACCESS KEY ID>",
"aws_secret_access_key": "<SOME SECRET ACCESS KEY>"
}
}
)
Example: Hugging Face (HF)
# Create an index that specifies the non-public location of the model.
# Note the `auth_required` field in `modelProperties` which tells Marqo to use
# the modelAuth it finds during search to download the model
mq.create_index(
index_name="my-cool-index",
settings_dict={
"treatUrlsAndPointersAsImages": True,
"model": 'my_hf_model',
"normalizeEmbeddings": True,
"modelProperties": {
"name": "ViT-B/32",
"dimensions": 512,
"model_location": {
"hf": {
"repo_id": "<SOME HF REPO NAME>",
"filename": "<THE FILENAME TO DOWNLOAD>",
},
"auth_required": True
},
"type": "open_clip",
}
}
)
# specify the authorisation needed to access the private model during search:
mq.index("my-cool-index").search(
q="Chocolate chip cookies",
model_auth={
'hf': {
"token": "<SOME HF TOKEN>",
}
}
)
Query Prefixes
Parameters: textQueryPrefix
Expected value: A string.
Default value: ""
This field overrides the text query prefix set during the index's creation.
Note: Users do not need to provide textQueryPrefix for e5
models unless you want to override our default prefixes.
Example: Adding prefixes to search queries. Overriding index defaults
cURL -XPOST 'http://localhost:8882/indexes/{index_name}/search' \
-H 'Content-type:application/json' -d '
{
"q": "Men shoes brown",
"textQueryPrefix": "override query: "
}'
mq.index("{index_name}").search(
q="Men shoes brown", text_query_prefix="override query: "
)
For Marqo Cloud, you will need to access the endpoint of your index and replace your_endpoint
with this. To do this, visit Find Your Endpoint. You will also need your API Key. To obtain this key visit Find Your API Key.
cURL -XPOST 'your_endpoint/indexes/my-first-index/search' \
-H 'x-api-key: XXXXXXXXXXXXX' \
-H 'Content-type:application/json' -d '
{
"q": "Men shoes brown",
"textQueryPrefix": "override query: "
}'
mq.index("{index_name}").search(
q="Men shoes brown", text_query_prefix="override query: "
)
Hybrid parameters
Parameters: hybridParameters
Expected value: A Dictionary with parameters for hybrid search.
Default value: null
Hybrid parameter | Type | Default | Description |
---|---|---|---|
retrievalMethod |
String | "disjunction" |
The method used for first stage retrieval. Can be "lexical" "tensor" or "disjunction" to use both lexical and tensor in the first stage. |
rankingMethod |
String | "rrf" |
The method used for second stage retrieval. Can be "lexical" "tensor" or "rrf" for reciprocal rank fusion. You must use rrf if you specify disjunction for retrieval_method . |
searchableAttributesLexical |
Array of strings | null |
Attributes which are used for the lexical search. |
searchableAttributesTensor |
Array of strings | null |
Attributes which are used for the tensor search. |
scoreModifiersTensor |
Dict | null |
Score modifiers for tensor component of the query. Modifies the score based on field values. Check here for more details. |
scoreModifiersLexical |
Dict | null |
Score modifiers for lexical component of the query. Modifies the score based on field values. Check here for more details. |
alpha |
Float | 0.5 |
The linear weight of the tensor RRF score. A score of 1 would be 100% contribution from tensor component, and a score of 0 would be a 100% contribution from the lexical component. |
rrfK |
Integer | 60 |
Smoothing factor for RRF. The higher rrfK , the lower the contribution of RRF to the ranking. |
Example 1: Hybrid search with a structured index
cURL -XPOST 'http://localhost:8882/indexes/my-hybrid-structured-index' \
-H "Content-Type: application/json" \
-d '{
"model": "hf/e5-base-v2",
"type": "structured",
"allFields": [
{"name": "title", "type": "text", "features": ["lexical_search"]},
{"name": "description", "type": "text", "features": ["lexical_search", "filter"]},
{"name": "time_added_epoch", "type": "float", "features": ["score_modifier"]}
],
"tensorFields": ["title", "description"]
}'
cURL -XPOST 'http://localhost:8882/indexes/my-hybrid-structured-index/documents' \
-H 'Content-type:application/json' -d '
{
"documents": [
{
"title": "brown shoes",
"description": "Mens brown shoes with laces",
"time_added_epoch": 1421423142,
"_id": "4231042142"
},
{
"title": "red shirt",
"description": "A red shirt with buttons",
"time_added_epoch": 1421499942,
"_id": "8988998589"
}
]
}'
cURL -XPOST 'http://localhost:8882/indexes/my-hybrid-structured-index/search' \
-H 'Content-type:application/json' -d '
{
"q": "shirt that is red",
"searchMethod": "HYBRID",
"hybridParameters": {
"retrievalMethod": "disjunction",
"rankingMethod": "rrf",
"alpha": 0.3,
"rrfK": 60,
"searchableAttributesLexical": ["description"],
"searchableAttributesTensor": ["description"],
"scoreModifiersTensor": { "add_to_score": [{"field_name": "epoch_timestamp", "weight": 0.01}] },
"scoreModifiersLexical": { "add_to_score": [{"field_name": "epoch_timestamp", "weight": 0.01}] }
}
}'
import marqo
mq = marqo.Client("http://localhost:8882", api_key=None)
mq.create_index(
index_name="my-hybrid-structured-index",
type="structured",
model="hf/e5-base-v2",
# field types can be found here: https://docs.marqo.ai/latest/reference/api/indexes/create-structured-index/#fields
all_fields=[
{"name": "title", "type": "text", "features": ["lexical_search"]},
{
"name": "description",
"type": "text",
"features": ["lexical_search", "filter"],
},
{"name": "time_added_epoch", "type": "float", "features": ["score_modifier"]},
],
tensor_fields=["title", "description"],
)
mq.index("my-hybrid-structured-index").add_documents(
[
{
"title": "brown shoes",
"description": "Mens brown shoes with laces",
"time_added_epoch": 1421423142,
"_id": "4231042142",
},
{
"title": "red shirt",
"description": "A red shirt with buttons",
"time_added_epoch": 1421499942,
"_id": "8988998589",
},
]
)
# hybrid search with lexical and tensor search, using score modifiers
mq.index("my-hybrid-structured-index").search(
q="shirt that is red",
search_method="HYBRID",
hybrid_parameters={
"retrievalMethod": "disjunction",
"rankingMethod": "rrf",
"alpha": 0.3,
"rrfK": 60,
"searchableAttributesLexical": ["description"],
"searchableAttributesTensor": ["description"],
"scoreModifiersTensor": {
"add_to_score": [{"field_name": "time_added_epoch", "weight": 0.001}]
},
"scoreModifiersLexical": {
"add_to_score": [{"field_name": "time_added_epoch", "weight": 0.001}]
},
},
)
For Marqo Cloud, you will need to access the endpoint of your index and replace your_endpoint
with this. To do this, visit Find Your Endpoint. You will also need your API Key. To obtain this key visit Find Your API Key.
cURL -XPOST 'https://api.marqo.ai/api/v2/indexes/my-hybrid-structured-index' \
-H 'x-api-key: XXXXXXXXXXXXXXX' \
-H "Content-Type: application/json" \
-d '{
"model": "hf/e5-base-v2",
"type": "structured",
"allFields": [
{"name": "title", "type": "text", "features": ["lexical_search"]},
{"name": "description", "type": "text", "features": ["lexical_search", "filter"]},
{"name": "time_added_epoch", "type": "float", "features": ["score_modifier"]}
],
"tensorFields": ["title", "description"]
}'
cURL -XPOST 'your_endpoint/indexes/my-hybrid-structured-index/documents' \
-H 'x-api-key: XXXXXXXXXXXXXXX' \
-H 'Content-type:application/json' -d '
{
"documents": [
{
"title": "brown shoes",
"description": "Mens brown shoes with laces",
"time_added_epoch": 1421423142,
"_id": "4231042142"
},
{
"title": "red shirt",
"description": "A red shirt with buttons",
"time_added_epoch": 1421499942,
"_id": "8988998589"
}
]
}'
cURL -XPOST 'your_endpoint/indexes/my-hybrid-structured-index/search' \
-H 'x-api-key: XXXXXXXXXXXXXXX' \
-H 'Content-type:application/json' -d '
{
"q": "shirt that is red",
"searchMethod": "HYBRID",
"hybridParameters": {
"retrievalMethod": "disjunction",
"rankingMethod": "rrf",
"alpha": 0.3,
"rrfK": 60,
"searchableAttributesLexical": ["description"],
"searchableAttributesTensor": ["description"],
"scoreModifiersTensor": { "add_to_score": [{"field_name": "epoch_timestamp", "weight": 0.01}] },
"scoreModifiersLexical": { "add_to_score": [{"field_name": "epoch_timestamp", "weight": 0.01}] }
}
}'
import marqo
mq = marqo.Client("https://api.marqo.ai", api_key="XXXXXXXXXXXXXXX")
mq.create_index(
index_name="my-hybrid-structured-index",
type="structured",
model="hf/e5-base-v2",
# field types can be found here: https://docs.marqo.ai/latest/reference/api/indexes/create-structured-index/#fields
all_fields=[
{"name": "title", "type": "text", "features": ["lexical_search"]},
{
"name": "description",
"type": "text",
"features": ["lexical_search", "filter"],
},
{"name": "time_added_epoch", "type": "float", "features": ["score_modifier"]},
],
tensor_fields=["title", "description"],
)
mq.index("my-hybrid-structured-index").add_documents(
[
{
"title": "brown shoes",
"description": "Mens brown shoes with laces",
"time_added_epoch": 1421423142,
"_id": "4231042142",
},
{
"title": "red shirt",
"description": "A red shirt with buttons",
"time_added_epoch": 1421499942,
"_id": "8988998589",
},
]
)
# hybrid search with lexical and tensor search, using score modifiers
mq.index("my-hybrid-structured-index").search(
q="shirt that is red",
search_method="HYBRID",
hybrid_parameters={
"retrievalMethod": "disjunction",
"rankingMethod": "rrf",
"alpha": 0.3,
"rrfK": 60,
"searchableAttributesLexical": ["description"],
"searchableAttributesTensor": ["description"],
"scoreModifiersTensor": {
"add_to_score": [{"field_name": "time_added_epoch", "weight": 0.001}]
},
"scoreModifiersLexical": {
"add_to_score": [{"field_name": "time_added_epoch", "weight": 0.001}]
},
},
)
Example 2: Creating and searching an unstructured index, hybrid search with model deployed within Marqo
cURL -XPOST 'http://localhost:8882/indexes/my-hybrid-index' \
-H "Content-Type: application/json" \
-d '{
"model": "hf/e5-base-v2",
"type": "unstructured"
}'
cURL -XPOST 'http://localhost:8882/indexes/my-hybrid-index/documents' \
-H 'Content-type:application/json' -d '
{
"documents": [
{
"title": "brown shoes",
"description": "Mens brown shoes with laces",
"time_added_epoch": 1421423142,
"_id": "4231042142"
},
{
"title": "red shirt",
"description": "A red shirt with buttons",
"time_added_epoch": 1421499942,
"_id": "8988998589"
}
],
"tensorFields": ["title", "description"]
}'
cURL -XPOST 'http://localhost:8882/indexes/my-hybrid-index/search' \
-H 'Content-type:application/json' -d '
{
"q": "Men shoes brown",
"searchMethod": "HYBRID",
"hybridParameters": {
"retrievalMethod": "disjunction",
"rankingMethod": "rrf",
"alpha": 0.3,
"rrfK": 10,
"scoreModifiersTensor": { "add_to_score": [{"field_name": "epoch_timestamp", "weight": 0.01}] },
"scoreModifiersLexical": { "add_to_score": [{"field_name": "epoch_timestamp", "weight": 0.01}] }
}
}'
import marqo
mq = marqo.Client("http://localhost:8882", api_key=None)
mq.create_index(
index_name="my-hybrid-index", type="unstructured", model="hf/e5-base-v2"
)
mq.index("my-hybrid-index").add_documents(
[
{
"title": "brown shoes",
"description": "Mens brown shoes with laces",
"time_added_epoch": 1421423142,
"_id": "4231042142",
},
{
"title": "red shirt",
"description": "A red shirt with buttons",
"time_added_epoch": 1421499942,
"_id": "8988998589",
},
],
tensor_fields=["title", "description"],
)
# hybrid search with lexical and tensor search, using score modifiers
mq.index("my-hybrid-index").search(
q="my query",
search_method="HYBRID",
hybrid_parameters={
"retrievalMethod": "disjunction",
"rankingMethod": "rrf",
"alpha": 0.3,
"rrfK": 10,
"scoreModifiersTensor": {
"add_to_score": [{"field_name": "time_added_epoch", "weight": 0.001}]
},
"scoreModifiersLexical": {
"add_to_score": [{"field_name": "time_added_epoch", "weight": 0.001}]
},
},
)
For Marqo Cloud, you will need to access the endpoint of your index and replace your_endpoint
with this. To do this, visit Find Your Endpoint. You will also need your API Key. To obtain this key visit Find Your API Key.
cURL -XPOST 'https://api.marqo.ai/api/v2/indexes/my-hybrid-index' \
-H 'x-api-key: XXXXXXXXXXXXXXX' \
-H "Content-Type: application/json" \
-d '{
"model": "hf/e5-base-v2",
"type": "unstructured"
}'
cURL -XPOST 'your_endpoint/indexes/my-hybrid-index/documents' \
-H 'x-api-key: XXXXXXXXXXXXXXX' \
-H 'Content-type:application/json' -d '
{
"documents": [
{
"title": "brown shoes",
"description": "Mens brown shoes with laces",
"time_added_epoch": 1421423142,
"_id": "4231042142"
},
{
"title": "red shirt",
"description": "A red shirt with buttons",
"time_added_epoch": 1421499942,
"_id": "8988998589"
}
],
"tensorFields": ["title", "description"]
}'
cURL -XPOST 'your_endpoint/indexes/my-hybrid-index/search' \
-H 'x-api-key: XXXXXXXXXXXXXXX' \
-H 'Content-type:application/json' -d '
{
"q": "Men shoes brown",
"searchMethod": "HYBRID",
"hybridParameters": {
"retrievalMethod": "disjunction",
"rankingMethod": "rrf",
"alpha": 0.3,
"rrfK": 10,
"scoreModifiersTensor": { "add_to_score": [{"field_name": "epoch_timestamp", "weight": 0.01}] },
"scoreModifiersLexical": { "add_to_score": [{"field_name": "epoch_timestamp", "weight": 0.01}] }
}
}'
import marqo
mq = marqo.Client("https://api.marqo.ai", api_key="XXXXXXXXXXXXXXX")
mq.create_index(
index_name="my-hybrid-index", type="unstructured", model="hf/e5-base-v2"
)
mq.index("my-hybrid-index").add_documents(
[
{
"title": "brown shoes",
"description": "Mens brown shoes with laces",
"time_added_epoch": 1421423142,
"_id": "4231042142",
},
{
"title": "red shirt",
"description": "A red shirt with buttons",
"time_added_epoch": 1421499942,
"_id": "8988998589",
},
],
tensor_fields=["title", "description"],
)
# hybrid search with lexical and tensor search, using score modifiers
mq.index("my-hybrid-index").search(
q="my query",
search_method="HYBRID",
hybrid_parameters={
"retrievalMethod": "disjunction",
"rankingMethod": "rrf",
"alpha": 0.3,
"rrfK": 10,
"scoreModifiersTensor": {
"add_to_score": [{"field_name": "time_added_epoch", "weight": 0.001}]
},
"scoreModifiersLexical": {
"add_to_score": [{"field_name": "time_added_epoch", "weight": 0.001}]
},
},
)
Example 3: Creating a hybrid index with no model, hybrid search using custom vectors
cURL -X POST 'http://localhost:8882/indexes/my-hybrid-structured-index' \
-H "Content-Type: application/json" \
-d '{
"model": "no_model",
"modelProperties": {
"type": "no_model",
"dimensions": 3072
},
"type": "structured",
"allFields": [
{"name": "title", "type": "custom_vector", "features": ["lexical_search"]},
{"name": "description", "type": "text", "features": ["lexical_search", "filter"]},
{"name": "time_added_epoch", "type": "int", "features": ["score_modifier"]}
],
"tensorFields": ["title"]
}'
cURL -XPOST 'http://localhost:8882/indexes/my-hybrid-structured-index/documents' \
-H 'Content-type:application/json' -d '
{
"documents": [
{
"title": {"vector": <replace with your custom 3072 dim vector>, "content": "brown shoes"},
"description": "Mens brown shoes with laces",
"time_added_epoch": 1421423142,
"_id": "4231042142"
},
{
"title": {"vector": <replace with your custom 3072 dim vector>, "content": "red shirt"},
"description": "A red shirt with buttons",
"time_added_epoch": 1421499942,
"_id": "8988998589"
}
]
}'
cURL -XPOST 'http://localhost:8882/indexes/my-hybrid-structured-index/search' \
-H 'Content-type:application/json' -d '
{
"q": {"customVector": {"vector": <replace with your custom 3072 dim vector>, "content": "Men shoes brown"}},
"searchMethod": "HYBRID",
"hybridParameters": {
"retrievalMethod": "disjunction",
"rankingMethod": "rrf",
"alpha": 0.3,
"rrfK": 60,
"searchableAttributesLexical": ["title"],
"searchableAttributesTensor": ["title"],
"scoreModifiersTensor": { "add_to_score": [{"field_name": "time_added_epoch", "weight": 0.001}] },
"scoreModifiersLexical": { "add_to_score": [{"field_name": "time_added_epoch", "weight": 0.001}] }
}
}'
import marqo
mq = marqo.Client("http://localhost:8882", api_key=None)
mq.create_index(
index_name="my-hybrid-structured-index",
type="structured",
model="no_model",
model_properties={"type": "no_model", "dimensions": 3072},
# field types can be found here: https://docs.marqo.ai/latest/reference/api/indexes/create-structured-index/#fields
all_fields=[
{"name": "title", "type": "text", "features": ["lexical_search"]},
{
"name": "description",
"type": "text",
"features": ["lexical_search", "filter"],
},
{"name": "epoch_timestamp", "type": "float", "features": ["score_modifier"]},
],
tensor_fields=["title"],
)
mq.index("my-hybrid-structured-index").add_documents(
[
{
"title": {"vector": [0.1] * 3072, "content": "brown shoes"},
"description": "Mens brown shoes with laces",
"time_added_epoch": 1421423142,
"_id": "4231042142",
},
{
"title": {"vector": [0.1] * 3072, "content": "red shirt"},
"description": "A red shirt with buttons",
"time_added_epoch": 1421499942,
"_id": "8988998589",
},
]
)
# hybrid search with a custom vector and score modifiers
mq.index("my-hybrid-structured-index").search(
q={"customVector": {"content": "brown mens shoes", "vector": [0.1] * 3072}},
search_method="HYBRID",
hybrid_parameters={
"retrievalMethod": "disjunction",
"rankingMethod": "rrf",
"alpha": 0.3,
"rrfK": 60,
"searchableAttributesLexical": ["title"],
"searchableAttributesTensor": ["title"],
"scoreModifiersTensor": {
"add_to_score": [{"field_name": "time_added_epoch", "weight": 0.001}]
},
"scoreModifiersLexical": {
"add_to_score": [{"field_name": "time_added_epoch", "weight": 0.001}]
},
},
)
For Marqo Cloud, you will need to access the endpoint of your index and replace your_endpoint
with this. To do this, visit Find Your Endpoint. You will also need your API Key. To obtain this key visit Find Your API Key.
cURL -X POST 'https://api.marqo.ai/api/v2/indexes/my-hybrid-structured-index' \
-H 'x-api-key: XXXXXXXXXXXXXXX' \
-H "Content-Type: application/json" \
-d '{
"model": "no_model",
"modelProperties": {
"type": "no_model",
"dimensions": 3072
},
"type": "structured",
"allFields": [
{"name": "title", "type": "custom_vector", "features": ["lexical_search"]},
{"name": "description", "type": "text", "features": ["lexical_search", "filter"]},
{"name": "time_added_epoch", "type": "int", "features": ["score_modifier"]}
],
"tensorFields": ["title"]
}'
cURL -XPOST 'your_endpoint/indexes/my-hybrid-structured-index/documents' \
-H 'x-api-key: XXXXXXXXXXXXXXX' \
-H 'Content-type:application/json' -d '
{
"documents": [
{
"title": {"vector": <replace with your custom 3072 dim vector>, "content": "brown shoes"},
"description": "Mens brown shoes with laces",
"time_added_epoch": 1421423142,
"_id": "4231042142"
},
{
"title": {"vector": <replace with your custom 3072 dim vector>, "content": "red shirt"},
"description": "A red shirt with buttons",
"time_added_epoch": 1421499942,
"_id": "8988998589"
}
]
}'
cURL -XPOST 'your_endpoint/indexes/my-hybrid-structured-index/search' \
-H 'x-api-key: XXXXXXXXXXXXXXX' \
-H 'Content-type:application/json' -d '
{
"q": {"customVector": {"vector": <replace with your custom 3072 dim vector>, "content": "Men shoes brown"}},
"searchMethod": "HYBRID",
"hybridParameters": {
"retrievalMethod": "disjunction",
"rankingMethod": "rrf",
"alpha": 0.3,
"rrfK": 60,
"searchableAttributesLexical": ["title"],
"searchableAttributesTensor": ["title"],
"scoreModifiersTensor": { "add_to_score": [{"field_name": "time_added_epoch", "weight": 0.001}] },
"scoreModifiersLexical": { "add_to_score": [{"field_name": "time_added_epoch", "weight": 0.001}] }
}
}'
import marqo
mq = marqo.Client("https://api.marqo.ai", api_key="XXXXXXXXXXXXXXX")
mq.create_index(
index_name="my-hybrid-structured-index",
type="structured",
model="no_model",
model_properties={"type": "no_model", "dimensions": 3072},
# field types can be found here: https://docs.marqo.ai/latest/reference/api/indexes/create-structured-index/#fields
all_fields=[
{"name": "title", "type": "text", "features": ["lexical_search"]},
{
"name": "description",
"type": "text",
"features": ["lexical_search", "filter"],
},
{"name": "epoch_timestamp", "type": "float", "features": ["score_modifier"]},
],
tensor_fields=["title"],
)
mq.index("my-hybrid-structured-index").add_documents(
[
{
"title": {"vector": [0.1] * 3072, "content": "brown shoes"},
"description": "Mens brown shoes with laces",
"time_added_epoch": 1421423142,
"_id": "4231042142",
},
{
"title": {"vector": [0.1] * 3072, "content": "red shirt"},
"description": "A red shirt with buttons",
"time_added_epoch": 1421499942,
"_id": "8988998589",
},
]
)
# hybrid search with a custom vector and score modifiers
mq.index("my-hybrid-structured-index").search(
q={"customVector": {"content": "brown mens shoes", "vector": [0.1] * 3072}},
search_method="HYBRID",
hybrid_parameters={
"retrievalMethod": "disjunction",
"rankingMethod": "rrf",
"alpha": 0.3,
"rrfK": 60,
"searchableAttributesLexical": ["title"],
"searchableAttributesTensor": ["title"],
"scoreModifiersTensor": {
"add_to_score": [{"field_name": "time_added_epoch", "weight": 0.001}]
},
"scoreModifiersLexical": {
"add_to_score": [{"field_name": "time_added_epoch", "weight": 0.001}]
},
},
)