Optimising Search
Only use one index per Marqo cluster
For production use cases we recommend only a single Marqo index per Marqo cluster. This results in more predictable resource usage and consistent search performance.
Be selective about tensor fields
Tensor embeddings are used to power Marqo's tensor search. It is possible to index any string fields as tensor fields.
There is a natural tradeoff, however, between tensor search and storage size. For certain fields, it may not be worth using tensor search, and thus, storing full embeddings for each field and document. For example, categorical fields such as a song's genre or a book's category may be represented as a string, but are mainly useful in keyword/lexical search, or as conditions in pre-filtering tensor search.
Marqo provides the ability to tune this tradeoff. When adding documents, only fields explicitly added to the tensor_fields parameter are indexed for tensor search. This selective indexing allows for a balance between the benefits of tensor search and storage size efficiency.
The best practice is to only select fields that will benefit from semantic and multimodal search as tensor fields.
For example:
import marqo
mq = marqo.Client(url="http://localhost:8882")
mq.create_index("my-first-index")
mq.index("my-first-index").add_documents(
[
{
"Title": "The Travels of Marco Polo",
"Description": "A 13th-century travelogue describing Polo's travels",
"Genre": "History",
},
{
"Title": "Extravehicular Mobility Unit (EMU)",
"Description": "The EMU is a spacesuit that provides environmental protection, "
"mobility, life support, and communications for astronauts",
"Genre": "Science",
},
],
tensor_fields=["Description"],
)
## Search all of a specific genre
result = mq.index("my-first-index").search("History", search_method="LEXICAL")
## Filter out search results
results = mq.index("my-first-index").search(
q="spacesuits", filter_string="Genre:Science"
)