Skip to content

Optimising Search

Non-tensor Fields

Tensor embeddings are used to power Marqo's tensor search. All string fields by default create tensors so that they can be used in tensor search (standard, cross and multi-modal search paradigms).

There is a natural tradeoff, however, between tensor search and storage size. For certain fields, it may not be worth using tensor search, and thus, storing full embeddings for each field and document. For example, categorical fields such as a song's genre or a book's category may be strings, but are mainly useful in keyword/lexical search, or as conditions in pre-filtering tensor search.

Marqo provides the ability to tune this tradeoff. When adding documents, fields can be designated as non-tensor fields. These fields cannot be used in tensor search, but will reduce the storage size. For example:

import marqo

mq = marqo.Client(url='http://localhost:8882')

mq.index("my-first-index").add_documents([
    {
        "Title": "The Travels of Marco Polo",
        "Description": "A 13th-century travelogue describing Polo's travels",
        "Genre": "History"
    }, 
    {
        "Title": "Extravehicular Mobility Unit (EMU)",
        "Description": "The EMU is a spacesuit that provides environmental protection, "
                        "mobility, life support, and communications for astronauts",
        "Genre": "Science"
    }], non_tensor_fields=["Genre"]
)
The above example will not store tensors against the "Genre" field, but we can still use it, for example:
## Search all of a specific genre
result = mq.index("my-first-index").search('History', searchable_attributes=['Genre'], search_method="LEXICAL")

## Filter out search results
results = mq.index("my-first-index").search(q="spacesuits", filter_string="Genre:Science")