Unstructured Vs Structured Indexes
Marqo lets you utilise both structured and unstructured indexes. While much of the functionality is shared between these two index types there are some key differences which influence the decision of which to use.
Definition
- New Unstructured Index: In Marqo 2.13.0, we introduced a significant improvement for unstructured indexes to support searchable attributes for both tensor and lexical search. New tensor and lexical fields are dynamically added to the index settings during indexing. Therefore, unstructured indexes created with Marqo 2.13.0 and later behave similarly to structured indexes in terms of search functionality.
- Legacy Unstructured Index: Unstructured indexes created prior to Marqo 2.13.0 remain unchanged and do not support searchable attributes.
- Structured Index: When creating a structured index, you must define all the fields. The field definition, including name, type, and features, is immutable throughout the lifecycle of the structured index.
Key Differences
- Searchable Attributes: Both structured indexes and new unstructured indexes allow you to specify which attributes to search at query time. Legacy unstructured indexes cannot do this and search all attributes by default.
- HNSW Behaviour: Structured indexes and new unstructured indexes place each tensor field in its own HNSW graph, whereas legacy unstructured indexes use a single HNSW graph for all tensor fields. This distinction means that searchable attributes offer additional performance benefits and enable targeted searches within specific fields.
- Lexical Search: Structured indexes allow you to specify which fields are available for lexical search, while unstructured indexes treat all text fields as lexical search fields.
- Mutability: Structured indexes have a fixed schema that cannot be changed once created. The schema must be a superset of the fields in each document, but a document does not have to contain all the fields defined in the schema. In contrast, unstructured indexes can have fields added at any time.
- Partial Updates: Partial updates are support for both however they are significantly faster for structured indexes. Partial updates for unstructured indexes are identical to adding the document with
useExistingTensors
set totrue
. - Filtering: Structured indexes allow you to specify which fields are filterable. Unstructured indexes will automatically make fields filterable, if a field contains text then
filterStringMaxLength
will be used to determine if it is filterable using the length of the string. - Performance: Structured indexes are faster in general, the largest performance difference is that structured indexes will consume less memory space. Partial updates to document metadata is also significantly faster for structured indexes.
- Error Handling: Structured indexes will throw an error if you try to add a document with a field that is not in the schema. Unstructured indexes will add the field to the schema and continue. The strictness of structured indexes can help catch errors early.
When to Use Unstructured Indexes
Unstructured indexes are recommended in the following situations:
- Getting Started: If you are new to Marqo and want to get started quickly, unstructured indexes are the best choice due to their ease of use.
- Dynamic Schema: If you have a dynamic schema where fields are added frequently, unstructured indexes are the best choice.
When to Use Structured Indexes
Structured indexes are recommended in the following situations:
- Performance: If you require the best maximum performance, structured indexes are the best choice. Expecially for large indexes with continuous updates to documents.
- Production/Enterprise: If you are using Marqo in a production or enterprise environment, structured indexes are often a better choice due to the strictness of the schema and the ability to catch malformed documents early.
- Advanced Usage: If you require better control over searchable attributes, lexical search, and other features, then structured indexes are the best choice.