Skip to content

Configuring Marqo

Marqo is configured through environment variables passed to the Marqo container when it is run.


Configuring usage limits

Limits can be set to protect the resources of the machine Marqo is running on.

Configuration name Default Description
MARQO_MAX_DOC_BYTES 100000 Maximum document size allowed to be indexed
MARQO_MAX_RETRIEVABLE_DOCS 10000 Maximum number of documents allowed to be returned in a single request. The maximum value this can be set to is 10000.
MARQO_MAX_CUDA_MODEL_MEMORY 4 Maximum CUDA memory usage (GB) for models in Marqo. For multi-GPU, this is the max memory for each GPU.
MARQO_MAX_CPU_MODEL_MEMORY 4 Maximum RAM usage (GB) for models in Marqo.
MARQO_MAX_VECTORISE_BATCH_SIZE 16 Maximum size of batch size to process in parallel (when, for example, adding documents ).
MARQO_MEDIA_DOWNLOAD_THREAD_COUNT_PER_REQUEST 5 Maximum number of threads to download media in parallel.
MARQO_IMAGE_DOWNLOAD_THREAD_COUNT_PER_REQUEST 20 Maximum number of threads to download images in parallel.
MARQO_MAX_SEARCH_VIDEO_AUDIO_FILE_SIZE 387973120 Maximum size of video or audio file to be searched in a single request in bytes.
MARQO_MAX_ADD_DOCS_VIDEO_AUDIO_FILE_SIZE 387973120 Maximum size of video or audio file to be added to an index in bytes.
MARQO_MAX_DOCUMENTS_BATCH_SIZE 128 Maximum number of documents that can be added or updated in a single request.
MARQO_MAX_DELETE_DOCS_COUNT 10000 Maximum number of documents that can be deleted in a single request.
VESPA_POOL_SIZE 10 The size of the connection pool for Vespa operations including search. This should be set to a value at least as large as MARQO_MAX_CONCURRENT_SEARCH.
VESPA_FEED_POOL_SIZE 10 Maximum Vespa feed concurrency per indexing batch.
VESPA_GET_POOL_SIZE 10 Maximum Vespa get concurrency per request when retrieving documents by ID.
VESPA_DELETE_POOL_SIZE 10 Maximum Vespa delete concurrency per request.
VESPA_PARTIAL_UPDATE_POOL_SIZE 10 Maximum Vespa update concurrency per request.
VESPA_SEARCH_TIMEOUT_MS 1000 Amount of time before search request to Vespa times out (milliseconds).

Example

docker run --name marqo -p 8882:8882 \
    -e "MARQO_MAX_DOC_BYTES=200000" \
    -e "MARQO_MAX_RETRIEVABLE_DOCS=600" \
    -e "MARQO_MAX_CUDA_MODEL_MEMORY=5" \
    -e "VESPA_SEARCH_TIMEOUT_MS=2000" marqoai/marqo:latest

In the above example a marqo container is being run with the following limits:

  • The max number of fields per index is capped at 400

  • The max size of an indexed document is 0.2mb

  • The max number of documents allowed to be returned in a single request is 600

  • The max CUDA memory usage for models in Marqo is 5GB.

  • The max number of replicas allowed when creating an index is 2.

  • The search timeout for Vespa is 2 seconds.

Configure backend communication

This section describes the environment variables that can be used to configure Marqo's communication with the backend. It can be helpful to set these variables when Marqo is running in a container and needs to communicate with a Vespa running on a separate container or a difference host machine.

Note: Regularly upgrade Vespa when hosting it yourself. New releases of Marqo leverage features and bug fixes introduced in the latest versions of Vespa. If you are running Marqo 2.13.0, please upgrade Vespa to version 8.396.18 or later. This helps prevent potential issues, such as long response times when adding documents to an unstructured index or unexpected behavior during Marqo upgrades. For more details or if you encounter any issues, please refer to the Troubleshooting Guide.

Configuration name Default Description
VESPA_CONFIG_URL "http://localhost:19071" URL for Vespa configuration.
VESPA_QUERY_URL "http://localhost:8080" URL for querying the Vespa instance.
VESPA_DOCUMENT_URL "http://localhost:8080" URL for document operations in the Vespa instance.
VESPA_CONTENT_CLUSTER_NAME "content_default" Name of the Vespa content cluster.
ZOOKEEPER_HOSTS null Hosts for the Zookeeper server, no "https" or "http" required in the string. If not set, Marqo will skip the connection to the Zookeeper server.

Example: Running Marqo on a standalone Vespa container

In this example, we will start a Vespa container, initialise it with an application package, and run Marqo container on that Vespa container.

Step 1: Initialize Vespa Container Environment

Start a Vespa container using the latest Vespa image. Make sure to expose the necessary ports for the config server, container server, and Zookeeper.

docker run --detach --name vespa -p 8080:8080 -p 19071:19071 -p 2181:2181 vespaengine/vespa:8

Step 2: Deploy an Application Package to Configure Vespa

Clone the Marqo repository and deploy an application package for local runs. This setup ensures that the vector store is configured correctly.

git clone https://github.com/marqo-ai/marqo.git
cd marqo/scripts/vespa_local
zip -r - * | curl --header "Content-Type:application/zip" --data-binary @- http://localhost:19071/application/v2/tenant/default/prepareandactivate

You can verify that the vector store has been set up correctly by visiting http://localhost:8080 in your browser. The vector store can take a few minutes to start responding after the initial configuration.

Step 3: Launch Marqo with Vespa Configuration

With your external vector store ready, you can now run Marqo configured to use it:

docker run --name marqo -p 8882:8882 --add-host host.docker.internal:host-gateway \
    -e VESPA_CONFIG_URL="http://host.docker.internal:19071" \
    -e VESPA_DOCUMENT_URL="http://host.docker.internal:8080" \
    -e VESPA_QUERY_URL="http://host.docker.internal:8080" \
    -e ZOOKEEPER_HOSTS="host.docker.internal:2181" \
    marqoai/marqo:latest

Enhancing Your Vespa Setup with Kubernetes

For a more robust and scalable setup, follow the instructions provided in marqo-on-kubernetes Github repo. This guide offers detailed steps for setting up a Vespa cluster using Kubernetes across various cloud providers.

Configuring preloaded patch models

  • Variable: MARQO_PATCH_MODELS_TO_PRELOAD

  • Default value: '[]'

  • Expected value: A string of comma-separated patch model names. Currently supported patch models are: 'simple', 'overlap', 'fastercnn', 'frcnn', 'marqo-yolo', 'yolox', 'dino-v1', 'dino-v2', 'dino/v1', 'dino/v2'.

This is a list of patch models to load and pre-warm as Marqo starts. This prevents a delay during initial image processing.

Configuring preloaded models

  • Variable: MARQO_MODELS_TO_PRELOAD

  • Default value: '["hf/e5-base-v2", "open_clip/ViT-B-32/laion2b_s34b_b79k"]'

  • Expected value: A JSON-encoded array of strings or objects.

This is a list of models to load and pre-warm as Marqo starts. This prevents a delay during initial search and index commands in actual Marqo usage.

Models in string form must be names of models within the model registry. You can find these models here

Models in object form must have model and modelProperties keys.

Model Object Example (OPEN CLIP model)

'{
    "model": "my-open-clip-1",
    "modelProperties": {
        "name": "ViT-B-32-quickgelu",
        "dimensions": 512,
        "url": "https://github.com/mlfoundations/open_clip/releases/download/v0.2-weights/vit_b_32-quickgelu-laion400m_avg-8a00ab3c.pt",
        "type": "open_clip"
    }
}'

Model Object Example (CLIP model)

'{
    "model": "generic-clip-test-model-2",
    "modelProperties": {
        "name": "ViT-B/32",
        "dimensions": 512,
        "type": "clip",
        "url": "https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt"
    }
}'

Marqo Run Example (containing both string and object)

export MY_MODEL_LIST='[
    "sentence-transformers/stsb-xlm-r-multilingual",
    "hf/e5-base-v2",
    {
        "model": "generic-clip-test-model-2",
        "modelProperties": {
            "name": "ViT-B/32",
            "dimensions": 512,
            "type": "clip",
            "url": "https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt"
        }
    }
]'

docker run --name marqo -p 8882:8882 \
    -e MARQO_MODELS_TO_PRELOAD="$MY_MODEL_LIST" \
    marqoai/marqo:latest

Configuring media download threads

Marqo provides environment variables and parameters to control the number of threads used for downloading media during processing.

Configuration name Default Description
MARQO_MEDIA_DOWNLOAD_THREAD_COUNT_PER_REQUEST 5 Maximum number of threads to download media in parallel.
MARQO_IMAGE_DOWNLOAD_THREAD_COUNT_PER_REQUEST 20 (Deprecated) Maximum number of threads to download images.

Thread Count Determination

Marqo determines the number of threads for media downloads in the following order of priority:

  • If media_download_thread_count is set in the add_documents parameters and is different from the default, this value is used.
  • If the MARQO_MEDIA_DOWNLOAD_THREAD_COUNT_PER_REQUEST environment variable is explicitly set and is different from the default, this value is used.
  • If the model type is languagebind, the thread count is set to 5.
  • If image_download_thread_count is explicitly set in the add_documents parameters and is different from the default, this value is used.
  • If the MARQO_IMAGE_DOWNLOAD_THREAD_COUNT_PER_REQUEST environment variable is explicitly set and is different from the default, this value is used.
  • If none of the above conditions are met, the default value (for MARQO_IMAGE_DOWNLOAD_THREAD_COUNT_PER_REQUEST) is used.

MARQO_MEDIA_DOWNLOAD_THREAD_COUNT_PER_REQUEST - This is the preferred parameter for controlling media download threads. It applies to all types of media, including images, videos, and audio files.

MARQO_IMAGE_DOWNLOAD_THREAD_COUNT_PER_REQUEST - This environment variable is deprecated and will be removed in future versions. It's maintained for backward compatibility but only affects image downloads.

Configuring log level

  • Variable: MARQO_LOG_LEVEL

  • Default value: INFO

  • Expected value: a str from one of ERROR, WARNING, INFO, DEBUG.

This environment variable will change the log level of timing logger and uvicorn logger. A higher log level (e.g., ERROR) will reduce the amount of logs in Marqo, while a lower log level (DEBUG) will record more detailed information in the logs.

The default log level is INFO and is recommended for production environments. Setting log level toDEBUG can have performance implications and is not recommended for production environments.

Example

docker run --name marqo -p 8882:8882 \
    -e MARQO_LOG_LEVEL='warning' \
    marqoai/marqo:latest

Configuring throttling

Configuration name Default Description
MARQO_ENABLE_THROTTLING "TRUE" Adds throttling if "TRUE". Must be a str: Either "TRUE" or "FALSE".
MARQO_MAX_CONCURRENT_INDEX 8 Maximum allowed concurrent indexing threads
MARQO_MAX_CONCURRENT_SEARCH 8 Maximum allowed concurrent search threads
MARQO_MAX_CONCURRENT_PARTIAL_UPDATE 100 Maximum allowed concurrent partial update threads

These environment variables set Marqo's allowed concurrency across index and search. If these limits are reached, then Marqo will return 429 on subsequent requests. These should be set with respect to available resources of the machine Marqo will be running on.

Example

docker run --name marqo -p 8882:8882 \
    -e MARQO_ENABLE_THROTTLING='TRUE' \
    -e MARQO_MAX_CONCURRENT_SEARCH='10' \
    marqoai/marqo:latest

Marqo inference cache configuration

Configuration name Default Description
MARQO_INFERENCE_CACHE_SIZE 0 (disabled) The size (measured by the number of query-embedding pairs) of the marqo inference cache. Set it to a positive integer to enable this feature
MARQO_INFERENCE_CACHE_TYPE "LRU" (least recently used) The eviction policy of the marqo inference cache. Supported types are "LRU", "LFU" (least frequently used)

These environment variables configure the size and eviction policy of the Marqo inference cache, which stores results from inference queries to improve search latency. Note that this cache does not apply on the add_documents endpoint. Consider enabling this feature if you frequently encounter a high volume of identical queries. By default, this feature is disabled.

Example

docker run --name marqo -p 8882:8882 \
    -e "MARQO_INFERENCE_CACHE_SIZE=20" \
    -e "MARQO_INFERENCE_CACHE_TYPE=LRU" \
    marqoai/marqo:latest

Marqo Video GPU Acceleration Configuration

Configuration Name Default Description
MARQO_ENABLE_VIDEO_GPU_ACCELERATION None Controls whether GPU acceleration is enabled for video decoding. Accepted values are TRUE or FALSE.

The environment variable MARQO_ENABLE_VIDEO_GPU_ACCELERATION determines whether Marqo uses GPU acceleration for video decoding.

  • Default Behavior: If this variable is not set, Marqo automatically decides based on the availability of a GPU on the host machine.
  • Set to TRUE: Forces Marqo to use GPU acceleration for video decoding. An error will be raised if GPU acceleration is not available.
  • Set to FALSE: Disables GPU acceleration for video decoding, ensuring CPU-based decoding is used.

Note: In addition to a compatible GPU, the NVIDIA drivers on the host must be version 550.54.14 or newer for GPU acceleration to function properly.

Example Usage

To enable GPU acceleration for video decoding, run the following Docker command:

docker run --name marqo --gpus all -p 8882:8882 \
    -e "MARQO_ENABLE_VIDEO_GPU_ACCELERATION=TRUE" \
    marqoai/marqo:latest

Advanced configuration

These are additional advanced configurations that can be set to customize Marqo's behavior. Most users will not need to change these values.

Configuration name Default Description
MARQO_DEFAULT_EF_SEARCH 2000 Default HNSW efSearch value
MARQO_MAX_SEARCHABLE_TENSOR_ATTRIBUTES null The maximum allowed number of tensor fields to be searched in a single tensor search query. By default, there is no limit
MARQO_MAX_SEARCH_LIMIT 1000 The maximum allowed limit for search requests. This can be set up to 1000000.
MARQO_MAX_SEARCH_OFFSET 10000 The maximum allowed offset for search requests. This can be set up to 1000000.
MARQO_MAX_TENSOR_FIELD_COUNT_UNSTRUCTURED 100 The maximum allowed number of tensor fields to be added to a unstructured index created with Marqo 2.13.0+
MARQO_MAX_LEXICAL_FIELD_COUNT_UNSTRUCTURED 100 The maximum allowed number of lexical fields to be added to a unstructured index created with Marqo 2.13.0+
MARQO_THREAD_EXPIRY_TIME 1800 When throttling is enabled, this is the time in seconds after which a request thread's slot is automatically freed up
MARQO_ROOT_PATH null Disk path where Marqo stores runtime artifacts such as downloaded models
ZOOKEEPER_CONNECTION_TIMEOUT null Connection timeout when connecting to Zookeeper

Third party environment variables

The following environment variables are managed by dependencies of Marqo rather than Marqo itself. They are intended for advanced users and should be configured with caution. These variables may be modified or deprecated in future Marqo versions.

Configuration Name Default Value Description
HF_HUB_ENABLE_HF_TRANSFER null Set this to 1 to enable faster downloads from Hugging Face on high-bandwidth networks. See the documentation for details.
HF_HUB_OFFLINE null Set this to 1 to skip HTTP requests when loading a Hugging Face model. This can be useful if you want to run Marqo in offline mode. Refer to the documentation for more details.