Skip to content

Running Marqo on AWS

Size your data

Marqo often requires additional storage due to the fact that it enriches data using embeddings. Therefore, expect at least a 10x increase in data size for text based data.

We recommend using the g4 instance family for marqo. If you have less than 200GB of data size, a g4dn.xlarge instance is recommended.

Note that running marqo on a CPU instance such as a t3 instance is also acceptable for search but will suffer a significant slowdown when used for add_documents calls compared to GPU.

AWS also advises using a GPU instance for the majority of deep learning tasks as it is faster to train new models on a GPU than a CPU instance. To learn more, you can visit AWS recommended GPU instances and AWS EC2 On-Demand Pricing.

Configuring your AWS EC2 Instance

  1. Create an EC2 instance with the following configuration:

    EC2 Instance Configuration
    Type g4dn.xlarge
    Storage EBS 200GB
    Operating system Amazon Linux
  2. When your EC2 instance is created, connect to the instance (e.g direct connect in the Amazon Console or SSH) and use the following command to install docker:

    sudo amazon-linux-extras install docker
    
  3. Then use the following commands to ensure that docker is running and will automatically start when the instance is restarted:

    sudo service docker start
    sudo systemctl enable docker
    
  4. Finally, run marqo on the instance with:

    docker run --name marqo -it --privileged -p 8882:8882 --gpus all --add-host host.docker.internal:host-gateway marqoai/marqo:latest
    

    Note that the data will be stored on the instance itself. If you remove the container the data will be lost.

    If you want to stop marqo run docker stop marqo.

    Then to restart marqo run docker start marqo.