Running Marqo on AWS
Size your data
Marqo often requires additional storage due to the fact that it enriches data using embeddings. Therefore, expect at least a 10x increase in data size for text based data.
We recommend using the g4
instance family for marqo. If you have less than 200GB of data size, a g4dn.xlarge
instance is recommended.
Note that running marqo on a CPU instance such as a t3
instance is also acceptable for search but will suffer a significant slowdown when used for add_documents
calls compared to GPU.
AWS also advises using a GPU instance for the majority of deep learning tasks as it is faster to train new models on a GPU than a CPU instance. To learn more, you can visit AWS recommended GPU instances and AWS EC2 On-Demand Pricing.
Configuring your AWS EC2 Instance
-
Create an EC2 instance with the following configuration:
EC2 Instance Configuration Type
g4dn.xlarge Storage
EBS 200GB Operating system
Amazon Linux -
When your EC2 instance is created, connect to the instance (e.g direct connect in the Amazon Console or SSH) and use the following command to install docker:
sudo amazon-linux-extras install docker
-
Then use the following commands to ensure that docker is running and will automatically start when the instance is restarted:
sudo service docker start sudo systemctl enable docker
-
Finally, run marqo on the instance with:
docker run --name marqo -it -p 8882:8882 --gpus all marqoai/marqo:latest
Note that the data will be stored on the instance itself. If you remove the container the data will be lost.
If you want to stop marqo run
docker stop marqo
.Then to restart marqo run
docker start marqo
.