Skip to content

Using Marqo Cloud

This page contains information about how to get up and running on Marqo Cloud by creating indexes, adding team members and using Marqo Cloud API Keys.

Sign up to marqo cloud

Signup at the marqo cloud console

Create an index

You can read more about creating an index on Marqo cloud here

Sizing your index

Marqo allows you to scale your index in the following ways:

Shards: increasing the number of shards allows you to store more vectors.

Replicas: which allow for more requests per second. Having at least 1 replica enables high availability. Replicas do not affect the total number of shards you can store.

Inference nodes: inference nodes run the machine learning models. You'll need to scale the number of inference nodes depending on the size of the machine learning model you're running and the expected RPS.

Name Description Approximate number of 768 dim. vectors Hourly pricing (USD)
marqo.basic Low cost, recommended for dev/testing workloads Up to 2,000,000 $0.0593
marqo.balanced Storage optimised Up to 16,000,000 $0.8708
marqo.performance RPS optimised Up to 16,000,000 $2.1808

For example, if you created an index with 3 balanced storage shards, 1 replica and 2 inference pods, the index would be able to store 3*16,000,000 = 48,000,000 vectors.

Another example, if you created an index with 2 performance storage shards, 0 replicas and 1 inference pod, the index would be able to store 2*16,000,000 = 32,000,000 vectors.

Choosing storage shard types

marqo.basic Marqo basic is the cheapest of the shard types. This is good for proof of concept applications and development work. These shards have higher search latency that the other options and each shard has a lower capacity of approximately 2 million 768 dim vectors. These shards cannot have any replicas either so they are note recommended for production applications where high availability is a requirement.

marqo.balanced Marqo balanced is the middle tier of the shard types. This is good for production applications where high availability is a requirement. These shards have lower search latency than marqo.basic and each shard has a higher capacity of approximately 16 million 768 dim vectors. These shards can have replicas so they are suitable for production applications where high availability is a requirement.

marqo.performance Marqo performance is the highest tier of the shard types. This is good for production applications where high availability and the lowest search latency is a requirement, especially for indexes with tens or hundreds of millions of vectors. These shards have the lowest search latency of all the shard types and each shard has a capacity of approximately 16 million 768 dim. vectors. These shards can have replicas so they are suitable for highly available production application with millions of users.For the tutorials in this getting started guide we will only use marqo.basic shards however if you are deploying a production search with many users we recommend using marqo.balanced. For larger enterprises with large number of concurrent searches a marqo.performance shard is likely more suitable.

Choosing inference pod types

Name Description Hourly pricing (USD)
marqo.CPU.small Low cost, recommended for dev/testing workloads $0.0593
marqo.CPU.large Storage optimised $0.3187
marqo.GPU RPS optimised $0.9717

The inference pod type adjusts the infrastructure that is used for inference. Inference is the process of creating vectors from your data. A more powerful inference node will reduce latency for indexing and search by creating vectors faster. Inference pod type has a particularly big difference on latency for indexing or searching with images. There are three inference pod types available:

‍ marqo.CPU.small Marqo CPU Small is the smallest and cheapest inference pod available. It is targeted towards very small applications or development and testing where speed is not critical. The small CPU is a very cost effective way to start out with Marqo and experiment with developing applications on the cloud. ‍

marqo.CPU.large Marqo CPU Large is the middle tier of the inference pod types. This is suitable for production applications with low latency search. For many applications a large marqo.CPU pod will be sufficient when searching with text however if searching or indexing images and dealing with very high request concurrency these may become too slow. ‍

marqo.GPU Marqo GPU is the highest tier of the inference pod types. This is suitable for production applications with low latency search and high request concurrency. These pods are significantly faster than marqo.CPU pods when indexing or searching with text and/or images.

A common usage pattern is to mix these nodes for different stages of development. For example you can accelerate indexing of images with marqo.GPU pods and then swap to marqo.CPU pods for searching with only text. You can change your inference configuration at any time by editing the index.

Using a custom model on Marqo cloud

1. Fine-tune your model

The first step is to fine-tune your model. Here we use Open CLIP framework as an example. You should follow the guide to fine-tune your own model and store the trained model (checkpoint) as a *.pt file.

You can also use SBERT or Huggingface sentence transformer models on Marqo cloud. If you want to use these models the first step is to fine-tune your model using the sentence-transformers framework. The fine-tuning guide can be found here.

2. Upload your model to a cloud storage

You need to upload your model (*.pt file for open-clip) to a cloud storage (e.g., Amazon S3, GitHub) and use the downloading address to reference it in Marqo.

Note for Huggingface sentence transformer models, .zip can be used as an extention.

3. Use your model on Marqo Cloud

To use your custom model, you need to navigate to and click create index within the indexes tab. When you have opened the create index dialogue you will then need to open the show advanced details dropdown. Here, you can define your finetuned model via model and modelProperties. For an example Open CLIP model, the code is:

  "treatUrlsAndPointersAsImages": true,
  "model": "generic-clip-test-model-1",
  "modelProperties": {
    "name": "ViT-B-32-quickgelu",
    "dimensions": 512,
    "url": "",
    "type": "open_clip"
  "normalizeEmbeddings": true
  • under url enter the URL of the model
  • under dimensions enter the number of dimensions. E.g if you have finetuned a vit-b-32 model the number of dimensions will be 512

For a sentence transformer model you can use the following:

# load from a public url
  "model": "your-own-sentence-transformers-model",
  "modelProperties": {
    "dimensions": 384,
    "url": "https://path/to/your/sbert/",
    "type": "hf"
- note that above the "hf" type is used to denote a Huggingface sentence transformer model.

If you'd like to use authentication so that you do not expose your model publically, you can load it in from a store with an authentication key such as an S3 bucket:

  "model": "your-own-sentence-transformers-model",
  "modelProperties": {
    "dimensions": 384,
    "type": "hf",
    "model_location": {
      "s3": {
        "Bucket": "s3_bucket",
        "Key": "s3_object_key"  // a zip file
      "auth_required": true

4. Create the index

Click the create index button to begin index creation. Note that if you use a custom model, the radio buttons for "image compatible" and "text optimised" will be greyed out along with the model dropdown. This is the expected behaviour and you can still create the index.

Adding Team Members

On Marqo it's not possible to add the same email to a team on multiple accounts. If you want to add your email to multiple accounts, the suggested workaround is to use the "+" symbol in the email, which will enable you to have an additional email which routes to your usual mailbox but will be separately identified when you log into Marqo.

Example: Example:

How do I create an API key?

To create an API key on Marqo cloud, first log into the Marqo cloud console. Then navigate to the "API Keys" tab, and click the blue "+" button.

Here you can enter the name of the API key. Once you've entered the name, click create. When the API key is generated, you'll be able to press the eye shaped button to unhide the secret key. To use the key, copy this secret key that was hidden and add it either as the API key when you initialise the python client (as above), or specify it in the header as X-API-KEY like below when using the REST API:

curl -XPOST \
  -H "Content-Type: application/json" \
  -d '{
      "documents": [
               "title": "The Travels of Marco Polo",
               "description": "A 13th-century travelogue describing the travels of Polo"
              "title": "Extravehicular Mobility Unit (EMU)",
              "description": "The EMU is a spacesuit that provides environmental protection"
      "tensorFields": ["description"]

What's the difference between and the index endpoint? is Marqo's control plane endpoint. It can be used for creating, listing and deleting indexes created on Marqo cloud (see the indexes tab for more information). If you use with pymarqo, pymarqo will automatically retrieve the index endpoint so that you can search, index data and perform other operations directly. However, if you are not using pymarqo, in order to search, index and perform other operations on your Marqo index, you will need to copy the index endpoint from the console as below: