Image Search with Localization

Transform your image search capabilities with our step-by-step guide to implementing localisation features using Marqo. Follow along to enhance your project with the power of vector search.

Getting Started

Before diving into the code, let's set up your environment.

Clone the Repository
Get the necessary example files by cloning the examples repository.

Run Marqo
Use Docker to pull and run the Marqo image:

docker rm -f marqo
docker pull marqoai/marqo:2.0.0
docker run --name marqo -it -p 8882:8882 --add-host host.docker.internal:host-gateway marqoai/marqo:2.0.0

For more detailed instructions, see the getting started guide.

Explore Further
You can view the original code and article for additional context here and here.

Walkthrough

Follow these steps to integrate image search with localisation into your project.

Step 1: Import Libraries and Define Helpers

Start by importing the necessary libraries and defining any helper functions you'll need.

from marqo import Client
import os
import pandas as pd
from utils import download_data

Step 2: Prepare Your Environment

Ensure Marqo is started and ready to use. Follow the instructions in the repository if you haven't already.

Step 3: Data Acquisition

Decide on the source of your data for indexing, either remotely or locally.

use_remote = False
in_docker = True

Step 4: Download and Locate Data

Fetch and prepare your image data, setting up the paths accordingly.

data = pd.read_csv('files.csv', index_col=0)
docker_path = 'http://host.docker.internal:8222/'
local_dir = os.getcwd() + '/images/'

locators = download_data(data=data, download_dir=local_dir, use_remote=use_remote, in_docker=in_docker, docker_path=docker_path)

Step 5: Document Preparation

Organize your images in a format suitable for indexing with Marqo.

documents = [{"image_location": s3_uri, '_id': os.path.basename(s3_uri)} for s3_uri in locators]

Step 6: Index Creation

Initialize the client and configure your indexing settings.

client = Client()

Define index names and settings for image preprocessing.

index_name_prefix = "visual-search"
patch_methods = ["dino/v1", None, "simple"]
model_name = "open_clip/ViT-B-32/laion2b_s34b_b79k"
n_processes = 3
batch_size = 50
delete_index = True

Apply settings and create the index.

settings = {
    "treatUrlsAndPointersAsImages": True,
    "imagePreprocessing": {
        "patchMethod": None
    },
    "model": None,
    "normalizeEmbeddings": True,
}

for patch_method in patch_methods:
    suffix = '' if patch_method is None else f"-{patch_method.replace('/', '-')}"
    index_name = index_name_prefix + suffix
    settings['model'] = model_name
    settings['imagePreprocessing']['patchMethod'] = patch_method

    if delete_index:
        try:
            client.index(index_name).delete()
        except:
            print("index does not exist, cannot delete")

    response = client.create_index(index_name, settings_dict=settings)

    response = client.index(index_name).add_documents(
        documents, 
        client_batch_size=batch_size, 
        tensor_fields=['image_location']
    )

Full Code

indexing_all_data.py

#####################################################
### STEP 0. Import and define any helper functions
#####################################################

from marqo import Client
import os
import pandas as pd
from utils import download_data

#####################################################
### STEP 1. start Marqo
#####################################################

# Follow the instructions here https://github.com/marqo-ai/marqo

#####################################################
### STEP 2. Get the data for indexing
#####################################################


# this will pull directly from the s3 bucket if True, otherwise it will pull for local indexing
use_remote = False
in_docker = True

data = pd.read_csv("files.csv", index_col=0)
docker_path = "http://host.docker.internal:8222/"
local_dir = os.getcwd() + "/images/"

locators = download_data(
    data=data,
    download_dir=local_dir,
    use_remote=use_remote,
    in_docker=in_docker,
    docker_path=docker_path,
)

documents = [
    {"image_location": s3_uri, "_id": os.path.basename(s3_uri)} for s3_uri in locators
]

# if you have the images locally, see the instructions
# here https://marqo.pages.dev/Advanced-Usage/images/ for the best ways to index


#####################################################
### STEP 3. Create the index(s)
######################################################

client = Client()

# setup the settings so we can comapre the different methods
index_name_prefix = "visual-search"
patch_methods = [
    "dino-v1",
    None,
    "simple",
]  # ["dino/v1", "dino/v2", "frcnn", None, "simple"]
model_name = "open_clip/ViT-B-32/laion2b_s34b_b79k"
n_processes = 3
batch_size = 50

# set this to false if you do not want to delete the previous index of the same name
delete_index = True

settings = {
    "treatUrlsAndPointersAsImages": True,
    "imagePreprocessing": {"patchMethod": None},
    "model": None,
    "normalizeEmbeddings": True,
}

for patch_method in patch_methods:
    suffix = "" if patch_method is None else f"-{patch_method.replace('/', '-')}"
    index_name = index_name_prefix + suffix

    # update the settings we want to use
    settings["model"] = model_name
    settings["imagePreprocessing"]["patchMethod"] = patch_method

    # optionally delete the index if it exists
    if delete_index:
        try:
            client.index(index_name).delete()
        except:
            print("index does not exist, cannot delete")

    # create the index with our settings
    response = client.create_index(index_name, settings_dict=settings)

    response = client.index(index_name).add_documents(
        documents, client_batch_size=batch_size, tensor_fields=["image_location"]
    )