Marqo Clothing Apparel Simple CLI Demo
Getting Started
-
Download the Dataset from Clothing Dataset into the directory where the
simple_marqo_demo.py
script is found. -
Run this command inside the script directory to setup an HTTP server
This is for the marqo docker container to read files from local os. For more info on this please visit this link.python3 -m http.server 8222
-
Make sure to run the Marqo docker container via the following command:
docker run --name marqo -it -p 8882:8882 --add-host host.docker.internal:host-gateway marqoai/marqo:2.0.0
-
Install Marqo
pip install marqo==3.0.0
-
Run the
simple_marqo_demo.py
script via the following command:python3 simple_marqo_demo.py
Code
simple_marqo_demo.py
import marqo
import pprint
import pandas as pd
mq = marqo.Client(url='http://localhost:8882') # Connection to Marqo Docker Container
dataset_path = "http://localhost:8222/" # Place your file path here (directory where http server is setup)
def load_index(index_name: str, number_data: int) -> None:
try:
shirt_data = pd.read_csv('clothing-dataset/images.csv').head(number_data)[['image','label','kids']].to_dict('records')
# dataset came from this link: https://github.com/alexeygrigorev/clothing-dataset-small
# the .csv file has the following headers:
# image, sender_id, label, kids
# (image name, id of the sender who sent the pictures from sender_id, what kind of clothing it is, whether or not the clothing is for kids)
# Dataset Example:.
# 4285fab0-751a-4b74-8e9b-43af05deee22,124,Not sure,False
# 70045b01-b350-4918-be74-2f627290ad7a,95,Skirt,False
for data in shirt_data:
path = "http://host.docker.internal:8222/clothing-dataset/images/" + data['image'] + ".jpg"
data['image'] = path
settings = {
"treatUrlsAndPointersAsImages":True, # allows us to find an image file and index it
"model":"ViT-B/16"
}
mq.create_index(index_name, settings_dict=settings)
mq.index(index_name).add_documents(shirt_data, tensor_fields=['image'], client_batch_size=64)
print("Index successfully created.")
except Exception as e:
print("Index already exists.")
def delete_index(index_name: str):
try:
mq.index(index_name).delete()
print("Index successfully deleted.")
except Exception as e:
print("Index does not exist.")
def delete_doc_from_index(index_name:str, doc_ids:list[str]):
results = mq.index(index_name).delete_documents(ids=doc_ids)
return results
def search_index_text(index_name:str, query_text: str, search_method: str):
results = mq.index(index_name).search(
q=query_text,
search_method=search_method,
)
# Marqo also has other features such as searhcing based on a specific attribute field and query fitlering
# refer to the documentation on how these features work (https://marqo.pages.dev/)
return results
def search_index_image(index_name:str, image_name: str):
# make sure the image is located inside the directory in which the python http server is running
image_path = "http://host.docker.internal:8222/" + image_name
results = mq.index(index_name).search(image_path)
return results
def get_index_stats(index_name: str) -> dict:
results = mq.index(index_name).get_stats()
return results
def main():
print("Welcome to Marqo Demo!")
while True:
action = int(input('''
What would you like to do?
1) Create an Index
2) Delete an Index
3) Search from an Index
4) Show Index Stats
5) Delete a document from an Index
6) Quit
Action: '''))
if action == 1:
index_name = input("Index name: ")
no_of_items = int(input("No. of items in dataset: "))
load_index(index_name, no_of_items)
elif action == 2:
index_name = input("Index name: ")
delete_index(index_name)
elif action == 3:
index_name = input("Index name: ")
search_type = input("Search Type (Text, Image): ")
if search_type == 'Text':
search_mode = str(input("Search Mode: (Lexical, Tensor)"))
query_text = str(input("Query Text: "))
results = search_index_text(index_name, query_text, search_mode.upper())
pprint.pprint(results)
elif search_type == 'Image':
image_name = str(input("Image name (include MIME type .jpg or .png): "))
results = search_index_image(index_name, image_name)
pprint.pprint(results)
elif action == 4:
index_name = input("Index name: ")
get_index_stats(index_name)
elif action == 5:
index_name = input("Index name: ")
no_of_docs = int(input("No. of documents to delete: "))
doc_ids = []
for i in range(no_of_docs):
doc_id = input("Document ID: ")
doc_ids.append(doc_id)
delete_doc_from_index(index_name, doc_ids)
else:
print("Goodbye")
break
main()
Function References
-
load_index(index_name, number_data)
Name Type Description index_name
String name of index number_data
Integer number of data lines to parse from dataset This function reads the data from the images.csv file found in the dataset and creates an index in the Marqo client. The created index name is based on the value of
index_name
. -
delete_index(index_name)
Name Type Description index_name
String name of index This function deletes an index based on the value of
index_name
. Function returns an error if index does not exist. -
delete_doc_from_index(index_name, doc_ids)
Name Type Description index_name
String name of index doc_ids
List[String] list of document ids This function deletes the documents found in index
index_name
based on the document IDs found indoc_ids
. -
search_index_text(index_name, query_text, search_method)
Name Type Description index_name
String name of index query_text
String search query text search_method
String search method (Lexical or Tensor) This function runs a TEXT search query
query_text
on the documents found in indexindex_name
. The search method is based onsearch_method
.A Python dictionary is returned containing all the results of the query.
-
search_index_iamge(index_name, image_name)
Name Type Description index_name
String name of index image_name
String name of image used for searching This function runs an IMAGE tensor search query based on
image_name
on the documents found in indexindex_name
.A Python dictionary is returned containing all the results of the query.
-
get_index_stats(index_name)
Name Type Description index_name
String name of index This function returns the index stats based on
index_name
.
Usage
Feel free to checkout the code in order to have a better understanding on how Marqo functions are used :).