Skip to content

Audio and Video Search with Marqo

First, select your platform:

This guide will walk you through using Marqo to index and search audio files.

If you have any questions or need help, visit our Community and ask in the get-help channel.

Step 1: Obtain Marqo Cloud API Key

We need to obtain our Marqo Cloud API Key. For more information on how you can obtain this, visit our article. Once you have obtained this, replace your_api_key with your actual API Key:

api_key = "your_api_key"

Let's now dive into the code.

Step 2: Create a Marqo Index

from marqo import Client

mq = Client(url="https://api.marqo.ai", api_key=api_key)

# Define settings for the index
settings = {
    "type": "unstructured",  # Unstructured data allows flexible input types
    "vectorNumericType": "float",  # Use floating-point numbers for vector embeddings
    "model": "LanguageBind/Video_V1.5_FT_Audio_FT_Image",  # Model to handle text, audio, video, and images
    "normalizeEmbeddings": True,  # Normalize embeddings to ensure comparability
    "treatUrlsAndPointersAsMedia": True,  # Treat URLs as media files
    "treatUrlsAndPointersAsImages": True,  # Specifically treat certain URLs as images
    "audioPreprocessing": {
        "splitLength": 10,
        "splitOverlap": 5,
    },  # Split audio into 10-second chunks with 5-second overlap
    "videoPreprocessing": {
        "splitLength": 20,
        "splitOverlap": 5,
    },  # Split video into 20-second chunks with 5-second overlap
    "inferenceType": "marqo.GPU",  # Specify inference type
}

# Create a new index with the specified settings
mq.create_index("audio-and-video-search", settings_dict=settings)
curl -X POST 'https://api.marqo.ai/api/v2/indexes/audio-and-video-search' \
-H 'x-api-key: XXXXXXXXXXXXXXX' \
-H 'Content-type:application/json' \
-d '
{
"type": "unstructured",
"vectorNumericType": "float",
"model": "LanguageBind/Video_V1.5_FT_Audio_FT_Image",
"normalizeEmbeddings": true,
"treatUrlsAndPointersAsMedia": true,
"treatUrlsAndPointersAsImages": true,
"audioPreprocessing": {"splitLength": 10, "splitOverlap": 5},
"videoPreprocessing": {"splitLength": 20, "splitOverlap": 5},
"inferenceType": "marqo.GPU"
}'

Replace the API Key with the one you obtained earlier.

Step 3: Add Documents to Index

mq.index("audio-and-video-search").add_documents(
    documents=[
        # Add an audio file (blues music)
        {
            "audio_field": "https://marqo-tutorial-public.s3.us-west-2.amazonaws.com/example-audio.mp3",
            "_id": "id1",
        },  # Add an audio file (blues music)
        # Add a video file (public speaking)
        {
            "video_field": "https://marqo-tutorial-public.s3.us-west-2.amazonaws.com/example-video.mp4",
            "_id": "id2",
        },  # Add a video file (public speaking)
        # Add an image (Marqo logo which is a hippo)
        {
            "image_field": "https://marqo-tutorial-public.s3.us-west-2.amazonaws.com/example-image.png",
            "_id": "id3",
        },  # Add an image file
        # Add more documents here if needed
    ],
    tensor_fields=[
        "audio_field",
        "video_field",
        "image_field",
    ],  # Specify which fields should be embedded
)

curl -X POST 'your_endpoint/indexes/audio-and-video-search/documents' \
-H 'x-api-key: XXXXXXXXXXXXXXX' \
-H 'Content-type:application/json' -d '
{
"documents": [
    {
    "audio_field": "https://marqo-tutorial-public.s3.us-west-2.amazonaws.com/example-audio.mp3",
    "_id": "id1"
    },
    {
    "video_field": "https://marqo-tutorial-public.s3.us-west-2.amazonaws.com/example-video.mp4",
    "_id": "id2"
    },
    {
    "image_field": "https://marqo-tutorial-public.s3.us-west-2.amazonaws.com/example-image.png",
    "_id": "id3"
    }
],
"tensorFields": ["audio_field", "video_field", "image_field"]
}'
Replace your_endpoint with your actual endpoint. To find your endpoint, visit Find Your Endpoint.

Note also that if your description contains an apostrophe ('), shell will throw an error.

Step 4: Search with Marqo

# Search the index for a query related to public speaking
res = mq.index("audio-and-video-search").search("public speaking")
print(
    res["hits"][0]
)  # Print the top hit (should relate to the video of public speaking)

This is a cURL example for the query 'public speaking'.

curl -X POST 'your_endpoint/indexes/audio-and-video-search/search' \
-H 'x-api-key: XXXXXXXXXXXXXXX' \
-H 'Content-type:application/json' -d '
{
    "q": "public speaking"
}'

This returns:

{'_id': 'id2', 'video_field': 'https://marqo-tutorial-public.s3.us-west-2.amazonaws.com/example-video.mp4', '_highlights': [{'video_field': '[0.8858670000000011, 20.885867]'}], '_score': 0.5409741804365457}
As we can see, our search results match exactly to our query.

Step 5: Clean up

When you are done with the index you can delete it with the following code:

# Delete your index
mq.delete_index("audio-and-video-search")

This is a cURL example for the query 'public speaking'.

curl -XDELETE https://api.marqo.ai/api/v2/indexes/audio-and-video-search \
-H 'x-api-key: XXXXXXXXXXXXXXX' 

If you do not delete your index you will continue to be charged for it.

Full Code

audio_and_video_search_cloud.py
import marqo

api_key = "add_your_api_key_here"

mq = marqo.Client(url='https://api.marqo.ai', api_key=api_key)

# Define settings for the index
settings = {
    "type": "unstructured",  # Unstructured data allows flexible input types
    "vectorNumericType": "float",  # Use floating-point numbers for vector embeddings
    "model": "LanguageBind/Video_V1.5_FT_Audio_FT_Image",  # Model to handle text, audio, video, and images
    "normalizeEmbeddings": True,  # Normalize embeddings to ensure comparability
    "treatUrlsAndPointersAsMedia": True,  # Treat URLs as media files
    "treatUrlsAndPointersAsImages": True,  # Specifically treat certain URLs as images
    "audioPreprocessing": {"splitLength": 10, "splitOverlap": 5},  # Split audio into 10-second chunks with 5-second overlap
    "videoPreprocessing": {"splitLength": 20, "splitOverlap": 5},  # Split video into 20-second chunks with 5-second overlap
    "inferenceType": "marqo.GPU",  # Specify inference type
}

# Delete the existing index if it exists to avoid conflicts
mq.delete_index("audio-and-video-search")

# Create a new index with the specified settings
mq.create_index("audio-and-video-search", settings_dict=settings)

# Add documents to the index, including audio, video, and image files
mq.index("audio-and-video-search").add_documents(
    documents=[
        {"audio_field": "https://marqo-tutorial-public.s3.us-west-2.amazonaws.com/example-audio.mp3", "_id": "id1"},  # Add an audio file (blues music)
        {"video_field": "https://marqo-tutorial-public.s3.us-west-2.amazonaws.com/example-video.mp4", "_id": "id2"},  # Add a video file (public speaking)
        {"image_field": "https://marqo-tutorial-public.s3.us-west-2.amazonaws.com/example-image.png", "_id": "id3"},  # Add an image file
        # Add more documents here if needed
    ],
    tensor_fields=['audio_field', 'video_field', 'image_field']  # Specify which fields should be embedded
)

# Search the index for a query related to public speaking
res = mq.index("audio-and-video-search").search("public speaking")
print(res['hits'][0])  # Print the top hit (should relate to the video of public speaking)

# Search the index for a query related to a "hippo" (no document is expected to match closely)
res = mq.index("audio-and-video-search").search("hippo")
print(res['hits'][0])  # Print the top hit (may be a loosely related result)

# Search the index for a query related to blues music
res = mq.index("audio-and-video-search").search("blues music")
print(res['hits'][0])  # Print the top hit (should relate to the audio of blues music)

# Search again for a query about a "hippo" and print all results
res = mq.index("audio-and-video-search").search("hippo")
print(res['hits'])  # Print all hits for the query

This guide will walk you through using Marqo to index and search audio files.

If you have any questions or need help, visit our Community and ask in the get-help channel.

Step 1: Run Marqo

Next, we need to get Marqo up and running. You can do this by executing the following command in your terminal:

docker pull marqoai/marqo:latest
docker rm -f marqo
docker run --name marqo -it -p 8882:8882 marqoai/marqo:latest

For more detailed instructions, check the Installation Guide.

Step 2: Create a Marqo Index

from marqo import Client

mq = Client("http://localhost:8882")

# Define settings for the index
settings = {
    "type": "unstructured",  # Unstructured data allows flexible input types
    "vectorNumericType": "float",  # Use floating-point numbers for vector embeddings
    "model": "LanguageBind/Video_V1.5_FT_Audio_FT_Image",  # Model to handle text, audio, video, and images
    "normalizeEmbeddings": True,  # Normalize embeddings to ensure comparability
    "treatUrlsAndPointersAsMedia": True,  # Treat URLs as media files
    "treatUrlsAndPointersAsImages": True,  # Specifically treat certain URLs as images
    "audioPreprocessing": {
        "splitLength": 10,
        "splitOverlap": 5,
    },  # Split audio into 10-second chunks with 5-second overlap
    "videoPreprocessing": {
        "splitLength": 20,
        "splitOverlap": 5,
    },  # Split video into 20-second chunks with 5-second overlap
}

# Create a new index with the specified settings
mq.create_index("audio-and-video-search", settings_dict=settings)
curl -X POST 'http://localhost:8882/indexes/audio-and-video-search' \
-H 'Content-type:application/json' \
-d '
{
"type": "unstructured",
"vectorNumericType": "float",
"model": "LanguageBind/Video_V1.5_FT_Audio_FT_Image",
"normalizeEmbeddings": true,
"treatUrlsAndPointersAsMedia": true,
"treatUrlsAndPointersAsImages": true,
"audioPreprocessing": {"splitLength": 10, "splitOverlap": 5},
"videoPreprocessing": {"splitLength": 20, "splitOverlap": 5},
}'

Step 3: Add Documents to Index

mq.index("audio-and-video-search").add_documents(
    documents=[
        # Add an audio file (blues music)
        {
            "audio_field": "https://marqo-tutorial-public.s3.us-west-2.amazonaws.com/example-audio.mp3",
            "_id": "id1",
        },  # Add an audio file (blues music)
        # Add a video file (public speaking)
        {
            "video_field": "https://marqo-tutorial-public.s3.us-west-2.amazonaws.com/example-video.mp4",
            "_id": "id2",
        },  # Add a video file (public speaking)
        # Add an image (Marqo logo which is a hippo)
        {
            "image_field": "https://marqo-tutorial-public.s3.us-west-2.amazonaws.com/example-image.png",
            "_id": "id3",
        },  # Add an image file
        # Add more documents here if needed
    ],
    tensor_fields=[
        "audio_field",
        "video_field",
        "image_field",
    ],  # Specify which fields should be embedded
)
curl -X POST 'http://localhost:8882/indexes/audio-and-video-search/documents' \
-H 'Content-type:application/json' -d '
{
"documents": [
    {
    "audio_field": "https://marqo-tutorial-public.s3.us-west-2.amazonaws.com/example-audio.mp3",
    "_id": "id1"
    },
    {
    "video_field": "https://marqo-tutorial-public.s3.us-west-2.amazonaws.com/example-video.mp4",
    "_id": "id2"
    },
    {
    "image_field": "https://marqo-tutorial-public.s3.us-west-2.amazonaws.com/example-image.png",
    "_id": "id3"
    }
],
"tensorFields": ["audio_field", "video_field", "image_field"]
}'

Step 4: Search with Marqo

# Search the index for a query related to public speaking
res = mq.index("audio-and-video-search").search("public speaking")
print(
    res["hits"][0]
)  # Print the top hit (should relate to the video of public speaking)

This is a cURL example for the query 'public speaking'.

curl -X POST 'http://localhost:8882/indexes/audio-and-video-search/search' \
-H 'Content-type:application/json' -d '
{
    "q": "public speaking"
}'

This returns:

{'_id': 'id2', 'video_field': 'https://marqo-tutorial-public.s3.us-west-2.amazonaws.com/example-video.mp4', '_highlights': [{'video_field': '[0.8858670000000011, 20.885867]'}], '_score': 0.5409741804365457}
As we can see, our search results match exactly to our query.

Step 5: Clean up

When you are done with the index you can delete it with the following code:

# Delete your index
mq.delete_index("audio-and-video-search")
curl -XDELETE http://localhost:8882/indexes/audio-and-video-search
audio_and_video_search.py
import marqo

# Pull the latest Marqo Docker image and run the Marqo server locally
# docker rm -f marqo  # Remove any existing Marqo containers
# docker pull marqoai/marqo:latest  # Pull the latest Marqo Docker image
# docker run --name marqo -it -p 8882:8882 --add-host host.docker.internal:host-gateway \
# -e MARQO_MODELS_TO_PRELOAD="[]" \  # Preload no models to reduce startup time
# -e MARQO_MAX_CUDA_MODEL_MEMORY=16 \  # Limit CUDA model memory to 16GB
# -e MARQO_MAX_CPU_MODEL_MEMORY=16 \  # Limit CPU model memory to 16GB
# marqoai/marqo:latest  # Run the Marqo container

# Connect to the Marqo client running on the localhost
mq = marqo.Client(url='http://localhost:8882')

# Define settings for the index 
settings = {
    "type": "unstructured",  # Unstructured data allows flexible input types
    "vectorNumericType": "float",  # Use floating-point numbers for vector embeddings
    "model": "LanguageBind/Video_V1.5_FT_Audio_FT_Image",  # Model to handle text, audio, video, and images
    "normalizeEmbeddings": True,  # Normalize embeddings to ensure comparability
    "treatUrlsAndPointersAsMedia": True,  # Treat URLs as media files
    "treatUrlsAndPointersAsImages": True,  # Specifically treat certain URLs as images
    "audioPreprocessing": {"splitLength": 10, "splitOverlap": 5},  # Split audio into 10-second chunks with 5-second overlap
    "videoPreprocessing": {"splitLength": 20, "splitOverlap": 5},  # Split video into 20-second chunks with 5-second overlap
}

# Delete the existing index if it exists to avoid conflicts
resp = mq.delete_index("audio-and-video-search")

# Create a new index with the specified settings
resp = mq.create_index("audio-and-video-search", settings_dict=settings)

# Add documents to the index, including audio, video, and image files
res = mq.index("audio-and-video-search").add_documents(
    documents=[
        {"audio_field": "https://marqo-tutorial-public.s3.us-west-2.amazonaws.com/example-audio.mp3", "_id": "id1"},  # Add an audio file (blues music)
        {"video_field": "https://marqo-tutorial-public.s3.us-west-2.amazonaws.com/example-video.mp4", "_id": "id2"},  # Add a video file (public speaking)
        {"image_field": "https://marqo-tutorial-public.s3.us-west-2.amazonaws.com/example-image.png", "_id": "id3"},  # Add an image file
        # Add more documents here if needed
    ],
    tensor_fields=['audio_field', 'video_field', 'image_field']  # Specify which fields should be embedded
)

# Search the index for a query related to public speaking
res = mq.index("audio-and-video-search").search("public speaking")
print(res['hits'][0])  # Print the top hit (should relate to the video of public speaking)

# Search the index for a query related to a "hippo" (no document is expected to match closely)
res = mq.index("audio-and-video-search").search("hippo")
print(res['hits'][0])  # Print the top hit (may be a loosely related result)

# Search the index for a query related to blues music
res = mq.index("audio-and-video-search").search("blues music")
print(res['hits'][0])  # Print the top hit (should relate to the audio of blues music)

# Search again for a query about a "hippo" and print all results
res = mq.index("audio-and-video-search").search("hippo")
print(res['hits'])  # Print all hits for the query