Skip to content

GPT4VisionAPI Documentation

Table of Contents - Introduction - Installation - Module Overview - Class: GPT4VisionAPI - Initialization - Methods - encode_image - run - call - Examples - Example 1: Basic Usage - Example 2: Custom API Key - Example 3: Adjusting Maximum Tokens - Additional Information - References

Introduction

Welcome to the documentation for the GPT4VisionAPI module! This module is a powerful wrapper for the OpenAI GPT-4 Vision model. It allows you to interact with the model to generate descriptions or answers related to images. This documentation will provide you with comprehensive information on how to use this module effectively.

Installation

Before you start using the GPT4VisionAPI module, make sure you have the required dependencies installed. You can install them using the following commands:

pip3 install --upgrade swarms

Module Overview

The GPT4VisionAPI module serves as a bridge between your application and the OpenAI GPT-4 Vision model. It allows you to send requests to the model and retrieve responses related to images. Here are some key features and functionality provided by this module:

  • Encoding images to base64 format.
  • Running the GPT-4 Vision model with specified tasks and images.
  • Customization options such as setting the OpenAI API key and maximum token limit.

Class: GPT4VisionAPI

The GPT4VisionAPI class is the core component of this module. It encapsulates the functionality required to interact with the GPT-4 Vision model. Below, we'll dive into the class in detail.

Initialization

When initializing the GPT4VisionAPI class, you have the option to provide the OpenAI API key and set the maximum token limit. Here are the parameters and their descriptions:

Parameter Type Default Value Description
openai_api_key str OPENAI_API_KEY environment variable (if available) The OpenAI API key. If not provided, it defaults to the OPENAI_API_KEY environment variable.
max_tokens int 300 The maximum number of tokens to generate in the model's response.

Here's how you can initialize the GPT4VisionAPI class:

from swarms.models import GPT4VisionAPI

# Initialize with default API key and max_tokens
api = GPT4VisionAPI()

# Initialize with custom API key and max_tokens
custom_api_key = "your_custom_api_key"
api = GPT4VisionAPI(openai_api_key=custom_api_key, max_tokens=500)

Methods

encode_image

This method allows you to encode an image from a URL to base64 format. It's a utility function used internally by the module.

def encode_image(img: str) -> str:
    """
    Encode image to base64.

    Parameters:
    - img (str): URL of the image to encode.

    Returns:
    str: Base64 encoded image.
    """

run

The run method is the primary way to interact with the GPT-4 Vision model. It sends a request to the model with a task and an image URL, and it returns the model's response.

def run(task: str, img: str) -> str:
    """
    Run the GPT-4 Vision model.

    Parameters:
    - task (str): The task or question related to the image.
    - img (str): URL of the image to analyze.

    Returns:
    str: The model's response.
    """

call

The __call__ method is a convenient way to run the GPT-4 Vision model. It has the same functionality as the run method.

def __call__(task: str, img: str) -> str:
    """
       Run the GPT-4 Vision model (callable).

       Parameters:
       - task (str): The task or question related to the image.
       - img

    (str): URL of the image to analyze.

       Returns:
       str: The model's response.
    """

Examples

Let's explore some usage examples of the GPT4VisionAPI module to better understand how to use it effectively.

Example 1: Basic Usage

In this example, we'll use the module with the default API key and maximum tokens to analyze an image.

from swarms.models import GPT4VisionAPI

# Initialize with default API key and max_tokens
api = GPT4VisionAPI()

# Define the task and image URL
task = "What is the color of the object?"
img = "https://i.imgur.com/2M2ZGwC.jpeg"

# Run the GPT-4 Vision model
response = api.run(task, img)

# Print the model's response
print(response)

Example 2: Custom API Key

If you have a custom API key, you can initialize the module with it as shown in this example.

from swarms.models import GPT4VisionAPI

# Initialize with custom API key and max_tokens
custom_api_key = "your_custom_api_key"
api = GPT4VisionAPI(openai_api_key=custom_api_key, max_tokens=500)

# Define the task and image URL
task = "What is the object in the image?"
img = "https://i.imgur.com/3T3ZHwD.jpeg"

# Run the GPT-4 Vision model
response = api.run(task, img)

# Print the model's response
print(response)

Example 3: Adjusting Maximum Tokens

You can also customize the maximum token limit when initializing the module. In this example, we set it to 1000 tokens.

from swarms.models import GPT4VisionAPI

# Initialize with default API key and custom max_tokens
api = GPT4VisionAPI(max_tokens=1000)

# Define the task and image URL
task = "Describe the scene in the image."
img = "https://i.imgur.com/4P4ZRxU.jpeg"

# Run the GPT-4 Vision model
response = api.run(task, img)

# Print the model's response
print(response)

Additional Information

  • If you encounter any errors or issues with the module, make sure to check your API key and internet connectivity.
  • It's recommended to handle exceptions when using the module to gracefully handle errors.
  • You can further customize the module to fit your specific use case by modifying the code as needed.

References

This documentation provides a comprehensive guide on how to use the GPT4VisionAPI module effectively. It covers initialization, methods, usage examples, and additional information to ensure a smooth experience when working with the GPT-4 Vision model.