Deploying an OpenAI Compatible Endpoint using FastChat

December 29, 2024 2 minute read

FastChat is a Python-based framework that enables developers to create scalable and reliable language models. In this article, we will explore how to deploy an OpenAI compatible endpoint using FastChat by leveraging FastAPI, HuggingFace Transformers, and the OpenAI API.

Components Overview

The FastChat architecture consists of three primary components: Controller, Model Worker, and OpenAI API Server. Each component plays a crucial role in ensuring seamless communication and response generation.

Controller: Manages the Model Worker and handles incoming requests.
Model Worker: Loads the model and generates responses based on user input.
OpenAI compatible API Server: Handles client requests, forwards them to the Model Worker, and returns responses.

Deployment Steps

To deploy an OpenAI compatible endpoint using FastChat, follow these steps:

Install the required packages:
```
pip install fastchat
```

Start the Controller, Model Worker, and OpenAI compatible API Server:

python -m fastchat.serve.controller \
  --host 127.0.0.1 &
python -m fastchat.serve.model_worker \
  --host 127.0.0.1 \
  --controller-address http://127.0.0.1:21001 \
  --model-path /path/to/model &
python -m fastchat.serve.openai_api_server \
--host 127.0.0.1 \
--controller-address http://127.0.0.1:21001 \
--port 8000

Make sure to replace /path/to/model with the actual path to your model file and set the same controller-address argument for both model_worker and openai_api_server.

Client Requests and Response

Request file request.json has the following structure:

{
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "What is deep learning?" }
  ],
  "max_tokens": 100
}

Once the components are running, you can send requests to the OpenAI compatible API Server using a client of your choice (e.g., cURL):

curl -X POST \
  http://127.0.0.1:8000/ \
  -H 'Content-Type: application/json' \
  -d @request.json

And the OpenAI compatible API server will return a response similar to the following:

{
    "id": "1",
    "object": "chat.completion",
    "created": 1734702969.6255138,
    "model":"/path/to/model", 
    "choices": [
      {
        "message": 
        { "role": "assistant", "content": "Deep learning is a subset of machine learning..." }, 
        "finish_reason": "stop", 
        "index": 0
      }
    ]
  }

Benefits

This approach offers several benefits, including:

Efficient Deployment: FastChat provides a reliable framework for deploying language models, allowing developers to concentrate on other project aspects.
Scalability: This architecture enables great scalability and flexibility by allowing the execution of multiple models concurrently on different ports.
Reliability: By separating concerns into distinct components, the system ensures reliability and reduces single-point failures.
Accessibility: As it is based on established open-source frameworks, there is ample documentation and resources available to learn and improve this framework. The API documentation for this deployment can also be accessed at http://127.0.0.1:8000/docs, assisting both users and third-party developers on how to use the tool and how to develop new products.

By following these steps, you can deploy an OpenAI compatible Endpoint using FastChat, providing a robust and efficient solution for your language modeling needs.

Share on

Twitter Facebook LinkedIn

Miguel Claramunt

Deploying an OpenAI Compatible Endpoint using FastChat

Components Overview

Deployment Steps

Benefits

Share on

You May Also Enjoy

Building a Simple Personal Webpage with GitHub Pages and Jekyll

Keeping My Product Database Lean: Eliminating Redundancy with a Clever Trigger