Skip to content

Fine-tuning gpt: Building a better and cheaper assistant?

Posted on:September 21, 2023
Fine-tuning gpt: Building a better and cheaper assistant?
Image generated by BlueWillow v4

From OpenAI docs

Fine-tuning lets you get more out of the models available through the API by providing:

  1. Higher quality results than prompting
  2. Ability to train on more examples than can fit in a prompt
  3. Token savings due to shorter prompts
  4. Lower latency requests

In this post, we will look at how to build an assistant with and without fine-tuning in this experiment and compare the results.

Code

https://github.com/amalshaji/finetune-gpt

Use-case

Let’s build an assistant capable of generating template messages for engaging with our users. The assistant will generate messages based on the required tone and use pre-defined template tags wherever necessary.

e.g.:

Happy Birthday {{ first_name }}! We're thrilled to celebrate this special day with you. As a valued customer, your happiness is our top priority. Enjoy this day to the fullest, and here's to another year of great experiences with us. Cheers to you on your birthday!

Assume the following tags are valid: first_name, email, last_name

Without fine-tuning

Generate a message in less than 50 words using the following parameters:
    occasion: {occasion}
    tone: {tone}

Use the following template tags as placeholders wherever necessary
| tags | description |
|---|---|
| {{ first_name }} | User's first name |
| {{ last_name }} | User's last name |
| {{ email }} | User's email |

The program will set the occasion and tone before passing to the OpenAI endpoint.

import asyncio
import openai
import os
from string import Template

openai.api_key = os.getenv("OPENAI_API_KEY")

prompt = Template(
    """
Generate a message in less than 50 words using the following parameters:
    occasion: $occasion
    tone: $tone

Use the following template tags as placeholders wherever necessary
| tags | description |
|---|---|
| {{ first_name }} | User's first name |
| {{ last_name }} | User's last name |
| {{ email }} | User's email |
"""
)


async def run(occasion: str, tone: str):
    response = await openai.ChatCompletion.acreate(
        model="gpt-3.5-turbo",
        messages=[
            {
                "role": "system",
                "content": prompt.substitute({"occasion": occasion, "tone": tone}),
            }
        ],
    )
    print(response)


asyncio.run(run(occasion="birthday", tone="texas accent"))

Output

{
  "id": "chatcmpl-817dhqQoNvTmBHlDMKDkiQIYezgRn",
  "object": "chat.completion",
  "created": 1695278385,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Happy birthday, {{ first_name }}! Hope y'all have a rootin' tootin' day filled with joy and good ol' Texas charm. Yeehaw! \ud83e\udd20"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 99,
    "completion_tokens": 39,
    "total_tokens": 138
  }
}

Fine-tuning

Preparing the dataset

OpenAI fine-tuning requires a dataset of at least ten examples to work. It accepts the input as a jsonl (json lines) file, where each line is a valid json. e.g.:

{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."},{"role": "user", "content": "What's the largest planet in our solar system?"},{"role": "assistant", "content": "Jupiter, but don't worry, it's not like it's taking up half the solar system or anything."}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."},{"role": "user", "content": "Who discovered penicillin?"},{"role": "assistant", "content": "Alexander Fleming. But don't get too excited; it's just one of the most important discoveries in medical history."}]}

Here is the training data we will use to fine-tune our model.

Create a new file

async def create_training_file():
    file_create_output = await openai.File.acreate(
        file=open("data.jsonl", "rb"), purpose="fine-tune"
    )
    print(f"Created training file: {file_create_output}")
    return file_create_output

The response should contain the file id and status. If the status is not processed, keep checking the status manually await openai.File.aretrieve(file_id)

Fine-tune

We can schedule a fine-tuning job with OpenAI using the uploaded file ID.


async def start_finetune_job(training_file: str):
    ft_job = await openai.FineTuningJob.acreate(
        training_file=training_file, model="gpt-3.5-turbo"
    )
    print(f"Created fine tuning job: {ft_job}")
    return ft_job

Navigate to https://platform.openai.com/finetune to see the job status. Once complete, you’ll see the relevant metrics, including the tokens used for training.

fine tune dashboard

Working with the new model

We run something similar to the non-fine-tuned version, replacing the model with our custom model.

async def run(occasion: str, tone: str):
    start = timeit.default_timer()
    response = await openai.ChatCompletion.acreate(
        model="ft:gpt-3.5-turbo-0613:personal::819PHd1U",
        messages=[
            {
                "role": "system",
                "content": prompt,
            },
            {
                "role": "user",
                "content": f"occasion: {occasion}, tone: {tone}",
            },
        ],
    )
    print(response)
    print(f"Time taken: {timeit.default_timer() - start}s")


asyncio.run(run(occasion="birthday", tone="informal"))

imports and other stuff are the same as the non-fine-tuned version

{
  "id": "chatcmpl-81AzImR0c0f9h3Kud3sHeHo8rmRyl",
  "object": "chat.completion",
  "created": 1695291256,
  "model": "ft:gpt-3.5-turbo-0613:personal::819PHd1U",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Happy birthday, {{ first_name }}! Hope your special day is filled with lots of cake, presents, and non-stop fun."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 96,
    "completion_tokens": 26,
    "total_tokens": 122
  }
}

There’s a slight reduction in the number of tokens, and the total cost is around $0.000196.

Comparison

non fine-tunedfine-tuned
CostIf you need to pass examples in the system commands, it will incur a higher costcheaper, as the model can be fine-tuned on the examples
AvailabilityAlways availableAfter training, the API raised a lot of server overload exceptions before giving the correct response.
The inference speedis slow; on average, it takes ~3 secondsfaster, and most queries are finished in less than a second. Some outliers took more than 10 seconds.
Ease of useStraightforward, write prompt and queryPrepare training and validation data, create files, start training, and wait for training to finish (Obviously, you can automate all these).
LimitationsNo known limitations (Using this as the base for comparison)gpt-4, function calling not available (later this year)

Conclusion

In this experiment, we improved the output of our prompt by fine-tuning gpt-3.5 on a small dataset. You must pass examples with each prompt in the non-fine-tuned version to achieve the same. Imagine the cost of doing this for 100,000-1,000,000 API calls. Although this experiment did not show any difference in results(as the task was easy for gpt), a complex task would’ve benefited from fine-tuning.

Fine-tuning GPT-3 models can significantly improve their performance on complex tasks, making them ideal for building low-cost, highly efficient assistants. While the process of fine-tuning requires some additional effort, the benefits in terms of cost and speed make it a worthwhile investment. As support for GPT-4 and function call is added, the potential applications of fine-tuned models will only continue to expand. However, it’s essential to carefully consider whether fine-tuning is the right approach for your specific use case. By researching and weighing the pros and cons, you can decide whether fine-tuning is the best way to achieve your goals.

References