Creating a custom AI chatbot with Python and OpenAI API

I spent considerable time trying to follow various blogs out there to get an AI chatbot written in Python using the OpenAI API. All my attempts following blogs here, here, and here were unsuccessful, mostly related to Python dependency challenges. Each had slightly different solutions and I've learned more about Pinecone, Rust, cmake, clang++, Apache Arrow, and SQLite than I ever wanted to know. Most bloggers are not specific in specifying prerequisites, the environment setup required, or the explicit package versions they're using.

Nonetheless, I came across this blog by Miluska Romero from Peru which was a god send and which had the simplest and most straightforward solution needed (which is exactly what I was looking for!). My writeup here is based on the help of that blog post.

Create an API Key on OpenAI

  1. Navigate to OpenAI
  2. Click on API login and login
  3. Click on API
  1. You will be presented with a dashboard; click on API keys
  2. Click on Create new secret key
  3. Give it a name (e.g., "AI ChatBot Key") and you can keep it in the Default project, then click on Create secret key
  4. Note the key down (e.g., smk3q...rnynf9YeKltt)

Install Python using pyenv

In my scenario here, I installed Python using pyenv.

Click here to a separate blog post to install Python 3.12.4 on Red Hat Enterprise Linux 9.4 (as a non-root user).

In short, the instructions to install pyenv as a non-root user are:

curl https://pyenv.run | bash

Afterwards add the following to your .bash_profile and re-login to the shell:

export PYENV_ROOT="$HOME/.pyenv"
[[ -d $PYENV_ROOT/bin ]] && export PATH="$PYENV_ROOT/bin:$PATH"
eval "$(pyenv init -)"

# Restart your shell for the changes to take effect.

# Load pyenv-virtualenv automatically by adding
# the following to ~/.bashrc:

eval "$(pyenv virtualenv-init -)"

Now install the following Python 3.10.14 (which is won't give any missing 'openai.openai_object' errors) and set it locally:

pyenv install 3.12.4
pyenv local 3.12.4

Upgrade pip from 24.0 to 24.1.2:

pip install --upgrade pip

Install Necessary Python Packages

Confirm the Python and pip version:

dev@devhost:/home/dev> python3 --version
Python 3.12.4

dev@devhost:/home/dev> pip3 --version
pip 24.1.2 from /home/dev/.pyenv/versions/3.12.4/lib/python3.12/site-packages/pip (python 3.12)

Install the following Python packages:

pip3 install openai==0.28.0

Upload Training Data

Put your data in the following subfolder and populate it with PDF, CSV, or TXT files. The more data you add, the more tokens will be consumed.

This data is a standard text file which contained two chapters of my upcoming book. The plan is to engage the chatbot on the contents of these chapters.

mkdir -p aichatbot
cp mydata.txt ~/aichatbot

Create Python Program

Create the Python program app.py:

import os
import openai
import sys

openai.api_key = '<your-openai-api-key-here>'

# Load the book data from the TXT file
with open("mydata.txt") as file:
    product_data = file.read()

# Initialize the context with interaction rules and product data
context = []

# Define the chatbot's interaction rules here, including how it should greet users, provide product information, manage order inquiries, and handle payment processing.

rules = """
Your name is Ahmed and you are an AI clone of the real Ahmed Aboulnaga. \
The way you look on video and the sound of your voice are both somewhat creepy and artificial. \
When responding, be more informal than formal. \
The data consists of two chapters from a book titled "DevSecOps on Oracle Cloud". \
You are the author of the chapters and your name is Ahmed. \
These chapters all revolve around the topic of Terraform and Oracle Cloud Infrastructure, also referred to as OCI. \
Each chapter is comprised of multiple sections. \
The book will be published sometime in 2024. \
When asked about the content of the data, mimic someone with a personality that is honest, but can be sarcastic at times. \
In the responses, keep the answers brief but engaging. \
"""

context.append({'role': 'system', 'content': f"""{rules} {product_data}"""})

# Function to fetch messages from the OpenAI Chat model
def fetch_messages(messages, model="gpt-3.5-turbo", temperature=0):
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=temperature,
    )
    return response.choices[0].message["content"]

# Function to refresh and update the conversation context based on user input
def refresh_conversation(chat):
    context.append({'role': 'user', 'content': f"{chat}"})
    response = fetch_messages(context, temperature=0.7)
    context.append({'role': 'assistant', 'content': f"{response}"})
    print(response)

# Main loop to engage users in conversation
def main():
    while True:
        message = input("Please enter a message (or 'exit' to leave): ")
        if message.lower() == 'exit':
            break
        refresh_conversation(message)

if __name__ == '__main__':
    main()

Executing the Python Program

Now the folder essentially has only two files in it; (1) the Python program app.py and (2) the training data mydata.txt.

To run the program:

python3 app.py

This screenshot is an example of how the output looked like, and the question/answer format of the application:

Troubleshooting

I did receive some errors when calling the OpenAI API, but they were easily resolvable.

Error #1

openai.error.InvalidRequestError: This model's maximum context length is 16385 tokens. However, your messages resulted in 18297 tokens. Please reduce the length of the messages.

I had to reduce the size of the training data. Essentially I had first included 3 chapters of my book in the custom dataset. I cut it down to 2 chapters to meet the 16k token limit.

Error #2

openai.error.RateLimitError: You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.

I had initially assumed that there was some free limits to API calls depending on the model, based on what I read. This seems incorrect. So I bought $10 worth of credit on the OpenAI API platform to proceed.

References