How to Build an AI Chatbot in Python (2026)
Learning how to build an AI chatbot in Python is one of the most practical AI projects you can take on. Python is the default language for building AI chatbots because the ecosystem is mature, the libraries are well-documented, and you can go from zero to a working chatbot in under an hour.
This tutorial covers three levels: a basic chatbot with the OpenAI API, an intermediate chatbot with LangChain and memory, and a production-ready chatbot with a web UI and document retrieval. Each builds on the previous one.
Prerequisites
- Python 3.10 or higher
- An OpenAI API key (sign up at platform.openai.com)
- Basic Python knowledge (functions, classes, installing packages)
GPT-4o costs $2.50 per million input tokens and $10 per million output tokens. For development and testing, expect to spend $1-5 total.
Level 1: Basic AI Chatbot in Python with OpenAI
This is the simplest possible AI chatbot, about 20 lines of code.
Setup
mkdir ai-chatbot && cd ai-chatbot
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install openai python-dotenv
Create a .env file:
OPENAI_API_KEY=sk-your-key-here
The Code
# chatbot.py
from openai import OpenAI
from dotenv import load_dotenv
load_dotenv()
client = OpenAI()
messages = [
{"role": "system", "content": "You are a helpful assistant. Be concise and practical."}
]
print("Chatbot ready. Type 'quit' to exit.\n")
while True:
user_input = input("You: ")
if user_input.lower() == "quit":
break
messages.append({"role": "user", "content": user_input})
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
temperature=0.7
)
reply = response.choices[0].message.content
messages.append({"role": "assistant", "content": reply})
print(f"\nAssistant: {reply}\n")
Run it:
python chatbot.py
This chatbot maintains conversation history by appending each message to the messages list. It remembers what you've said within the current session, but forgets everything when you restart the script.
What Each Parameter Does
- model: The LLM to use.
gpt-4obalances quality and cost. Usegpt-4o-minifor cheaper, faster responses. - messages: The full conversation history. The system message sets the chatbot's personality.
- temperature: Controls randomness. 0 is deterministic, 1 is creative. 0.7 is a good default for conversational bots.
If you've been using ChatGPT for coding, you already know the patterns; now you're building the same capability into your own application.
Level 2: Add Memory and Context with LangChain
The basic chatbot loses its memory when the script stops. LangChain adds persistent memory and a structured framework for building more capable chatbots.
Setup
pip install langchain langchain-openai
The Code
# chatbot_memory.py
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_community.chat_message_histories import ChatMessageHistory
from dotenv import load_dotenv
load_dotenv()
# Initialize the model
llm = ChatOpenAI(model="gpt-4o", temperature=0.7)
# Define the prompt with conversation history
prompt = ChatPromptTemplate.from_messages([
("system", "You are a knowledgeable AI assistant. Provide clear, accurate answers. If you're not sure about something, say so."),
MessagesPlaceholder(variable_name="history"),
("human", "{input}")
])
chain = prompt | llm
# Store conversation history
store = {}
def get_session_history(session_id: str):
if session_id not in store:
store[session_id] = ChatMessageHistory()
return store[session_id]
chatbot = RunnableWithMessageHistory(
chain,
get_session_history,
input_messages_key="input",
history_messages_key="history"
)
# Chat loop
print("Chatbot with memory ready. Type 'quit' to exit.\n")
while True:
user_input = input("You: ")
if user_input.lower() == "quit":
break
response = chatbot.invoke(
{"input": user_input},
config={"configurable": {"session_id": "user-1"}}
)
print(f"\nAssistant: {response.content}\n")
What's Different
- Session-based memory: Conversations are stored per session ID. You can have multiple users with separate conversation histories.
- Structured prompts: The
ChatPromptTemplateseparates the system instructions from conversation history and user input, making it easier to modify behavior. - Extensibility: LangChain's chain architecture means you can add tools, retrieval, and custom logic by inserting new links in the chain.
Level 3: RAG AI Chatbot in Python with Document Retrieval
This is where it gets practical. A RAG (Retrieval-Augmented Generation) chatbot answers questions using your own documents: company docs, product manuals, knowledge bases, whatever you feed it. If you've explored using ChatGPT for work, this is how you build that same Q&A capability into a standalone tool.
Setup
pip install langchain langchain-openai langchain-community chromadb streamlit
Step 1Ingest Documents
# ingest.py
from langchain_community.document_loaders import DirectoryLoader, TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from dotenv import load_dotenv
load_dotenv()
# Load all .txt and .md files from a directory
loader = DirectoryLoader("./docs", glob="**/*.md", loader_cls=TextLoader)
documents = loader.load()
print(f"Loaded {len(documents)} documents")
# Split into chunks
splitter = RecursiveCharacterTextSplitter(
chunk_size=800,
chunk_overlap=100,
separators=["\n\n", "\n", ". ", " ", ""]
)
chunks = splitter.split_documents(documents)
print(f"Created {len(chunks)} chunks")
# Create vector store
vectorstore = Chroma.from_documents(
documents=chunks,
embedding=OpenAIEmbeddings(model="text-embedding-3-small"),
persist_directory="./chroma_db"
)
print("Vector store created and saved.")
Put your documents in a ./docs folder and run:
python ingest.py
Step 2Build the Chat Interface with Streamlit
# app.py
import streamlit as st
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferWindowMemory
from dotenv import load_dotenv
load_dotenv()
st.set_page_config(page_title="AI Chatbot", page_icon="💬")
st.title("AI Chatbot")
# Initialize components
@st.cache_resource
def setup_chain():
vectorstore = Chroma(
persist_directory="./chroma_db",
embedding_function=OpenAIEmbeddings(model="text-embedding-3-small")
)
memory = ConversationBufferWindowMemory(
memory_key="chat_history",
return_messages=True,
k=10 # Remember last 10 exchanges
)
chain = ConversationalRetrievalChain.from_llm(
llm=ChatOpenAI(model="gpt-4o", temperature=0.3),
retriever=vectorstore.as_retriever(search_kwargs={"k": 4}),
memory=memory,
return_source_documents=True
)
return chain
chain = setup_chain()
# Chat interface
if "messages" not in st.session_state:
st.session_state.messages = []
for message in st.session_state.messages:
with st.chat_message(message["role"]):
st.markdown(message["content"])
if prompt := st.chat_input("Ask a question about your documents..."):
st.session_state.messages.append({"role": "user", "content": prompt})
with st.chat_message("user"):
st.markdown(prompt)
with st.chat_message("assistant"):
response = chain.invoke({"question": prompt})
answer = response["answer"]
st.markdown(answer)
# Show sources
if response.get("source_documents"):
with st.expander("Sources"):
for doc in response["source_documents"]:
st.caption(f"From: {doc.metadata.get('source', 'Unknown')}")
st.text(doc.page_content[:300])
st.session_state.messages.append({"role": "assistant", "content": answer})
Run the app:
streamlit run app.py
You now have a chatbot with a clean web interface that answers questions using your documents, shows its sources, and maintains conversation history.
Deploying Your Python AI Chatbot
Option 1: Streamlit Cloud (Free)
Push your code to GitHub and connect it to Streamlit Community Cloud. Add your OPENAI_API_KEY in the app secrets. Free tier handles low-traffic apps well.
Option 2: Your Own Server
Use a VPS (DigitalOcean, Hetzner, AWS) and run:
pip install -r requirements.txt
streamlit run app.py --server.port 8501 --server.address 0.0.0.0
Put Nginx in front for HTTPS and proper domain routing. Expect $5-20/month for hosting.
Option 3: Docker
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8501
CMD ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]
Docker makes deployment reproducible across any cloud provider.
Using Open-Source Models Instead
If you want to avoid API costs or need to run everything locally, swap OpenAI for an open-source model via Ollama:
# Install Ollama and pull a model
ollama pull llama3.1
from langchain_community.llms import Ollama
llm = Ollama(model="llama3.1")
Everything else in the code stays the same. The tradeoff: open-source models require more compute (a decent GPU or a fast CPU with 16GB+ RAM) and typically produce lower-quality responses than GPT-4o for general tasks.
Common Issues and Fixes
Chatbot hallucinates answers: Lower the temperature to 0.1-0.3 and add "Only answer based on the provided context. If the answer isn't in the context, say you don't know." to the system prompt.
Responses are too slow: Use gpt-4o-mini instead of gpt-4o. Enable streaming for perceived faster responses. Reduce the number of retrieved chunks from 4 to 2.
Running out of context window: Use ConversationBufferWindowMemory with k=5 to only keep the last 5 exchanges instead of the entire history.
Vector store is too large: Use text-embedding-3-small instead of text-embedding-3-large. It's 5x cheaper with similar retrieval quality for most use cases.
Once your chatbot is working, you'll likely want to improve its responses by training the AI on your own data or training the chatbot with RAG or fine-tuning.
Next Steps
Once your chatbot works, consider these enhancements:
- Authentication: Add user login so conversations are private and persistent.
- Analytics: Track which questions are asked most often and where the chatbot struggles.
- Feedback buttons: Let users rate answers so you can identify and fix weak spots.
- Multi-modal input: Streamlit now supports audio input in
st.chat_input, letting users speak their questions.
You've built an AI chatbot in Python. The same patterns (retrieval, memory, prompt engineering) apply whether you're building a customer support bot, an internal knowledge assistant, or a project management tool. If you want to go beyond a chatbot and build a full personal AI assistant that handles tasks like scheduling, email, and file management, the architecture extends naturally from what you've learned here.
FAQ
What Python libraries do I need to build an AI chatbot?
At minimum, you need the openai library and python-dotenv for environment variables. For more advanced chatbots with memory and retrieval, add langchain, langchain-openai, chromadb for vector storage, and streamlit for a web interface.
How much does it cost to run a Python AI chatbot?
Using GPT-4o, expect to spend $1-5 during development and testing. In production, a chatbot handling 1,000 conversations per day with average-length exchanges costs roughly $5-15/month in API fees. Using GPT-4o-mini reduces costs by about 16x.
Can I build an AI chatbot in Python without using OpenAI?
Yes. You can use open-source models like Llama 3.1 through Ollama, which runs entirely on your local machine. The LangChain code stays nearly identical - you just swap the model provider. The tradeoff is that open-source models require more compute (a GPU or 16GB+ RAM) and generally produce lower-quality responses.
What is RAG and why does it matter for chatbots?
RAG (Retrieval-Augmented Generation) lets your chatbot answer questions using your own documents instead of relying solely on the model's training data. It works by converting your documents into vector embeddings, storing them in a database, and retrieving relevant chunks to include in the prompt when a user asks a question.
How do I deploy a Python AI chatbot to production?
The three most common options are Streamlit Cloud (free, easiest), a VPS with Nginx (most control, $5-20/month), or Docker containers on any cloud provider. For production use, add authentication, rate limiting, error handling, and conversation logging.
Want to take your Python chatbot skills to the next level? Start your free 14-day trial →