Home » Building AI Assistants » Build with OpenAI API

How to Build an AI Assistant with the OpenAI API

Building an AI assistant directly on the OpenAI Chat Completions API gives you full control over every aspect of the assistant's behavior without framework abstractions between you and the model. You manage the conversation state, tool routing, context assembly, and memory integration yourself. This approach is more code but fewer surprises, and it scales cleanly to production without framework migration.

Before You Start

You need an OpenAI API key and the openai Python package (or the equivalent in your language of choice). This guide uses Python, but the same patterns apply in TypeScript, Go, or any language with an OpenAI client library. The concepts also apply to the Anthropic API, which follows a similar request-response pattern with tool use support. If you are evaluating both providers, the architectural patterns are identical; only the API syntax differs.

Step-by-Step Setup

Step 1: Set up the OpenAI client and system prompt.
Install the SDK and configure your assistant's base behavior through a system prompt. The system prompt defines what the assistant is, how it should behave, and what tools it has access to. Invest time in this prompt because it is the single largest determinant of your assistant's quality.
pip install openai
from openai import OpenAI client = OpenAI() SYSTEM_PROMPT = """You are a helpful AI assistant for a software development team. You have access to tools for searching documentation, querying the database, and managing tasks. You also have persistent memory from previous conversations. Guidelines: - Use tools when they would help answer a question rather than guessing - Reference memories from previous conversations when relevant - Be concise and technical, matching the team's communication style - When unsure, say so rather than fabricating an answer - Store important facts, decisions, and preferences in memory""" messages = [{"role": "system", "content": SYSTEM_PROMPT}]
Step 2: Define tools as function schemas.
OpenAI's function calling uses a tools array where each tool has a type ("function"), a function name, a description, and a parameters schema in JSON Schema format. The model reads these definitions to understand what each tool does and when to use it.
tools = [ { "type": "function", "function": { "name": "search_docs", "description": "Search the team's documentation wiki for relevant articles. Returns the top 5 matching results with titles and content snippets.", "parameters": { "type": "object", "properties": { "query": { "type": "string", "description": "The search query describing what documentation you need" } }, "required": ["query"] } } }, { "type": "function", "function": { "name": "run_query", "description": "Execute a read-only SQL query against the analytics database. Returns up to 50 rows.", "parameters": { "type": "object", "properties": { "sql": { "type": "string", "description": "A SELECT query. No writes allowed." } }, "required": ["sql"] } } }, { "type": "function", "function": { "name": "store_memory", "description": "Store an important fact, decision, or preference in persistent memory for future conversations.", "parameters": { "type": "object", "properties": { "content": { "type": "string", "description": "The information to remember" } }, "required": ["content"] } } }, { "type": "function", "function": { "name": "recall_memory", "description": "Retrieve relevant information from previous conversations and stored knowledge.", "parameters": { "type": "object", "properties": { "query": { "type": "string", "description": "What to search for in memory" } }, "required": ["query"] } } } ]
Step 3: Build the conversation loop with tool handling.
The core loop sends messages to the API, checks whether the response includes tool calls, executes those calls, appends the results, and sends the updated messages back for the next generation. This loop continues until the model produces a text response without tool calls.
import json def handle_tool_call(tool_call): name = tool_call.function.name args = json.loads(tool_call.function.arguments) if name == "search_docs": return wiki_client.search(args["query"]) elif name == "run_query": return db_client.execute_readonly(args["sql"]) elif name == "store_memory": return memory_client.store(content=args["content"]) elif name == "recall_memory": return memory_client.recall(query=args["query"]) else: return {"error": f"Unknown tool: {name}"} def chat(user_input): messages.append({"role": "user", "content": user_input}) while True: response = client.chat.completions.create( model="gpt-4o", messages=messages, tools=tools, tool_choice="auto" ) choice = response.choices[0] messages.append(choice.message) if choice.finish_reason == "tool_calls": for tool_call in choice.message.tool_calls: result = handle_tool_call(tool_call) messages.append({ "role": "tool", "tool_call_id": tool_call.id, "content": json.dumps(result) }) else: return choice.message.content
Step 4: Add streaming for responsive output.
Streaming sends tokens to the user as they are generated rather than waiting for the entire response. This dramatically improves perceived responsiveness, especially for longer responses. With streaming, the user starts seeing output within a few hundred milliseconds instead of waiting 2 to 5 seconds for the complete generation.
def chat_stream(user_input): messages.append({"role": "user", "content": user_input}) stream = client.chat.completions.create( model="gpt-4o", messages=messages, tools=tools, stream=True ) collected_content = "" tool_calls_buffer = {} for chunk in stream: delta = chunk.choices[0].delta if delta.content: print(delta.content, end="", flush=True) collected_content += delta.content if delta.tool_calls: for tc in delta.tool_calls: idx = tc.index if idx not in tool_calls_buffer: tool_calls_buffer[idx] = { "id": tc.id, "name": tc.function.name, "arguments": "" } if tc.function.arguments: tool_calls_buffer[idx]["arguments"] += tc.function.arguments # Handle any accumulated tool calls after stream completes if tool_calls_buffer: # Process tool calls and continue conversation pass return collected_content
Step 5: Integrate persistent memory.
The memory tools defined in Step 2 connect to your memory backend. At the start of each conversation, the assistant should recall relevant context. During the conversation, it stores important new information. The model decides when to use these tools based on the conversation flow, just like it decides when to use any other tool. You can also add automatic memory retrieval before the first model call by querying the memory store with the user's initial message and injecting results into the system prompt.
def start_conversation(user_input, user_id): # Pre-fetch relevant memories before the first model call memories = memory_client.recall( query=user_input, user_id=user_id, limit=10 ) if memories: memory_context = "\n\nRelevant context from previous sessions:\n" for m in memories: memory_context += f"- {m['content']}\n" messages[0]["content"] = SYSTEM_PROMPT + memory_context return chat(user_input)

Direct API vs Framework

Building directly on the API means writing more code for things that frameworks handle automatically: tool routing, context window management, conversation summarization, and provider abstraction. The advantage is that every line of that code is yours, visible, debuggable, and modifiable. When something goes wrong in production, you can trace the exact path from user input to model output without navigating framework internals. When you need custom behavior, you implement it directly rather than searching for a framework hook or override point.

This approach works best for teams that have a clear picture of what their assistant needs to do and want maximum control over how it does it. If you are still exploring what kind of assistant to build, a framework gives you faster iteration cycles. If you know exactly what you need, the direct API path gives you a cleaner production system.

Give your OpenAI assistant persistent memory. Adaptive Recall integrates through a simple REST API, adding storage, cognitive retrieval, and knowledge graph capabilities to any model provider.

Get Started Free