Complete Guide Integrating OpenAI Agents SDK with Ollama
Complete Guide: Integrating OpenAI Agents SDK with Ollama
This comprehensive guide demonstrates how to integrate the official OpenAI Agents SDK with Ollama to create AI agents that run entirely on local infrastructure. By the end, you'll understand both the theoretical foundations and practical implementation of locally-hosted AI agents.
Table of Contents
- Introduction
- Understanding the Components
- Setting Up Your Environment
- Integrating Ollama with OpenAI Agents SDK
- Building a Document Analysis Agent
- Adding Document Memory
- Putting It All Together
- Troubleshooting
- Conclusion
Introduction
The OpenAI Agents SDK is a powerful framework for building agent-based AI systems that can solve complex tasks through planning and tool use. By integrating it with Ollama, we can run these agents locally, improving privacy, reducing latency, and eliminating API costs.
Understanding the Components
What is the OpenAI Agents SDK?
The OpenAI Agents SDK (agents
) is a framework that simplifies the development of AI agents. It provides:
- A structured approach for defining agent behaviors
- Built-in support for tool usage and planning
- Session management for multi-turn conversations
- Memory and state persistence
At its core, this SDK formalizes the agent pattern that emerged from the broader LLM community, giving developers a standard way to implement agents that can plan, reason, and execute complex tasks.
What is Ollama?
Ollama is an open-source framework for running large language models (LLMs) locally. Key features include:
- Easy installation and model management
- Compatible API endpoints that mimic OpenAI's API structure
- Support for many open-source models (Llama, Mistral, etc.)
- Custom model creation and fine-tuning
Why Integrate Them?
Integration provides several benefits:
- Data Privacy: All data stays on your local machine
- Cost Efficiency: No pay-per-token API costs
- Customization: Fine-tune models for specific use cases
- Network Independence: Agents function without internet access
- Reduced Latency: Eliminate network roundtrips
Setting Up Your Environment
Step 1: Install Ollama
First, install Ollama following the instructions for your operating system:
For macOS and Linux:
curl -fsSL https://ollama.ai/install.sh | sh
For Windows:
Download the installer from Ollama's website.
Step 2: Download a Model
Pull a capable model that will power your agent. For this guide, we'll use Mistral:
ollama pull mistral
Verify that Ollama is working by running:
ollama run mistral "Hello, are you running correctly?"
You should see a response generated by the model.
Step 3: Install the OpenAI Agents SDK
Clone the repository and install the package:
git clone https://github.com/openai/openai-agents-python.git
cd openai-agents-python
pip install -e .
This installs the package in development mode, allowing you to modify the code if needed.
Step 4: Set Up Required Dependencies
Install additional dependencies:
pip install requests python-dotenv pydantic
Integrating Ollama with OpenAI Agents SDK
The OpenAI Agents SDK uses the OpenAI Python client underneath. We need to create a custom client that directs requests to Ollama instead of OpenAI's servers.
Step 1: Create a Custom Client
Create a file named ollama_client.py
:
import os
from openai import OpenAI
class OllamaClient(OpenAI):
"""Custom OpenAI client that routes requests to Ollama."""
def __init__(self, model_name="mistral", **kwargs):
# Configure to use Ollama's endpoint
kwargs["base_url"] = "http://localhost:11434/v1"
# Ollama doesn't require an API key but the client expects one
kwargs["api_key"] = "ollama-placeholder-key"
super().__init__(**kwargs)
self.model_name = model_name
# Check if the model exists
print(f"Using Ollama model: {model_name}")
def create_completion(self, *args, **kwargs):
# Override model name if not explicitly provided
if "model" not in kwargs:
kwargs["model"] = self.model_name
return super().create_completion(*args, **kwargs)
def create_chat_completion(self, *args, **kwargs):
# Override model name if not explicitly provided
if "model" not in kwargs:
kwargs["model"] = self.model_name
return super().create_chat_completion(*args, **kwargs)
# These methods are needed for compatibility with agents library
def completion(self, prompt, **kwargs):
if "model" not in kwargs:
kwargs["model"] = self.model_name
return self.completions.create(prompt=prompt, **kwargs)
def chat_completion(self, messages, **kwargs):
if "model" not in kwargs:
kwargs["model"] = self.model_name
return self.chat.completions.create(messages=messages, **kwargs)
Step 2: Create an Adapter for OpenAI Agents SDK
Now we'll create an adapter that makes the OpenAI Agents SDK compatible with our Ollama client. Create a file named agent_adapter.py
:
from ollama_client import OllamaClient
from openai.types.chat import ChatCompletion, ChatCompletionMessage
import agents.agent as agent_module
from agents.agent import Agent
from agents.run import Runner, RunConfig
from agents.models import _openai_shared
import json
import logging
# Configure logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)
# Set placeholder OpenAI API key to avoid initialization errors
_openai_shared.set_default_openai_key("placeholder-key")
# Store original init for Agent class
original_init = Agent.__init__
def patched_init(self, *args, **kwargs):
"""Replace the model with OllamaClient if not provided."""
if "model" not in kwargs:
kwargs["model"] = OllamaClient(model_name="mistral")
original_init(self, *args, **kwargs)
# Apply the patched init
Agent.__init__ = patched_init
# Class for a structured tool call
class ToolCall:
def __init__(self, name, inputs=None):
self.name = name
self.inputs = inputs or {}
# Define a response class that matches what main.py expects
class AgentResponse:
def __init__(self, result):
# Extract the message from the final output
if hasattr(result, 'final_output'):
if isinstance(result.final_output, str):
self.message = result.final_output
else:
self.message = str(result.final_output)
else:
self.message = "I'm sorry, I couldn't process that request."
# Get conversation ID if available
self.conversation_id = getattr(result, 'conversation_id', None)
# Initialize tool_calls
self.tool_calls = []
# Extract tool calls from raw_responses
if hasattr(result, 'raw_responses'):
for response in result.raw_responses:
try:
if hasattr(response, 'output') and hasattr(response.output, 'tool_calls'):
for tool_call in response.output.tool_calls:
# Handle the case where tool_call is a dict
if isinstance(tool_call, dict):
name = tool_call.get('name', 'unknown_tool')
inputs = tool_call.get('inputs', {})
self.tool_calls.append(ToolCall(name, inputs))
else:
# Assume it's already an object with name and inputs attributes
self.tool_calls.append(tool_call)
except Exception as e:
logger.error(f"Error extracting tool calls: {str(e)}")
# Add a run method to the Agent class
def run(self, message, conversation_id=None):
"""Run the agent with the given message.
Args:
message: The user message to process
conversation_id: Optional conversation ID for continuity
Returns:
A response object with message, conversation_id, and tool_calls attributes
"""
try:
# Create a direct prompt for the model
prompt = f"""
{self.instructions}
User query: {message}
"""
# Get a response directly from the model (OllamaClient)
response = self.model.chat.completions.create(
model="mistral",
messages=[{"role": "user", "content": prompt}],
temperature=0.7,
)
# Extract the text response
response_text = response.choices[0].message.content
# Create a minimal result object with just the response text
class MinimalResult:
def __init__(self, text, conv_id):
self.final_output = text
self.conversation_id = conv_id
self.raw_responses = []
result = MinimalResult(response_text, conversation_id)
# Return a response object
return AgentResponse(result)
except Exception as e:
import traceback
error_traceback = traceback.format_exc()
logger.error(f"Error running agent: {str(e)}\n{error_traceback}")
# Create a basic response with the error message
response = AgentResponse(None)
response.message = f"An error occurred: {str(e)}"
return response
# Make sure the run method is applied to the Agent class
Agent.run = run
# Debugging statement - log when the adapter is loaded
print("Agent adapter loaded, Agent class patched with run method.")
Building a Document Analysis Agent
Let's build a practical agent that analyzes documents, extracts key information, and answers questions about the content.
Step 1: Create Document Memory
First, let's create a simple document memory system to store and retrieve analyzed documents. Create a file named document_memory.py
:
import os
import json
import hashlib
from typing import Dict, List, Optional
class DocumentMemory:
"""Simple document storage system for the agent."""
def __init__(self, storage_dir: str = "./document_memory"):
self.storage_dir = storage_dir
os.makedirs(storage_dir, exist_ok=True)
self.index_file = os.path.join(storage_dir, "index.json")
self.document_index = self._load_index()
def _load_index(self) -> Dict:
"""Load document index from disk."""
if os.path.exists(self.index_file):
with open(self.index_file, 'r') as f:
return json.load(f)
return {"documents": {}}
def _save_index(self):
"""Save document index to disk."""
with open(self.index_file, 'w') as f:
json.dump(self.document_index, f, indent=2)
def _generate_doc_id(self, url: str) -> str:
"""Generate a unique ID for a document based on its URL."""
return hashlib.md5(url.encode()).hexdigest()
def store_document(self, url: str, content: str, metadata: Optional[Dict] = None) -> str:
"""Store a document and return its ID."""
doc_id = self._generate_doc_id(url)
doc_path = os.path.join(self.storage_dir, f"{doc_id}.txt")
# Store document content
with open(doc_path, 'w') as f:
f.write(content)
# Update index
self.document_index["documents"][doc_id] = {
"url": url,
"path": doc_path,
"metadata": metadata or {}
}
self._save_index()
return doc_id
def get_document(self, doc_id: str) -> Optional[Dict]:
"""Retrieve a document by ID."""
if doc_id not in self.document_index["documents"]:
return None
doc_info = self.document_index["documents"][doc_id]
try:
with open(doc_info["path"], 'r') as f:
content = f.read()
return {
"id": doc_id,
"url": doc_info["url"],
"content": content,
"metadata": doc_info["metadata"]
}
except Exception as e:
print(f"Error retrieving document {doc_id}: {e}")
return None
def get_document_by_url(self, url: str) -> Optional[Dict]:
"""Find and retrieve a document by URL."""
doc_id = self._generate_doc_id(url)
return self.get_document(doc_id)
def list_documents(self) -> List[Dict]:
"""List all stored documents."""
return [
{"id": doc_id, "url": info["url"], "metadata": info["metadata"]}
for doc_id, info in self.document_index["documents"].items()
]
Step 2: Define the Agent's Tools
Create a file named document_agent.py
to implement the document analysis agent with its tools:
import re
import json
import requests
from datetime import datetime
from typing import List, Dict, Any, Optional
from pydantic import BaseModel, Field
# Import the Agent directly from openai_agents
from agents import Agent, function_tool
from ollama_client import OllamaClient
from document_memory import DocumentMemory
# Import the agent adapter to add the run method to the Agent class
import agent_adapter
# Initialize document memory
document_memory = DocumentMemory()
# Define the tool schemas
class FetchDocumentInput(BaseModel):
url: str = Field(..., description="URL of the document to fetch")
class FetchDocumentOutput(BaseModel):
content: str = Field(..., description="Content of the document")
class ExtractInfoInput(BaseModel):
text: str = Field(..., description="Text to extract information from")
info_type: str = Field(
..., description="Type of information to extract (e.g., 'dates', 'names', 'key points')"
)
class ExtractInfoOutput(BaseModel):
information: List[str] = Field(..., description="List of extracted information")
class SearchDocumentInput(BaseModel):
text: str = Field(..., description="Document text to search within")
query: str = Field(..., description="Query to search for")
class SearchDocumentOutput(BaseModel):
results: List[str] = Field(..., description="List of matching paragraphs or sentences")
# Implement tool functions
@function_tool
def fetch_document(url: str) -> Dict[str, Any]:
"""Fetches a document from a URL and returns its content.
Checks document memory first before making a network request."""
# Check if document already exists in memory
cached_doc = document_memory.get_document_by_url(url)
if cached_doc:
print(f"Retrieved document from memory: {url}")
return {"content": cached_doc["content"]}
# If not in memory, fetch from URL
try:
print(f"Fetching document from URL: {url}")
response = requests.get(url)
response.raise_for_status()
content = re.sub(r"<[^>]+>", "", response.text) # Remove HTML tags
# Store in document memory
document_memory.store_document(url, content, {"fetched_at": str(datetime.now())})
return {"content": content}
except Exception as e:
return {"content": f"Error fetching document: {str(e)}"}
@function_tool
def extract_info(text: str, info_type: str) -> Dict[str, Any]:
"""Extracts specified type of information from text using Ollama."""
client = OllamaClient(model_name="mistral")
prompt = f"""
Extract all {info_type} from the following text.
Return ONLY a JSON array with the items.
TEXT:
{text[:2000]} # Limit text length to prevent context overflow
JSON ARRAY OF {info_type.upper()}:
"""
try:
response = client.chat.completions.create(
model="mistral",
messages=[{"role": "user", "content": prompt}],
temperature=0.1, # Lower temperature for more deterministic output
)
result_text = response.choices[0].message.content
print(f"Extract info response: {result_text[:100]}...")
# Try to find JSON array in the response
try:
match = re.search(r"\[.*\]", result_text, re.DOTALL)
if match:
information = json.loads(match.group(0))
else:
# If no JSON array is found, try to parse the entire response as JSON
try:
information = json.loads(result_text)
if not isinstance(information, list):
information = [result_text.strip()]
except:
information = [result_text.strip()]
except json.JSONDecodeError:
# Split by commas or newlines if JSON parsing fails
information = []
for line in result_text.split('\n'):
line = line.strip()
if line and not line.startswith('```') and not line.endswith('```'):
information.append(line)
if not information:
information = [item.strip() for item in result_text.split(",")]
except Exception as e:
print(f"Error in extract_info: {str(e)}")
information = [f"Error extracting information: {str(e)}"]
return {"information": information}
@function_tool
def search_document(text: str, query: str) -> Dict[str, Any]:
"""Searches for relevant content in the document."""
paragraphs = [p.strip() for p in re.split(r"\n\s*\n", text) if p.strip()]
client = OllamaClient(model_name="mistral")
prompt = f"""
You need to find paragraphs in a document that answer or relate to the query: "{query}"
Rate each paragraph's relevance to the query on a scale of 0-10.
Return the 3 most relevant paragraphs with their ratings as JSON.
Document sections:
{json.dumps(paragraphs[:15])} # Limit to first 15 paragraphs for context limits
Output format: [{"rating": 8, "text": "paragraph text"}, ...]
"""
try:
response = client.chat.completions.create(
model="mistral",
messages=[{"role": "user", "content": prompt}],
temperature=0.1, # Lower temperature for more deterministic output
)
result_text = response.choices[0].message.content
print(f"Search document response: {result_text[:100]}...")
# Try to find JSON array in the response
try:
match = re.search(r"\[.*\]", result_text, re.DOTALL)
if match:
parsed = json.loads(match.group(0))
results = [item["text"] for item in parsed if "text" in item]
else:
# Try to parse the entire response as JSON
try:
parsed = json.loads(result_text)
if isinstance(parsed, list):
results = [item.get("text", str(item)) for item in parsed]
else:
results = [str(parsed)]
except:
# If JSON parsing fails, extract quoted text
results = re.findall(r'"([^"]+)"', result_text)
if not results:
results = [result_text]
except json.JSONDecodeError:
# If JSON parsing fails completely
results = [result_text]
except Exception as e:
print(f"Error in search_document: {str(e)}")
results = [f"Error searching document: {str(e)}"]
return {"results": results}
# Define additional tools for document memory management
class ListDocumentsOutput(BaseModel):
documents: List[Dict] = Field(..., description="List of stored documents")
class GetDocumentInput(BaseModel):
url: str = Field(..., description="URL of the document to retrieve")
class GetDocumentOutput(BaseModel):
content: str = Field(..., description="Content of the retrieved document")
metadata: Dict = Field(..., description="Metadata of the document")
@function_tool
def list_documents() -> Dict[str, Any]:
"""Lists all stored documents in memory."""
documents = document_memory.list_documents()
return {"documents": documents}
@function_tool
def get_document(url: str) -> Dict[str, Any]:
"""Retrieves a document from memory by URL."""
doc = document_memory.get_document_by_url(url)
if not doc:
return {"content": "Document not found", "metadata": {}}
return {"content": doc["content"], "metadata": doc["metadata"]}
# Create a Document Analysis Agent
def create_document_agent():
"""Creates and returns an AI agent for document analysis."""
client = OllamaClient(model_name="mistral")
# Collect all the tools decorated with function_tool
tools = [
fetch_document,
extract_info,
search_document,
list_documents,
get_document
]
agent = Agent(
name="DocumentAnalysisAgent",
instructions=(
"You are a Document Analysis Assistant that helps users extract valuable information from documents.\n\n"
"When given a task:\n"
"1. If you need to analyze a document, first use fetch_document to get its content.\n"
"2. Use extract_info to identify specific information in the document.\n"
"3. Use search_document to find answers to specific questions.\n"
"4. Summarize your findings in a clear, organized manner.\n\n"
"You can manage documents with:\n"
"- list_documents to see all stored documents\n"
"- get_document to retrieve a previously fetched document\n\n"
"Always be thorough and accurate in your analysis. If the document content is too large, "
"focus on the most relevant sections for the user's query."
),
tools=tools,
model=client,
)
return agent
Putting It All Together
Let's create a main.py
file that will tie everything together and provide a command-line interface for interacting with our document analysis agent:
from document_agent import create_document_agent, document_memory
from ollama_client import OllamaClient
def print_banner():
"""Print a welcome banner for the Document Analysis Agent."""
print("\n" + "="*60)
print("π Document Analysis Agent π".center(60))
print("="*60)
print("\nThis agent can analyze documents, extract information, and search for content.")
print("It also has document memory to store and retrieve documents between sessions.")
# Check for existing documents
docs = document_memory.list_documents()
if docs:
print(f"\nποΈ {len(docs)} documents already in memory:")
for i, doc in enumerate(docs, 1):
print(f" {i}. {doc['url']}")
print("\nCommands:")
print(" 'exit' - Quit the program")
print(" 'list' - Show stored documents")
print(" 'help' - Show this help message")
print("="*60 + "\n")
def main():
print("Initializing Document Analysis Agent...")
agent = create_document_agent()
print_banner()
# Debug: Test agent with a simple query
try:
print("\nDEBUG: Testing agent with 'what is war'")
print("Processing...")
test_response = agent.run(message="what is war")
print(f"\nAgent (test): {test_response.message}")
# If tools were used, show info about tool usage
if test_response.tool_calls:
print("\nπ οΈ Tools Used (test):")
for tool in test_response.tool_calls:
# Display more info about each tool call
inputs = getattr(tool, 'inputs', {})
inputs_str = ', '.join(f"{k}='{v}'" for k, v in inputs.items()) if inputs else ""
print(f" β’ {tool.name}({inputs_str})")
except Exception as e:
import traceback
print(f"\nDEBUG ERROR: {str(e)}")
traceback.print_exc()
# Start a conversation session
conversation_id = None
while True:
try:
user_input = input("\nYou: ")
if user_input.lower() == 'exit':
break
if user_input.lower() == 'help':
print_banner()
continue
if user_input.lower() == 'list':
docs = document_memory.list_documents()
if not docs:
print("\nNo documents in memory yet.")
else:
print(f"\nπ Documents in memory ({len(docs)}):")
for i, doc in enumerate(docs, 1):
metadata = doc.get('metadata', {})
fetched_at = metadata.get('fetched_at', 'unknown time')
print(f" {i}. {doc['url']} (fetched: {fetched_at})")
continue
# Get agent response
print("\nProcessing...")
response = agent.run(
message=user_input,
conversation_id=conversation_id
)
# Store the conversation ID for continuity
conversation_id = response.conversation_id
# Print the response
print(f"\nAgent: {response.message}")
# If tools were used, show info about tool usage
if response.tool_calls:
print("\nπ οΈ Tools Used:")
for tool in response.tool_calls:
# Display more info about each tool call
inputs = getattr(tool, 'inputs', {})
inputs_str = ', '.join(f"{k}='{v}'" for k, v in inputs.items()) if inputs else ""
print(f" β’ {tool.name}({inputs_str})")
except KeyboardInterrupt:
print("\nExiting...")
break
except Exception as e:
import traceback
print(f"\nERROR: {str(e)}")
traceback.print_exc()
if __name__ == "__main__":
main()
Example Conversation
Here's an example of how a conversation with the document analysis agent might look:
==========================================
π Document Analysis Agent π
==========================================
This agent can analyze documents, extract information, and search for content.
It also has document memory to store and retrieve documents between sessions.
Commands:
'exit' - Quit the program
'list' - Show stored documents
'help' - Show this help message
==========================================
You: Please analyze the article at https://en.wikipedia.org/wiki/Artificial_intelligence and tell me when AI was first developed as a concept.
Processing...
Agent: Based on my analysis of the Wikipedia article on Artificial Intelligence, the concept of AI has historical roots that go back centuries, but the formal academic field of AI was founded in 1956.
According to the article, the workshop at Dartmouth College in the summer of 1956 is widely considered the founding event of artificial intelligence as a field. This was when John McCarthy coined the term "artificial intelligence" as the topic of the Dartmouth Conference.
However, the concept of artificial beings with intelligence appeared much earlier in history:
- Ancient Greek myths mentioned mechanical men designed to mimic human functions
- Philosophers like Aristotle described mechanical reasoning in syllogisms
- In the 13th century, Ramon Llull developed logical machines
- In the 17th century, RenΓ© Descartes, Thomas Hobbes, and Gottfried Leibniz explored mechanical reasoning
- In the 19th century, Charles Babbage and Ada Lovelace contemplated programmable mechanical calculating devices
The article indicates that the modern field of AI research officially began at that 1956 workshop organized by John McCarthy, Marvin Minsky, Claude Shannon, and Nathaniel Rochester.
π οΈ Tools Used:
β’ fetch_document(url='https://en.wikipedia.org/wiki/Artificial_intelligence')
β’ search_document(query='when was AI first developed concept history')
β’ extract_info(info_type='key dates in AI history')
Troubleshooting
Here are some common issues you might encounter and how to fix them:
1. Model Issues
Problem: The model generates poor responses, hallucinates, or fails to use tools properly.
Solution:
- Try a more capable model like
llama3
ormixtral
- Check if your prompts are clear and well-formatted
- Reduce the complexity of your tools
- Add more explicit instructions in the agent's system prompt
You can pull a more capable model with:
ollama pull llama3
Then update your client:
client = OllamaClient(model_name="llama3")
2. Context Length Issues
Problem: The model returns incomplete responses or fails when processing long documents.
Solution:
- Implement chunking for document text (we've already limited to 2000 characters in our tools)
- Use models with larger context windows if available (like Llama 3 or Mixtral)
- Break down complex tasks into smaller subtasks
3. API Compatibility Issues
Problem: Some OpenAI client functions aren't supported by Ollama.
Solution:
- Our adapted client handles the most common method differences
- If you encounter unsupported features, add similar wrapper methods to OllamaClient class
- Check Ollama's API documentation for compatible endpoints
Conclusion
In this guide, we've explored how to integrate the OpenAI Agents SDK with Ollama to create a powerful document analysis agent that runs entirely on local infrastructure. This approach combines the best of both worlds: the structured agent framework from OpenAI with the privacy and cost benefits of local inference through Ollama.
Key takeaways:
-
Architecture: We've created a layered architecture with:
- Ollama providing the LLM inference capability
- A custom client adapter connecting Ollama to the OpenAI interface
- The OpenAI Agents SDK providing the agent framework
- Custom tools for document analysis and memory
-
Implementation: We've built a complete document analysis agent with:
- Document fetching and parsing
- Information extraction
- Document search
- Persistent document storage
-
Benefits:
- Complete data privacy
- No ongoing API costs
- Customizable to specific use cases
- Works offline
-
Limitations and Mitigations:
- Model quality limitations (mitigated by using more capable models)
- Context length constraints (mitigated with our chunking approach)
- API compatibility gaps (mitigated with our custom client)
This integration demonstrates how organizations can leverage the power of advanced AI agent frameworks while maintaining control over their data and infrastructure. The result is a flexible, extensible system that can be adapted to many different use cases beyond document analysis.
By building on this foundation, you can create specialized agents for various domains while keeping all processing local and secure.