A Comprehensive Guide to Browser-Use, MCP, and AI-Powered Information Processing
Beyond Research: Building a Modern AI Knowledge Companion
A Comprehensive Guide to Browser-Use, MCP, and AI-Powered Information Processing
1. Introduction to AI-Powered Knowledge Systems
In today's information landscape, the ability to efficiently gather, process, and synthesize knowledge has become essential. This guide transforms the concept of a basic research assistant into a comprehensive AI Knowledge Companion system—a versatile tool that not only conducts research but acts as your digital extension in navigating the vast information ecosystem.
What is Browser-Use? Browser-Use is a programmable interface that enables AI systems to interact with web browsers just as humans do—visiting websites, clicking links, filling forms, and extracting information. Unlike simple web scraping, Browser-Use provides true browser automation that can handle modern, JavaScript-heavy websites, captchas, and complex user interactions.
What is MCP (Model Context Protocol)? The Model Context Protocol is a standardized framework that facilitates secure communication between AI models and external tools or data sources. MCP defines how information is exchanged, permissions are granted, and results are returned, creating a universal "language" for AI systems to safely and effectively interface with the digital world.
2. Understanding the Core Technologies
Browser-Use: AI's Window to the Web
Browser-Use fundamentally transforms how AI interacts with the internet by:
- Providing visual context: Unlike API-based approaches, Browser-Use allows the AI to "see" what a human would see
- Enabling stateful navigation: Maintaining session information across multiple pages
- Handling dynamic content: Processing JavaScript-rendered pages that traditional scrapers cannot access
- Supporting authentication: Logging into services when needed
Implementation principle: Browser-Use creates a controlled browser instance that executes commands from your AI system through a dedicated interface, while feeding back visual and structural information about the pages it visits.
MCP: The Universal AI Connector
MCP serves as a standardized protocol for AI-to-tool communication, addressing several key challenges:
- Security: Defining clear permission boundaries and data access controls
- Interoperability: Creating a common language for diverse tools to connect to AI systems
- Context management: Efficiently transferring relevant information between systems
- Versioning and compatibility: Ensuring tools and AI models can evolve independently
Key concept: MCP treats external tools as "contexts" that an AI model can access, defining both how the AI can request information and how the external systems should respond.
3. Project Architecture: Building Your Knowledge Companion
System Overview
Our Knowledge Companion consists of five core components:
- User Interface: Accepts queries and displays results
- Orchestration Engine: Coordinates all system components
- LLM Core: Processes language, plans actions, and generates reports
- Browser-Use Module: Handles web navigation and extraction
- MCP Integration Layer: Connects to external knowledge sources
Component Interaction Flow
- User submits a query through the interface
- The orchestration engine passes the query to the LLM core
- The LLM plans a research strategy and generates actions
- Actions are executed through Browser-Use or MCP connections
- Retrieved information returns to the LLM for synthesis
- The final report is presented to the user
Design philosophy: This modular architecture allows each component to evolve independently while maintaining clear communication channels between them.
4. Setting Up Your Development Environment
Hardware and Software Requirements
For optimal performance, we recommend:
- CPU: 4+ cores (8+ preferred)
- RAM: 16GB minimum (32GB recommended)
- Storage: 20GB free space (SSD preferred)
- GPU: Optional but beneficial for larger models
- Operating System: Linux, macOS, or Windows 10/11
Installation Process
- Python Environment Setup:
# Create a virtual environment
python -m venv ai-companion
source ai-companion/bin/activate # On Windows: ai-companion\Scripts\activate
# Install core dependencies
pip install browser-use ollama mcp-client pydantic fastapi uvicorn
- Ollama Configuration:
# Download Ollama from https://ollama.com
# Then pull the Llama 3.2 model
ollama pull llama3.2
# Test the model
ollama run llama3.2 "Hello, world!"
- Browser-Use Setup:
# Test browser-use functionality
from browser_use import BrowserSession
browser = BrowserSession()
browser.navigate("https://www.example.com")
content = browser.get_page_content()
print(content)
browser.close()
- MCP Configuration:
# Configure MCP client
from mcp_client import MCPClient
mcp = MCPClient(
server_url="https://your-mcp-server.com",
api_key="your_api_key",
default_timeout=30
)
# Test connection
status = mcp.check_connection()
print(f"MCP Connection: {status}")
Important concept: The separation between the LLM runtime (Ollama) and your application code creates a clean architecture that can adapt to different models and execution environments.
5. Implementing Browser-Use Intelligence
Understanding Browser Automation Principles
When implementing Browser-Use, it's essential to understand that we're creating an AI system that can:
- Form intentions: Decide what information to seek
- Execute navigation: Move through websites purposefully
- Extract information: Identify and collect relevant data
- Process results: Transform raw web content into structured knowledge
Creating a Robust Browser-Use Module
class IntelligentBrowser:
def __init__(self, headless=True):
"""Initialize browser session with configurable visibility."""
self.browser = BrowserSession(headless=headless)
self.history = []
def search(self, query, search_engine="google"):
"""Perform a search using specified engine."""
if search_engine == "google":
self.browser.navigate("https://www.google.com")
search_box = self.browser.find_element('input[name="q"]')
self.browser.input_text(search_box, query)
self.browser.press_enter()
self.history.append({"action": "search", "query": query})
return self.get_search_results()
def get_search_results(self):
"""Extract search results from the current page."""
results = []
elements = self.browser.find_elements("div.g")
for element in elements:
title_elem = self.browser.find_element_within(element, "h3")
link_elem = self.browser.find_element_within(element, "a")
snippet_elem = self.browser.find_element_within(element, "div.VwiC3b")
if title_elem and link_elem and snippet_elem:
title = self.browser.get_text(title_elem)
link = self.browser.get_attribute(link_elem, "href")
snippet = self.browser.get_text(snippet_elem)
results.append({
"title": title,
"url": link,
"snippet": snippet
})
return results
def visit_page(self, url):
"""Navigate to a specific URL and extract content."""
self.browser.navigate(url)
self.history.append({"action": "visit", "url": url})
# Wait for page to load completely
self.browser.wait_for_page_load()
# Extract main content, avoiding navigation elements
content = self.extract_main_content()
return {
"url": url,
"title": self.browser.get_page_title(),
"content": content
}
def extract_main_content(self):
"""Intelligently extract the main content from the current page."""
# Try common content selectors
content_selectors = [
"article", "main", ".content", "#content",
"[role='main']", ".post-content"
]
for selector in content_selectors:
element = self.browser.find_element(selector)
if element:
return self.browser.get_text(element)
# Fallback: use heuristics to find the largest text block
paragraphs = self.browser.find_elements("p")
if paragraphs:
paragraph_texts = [self.browser.get_text(p) for p in paragraphs]
# Filter out very short paragraphs
substantial_paragraphs = [p for p in paragraph_texts if len(p) > 100]
if substantial_paragraphs:
return "\n\n".join(substantial_paragraphs)
# Last resort: get body text
return self.browser.get_body_text()
def close(self):
"""Close the browser session."""
self.browser.close()
Key insight: The above implementation demonstrates how Browser-Use goes beyond simple scraping by making contextual decisions about what content is relevant, handling different site structures, and maintaining state across navigation.
6. Implementing the MCP Integration Layer
Understanding the Model Context Protocol
MCP enables standardized communication between your AI system and external tools through a structured protocol. Instead of custom code for each integration, MCP provides a unified framework for:
- Tool registration: Defining what tools are available
- Request formatting: Structuring how the AI requests information
- Response handling: Processing and validating tool outputs
- Error management: Handling failures in a consistent way
Building an MCP Client
class KnowledgeSourceManager:
def __init__(self, mcp_client):
"""Initialize with an MCP client."""
self.mcp = mcp_client
self.available_sources = self._discover_sources()
def _discover_sources(self):
"""Query the MCP server for available knowledge sources."""
try:
sources = self.mcp.list_contexts()
return {
source["name"]: {
"description": source["description"],
"capabilities": source["capabilities"],
"parameters": source["parameters"]
} for source in sources
}
except Exception as e:
print(f"Error discovering sources: {e}")
return {}
def query_source(self, source_name, query_params):
"""Query a specific knowledge source through MCP."""
if source_name not in self.available_sources:
raise ValueError(f"Unknown source: {source_name}")
try:
response = self.mcp.query_context(
context_name=source_name,
parameters=query_params
)
return response
except Exception as e:
print(f"Error querying {source_name}: {e}")
return {"error": str(e)}
def search_arxiv(self, query, max_results=5, categories=None):
"""Specialized method for arXiv searches."""
params = {
"query": query,
"max_results": max_results
}
if categories:
params["categories"] = categories
return self.query_source("arxiv", params)
def search_wikipedia(self, query, depth=1):
"""Specialized method for Wikipedia searches."""
params = {
"query": query,
"depth": depth # How many links to follow
}
return self.query_source("wikipedia", params)
def get_source_capabilities(self, source_name):
"""Get detailed information about a knowledge source."""
if source_name in self.available_sources:
return self.available_sources[source_name]
return None
MCP concept in practice: This implementation shows how MCP creates a uniform interface to diverse knowledge sources. The AI doesn't need to know the specifics of the arXiv API or Wikipedia's structure—it just makes standardized requests through the MCP protocol.
7. The Orchestration Engine: Coordinating Your AI System
Understanding Orchestration
The orchestration engine is the "brain" of your Knowledge Companion, responsible for:
- Query analysis: Understanding what the user is asking
- Planning: Determining which tools to use and in what sequence
- Execution: Calling the appropriate components
- Integration: Combining information from multiple sources
- Presentation: Formatting the final output for the user
Implementing the Orchestrator
class KnowledgeOrchestrator:
def __init__(self, llm_client, browser, knowledge_manager):
"""Initialize with core components."""
self.llm = llm_client
self.browser = browser
self.knowledge_manager = knowledge_manager
async def process_query(self, user_query):
"""Process a user query from start to finish."""
# Step 1: Analyze the query to determine approach
analysis = await self._analyze_query(user_query)
# Step 2: Execute the research plan
research_results = await self._execute_research_plan(analysis, user_query)
# Step 3: Synthesize the findings into a coherent response
final_report = await self._synthesize_report(user_query, research_results)
return final_report
async def _analyze_query(self, query):
"""Use the LLM to analyze the query and create a research plan."""
prompt = f"""
Analyze the following research query and determine the best approach:
QUERY: {query}
Please determine:
1. What type of information is being requested?
2. Which knowledge sources would be most relevant (web search, arXiv, Wikipedia, etc.)?
3. What specific search terms should be used for each source?
4. What is the priority order for consulting these sources?
5. Are there any specialized domains or technical knowledge required?
Return your analysis as a structured JSON object.
"""
response = await self.llm.complete(prompt)
return json.loads(response)
async def _execute_research_plan(self, analysis, original_query):
"""Execute the research plan across multiple sources."""
results = []
# Execute based on the priority order determined in the analysis
for source in analysis["priority_order"]:
if source == "web_search":
search_terms = analysis["search_terms"]["web_search"]
web_results = await self._perform_web_research(search_terms)
results.append({
"source": "web_search",
"data": web_results
})
elif source == "arxiv":
if "arxiv" in analysis["search_terms"]:
arxiv_query = analysis["search_terms"]["arxiv"]
categories = analysis.get("arxiv_categories", None)
arxiv_results = self.knowledge_manager.search_arxiv(
arxiv_query,
categories=categories
)
results.append({
"source": "arxiv",
"data": arxiv_results
})
elif source == "wikipedia":
if "wikipedia" in analysis["search_terms"]:
wiki_query = analysis["search_terms"]["wikipedia"]
wiki_results = self.knowledge_manager.search_wikipedia(wiki_query)
results.append({
"source": "wikipedia",
"data": wiki_results
})
# If needed, perform follow-up research based on initial findings
if analysis.get("requires_followup", False):
followup_results = await self._perform_followup_research(results, original_query)
results.extend(followup_results)
return results
async def _perform_web_research(self, search_terms):
"""Conduct web research using Browser-Use."""
web_results = []
for term in search_terms:
# Search and get results
search_results = self.browser.search(term)
# Visit the top 3 results and extract content
for result in search_results[:3]:
page_data = self.browser.visit_page(result["url"])
web_results.append({
"search_term": term,
"page_data": page_data
})
return web_results
async def _perform_followup_research(self, initial_results, original_query):
"""Conduct follow-up research based on initial findings."""
# Generate follow-up questions using the LLM
prompt = f"""
Based on these initial research results and the original query:
ORIGINAL QUERY: {original_query}
INITIAL FINDINGS: {json.dumps(initial_results, indent=2)}
Generate 3 follow-up questions that would help complete the research.
Format as a JSON list of questions.
"""
followup_response = await self.llm.complete(prompt)
followup_questions = json.loads(followup_response)
followup_results = []
for question in followup_questions:
# Recursively analyze and research each follow-up question
followup_analysis = await self._analyze_query(question)
question_results = await self._execute_research_plan(followup_analysis, question)
followup_results.append({
"followup_question": question,
"results": question_results
})
return followup_results
async def _synthesize_report(self, original_query, research_results):
"""Synthesize research results into a comprehensive report."""
prompt = f"""
Create a comprehensive research report based on the following information:
ORIGINAL QUERY: {original_query}
RESEARCH FINDINGS: {json.dumps(research_results, indent=2)}
Your report should:
1. Start with an executive summary
2. Organize information logically by topic and source
3. Highlight key findings and insights
4. Note any contradictions or gaps in the research
5. Include relevant citations to original sources
6. End with conclusions and potential next steps
Format the report in Markdown for readability.
"""
report = await self.llm.complete(prompt)
return report
Orchestration insight: The orchestrator demonstrates how to implement a multi-step research process that intelligently combines Browser-Use and MCP. Note how the system uses the LLM at multiple stages—for planning, for generating follow-up questions, and for synthesizing the final report.
8. Creating an Effective User Interface
Console-based Interface
For a simple but effective console interface:
import asyncio
import rich
from rich.console import Console
from rich.markdown import Markdown
console = Console()
class KnowledgeCompanionCLI:
def __init__(self, orchestrator):
self.orchestrator = orchestrator
async def start(self):
console.print("[bold blue]AI Knowledge Companion[/bold blue]")
console.print("Ask me anything, and I'll research it for you.")
console.print("Type 'exit' to quit.\n")
while True:
query = console.input("[bold green]Query:[/bold green] ")
if query.lower() in ('exit', 'quit'):
break
console.print("\n[italic]Researching your query...[/italic]")
try:
with console.status("[bold green]Thinking..."):
report = await self.orchestrator.process_query(query)
console.print("\n[bold]Research Results:[/bold]\n")
console.print(Markdown(report))
except Exception as e:
console.print(f"[bold red]Error:[/bold red] {str(e)}")
console.print("[bold blue]Thank you for using AI Knowledge Companion![/bold blue]")
# Usage
async def main():
# Setup components (simplified)
llm_client = LlamaClient()
browser = IntelligentBrowser()
mcp_client = MCPClient(server_url="https://mcp.example.com")
knowledge_manager = KnowledgeSourceManager(mcp_client)
orchestrator = KnowledgeOrchestrator(llm_client, browser, knowledge_manager)
# Start the CLI
cli = KnowledgeCompanionCLI(orchestrator)
await cli.start()
# Cleanup
browser.close()
if __name__ == "__main__":
asyncio.run(main())
Web-based Interface (FastAPI)
For a more versatile web interface:
from fastapi import FastAPI, BackgroundTasks
from pydantic import BaseModel
from typing import Optional, List
import uvicorn
app = FastAPI(title="AI Knowledge Companion API")
# Data models
class Query(BaseModel):
text: str
max_sources: Optional[int] = 5
preferred_sources: Optional[List[str]] = None
class ResearchStatus(BaseModel):
query_id: str
status: str
progress: float
message: Optional[str] = None
class ResearchReport(BaseModel):
query_id: str
query_text: str
report: str
sources: List[dict]
execution_time: float
# In-memory storage for demo purposes
research_tasks = {}
@app.post("/research/start", response_model=ResearchStatus)
async def start_research(query: Query, background_tasks: BackgroundTasks):
"""Start a new research task."""
query_id = str(uuid.uuid4())
# Store initial status
research_tasks[query_id] = {
"status": "starting",
"progress": 0.0,
"query": query.text,
"report": None
}
# Launch research in background
background_tasks.add_task(
perform_research,
query_id,
query.text,
query.max_sources,
query.preferred_sources
)
return ResearchStatus(
query_id=query_id,
status="starting",
progress=0.0,
message="Research task initiated"
)
@app.get("/research/{query_id}/status", response_model=ResearchStatus)
async def get_research_status(query_id: str):
"""Get the status of a research task."""
if query_id not in research_tasks:
raise HTTPException(status_code=404, detail="Research task not found")
task = research_tasks[query_id]
return ResearchStatus(
query_id=query_id,
status=task["status"],
progress=task["progress"],
message=task.get("message")
)
@app.get("/research/{query_id}/report", response_model=ResearchReport)
async def get_research_report(query_id: str):
"""Get the final report of a completed research task."""
if query_id not in research_tasks:
raise HTTPException(status_code=404, detail="Research task not found")
task = research_tasks[query_id]
if task["status"] != "completed":
raise HTTPException(status_code=400, detail="Research not yet completed")
return ResearchReport(
query_id=query_id,
query_text=task["query"],
report=task["report"],
sources=task["sources"],
execution_time=task["execution_time"]
)
async def perform_research(query_id, query_text, max_sources, preferred_sources):
"""Background task to perform the actual research."""
try:
# Update status
research_tasks[query_id]["status"] = "researching"
research_tasks[query_id]["progress"] = 0.1
# Create components (simplified)
llm_client = LlamaClient()
browser = IntelligentBrowser()
mcp_client = MCPClient(server_url="https://mcp.example.com")
knowledge_manager = KnowledgeSourceManager(mcp_client)
orchestrator = KnowledgeOrchestrator(llm_client, browser, knowledge_manager)
# Update progress periodically
research_tasks[query_id]["progress"] = 0.3
# Perform the research
start_time = time.time()
report = await orchestrator.process_query(query_text)
end_time = time.time()
# Store the results
research_tasks[query_id].update({
"status": "completed",
"progress": 1.0,
"report": report,
"sources": orchestrator.get_used_sources(),
"execution_time": end_time - start_time
})
# Cleanup
browser.close()
except Exception as e:
research_tasks[query_id].update({
"status": "error",
"message": str(e)
})
# Run with: uvicorn app:app --reload
UI design principle: Both interfaces demonstrate the importance of providing feedback during long-running research operations. The web interface adds asynchronous operation, allowing users to start research and check back later for results.
9. Advanced Features and Optimizations
Caching for Performance
Implement a caching layer to store frequently accessed information:
import hashlib
import json
import aioredis
from datetime import timedelta
class KnowledgeCache:
def __init__(self, redis_url="redis://localhost"):
"""Initialize the caching system."""
self.redis = None
self.redis_url = redis_url
async def connect(self):
"""Connect to Redis."""
self.redis = await aioredis.create_redis_pool(self.redis_url)
async def close(self):
"""Close Redis connection."""
if self.redis:
self.redis.close()
await self.redis.wait_closed()
async def get_cached_result(self, query, source):
"""Try to get a cached result for a query from a specific source."""
if not self.redis:
return None
cache_key = self._make_cache_key(query, source)
cached_data = await self.redis.get(cache_key)
if cached_data:
return json.loads(cached_data)
return None
async def cache_result(self, query, source, result, ttl=timedelta(hours=24)):
"""Cache a result with an expiration time."""
if not self.redis:
return
cache_key = self._make_cache_key(query, source)
await self.redis.set(
cache_key,
json.dumps(result),
expire=int(ttl.total_seconds())
)
def _make_cache_key(self, query, source):
"""Create a deterministic cache key."""
data = f"{query}:{source}"
return f"knowledge_cache:{hashlib.md5(data.encode()).hexdigest()}"
Parallel Processing
Optimize research by executing multiple sources in parallel:
async def _execute_research_plan(self, analysis, original_query):
"""Execute the research plan with parallel processing."""
tasks = []
# Create tasks for each source in the research plan
for source in analysis["priority_order"]:
if source == "web_search" and "web_search" in analysis["search_terms"]:
search_terms = analysis["search_terms"]["web_search"]
task = asyncio.create_task(
self._perform_web_research(search_terms)
)
tasks.append(("web_search", task))
elif source == "arxiv" and "arxiv" in analysis["search_terms"]:
arxiv_query = analysis["search_terms"]["arxiv"]
categories = analysis.get("arxiv_categories", None)
task = asyncio.create_task(
self._perform_arxiv_research(arxiv_query, categories)
)
tasks.append(("arxiv", task))
elif source == "wikipedia" and "wikipedia" in analysis["search_terms"]:
wiki_query = analysis["search_terms"]["wikipedia"]
task = asyncio.create_task(
self._perform_wikipedia_research(wiki_query)
)
tasks.append(("wikipedia", task))
# Wait for all tasks to complete
results = []
for source_name, task in tasks:
try:
data = await task
results.append({
"source": source_name,
"data": data
})
except Exception as e:
print(f"Error researching {source_name}: {e}")
results.append({
"source": source_name,
"error": str(e)
})
# If needed, perform follow-up research based on initial findings
if analysis.get("requires_followup", False):
followup_results = await self._perform_followup_research(results, original_query)
results.extend(followup_results)
return results
Smart Throttling
Prevent overloading external services:
class RateLimiter:
def __init__(self):
"""Initialize rate limiters for different domains."""
self.limiters = {}
def register_domain(self, domain, requests_per_minute):
"""Register rate limits for a domain."""
self.limiters[domain] = {
"rate": requests_per_minute,
"tokens": requests_per_minute,
"last_update": time.time(),
"lock": asyncio.Lock()
}
async def acquire(self, url):
"""Acquire permission to make a request to a URL."""
domain = self._extract_domain(url)
if domain not in self.limiters:
# Default conservative limit
self.register_domain(domain, 10)
limiter = self.limiters[domain]
async with limiter["lock"]:
# Refill tokens based on time elapsed
now = time.time()
time_passed = now - limiter["last_update"]
new_tokens = time_passed * (limiter["rate"] / 60.0)
limiter["tokens"] = min(limiter["rate"], limiter["tokens"] + new_tokens)
limiter["last_update"] = now
if limiter["tokens"] < 1:
# Calculate wait time until a token is available
wait_time = (1 - limiter["tokens"]) * (60.0 / limiter["rate"])
await asyncio.sleep(wait_time)
limiter["tokens"] = 1
limiter["last_update"] = time.time()
# Consume a token
limiter["tokens"] -= 1
def _extract_domain(self, url):
"""Extract the domain from a URL."""
parsed = urllib.parse.urlparse(url)
return parsed.netloc
Advanced technique: These optimizations show how to balance speed and resource usage. Caching prevents redundant research, parallel processing maximizes throughput, and rate limiting ensures respectful use of external services.
10. Security and Ethical Considerations
Securing Your Knowledge Companion
Implement these security measures to protect your system and users:
- Input validation: Sanitize all user inputs to prevent injection attacks
- Rate limiting: Prevent abuse by limiting requests per user
- Authentication: Require user authentication for sensitive operations
- Secure storage: Encrypt sensitive data and API keys
- Audit logging: Track all system actions for review
Example implementation of authentication middleware:
from fastapi import Depends, HTTPException, status
from fastapi.security import OAuth2PasswordBearer
import jwt
from datetime import datetime, timedelta
# Setup
oauth2_scheme = OAuth2PasswordBearer(tokenUrl="token")
SECRET_KEY = "your-secret-key" # Store securely in environment variables
ALGORITHM = "HS256"
# User authentication
async def get_current_user(token: str = Depends(oauth2_scheme)):
credentials_exception = HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Invalid authentication credentials",
headers={"WWW-Authenticate": "Bearer"},
)
try:
payload = jwt.decode(token, SECRET_KEY, algorithms=[ALGORITHM])
username: str = payload.get("sub")
if username is None:
raise credentials_exception
except jwt.PyJWTError:
raise credentials_exception
# Get user from database (simplified)
user = get_user(username)
if user is None:
raise credentials_exception
return user
# Example protected endpoint
@app.post("/research/start", response_model=ResearchStatus)
async def start_research(
query: Query,
background_tasks: BackgroundTasks,
current_user = Depends(get_current_user)
):
# Check user permissions
if not user_can_research(current_user):
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail="Not enough permissions"
)
# Continue with research...
Ethical Web Scraping
Follow these guidelines for responsible web navigation:
- Respect robots.txt: Check for permission before crawling
- Identify your bot: Set proper User-Agent strings
- Rate limiting: Don't overload websites with requests
- Cache results: Minimize duplicate requests
- Honor copyright: Respect terms of service and licensing
Implementation example:
class EthicalBrowser(IntelligentBrowser):
def __init__(self, headless=True, user_agent=None):
"""Initialize with ethical browsing capabilities."""
super().__init__(headless)
# Set an honest user agent
if user_agent is None:
user_agent = "KnowledgeCompanionBot/1.0 (+https://yourwebsite.com/bot.html)"
self.browser.set_user_agent(user_agent)
# Initialize robots.txt cache
self.robots_cache = {}
self.rate_limiter = RateLimiter()
async def visit_page(self, url):
"""Ethically visit a page with proper checks."""
# Check robots.txt first
domain = self._extract_domain(url)
if not await self._can_access(url):
return {
"url": url,
"error": "Access disallowed by robots.txt",
"content": None
}
# Apply rate limiting
await self.rate_limiter.acquire(url)
# Now perform the visit
return await super().visit_page(url)
async def _can_access(self, url):
"""Check if a URL can be accessed according to robots.txt."""
domain = self._extract_domain(url)
# Check cache first
if domain in self.robots_cache:
parser = self.robots_cache[domain]["parser"]
last_checked = self.robots_cache[domain]["time"]
# Refresh cache if older than 1 day
if time.time() - last_checked > 86400:
parser = await self._fetch_robots_txt(domain)
else:
# Fetch and parse robots.txt
parser = await self._fetch_robots_txt(domain)
# Check if our user agent can access the URL
user_agent = self.browser.get_user_agent()
path = urllib.parse.urlparse(url).path
return parser.can_fetch(user_agent, path)
async def _fetch_robots_txt(self, domain):
"""Fetch and parse robots.txt for a domain."""
robots_url = f"https://{domain}/robots.txt"
# Use a simple GET request, not the browser
async with aiohttp.ClientSession() as session:
try:
async with session.get(robots_url) as response:
if response.status == 200:
content = await response.text()
parser = robotparser.RobotFileParser()
parser.parse(content.splitlines())
else:
# No robots.txt or can't access it - create a permissive parser
parser = robotparser.RobotFileParser()
parser.allow_all = True
except:
# Error accessing robots.txt - create a permissive parser
parser = robotparser.RobotFileParser()
parser.allow_all = True
# Cache the result
self.robots_cache[domain] = {
"parser": parser,
"time": time.time()
}
return parser
def _extract_domain(self, url):
"""Extract domain from URL."""
parsed = urllib.parse.urlparse(url)
return parsed.netloc
Ethical principle: This implementation demonstrates the technical aspects of ethical web scraping by checking robots.txt files, using honest user agents, and implementing rate limiting to be a good citizen of the web.
11. Testing and Quality Assurance
Unit Testing Core Components
Example of testing the Browser-Use module:
import unittest
from unittest.mock import MagicMock, patch
import asyncio
class TestIntelligentBrowser(unittest.TestCase):
@patch('browser_use.BrowserSession')
def setUp(self, MockBrowserSession):
self.mock_browser = MockBrowserSession.return_value
self.intelligent_browser = IntelligentBrowser(headless=True)
def test_search(self):
# Set up mocks
self.mock_browser.find_element.return_value = "search_box"
self.intelligent_browser.get_search_results = MagicMock(return_value=[
{"title": "Test Result", "url": "https://example.com", "snippet": "Example snippet"}
])
# Execute search
results = self.intelligent_browser.search("test query")
# Verify
self.mock_browser.navigate.assert_called_with("https://www.google.com")
self.mock_browser.input_text.assert_called_with("search_box", "test query")
self.mock_browser.press_enter.assert_called_once()
self.assertEqual(len(results), 1)
self.assertEqual(results[0]["title"], "Test Result")
def test_extract_main_content(self):
# Set up mocks for different scenarios
self.mock_browser.find_element.side_effect = [
None, # No article
None, # No main
"content_div" # Found .content
]
self.mock_browser.get_text.return_value = "Extracted content"
# Execute
content = self.intelligent_browser.extract_main_content()
# Verify
self.assertEqual(content, "Extracted content")
self.assertEqual(self.mock_browser.find_element.call_count, 3)
class TestKnowledgeManager(unittest.TestCase):
def setUp(self):
self.mock_mcp = MagicMock()
self.knowledge_manager = KnowledgeSourceManager(self.mock_mcp)
def test_query_source(self):
# Set up
self.knowledge_manager.available_sources = {
"test_source": {"description": "Test", "capabilities": [], "parameters": {}}
}
self.mock_mcp.query_context.return_value = {"result": "test_data"}
# Execute
result = self.knowledge_manager.query_source("test_source", {"param": "value"})
# Verify
self.mock_mcp.query_context.assert_called_with(
context_name="test_source",
parameters={"param": "value"}
)
self.assertEqual(result, {"result": "test_data"})
def test_query_unknown_source(self):
# Verify exception for unknown source
with self.assertRaises(ValueError):
self.knowledge_manager.query_source("unknown_source", {})
Integration Testing
Test how components work together:
class TestOrchestration(unittest.IsolatedAsyncioTestCase):
async def asyncSetUp(self):
# Create mocks for all components
self.mock_llm = MagicMock()
self.mock_browser = MagicMock()
self.mock_knowledge_manager = MagicMock()
# Setup orchestrator with mocks
self.orchestrator = KnowledgeOrchestrator(
self.mock_llm,
self.mock_browser,
self.mock_knowledge_manager
)
# Setup common test data
self.mock_llm.complete.return_value = json.dumps({
"priority_order": ["web_search", "arxiv"],
"search_terms": {
"web_search": ["test query"],
"arxiv": "test query physics"
},
"requires_followup": False
})
async def test_full_query_processing(self):
# Mock the research methods
self.orchestrator._perform_web_research = MagicMock(
return_value=asyncio.Future()
)
self.orchestrator._perform_web_research.return_value.set_result([
{"title": "Web Result", "url": "https://example.com"}
])
self.mock_knowledge_manager.search_arxiv.return_value = {
"papers": [{"title": "ArXiv Paper", "abstract": "Test abstract"}]
}
# Mock report synthesis
final_report = "# Research Report\n\nThis is a test report."
self.mock_llm.complete.side_effect = [
# First call - query analysis
json.dumps({
"priority_order": ["web_search", "arxiv"],
"search_terms": {
"web_search": ["test query"],
"arxiv": "test query physics"
},
"requires_followup": False
}),
# Second call - report synthesis
final_report
]
# Execute full query processing
result = await self.orchestrator.process_query("test query")
# Verify
self.assertEqual(result, final_report)
self.assertEqual(self.mock_llm.complete.call_count, 2)
self.orchestrator._perform_web_research.assert_called_once()
self.mock_knowledge_manager.search_arxiv.assert_called_once()
End-to-End Testing
Test the entire system with real inputs:
class TestEndToEnd(unittest.IsolatedAsyncioTestCase):
async def asyncSetUp(self):
# Create real components with test configuration
self.llm_client = LlamaClient(model="llama3.2-test")
self.browser = IntelligentBrowser(headless=True)
self.mcp_client = MCPClient(
server_url="https://test-mcp.example.com",
api_key="test_key"
)
self.knowledge_manager = KnowledgeSourceManager(self.mcp_client)
# Set up the orchestrator with real components
self.orchestrator = KnowledgeOrchestrator(
self.llm_client,
self.browser,
self.knowledge_manager
)
# Create the CLI interface
self.cli = KnowledgeCompanionCLI(self.orchestrator)
async def asyncTearDown(self):
# Clean up resources
self.browser.close()
@unittest.skip("End-to-end test requires real services")
async def test_basic_query(self):
"""Test end-to-end with a simple query."""
# This test is skipped by default as it requires real services
query = "What is the capital of France?"
report = await self.orchestrator.process_query(query)
# Verify basic expectations about the report
self.assertIn("Paris", report)
self.assertIn("France", report)
self.assertGreater(len(report), 100) # Ensure substantial content
Testing principle: The test suite demonstrates how to test at multiple levels—unit tests for individual functions, integration tests for component interactions, and end-to-end tests for the complete system. Note the use of mocks to isolate components during testing.
12. Deployment Strategies
Docker Containerization
Create a Dockerfile for your Knowledge Companion:
# Use a Python base image
FROM python:3.9-slim
# Install system dependencies including Chrome/Chromium
RUN apt-get update && apt-get install -y \
wget \
gnupg \
unzip \
&& wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
&& echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list \
&& apt-get update \
&& apt-get install -y google-chrome-stable \
&& rm -rf /var/lib/apt/lists/*
# Install ChromeDriver
RUN CHROMEDRIVER_VERSION=`curl -sS chromedriver.storage.googleapis.com/LATEST_RELEASE` && \
wget -N chromedriver.storage.googleapis.com/$CHROMEDRIVER_VERSION/chromedriver_linux64.zip -P ~/ && \
unzip ~/chromedriver_linux64.zip -d ~/ && \
rm ~/chromedriver_linux64.zip && \
mv -f ~/chromedriver /usr/local/bin/chromedriver && \
chmod +x /usr/local/bin/chromedriver
# Set up workdir and install Python dependencies
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Download and configure Ollama (for local LLM support)
RUN wget -O ollama https://ollama.ai/download/ollama-linux-amd64 && \
chmod +x ollama && \
mv ollama /usr/local/bin/
# Copy application code
COPY . .
# Create a non-root user
RUN useradd -m appuser
USER appuser
# Set environment variables
ENV PYTHONUNBUFFERED=1
ENV BROWSER_USE_HEADLESS=true
ENV OLLAMA_HOST=host.docker.internal
# Expose port for the API
EXPOSE 8000
# Health check
HEALTHCHECK --interval=30s --timeout=30s --start-period=5s --retries=3 \
CMD curl -f http://localhost:8000/health || exit 1
# Command to run the application
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
Using Docker Compose for Multi-Container Deployment
Create a docker-compose.yml
file:
version: '3.8'
services:
# API Service
api:
build: .
ports:
- "8000:8000"
environment:
- REDIS_URL=redis://redis:6379
- MCP_SERVER_URL=http://mcp:5000
depends_on:
- redis
- mcp
- ollama
volumes:
- ./app:/app
networks:
- knowledge-network
# Redis for caching
redis:
image: redis:6.2-alpine
ports:
- "6379:6379"
volumes:
- redis-data:/data
command: redis-server --appendonly yes
networks:
- knowledge-network
# MCP Server
mcp:
build: ./mcp-server
ports:
- "5000:5000"
environment:
- PYTHONUNBUFFERED=1
volumes:
- ./mcp-server:/app
networks:
- knowledge-network
# Ollama for local LLM support
ollama:
image: ollama/ollama:latest
volumes:
- ollama-data:/root/.ollama
ports:
- "11434:11434"
networks:
- knowledge-network
networks:
knowledge-network:
driver: bridge
volumes:
redis-data:
ollama-data:
Serverless Deployment
For AWS Lambda deployment, create a serverless.yml configuration:
service: knowledge-companion
provider:
name: aws
runtime: python3.9
region: us-east-1
memorySize: 2048
timeout: 30
environment:
MCP_SERVER_URL: ${env:MCP_SERVER_URL}
REDIS_URL: ${env:REDIS_URL}
functions:
api:
handler: serverless_handler.handler
events:
- http:
path: /
method: ANY
- http:
path: /{proxy+}
method: ANY
layers:
- !Ref PythonDependenciesLambdaLayer
layers:
pythonDependencies:
path: layer
compatibleRuntimes:
- python3.9
custom:
pythonRequirements:
dockerizePip: true
slim: true
layer: true
plugins:
- serverless-python-requirements
- serverless-offline
Deployment concept: These examples show multiple deployment strategies—containerized for easy distribution, Docker Compose for multi-service orchestration, and serverless for scalable API deployment. The actual choice depends on your specific requirements and infrastructure constraints.
13. Practical Applications and Use Cases
Academic Research Assistant
Customize your Knowledge Companion for academic research:
class AcademicResearchOrchestrator(KnowledgeOrchestrator):
"""Specialized orchestrator for academic research."""
async def _analyze_query(self, query):
"""Academic-focused query analysis."""
prompt = f"""
Analyze the following academic research query and determine the best approach:
QUERY: {query}
Please determine:
1. Which academic fields are most relevant to this query?
2. What specific academic databases should be consulted? (arXiv, PubMed, etc.)
3. What are the key search terms for academic literature?
4. Are there specific authors or institutions known for work in this area?
5. What time period is most relevant for this research?
Return your analysis as a structured JSON object.
"""
response = await self.llm.complete(prompt)
return json.loads(response)
async def _perform_literature_review(self, analysis):
"""Conduct a comprehensive literature review."""
sources = []
# Search across multiple academic databases
if "arxiv" in analysis.get("databases", []):
arxiv_results = await self._search_arxiv(
analysis["search_terms"],
analysis.get("categories", [])
)
sources.append(("arxiv", arxiv_results))
if "pubmed" in analysis.get("databases", []):
pubmed_results = await self._search_pubmed(
analysis["search_terms"],
analysis.get("date_range", {})
)
sources.append(("pubmed", pubmed_results))
# Find and analyze citation networks
key_papers = self._identify_key_papers(sources)
citation_network = await self._analyze_citations(key_papers)
return {
"primary_sources": sources,
"key_papers": key_papers,
"citation_network": citation_network
}
async def _generate_academic_report(self, query, research_results):
"""Generate an academic-style report with proper citations."""
prompt = f"""
Create a comprehensive academic literature review based on the following research:
QUERY: {query}
RESEARCH FINDINGS: {json.dumps(research_results, indent=2)}
Your review should:
1. Begin with an abstract summarizing the findings
2. Include a methodology section explaining the search strategy
3. Present findings organized by themes or chronologically
4. Discuss conflicts and agreements in the literature
5. Identify research gaps
6. Include a properly formatted bibliography (APA style)
Format the review in Markdown.
"""
review = await self.llm.complete(prompt)
return review
Business Intelligence Tool
Adapt your Knowledge Companion for business intelligence:
class BusinessIntelligenceOrchestrator(KnowledgeOrchestrator):
"""Specialized orchestrator for business intelligence."""
async def _analyze_query(self, query):
"""Business-focused query analysis."""
prompt = f"""
Analyze the following business intelligence query and determine the best approach:
QUERY: {query}
Please determine:
1. What industry sectors are relevant to this query?
2. What specific companies should be researched?
3. What types of business data would be most valuable (financial, strategic, competitive)?
4. What time frame is relevant for this analysis?
5. What business news sources would be most appropriate?
Return your analysis as a structured JSON object.
"""
response = await self.llm.complete(prompt)
return json.loads(response)
async def _gather_company_data(self, companies):
"""Gather data about specific companies."""
results = []
for company in companies:
# Financial data
financial_data = await self._fetch_financial_data(company)
# News and press releases
news = await self._fetch_company_news(company)
# Social media sentiment
sentiment = await self._analyze_social_sentiment(company)
results.append({
"company": company,
"financial": financial_data,
"news": news,
"sentiment": sentiment
})
return results
async def _perform_competitive_analysis(self, primary_company, competitors):
"""Analyze competitive positioning."""
company_data = await self._gather_company_data([primary_company] + competitors)
# Extract the primary company data
primary_data = next(item for item in company_data if item["company"] == primary_company)
# Extract competitor data
competitor_data = [item for item in company_data if item["company"] != primary_company]
# Perform SWOT analysis
swot = await self._generate_swot_analysis(primary_data, competitor_data)
return {
"primary_company": primary_data,
"competitors": competitor_data,
"swot_analysis": swot
}
async def _generate_business_report(self, query, intelligence_data):
"""Generate a business intelligence report."""
prompt = f"""
Create a comprehensive business intelligence report based on the following data:
QUERY: {query}
INTELLIGENCE DATA: {json.dumps(intelligence_data, indent=2)}
Your report should:
1. Begin with an executive summary
2. Include market overview and trends
3. Provide detailed company analyses
4. Present competitive comparisons with data visualizations (described in text)
5. Offer strategic recommendations
6. Include reference sources
Format the report in Markdown with sections for easy navigation.
"""
report = await self.llm.complete(prompt)
return report
Personal Learning Assistant
Create a personal learning companion:
class LearningPathOrchestrator(KnowledgeOrchestrator):
"""Specialized orchestrator for creating personalized learning paths."""
async def create_learning_path(self, subject, user_level="beginner", goals=None):
"""Create a personalized learning path for a subject."""
analysis = await self._analyze_learning_needs(subject, user_level, goals)
resources = await self._discover_learning_resources(analysis)
curriculum = await self._design_curriculum(analysis, resources)
return {
"subject": subject,
"user_level": user_level,
"goals": goals,
"curriculum": curriculum
}
async def _analyze_learning_needs(self, subject, user_level, goals):
"""Analyze the learning requirements for a subject."""
prompt = f"""
Analyze the following learning request and determine the optimal approach:
SUBJECT: {subject}
USER LEVEL: {user_level}
LEARNING GOALS: {goals if goals else 'General proficiency'}
Please determine:
1. What are the foundational concepts needed for this subject?
2. What is an appropriate learning sequence?
3. What types of resources would be most beneficial (textbooks, videos, interactive exercises)?
4. How should progress be measured?
5. What are common stumbling blocks for learners at this level?
Return your analysis as a structured JSON object.
"""
response = await self.llm.complete(prompt)
return json.loads(response)
async def _discover_learning_resources(self, analysis):
"""Find optimal learning resources based on analysis."""
resources = {
"textbooks": [],
"online_courses": [],
"videos": [],
"practice_resources": [],
"communities": []
}
# Search for textbooks and books
book_results = await self._search_books(analysis["subject"], analysis["key_concepts"])
resources["textbooks"] = book_results[:5] # Top 5 relevant books
# Search for online courses
course_results = await self._search_online_courses(
analysis["subject"],
analysis["user_level"]
)
resources["online_courses"] = course_results[:5]
# Find educational videos
video_results = await self._search_educational_videos(
analysis["subject"],
analysis["key_concepts"]
)
resources["videos"] = video_results[:8]
# Find practice resources
practice_results = await self._search_practice_resources(analysis["subject"])
resources["practice_resources"] = practice_results[:5]
# Find learning communities
community_results = await self._search_learning_communities(analysis["subject"])
resources["communities"] = community_results[:3]
return resources
async def _design_curriculum(self, analysis, resources):
"""Design a structured curriculum based on analysis and resources."""
prompt = f"""
Create a comprehensive learning curriculum based on the following:
SUBJECT ANALYSIS: {json.dumps(analysis, indent=2)}
AVAILABLE RESOURCES: {json.dumps(resources, indent=2)}
Your curriculum should:
1. Be organized in modules with clear learning objectives for each
2. Include a mix of theory and practice
3. Estimate time commitments for each module
4. Recommend specific resources for each module
5. Include checkpoints to assess understanding
6. Provide a progression from foundational to advanced concepts
Format the curriculum in Markdown with clear sections.
"""
curriculum = await self.llm.complete(prompt)
return curriculum
Application principle: These specialized orchestrators show how the same core technology can be adapted to different domains by modifying the analysis, research strategies, and report generation based on domain-specific requirements.
14. Conclusion: The Future of AI Knowledge Systems
The AI Knowledge Companion we've built represents just the beginning of a new era in information processing. As we look to the future, several trends are emerging:
-
Deeper reasoning capabilities: Future systems will go beyond information retrieval to perform logical reasoning, connecting disparate pieces of information to generate novel insights.
-
Multimodal understanding: The next generation of knowledge systems will process not just text but images, audio, and video seamlessly, extracting information from all media types.
-
Collective intelligence: Knowledge companions will facilitate collaboration between multiple human experts and AI systems, creating hybrid intelligence networks.
-
Continuous learning: Systems will remember past interactions and continuously refine their understanding based on user feedback and new information.
-
Specialized domain expertise: We'll see the rise of highly specialized companions focused on specific fields like medicine, law, or engineering.
The technologies we've explored—Browser-Use and MCP—are foundational pieces of this future, enabling AI systems to navigate the internet and connect to diverse knowledge sources in standardized ways. As these technologies mature, the boundary between AI assistants and knowledge workers will continue to blur, creating powerful partnerships that enhance human capabilities.
By building your own AI Knowledge Companion, you're not just creating a research tool—you're participating in the development of systems that will fundamentally transform how humans interact with the vast landscape of global knowledge.
Additional Resources
- Browser-Use Documentation
- Model Context Protocol Specification
- Ollama GitHub Repository
- Ethical Web Scraping Guidelines
- FastAPI Documentation
- Redis Documentation
- Docker and Docker Compose Documentation
- ArXiv API Documentation
Happy knowledge exploration!