Building a Multimodal Story Generation System
Daniel Kliewer
Author, Sovereign AI

From the Book
This is from Sovereign AI: Building Local-First Intelligent Systems.


Multimodal Story Generation System
Transform visual inputs into structured narratives using cutting-edge AI technologies. This system combines computer vision and large language models to generate dynamic, multi-chapter stories from images.
Features
- ๐ผ๏ธ Image Analysis - Extract narrative elements from images using LLaVA
- ๐ Adaptive Story Generation - Generate 5-chapter stories with Gemma2-27B
- ๐ง Context Awareness - Maintain narrative consistency with ChromaDB RAG
- ๐ Interactive Visualization - ReactFlow-powered story graph interface
- ๐ Production Ready - Dockerized microservices architecture
Table of Contents
Quick Start
Local Development Setup
-
Clone Repository
bash1git clone https://github.com/kliewerdaniel/ITB022cd ITB02 -
Create Virtual Environment
bash1python -m venv venv2source venv/bin/activate # Linux/Mac3venv\Scripts\activate # Windows -
Install Dependencies
bash1pip install -r requirements.txt23# Apple Silicon Special Setup4pip install --pre torch --extra-index-url https://download.pytorch.org/whl/nightly/cpu5brew install libjpeg webp -
Initialize AI Models
bash1ollama pull gemma2:27b2ollama pull llava -
Start Services
bash1# Backend (FastAPI)2uvicorn backend.main:app --reload34# Frontend (new terminal)5cd frontend6npm install && npm run dev -
Verify Installation
bash1curl http://localhost:8000/health2# Expected response: {"status":"healthy"}
System Requirements
- Python 3.11+
- Node.js 18+
- Ollama runtime
- 16GB RAM (24GB+ recommended for GPU acceleration)
- 10GB+ Disk Space
Architecture
text1[Frontend] โHTTPโ [FastAPI]2 โ โ3 [Ollama] โโ [ChromaDB]4 โ5 [Redis]6 โ7 [Celery Workers]
Key Components
| Component | Technology Stack | Function | |---------------------|------------------------|------------------------------------| | Image Analysis | LLaVA, Pillow | Visual narrative extraction | | Story Engine | Gemma2-27B, LangChain | Context-aware chapter generation | | Knowledge Base | ChromaDB | Narrative consistency management | | API Layer | FastAPI | REST endpoint management | | Visualization | ReactFlow, Zustand | Interactive story mapping |
Production Deployment
Docker Setup
bash1# Build and launch all services2docker-compose up --build34# Initialize vector store5docker exec -it backend python -c "from backend.core.rag_manager import NarrativeRAG; NarrativeRAG()"
Cluster Configuration
yaml1# docker-compose.yml excerpt2services:3 ollama:4 deploy:5 resources:6 limits:7 memory: 12G8 cpus: '4'
Troubleshooting
Common Issues
-
Missing Vector Store
bash1rm -rf chroma_db && mkdir chroma_db -
Out-of-Memory Errors
bash1export OLLAMA_MAX_LOADED_MODELS=2 -
CUDA Compatibility Issues
bash1pip uninstall torch2pip install torch --extra-index-url https://download.pytorch.org/whl/cu117
Daniel Kliewer
GitHub Profile
AI Systems Developer

Sovereign AI: Building Local-First Intelligent Systems
by Daniel Kliewer ยท Paperback ยท 72 pages
The hands-on guide to building AI that runs on your hardware, keeps your data private, and eliminates cloud dependence. Working code included.