Multimodal Story Generation System

Transform visual inputs into structured narratives using cutting-edge AI technologies. This system combines computer vision and large language models to generate dynamic, multi-chapter stories from images.

Features

🖼️ Image Analysis - Extract narrative elements from images using LLaVA
📖 Adaptive Story Generation - Generate 5-chapter stories with Gemma2-27B
🧠 Context Awareness - Maintain narrative consistency with ChromaDB RAG
📊 Interactive Visualization - ReactFlow-powered story graph interface
🚀 Production Ready - Dockerized microservices architecture

Quick Start
System Requirements
Architecture
Production Deployment
Troubleshooting

Quick Start

Local Development Setup

Clone Repository

Bash
git clone https://github.com/kliewerdaniel/ITB02
cd ITB02

Create Virtual Environment

Bash
python -m venv venv
source venv/bin/activate  # Linux/Mac
venv\Scripts\activate     # Windows

Install Dependencies

Bash
pip install -r requirements.txt

# Apple Silicon Special Setup
pip install --pre torch --extra-index-url https://download.pytorch.org/whl/nightly/cpu
brew install libjpeg webp

Initialize AI Models

Bash
ollama pull gemma2:27b
ollama pull llava

Start Services

Bash
# Backend (FastAPI)
uvicorn backend.main:app --reload

# Frontend (new terminal)
cd frontend
npm install && npm run dev

Verify Installation

Bash
curl http://localhost:8000/health
# Expected response: {"status":"healthy"}

System Requirements

Python 3.11+
Node.js 18+
Ollama runtime
16GB RAM (24GB+ recommended for GPU acceleration)
10GB+ Disk Space

Architecture

JavaScript
[Frontend] ←HTTP→ [FastAPI]  
                 ↓     ↑  
              [Ollama] ←→ [ChromaDB]  
                 ↓  
              [Redis]  
                 ↓  
            [Celery Workers]

Key Components

Component	Technology Stack	Function
Image Analysis	LLaVA, Pillow	Visual narrative extraction
Story Engine	Gemma2-27B, LangChain	Context-aware chapter generation
Knowledge Base	ChromaDB	Narrative consistency management
API Layer	FastAPI	REST endpoint management
Visualization	ReactFlow, Zustand	Interactive story mapping

Production Deployment

Docker Setup

Bash
# Build and launch all services
docker-compose up --build

# Initialize vector store
docker exec -it backend python -c "from backend.core.rag_manager import NarrativeRAG; NarrativeRAG()"

Cluster Configuration

YAML
# docker-compose.yml excerpt
services:
  ollama:
    deploy:
      resources:
        limits:
          memory: 12G
          cpus: '4'

Troubleshooting

Common Issues

Missing Vector Store

Bash
rm -rf chroma_db && mkdir chroma_db

Out-of-Memory Errors

Bash
export OLLAMA_MAX_LOADED_MODELS=2

CUDA Compatibility Issues

Bash
pip uninstall torch
pip install torch --extra-index-url https://download.pytorch.org/whl/cu117

Daniel Kliewer
GitHub Profile
AI Systems Developer

Building a Multimodal Story Generation System