Daniel Kliewer

Building a Multimodal Story Generation System Complete Setup Guide

3 min read

Multimodal Story Generation System

License: MIT Python 3.11+ Ollama Required

Transform visual inputs into structured narratives using cutting-edge AI technologies. This system combines computer vision and large language models to generate dynamic, multi-chapter stories from images.

Features

  • 🖼️ Image Analysis - Extract narrative elements from images using LLaVA
  • 📖 Adaptive Story Generation - Generate 5-chapter stories with Gemma2-27B
  • 🧠 Context Awareness - Maintain narrative consistency with ChromaDB RAG
  • 📊 Interactive Visualization - ReactFlow-powered story graph interface
  • 🚀 Production Ready - Dockerized microservices architecture

Table of Contents

Quick Start

Local Development Setup

  1. Clone Repository

    git clone https://github.com/kliewerdaniel/ITB02
    cd ITB02
    
  2. Create Virtual Environment

    python -m venv venv
    source venv/bin/activate  # Linux/Mac
    venv\Scripts\activate     # Windows
    
  3. Install Dependencies

    pip install -r requirements.txt
    
    # Apple Silicon Special Setup
    pip install --pre torch --extra-index-url https://download.pytorch.org/whl/nightly/cpu
    brew install libjpeg webp
    
  4. Initialize AI Models

    ollama pull gemma2:27b
    ollama pull llava
    
  5. Start Services

    # Backend (FastAPI)
    uvicorn backend.main:app --reload
    
    # Frontend (new terminal)
    cd frontend
    npm install && npm run dev
    
  6. Verify Installation

    curl http://localhost:8000/health
    # Expected response: {"status":"healthy"}
    

System Requirements

  • Python 3.11+
  • Node.js 18+
  • Ollama runtime
  • 16GB RAM (24GB+ recommended for GPU acceleration)
  • 10GB+ Disk Space

Architecture

[Frontend] ←HTTP→ [FastAPI]  
                 ↓     ↑  
              [Ollama] ←→ [ChromaDB]  
                 ↓  
              [Redis]  
                 ↓  
            [Celery Workers]

Key Components

ComponentTechnology StackFunction
Image AnalysisLLaVA, PillowVisual narrative extraction
Story EngineGemma2-27B, LangChainContext-aware chapter generation
Knowledge BaseChromaDBNarrative consistency management
API LayerFastAPIREST endpoint management
VisualizationReactFlow, ZustandInteractive story mapping

Production Deployment

Docker Setup

# Build and launch all services
docker-compose up --build

# Initialize vector store
docker exec -it backend python -c "from backend.core.rag_manager import NarrativeRAG; NarrativeRAG()"

Cluster Configuration

# docker-compose.yml excerpt
services:
  ollama:
    deploy:
      resources:
        limits:
          memory: 12G
          cpus: '4'

Troubleshooting

Common Issues

  1. Missing Vector Store

    rm -rf chroma_db && mkdir chroma_db
    
  2. Out-of-Memory Errors

    export OLLAMA_MAX_LOADED_MODELS=2
    
  3. CUDA Compatibility Issues

    pip uninstall torch
    pip install torch --extra-index-url https://download.pytorch.org/whl/cu117
    

Daniel Kliewer
GitHub Profile
AI Systems Developer