Multimodal Story Generation System

License: MIT

Image failed to load: License-MIT-yellow

Python 3.11+

Image failed to load: Python-3

Ollama Required

Transform visual inputs into structured narratives using cutting-edge AI technologies. This system combines computer vision and large language models to generate dynamic, multi-chapter stories from images.

Features

  • ๐Ÿ–ผ๏ธ Image Analysis - Extract narrative elements from images using LLaVA
  • ๐Ÿ“– Adaptive Story Generation - Generate 5-chapter stories with Gemma2-27B
  • ๐Ÿง  Context Awareness - Maintain narrative consistency with ChromaDB RAG
  • ๐Ÿ“Š Interactive Visualization - ReactFlow-powered story graph interface
  • ๐Ÿš€ Production Ready - Dockerized microservices architecture

Table of Contents

Quick Start

Local Development Setup

  1. Clone Repository
    git clone https://github.com/kliewerdaniel/ITB02
    cd ITB02
    
  2. Create Virtual Environment
    python -m venv venv
    source venv/bin/activate  # Linux/Mac
    venv\Scripts\activate     # Windows
    
  3. Install Dependencies
    pip install -r requirements.txt
       
    # Apple Silicon Special Setup
    pip install --pre torch --extra-index-url https://download.pytorch.org/whl/nightly/cpu
    brew install libjpeg webp
    
  4. Initialize AI Models
    ollama pull gemma2:27b
    ollama pull llava
    
  5. Start Services
    # Backend (FastAPI)
    uvicorn backend.main:app --reload
    
    # Frontend (new terminal)
    cd frontend
    npm install && npm run dev
    
  6. Verify Installation
    curl http://localhost:8000/health
    # Expected response: {"status":"healthy"}
    

System Requirements

  • Python 3.11+
  • Node.js 18+
  • Ollama runtime
  • 16GB RAM (24GB+ recommended for GPU acceleration)
  • 10GB+ Disk Space

Architecture

[Frontend] โ†HTTPโ†’ [FastAPI]  
                 โ†“     โ†‘  
              [Ollama] โ†โ†’ [ChromaDB]  
                 โ†“  
              [Redis]  
                 โ†“  
            [Celery Workers]

Key Components

Component Technology Stack Function
Image Analysis LLaVA, Pillow Visual narrative extraction
Story Engine Gemma2-27B, LangChain Context-aware chapter generation
Knowledge Base ChromaDB Narrative consistency management
API Layer FastAPI REST endpoint management
Visualization ReactFlow, Zustand Interactive story mapping

Production Deployment

Docker Setup

# Build and launch all services
docker-compose up --build

# Initialize vector store
docker exec -it backend python -c "from backend.core.rag_manager import NarrativeRAG; NarrativeRAG()"

Cluster Configuration

# docker-compose.yml excerpt
services:
  ollama:
    deploy:
      resources:
        limits:
          memory: 12G
          cpus: '4'

Troubleshooting

Common Issues

  1. Missing Vector Store
    rm -rf chroma_db && mkdir chroma_db
    
  2. Out-of-Memory Errors
    export OLLAMA_MAX_LOADED_MODELS=2
    
  3. CUDA Compatibility Issues
    pip uninstall torch
    pip install torch --extra-index-url https://download.pytorch.org/whl/cu117
    

Daniel Kliewer
GitHub Profile
AI Systems Developer