Multimodal Story Generation System

License: MIT Python 3.11+ Ollama Required

Transform visual inputs into structured narratives using cutting-edge AI technologies. This system combines computer vision and large language models to generate dynamic, multi-chapter stories from images.

Features

Table of Contents

Quick Start

Local Development Setup

  1. Clone Repository
    git clone https://github.com/kliewerdaniel/ITB02
    cd ITB02
    
  2. Create Virtual Environment
    python -m venv venv
    source venv/bin/activate  # Linux/Mac
    venv\Scripts\activate     # Windows
    
  3. Install Dependencies
    pip install -r requirements.txt
       
    # Apple Silicon Special Setup
    pip install --pre torch --extra-index-url https://download.pytorch.org/whl/nightly/cpu
    brew install libjpeg webp
    
  4. Initialize AI Models
    ollama pull gemma2:27b
    ollama pull llava
    
  5. Start Services
    # Backend (FastAPI)
    uvicorn backend.main:app --reload
    
    # Frontend (new terminal)
    cd frontend
    npm install && npm run dev
    
  6. Verify Installation
    curl http://localhost:8000/health
    # Expected response: {"status":"healthy"}
    

System Requirements

Architecture

[Frontend] ←HTTP→ [FastAPI]  
                 ↓     ↑  
              [Ollama] ←→ [ChromaDB]  
                 ↓  
              [Redis]  
                 ↓  
            [Celery Workers]

Key Components

Component Technology Stack Function
Image Analysis LLaVA, Pillow Visual narrative extraction
Story Engine Gemma2-27B, LangChain Context-aware chapter generation
Knowledge Base ChromaDB Narrative consistency management
API Layer FastAPI REST endpoint management
Visualization ReactFlow, Zustand Interactive story mapping

Production Deployment

Docker Setup

# Build and launch all services
docker-compose up --build

# Initialize vector store
docker exec -it backend python -c "from backend.core.rag_manager import NarrativeRAG; NarrativeRAG()"

Cluster Configuration

# docker-compose.yml excerpt
services:
  ollama:
    deploy:
      resources:
        limits:
          memory: 12G
          cpus: '4'

Troubleshooting

Common Issues

  1. Missing Vector Store
    rm -rf chroma_db && mkdir chroma_db
    
  2. Out-of-Memory Errors
    export OLLAMA_MAX_LOADED_MODELS=2
    
  3. CUDA Compatibility Issues
    pip uninstall torch
    pip install torch --extra-index-url https://download.pytorch.org/whl/cu117
    

Daniel Kliewer
GitHub Profile
AI Systems Developer