ยท3 min

Building a Multimodal Story Generation System

DK

Daniel Kliewer

Author, Sovereign AI

AIMultimodalStory GenerationPythonLLM
Sovereign AI book cover

From the Book

This is from Sovereign AI: Building Local-First Intelligent Systems.

Get the Book โ€” $88
Building a Multimodal Story Generation System

Image

Multimodal Story Generation System

License: MIT Python 3.11+ Ollama Required

Transform visual inputs into structured narratives using cutting-edge AI technologies. This system combines computer vision and large language models to generate dynamic, multi-chapter stories from images.

Features

  • ๐Ÿ–ผ๏ธ Image Analysis - Extract narrative elements from images using LLaVA
  • ๐Ÿ“– Adaptive Story Generation - Generate 5-chapter stories with Gemma2-27B
  • ๐Ÿง  Context Awareness - Maintain narrative consistency with ChromaDB RAG
  • ๐Ÿ“Š Interactive Visualization - ReactFlow-powered story graph interface
  • ๐Ÿš€ Production Ready - Dockerized microservices architecture

Table of Contents

Quick Start

Local Development Setup

  1. Clone Repository

    bash
    1git clone https://github.com/kliewerdaniel/ITB02
    2cd ITB02
  2. Create Virtual Environment

    bash
    1python -m venv venv
    2source venv/bin/activate # Linux/Mac
    3venv\Scripts\activate # Windows
  3. Install Dependencies

    bash
    1pip install -r requirements.txt
    2
    3# Apple Silicon Special Setup
    4pip install --pre torch --extra-index-url https://download.pytorch.org/whl/nightly/cpu
    5brew install libjpeg webp
  4. Initialize AI Models

    bash
    1ollama pull gemma2:27b
    2ollama pull llava
  5. Start Services

    bash
    1# Backend (FastAPI)
    2uvicorn backend.main:app --reload
    3
    4# Frontend (new terminal)
    5cd frontend
    6npm install && npm run dev
  6. Verify Installation

    bash
    1curl http://localhost:8000/health
    2# Expected response: {"status":"healthy"}

System Requirements

  • Python 3.11+
  • Node.js 18+
  • Ollama runtime
  • 16GB RAM (24GB+ recommended for GPU acceleration)
  • 10GB+ Disk Space

Architecture

text
1[Frontend] โ†HTTPโ†’ [FastAPI]
2 โ†“ โ†‘
3 [Ollama] โ†โ†’ [ChromaDB]
4 โ†“
5 [Redis]
6 โ†“
7 [Celery Workers]

Key Components

| Component | Technology Stack | Function | |---------------------|------------------------|------------------------------------| | Image Analysis | LLaVA, Pillow | Visual narrative extraction | | Story Engine | Gemma2-27B, LangChain | Context-aware chapter generation | | Knowledge Base | ChromaDB | Narrative consistency management | | API Layer | FastAPI | REST endpoint management | | Visualization | ReactFlow, Zustand | Interactive story mapping |

Production Deployment

Docker Setup

bash
1# Build and launch all services
2docker-compose up --build
3
4# Initialize vector store
5docker exec -it backend python -c "from backend.core.rag_manager import NarrativeRAG; NarrativeRAG()"

Cluster Configuration

yaml
1# docker-compose.yml excerpt
2services:
3 ollama:
4 deploy:
5 resources:
6 limits:
7 memory: 12G
8 cpus: '4'

Troubleshooting

Common Issues

  1. Missing Vector Store

    bash
    1rm -rf chroma_db && mkdir chroma_db
  2. Out-of-Memory Errors

    bash
    1export OLLAMA_MAX_LOADED_MODELS=2
  3. CUDA Compatibility Issues

    bash
    1pip uninstall torch
    2pip install torch --extra-index-url https://download.pytorch.org/whl/cu117

Daniel Kliewer
GitHub Profile
AI Systems Developer

Sovereign AI book cover

Sovereign AI: Building Local-First Intelligent Systems

by Daniel Kliewer ยท Paperback ยท 72 pages

The hands-on guide to building AI that runs on your hardware, keeps your data private, and eliminates cloud dependence. Working code included.