RedDiss

Repo

Behind the Scenes of RedDiss: Crafting AI-Powered Diss Tracks from Reddit

In the ever-evolving landscape of artificial intelligence and social media, innovative projects continually push the boundaries of what’s possible. One such pioneering endeavor is RedDiss, an AI-powered diss track generator developed by Daniel Kliewer. As an entry for the Loco Local LocalLLaMa Hackathon 1.0, RedDiss seamlessly blends Reddit data extraction with cutting-edge AI technologies to produce personalized diss tracks. This blog post delves deep into the architecture, functionalities, and inner workings of RedDiss, offering a comprehensive overview of how this project transforms raw Reddit content into polished auditory art.

Table of Contents

  1. Introduction to RedDiss
  2. Project Architecture
  3. Core Components
  4. Streamlit Front-End
  5. Backend Integration with FastAPI
  6. Testing and Quality Assurance
  7. Installation and Deployment
  8. Conclusion and Future Prospects

Introduction to RedDiss

RedDiss stands at the intersection of social media analytics, natural language processing, and audio engineering. By harnessing the wealth of conversations on Reddit, RedDiss extracts relevant themes and sentiments to craft diss track lyrics tailored to specific Reddit posts or comments. These lyrics are then refined for flow, converted to speech, synchronized with beats, and masterfully processed into a final audio track—all within an intuitive Streamlit application.

Project Architecture

RedDiss is structured to ensure maintainability, scalability, and efficiency. The project repository is organized into several key directories:

This modular architecture allows each component to operate independently while seamlessly integrating with others, fostering an environment conducive to continuous development and improvement.

Core Components

Let’s explore each core component of RedDiss, understanding its purpose and implementation.

1. Reddit Data Scraper

File: agents/scraper.py

RedDiss begins its magic by tapping into Reddit’s vast repository of posts and comments. Utilizing the asyncpraw library, an asynchronous Reddit API wrapper, the scraper fetches content based on user-provided URLs. Here’s a glimpse into its functionality:

class RedditScraper:
    def __init__(self):
        # Initialize Reddit client with credentials
        self.reddit = asyncpraw.Reddit(
            client_id=os.getenv("REDDIT_CLIENT_ID"),
            client_secret=os.getenv("REDDIT_CLIENT_SECRET"),
            user_agent=os.getenv("REDDIT_USER_AGENT")
        )
    
    async def extract_post_data(self, url: str) -> Dict[str, Any]:
        # Fetch and process submission data
        submission = await self.reddit.submission(url=url)
        await submission.load()
        # Extract relevant details and comments
        # ...

The scraper ensures that only meaningful and non-deprecated directories (like venv/) are accessed, maintaining the integrity and security of the data extraction process.

2. Text Sanitization

File: agents/sanitizer.py

Raw Reddit data often contains noise—URLs, markdown formatting, special characters, and more. The sanitizer cleans and normalizes this content, making it suitable for further processing.

async def clean_text(content: Dict[str, Any]) -> Dict[str, Any]:
    # Clean title and main text
    cleaned_data = {
        "title": _clean_string(content["title"]),
        "main_text": _clean_string(content["selftext"]),
        # ...
    }
    # Filter and clean comments
    # ...
    return cleaned_data

This step is crucial for ensuring that subsequent analyses, like theme extraction and lyrics generation, operate on clear and concise text.

3. Theme Extraction

File: agents/theme_extractor.py

Understanding the themes and sentiments within the Reddit content is pivotal for generating relevant diss tracks. Leveraging Hugging Face’s transformers library, RedDiss employs a zero-shot classification pipeline to identify dominant themes.

class ThemeExtractor:
    def __init__(self):
        self.classifier = pipeline(
            "zero-shot-classification",
            model="facebook/bart-large-mnli",
            device=-1  # CPU usage
        )
        self.candidate_themes = ["wealth/money", "success/achievements", ...]
    
    async def extract_themes(self, content: Dict[str, Any]) -> Dict[str, Any]:
        main_themes = await self._classify_text(main_content)
        # Extract themes from comments
        # ...
        return themes_data

By analyzing both the main content and top comments, the theme extractor ensures a comprehensive understanding of the target’s discourse.

4. Lyrics Generation

File: agents/lyrics_generator.py

At the heart of RedDiss lies its ability to craft diss track lyrics. Utilizing Llama 3.3 through the litellm library, the generator produces verses tailored to the extracted themes and chosen style.

class LyricsGenerator:
    def __init__(self):
        self.model = "ollama/llama3.3:latest"
    
    async def generate_lyrics(self, themes: Dict[str, Any], style: str) -> Dict[str, Any]:
        context = self._build_context(themes, style)
        lyrics = await self._generate_verses(context)
        structured_lyrics = self._structure_lyrics(lyrics)
        return structured_lyrics

The lyrics are scaffolded into structured formats, including verses, chorus, and outro, ensuring a coherent and impactful flow.

5. Flow Refinement

File: agents/flow_refiner.py

Raw lyrics can benefit from refinement to enhance their rhythmic and rhyming quality. The flow refiner employs Llama 3.3 to polish the generated lyrics, focusing on internal rhyme schemes, wordplay, and punchline effectiveness.

class FlowRefiner:
    def __init__(self):
        self.model = "ollama/llama3.3:latest"
    
    async def refine_flow(self, lyrics: Dict[str, Any], flow_complexity: int) -> Dict[str, Any]:
        refined_lyrics = {}
        for section, content in lyrics.items():
            refined_lyrics[section] = await self._enhance_section(content, section, flow_complexity)
        return refined_lyrics

This iterative process ensures that the diss tracks resonate with the desired intensity and sophistication.

6. Text-to-Speech (TTS) Engine

File: agents/tts_engine.py

Transforming written lyrics into spoken word is achieved through the TTS engine. On macOS, RedDiss leverages the native say command, combined with ffmpeg for audio processing, to generate high-quality vocal tracks.

class TTSEngine:
    def __init__(self):
        # Verify availability of 'say' and 'ffmpeg'
        subprocess.run(['say', '-?'], capture_output=True)
        subprocess.run(['ffmpeg', '-version'], capture_output=True)
    
    async def text_to_speech(self, lyrics: Dict[str, Any]) -> str:
        audio_sections = []
        for section, content in lyrics.items():
            # Generate audio for each section
            subprocess.run(['say', '-v', 'Daniel', '-r', '220', '-f', temp_txt.name, '-o', temp_aiff.name], check=True)
            # Process with ffmpeg
            subprocess.run(['ffmpeg', '-i', temp_aiff.name, '-af', 'acompressor=...', '-ar', '44100', '-ac', '1', '-y', temp_wav.name], check=True)
            # Normalize and append
            audio_sections.append(audio_array)
        # Combine sections and save
        final_audio = np.concatenate(audio_sections)
        sf.write("data/audio/raw_vocals.wav", final_audio, 44100)
        return "data/audio/raw_vocals.wav"

This component ensures that the diss tracks not only look good on paper but also sound compelling to the ear.

7. Beat Synchronization

File: agents/beat_sync.py

No diss track is complete without the right beat. The beat synchronizer aligns the vocal tracks with the chosen beats, ensuring timed precision and harmonious integration.

class BeatSynchronizer:
    def __init__(self):
        self.target_tempo = 90  # BPM
    
    async def sync_to_beat(self, vocals_path: str, beat_url: str) -> str:
        # Load vocals and beat
        vocals, sr_vocals = librosa.load(vocals_path)
        beat_path = await self._download_beat(beat_url)
        beat, sr_beat = librosa.load(beat_path)
        # Analyze tempo and synchronize
        # Mix and save the final track
        return "data/audio/synced_track.wav"

By adjusting tempos and aligning beats, this module ensures that the diss tracks maintain a steady and immersive rhythm.

8. Audio Mastering

File: agents/mastering.py

The final polish comes from the audio mastering component, which enhances the track’s quality, balances audio levels, and ensures consistency across platforms.

class AudioMaster:
    def __init__(self):
        self.target_lufs = -14.0
        self.target_peak = -1.0
    
    async def master_audio(self, audio_path: str) -> str:
        # Load audio, apply compression, EQ, stereo enhancement, and limiting
        # Save the mastered audio
        sf.write("data/audio/mastered/final_track.wav", processed, sr)
        return "data/audio/mastered/final_track.wav"

This meticulous process guarantees that each diss track is studio-quality, ready for listeners to engage and enjoy.

Streamlit Front-End

File: streamlit_app.py

The user-facing interface of RedDiss is built with Streamlit, offering an intuitive platform for users to generate diss tracks effortlessly.

def main():
    st.title("Reddit Diss Track Generator")
    reddit_url = st.text_input("Enter Reddit Post URL", placeholder="https://reddit.com/r/...")
    style = st.selectbox("Diss Track Style", ["Aggressive", "Playful", "Sarcastic"])
    flow_complexity = st.slider("Flow Complexity", 1, 10, 5)
    beat_intensity = st.slider("Beat Intensity", 1, 10, 5)
    if st.button("Generate Diss Track"):
        # Orchestrate the diss track generation process
        # Display lyrics and audio

This seamless user experience ensures that both novices and experts can harness the power of RedDiss with ease.

Backend Integration with FastAPI

File: main.py

RedDiss’s backend is powered by FastAPI, facilitating efficient handling of API requests and orchestrating the diss track generation workflow.

app = FastAPI(title="Diss Track AI", description="AI-powered diss track generator using Reddit content", version="1.0.0")

@app.get("/generate_diss")
async def generate_diss(url: str, style: str, beat_url: Optional[str] = None, flow_complexity: int = 5):
    try:
        # Sequentially execute scraping, sanitization, theme extraction, lyrics generation, flow refinement, TTS, beat sync, and mastering
        return {"status": "success", "lyrics": refined_lyrics, "audio_file": final_track}
    except Exception as e:
        # Handle and log errors
        raise HTTPException(status_code=500, detail={"error": str(e), "step": "unknown"})

This robust backend ensures that RedDiss can handle multiple simultaneous requests, maintaining performance and reliability.

Testing and Quality Assurance

File: tests/test_sanitizer.py

To maintain high-quality outputs, RedDiss incorporates a suite of tests using pytest. For instance, the sanitizer module is rigorously tested to ensure it effectively cleans and preprocesses Reddit content.

@pytest.mark.asyncio
async def test_clean_text():
    test_data = {
        "title": "Test [Post] with http://example.com URLs",
        "selftext": "Some & special characters < here >",
        # ...
    }
    result = await clean_text(test_data)
    assert result["title"] == "test post with urls"
    # Additional assertions

These tests validate the functionality of each component, ensuring that RedDiss operates smoothly and produces accurate results.

Installation and Deployment

File: README.md

Setting up RedDiss is straightforward, guided by comprehensive documentation. Here’s a condensed version of the installation steps:

  1. Clone the Repository
     git clone https://github.com/kliewerdaniel/RedDiss.git
     cd RedDiss
    
  2. Install Dependencies
     pip install -r requirements.txt
    
  3. Set Up Environment Variables Create a .env file in the root directory with Reddit API credentials:
     REDDIT_CLIENT_ID=your_client_id
     REDDIT_CLIENT_SECRET=your_client_secret
     REDDIT_USER_AGENT=DissTrackAI/1.0.0
    
  4. Run the Streamlit App
     streamlit run streamlit_app.py
    

This streamlined setup ensures that users can quickly get started, tapping into the full potential of RedDiss without unnecessary hurdles.

Conclusion and Future Prospects

RedDiss exemplifies the harmonious integration of data extraction, natural language processing, and audio engineering. By transforming raw Reddit content into personalized diss tracks, it not only showcases the capabilities of modern AI but also underscores the potential for creative applications in digital entertainment.

Daniel Kliewer’s methodical approach—evident in the project’s structured architecture and comprehensive testing—lays a solid foundation for future enhancements. Potential avenues for expansion include incorporating more diverse AI models for lyric generation, enhancing beat synchronization with a broader library of beats, and expanding the application’s reach to other social media platforms.

As AI continues to redefine the boundaries of creativity, projects like RedDiss pave the way for innovative applications that blend technology with artistic expression, offering users unique and personalized experiences in the digital age.