20 Fun AI Projects
Theory without practice is empty. This chapter is your playground: a collection of hands-on projects designed to be genuinely fun to build, surprisingly educational, and impressive enough to show off. Each project is self-contained, with clear instructions on what to build, which concepts it teaches, and how to get started. No project requires more than a single GPU (and several need no GPU at all).
If you remember one thing from this book, let it be this: you learn AI by building things, not by reading about building things. Every project in this chapter will teach you more than five chapters of theory. Pick the one that excites you most and start building today.
20.1 AI Dungeon Master: A Text Adventure Game
Build an interactive text-based adventure game powered by an LLM. The model serves as a dungeon master, generating vivid descriptions, NPC dialogue, combat outcomes, and consequences of player actions.
What you will learn: Prompt engineering, context management, system prompts, maintaining coherent state over long interactions, structured output parsing.
How to build it:
- Write a detailed system prompt that defines the world, rules, and the AI's role as narrator.
- Implement a conversation loop that sends player actions to the LLM and displays responses.
- Maintain a “world state” summary that gets injected into each prompt (character stats, inventory, location, recent events).
- Add structured output parsing: extract HP changes, item pickups, and location transitions from the LLM's narrative response.
Stretch goal: Add memory via RAG. Store important events in a vector database and retrieve relevant history when the player revisits locations or encounters recurring characters.
Tools: Ollama or any LLM API, Python, optionally ChromaDB for memory.
This seemingly playful project teaches the exact same skills used in production AI agents: managing context windows, maintaining state, parsing structured outputs, and handling the unpredictability of generative models. Many professional AI engineers got their start building exactly this kind of system.
20.2 Personal Knowledge Base with RAG
Build a chatbot that answers questions about your documents: notes, textbooks, PDFs, code repositories. This is one of the most practically useful projects you can build, and it teaches the full RAG (Retrieval-Augmented Generation) pipeline (Lewis et al. 2020).
What you will learn: Embeddings, vector databases, chunking strategies, retrieval algorithms, context injection, and the art of balancing retrieval quality with context window limits.
How to build it:
- Collect your documents and convert them to plain text (use PyPDF2 for PDFs, markdown parsers for notes).
- Split documents into chunks (experiment with chunk sizes: 256, 512, 1024 tokens).
- Embed each chunk using an embedding model (e.g.,
all-MiniLM-L6-v2from Sentence Transformers, or OpenAI'stext-embedding-3-small). - Store embeddings in a vector database (ChromaDB is the simplest to set up, FAISS for more control).
- At query time, embed the user's question, retrieve the top-\(k\) most relevant chunks, and inject them into the LLM's prompt.
Stretch goals: Add hybrid search (combine vector similarity with BM25 keyword matching). Add a reranker (e.g., Cohere Rerank or a cross-encoder model) to improve retrieval quality. Build a web UI with Gradio or Streamlit.
Tools: LlamaIndex (Liu 2023) or LangChain (Chase 2023), ChromaDB or FAISS, any embedding model, any LLM.
20.3 Train Your Own Tiny Language Model
This is the project that Andrej Karpathy made famous with his “Let's build GPT from scratch” video. Train a character-level language model on Shakespeare, then scale up to a BPE tokenizer and a subset of the web.
What you will learn: The transformer architecture from the ground up, tokenization, training loops, loss curves, sampling strategies, the relationship between model size and data size.
How to build it:
- Clone nanoGPT (
github.com/karpathy/nanoGPT) or start from Karpathy's YouTube tutorial. - Train on Shakespeare first (it is small and you will see results in minutes).
- Observe: at step 100, the model outputs random characters. At step 1000, it learns word boundaries. At step 5000, it generates plausible (if nonsensical) Shakespearean English. At step 10000+, it starts producing genuinely coherent passages.
- Experiment: change the number of layers, the embedding dimension, the number of heads. How does each change affect training speed and output quality?
- Scale up: switch to a BPE tokenizer and train on a larger corpus (Wikipedia, a book collection).
Stretch goal: Fine-tune your tiny model on a specific author's writing style. Can a 10M parameter model learn to write like Hemingway vs. Tolkien?
Tools: PyTorch, nanoGPT or litgpt, a single GPU (or even CPU for tiny models).
Andrej Karpathy's YouTube series “Neural Networks: Zero to Hero” is the single best resource for understanding transformers from first principles. It takes you from basic neural networks through backpropagation, character-level models, and finally to a full GPT implementation. If you have not watched it, this project is your excuse to start.
20.4 Image Generation with Stable Diffusion
Set up a local Stable Diffusion pipeline and become a prompt engineer for images. Generate art, experiment with styles, and learn how diffusion models work from the inside.
What you will learn: Diffusion models, latent spaces, CLIP text conditioning, classifier-free guidance, ControlNet, img2img, LoRA fine-tuning.
How to build it:
- Install the
diffuserslibrary from Hugging Face and download a Stable Diffusion model (SDXL or SD 1.5). - Generate images from text prompts. Experiment with negative prompts (“blurry, low quality, distorted”) to improve quality.
- Try img2img: provide a rough sketch and let the model transform it into a detailed image.
- Install ControlNet for structural conditioning: generate images that follow a specific pose, edge map, or depth map.
- (Advanced) Train a LoRA on 10 to 20 images of a specific subject (your face, your pet, a particular art style).
Alternative: Use ComfyUI for a visual node-based workflow, or AUTOMATIC1111's WebUI for a feature-rich GUI.
Tools: diffusers, ComfyUI, or AUTOMATIC1111. Requires 8+ GB VRAM for SDXL.
20.5 AI Music Generation
Generate short music clips from text descriptions using Meta's MusicGen or Google's MusicLM. “An upbeat jazz piano solo” becomes an actual audio clip.
What you will learn: Audio tokenization (EnCodec), autoregressive generation over discrete audio codes, multimodal conditioning, the difference between audio and text generation.
How to build it:
- Install Meta's
audiocraftlibrary. - Generate music from text prompts at different durations (5s, 15s, 30s).
- Experiment with melody conditioning: hum a melody, and let MusicGen generate a full arrangement.
- Compare different model sizes (small, medium, large). How does model size affect musical quality?
Tools: audiocraft (Meta), Hugging Face Transformers.
20.6 AI-Powered Code Review Bot
Create a bot that automatically reviews pull requests on GitHub, identifying potential bugs, style issues, and security vulnerabilities.
What you will learn: API integration, structured diff parsing, prompt engineering for code analysis, the challenges of applying LLMs to real-world workflows.
How to build it:
- Set up a GitHub webhook that triggers on pull request events.
- Parse the diff to extract changed files and line numbers.
- Send the diff to an LLM with a system prompt that defines your coding standards and asks for specific feedback.
- Parse the LLM's response and post inline comments on the PR via the GitHub API.
Tools: GitHub API, any LLM API, Python (Flask or FastAPI for the webhook handler).
20.7 Voice Cloning
Clone a voice from a few minutes of audio, then generate speech in that voice from arbitrary text.
What you will learn: Speaker embeddings, mel spectrograms, vocoders, few-shot voice cloning, and the ethical implications of synthetic voice technology.
How to build it:
- Record 3 to 5 minutes of clear speech (or use a public audio sample).
- Use a voice cloning model (OpenVoice, Coqui TTS, or Bark) to clone the voice.
- Generate speech from new text in the cloned voice.
- Compare the clone quality across different models and amounts of training audio.
Voice cloning technology raises serious ethical questions. Never clone someone's voice without their explicit consent. Be aware that this technology can be (and has been) used for fraud, deepfakes, and impersonation. Many jurisdictions are developing laws around synthetic media. Use this project to understand the technology and its implications, not to deceive.
Tools: Coqui TTS, Bark, OpenVoice.
20.8 AI Art Style Transfer
Apply the artistic style of one image to the content of another. Turn your vacation photos into Van Gogh paintings, or apply the aesthetic of a Studio Ghibli film to your selfies.
What you will learn: Feature extraction with CNNs, Gram matrices, perceptual loss, and the difference between content and style representations.
How to build it:
- Implement classical neural style transfer using a pre-trained VGG network and PyTorch.
- Try modern approaches: style LoRAs with Stable Diffusion, or IP-Adapter for style conditioning.
- Compare the results: classical style transfer produces painterly effects, while diffusion-based approaches can create more creative interpretations.
Tools: PyTorch, diffusers, pre-trained VGG-19.
20.9 Exercises
- Build the text adventure game and play through at least 50 turns. Document three failure modes (where the AI breaks character, contradicts itself, or loses track of the world state) and implement fixes for each.
- Create a RAG chatbot over your course notes or a textbook. Ask it 20 factual questions and score each answer as correct, partially correct, or incorrect. What is the accuracy? Identify the most common failure mode and try to fix it.
- Train a character-level GPT on a corpus of your choice (song lyrics, fan fiction, legal documents, cooking recipes). Generate samples at checkpoints and create a “gallery” showing how output quality evolves during training.
- Generate 20 images with Stable Diffusion using the same base prompt but different random seeds. How much variation do you see? Now try the same prompt with different guidance scale values (2, 5, 7.5, 15, 30). How does guidance scale affect quality and diversity?
- Combine two projects: build a text adventure game that generates images of each scene using Stable Diffusion. The game describes a forest clearing, and you see a generated image of a forest clearing alongside the text.