Veedoo Blog - Can AI Help Build a GPU Video Pipeline in Five Days?

My AI-Assisted Coding Challenge

Like a lot of developers, I’ve been curious whether AI coding assistants could take a project from idea to production. This week was my test. The goal: a GPU-powered video pipeline for diarisation, transcription, and face detection.

In one week, I went from concept to a fully functional backend for large‑scale video processing, powered by serverless GPU functions. Leveraging AI‑assisted coding tools such as Claude Code, I rapidly prototyped, tested, and optimised a system capable of diarising speakers, transcribing speech, detecting faces, and generating video clips - all integrated with Supabase.

The lesson: AI can accelerate your progress, but it won’t do the hard thinking for you. Choosing the right models, optimising for GPUs, and pushing past inefficient defaults still required hands-on problem-solving.

Here's how it all went down.

Day 1: Initial Stack and Early Insights

For speed and flexibility, I began with a self‑hosted Next.js application and Supabase in a single codebase. The challenge: process long videos, identify speakers, and produce accurate transcripts.

I explored speaker diarisation (pyannote.audio) and speech‑to‑text (Whisper models), quickly realising that AI assistants often default to older, well‑known tools from their training data. For example, they suggested outdated face recognition models. I opted instead for InsightFace, a modern, high‑accuracy alternative.

Prototyping with Claude Code surfaced another issue: it optimised for my local Apple MPS GPU, incompatible with the NVIDIA GPUs on my deployment platform (Runpod). This reinforced the importance of knowing the target environment before coding.

My initial CPU‑based pipeline took over five hours to process a two‑hour video. Switching to local test files and pre‑computing face encodings in Supabase sped things up, but I still couldn’t get below a 1:1 processing ratio. GPU acceleration became the clear next step.

Day 2: Exploring GPU Hosting

I evaluated vast.ai, modal.com, and runpod.io, choosing Runpod serverless functions for flexibility and cost. The planned workflow: a cron job in Next.js triggers a Runpod function to process new videos and update Supabase.

I designed three GPU‑powered functions:

Process videos – Full pipeline from download to diarisation, transcription, face detection, clip creation, and upload.
Face encoder – Pre‑computes InsightFace encodings for speaker portraits in Supabase, dramatically speeding up detection.
Clip creator – Generates custom clips with optional watermarks, plus transcripts and thumbnails.

Days 3–4: Implementation with AI Support

With the architecture set, I implemented the Runpod functions, using Claude Code for rapid Python development while actively steering it away from inefficient solutions. I integrated Glitchtip for error monitoring and validated all functions through testing.

Day 5: Optimisation and GPU Integration

Reviewing the AI‑generated code revealed inefficiencies, such as downloading AI models on every request. I added caching and bundled models into the Docker image. I also moved clip creation and transcription segmenting to the GPU.

Performance gains:

Face encoder: Initial run on 6,044 portraits – 11 minutes; subsequent updates – seconds.
Process videos: Four‑hour video processed end‑to‑end in ~1h 30m (down from 5h on CPU).
Clip creator: GPU acceleration significantly reduced rendering and thumbnail generation times.

Key Takeaways

AI‑assisted coding accelerates prototyping but requires domain knowledge to select optimal tools and avoid outdated defaults.
Deployment context matters. Guide the AI to target the correct hardware and environment.
Throwaway code is valuable for validating ideas quickly before refining for production.
GPU acceleration can transform processing times for compute‑intensive media tasks.

AI coding assistants: powerful tools, but they still require human oversight

In five days, I delivered a scalable, GPU‑powered backend tightly integrated with Supabase, capable of processing multi‑hour videos efficiently. The combination of AI‑assisted development, rapid prototyping, and targeted optimisation laid a strong foundation for future enhancements and deployment.

The next step is scaling it further, but the real takeaway is clear: with the right mix of AI support and hands-on problem-solving, even ambitious projects can move faster than you think. AI won’t replace developers just yet, but it can help us build and learn faster.