Pokémon With Arguments — How BattleTalk Was Built
#AI 03.13.2026 — 6 MIN READ

Pokémon With Arguments — How BattleTalk Was Built

Two players, one topic, 60 seconds of voice input. Then AI takes over. BattleTalk turns spoken arguments into playing cards and makes them clash in a real-time card duel.

BattleTalk started with a simple observation: two colleagues who constantly argue about everything. And a thought — what if an AI could take over that discussion?

The idea: Two players compete online. Each gets 60 seconds to speak their arguments on a random topic. Then the AI analyzes the arguments, turns them into playing cards — and the players battle it out in a card duel.

Or shorter: Pokémon with arguments. Except you build your own Pokémon — in 60 seconds, with your own voice.


The First Version: AI Debates For You

The first concept was linear. Two players are matched online, a random debate topic is displayed, both are assigned PRO or CONTRA. After 30 seconds of preparation, each gets 60 seconds for voice input. The recordings are transcribed and handed to two AI agents that debate on behalf of the players over five rounds. At the end, a judge model scores each round and crowns a winner.

The most important rule: the AI was only allowed to use arguments the player actually made. Expand, sharpen, strengthen rhetorically — but never invent new ones. Without this rule, it would have been a pure AI-vs-AI game where player input is irrelevant.

For the AI side, I chose the Groq API. Llama 3.1 8B for the debate agents because it's fast enough for real-time ping-pong. Llama 3.3 70B as the judge for more nuanced scoring. And Whisper Large V3 Turbo for transcription.

It worked. Technically.

Why V1 Wasn't Enough

The AI-vs-AI debate was fun to read — the first time. Less so the second. The problem was straightforward: after voice input, the player was just a spectator.

The five rounds of AI debate ran automatically. You could watch "your" AI fight, but you had no further influence. The identification worked psychologically — but it lacked what makes a game a game: ongoing decisions.

The debate also became predictable quickly. The AI agents delivered solid arguments, but the rounds felt similar. There was no strategic depth.

There was also a subtler problem: on certain topics, the same side always won — regardless of argument quality. On the topic of data privacy, for example, the AI practically always favored the PRO side. This is due to topic bias in LLM models. Because we were weighing arguments directly against each other, the model's built-in tendencies could tip the result.

The Pivot: Arguments Become Playing Cards

The decisive change came from a simple question: what if the player doesn't just give input once, but makes decisions throughout the entire game?

Instead of having the AI debate directly, the transcribed arguments are now turned into playing cards. The AI analyzes each player's speech, extracts 3-5 individual arguments, and turns each into a card with four scoring dimensions:

  • Logic — How sound is the argument?
  • Impact — How convincing is the phrasing?
  • Relevance — How relevant to the topic?
  • Eloquence — How well articulated?

Each dimension is scored on a scale of 1-10. The sum becomes the card's total value.

The crucial difference from V1: arguments are no longer evaluated against each other. Instead, each argument is judged on its own merits — how sound it is in the context of the topic, how convincingly phrased, how relevant. This eliminates the topic bias because the AI no longer has to decide which side is "right".

Glitch art: Audio waveform transforming into playing cards
VOICE TO CARDS — SPEECH BECOMES YOUR WEAPON

The cards are then played in a trick-taking system: each round, both players choose a card from their hand. Both are revealed simultaneously. The card with the higher total value wins the trick. After all tricks, whoever won more tricks wins the game.

The interesting part: the player must decide strategically which card to play when. Do you lead with your strongest card to take an early lead? Or save it for the final trick? What's the opponent doing?

How a Game Works

A player enters the site and clicks "Find Opponent". Once a second player joins, both see a ready check. The game only starts when both confirm.

A random debate topic appears — for example "Should the 4-day work week be standard?" or "Is privacy more important than security?". Each player is randomly assigned PRO or CONTRA. You defend your assigned position, even if you personally disagree.

After 30 seconds of preparation, recording begins. Both players speak their arguments simultaneously, up to 60 seconds. You can stop early.

The audio recordings are sent to Groq's Whisper API in parallel. Once both transcripts are ready, Llama 3.3 70B analyzes both contributions and extracts the arguments as scored cards. If a player delivers fewer arguments, their hand is filled with weak filler cards at score 4 — a deliberate disadvantage.

Glitch art: Playing cards with KPI bars floating in digital space
ARGUMENT CARDS — LOGIC, IMPACT, RELEVANCE, ELOQUENCE

Then the card battle begins. Both players see their own cards with all values. The opponent's cards are face-down. Each trick, you pick a card, both are revealed and compared. A result screen shows the trick winner before moving to the next round.

The Tech Stack

The app is a Next.js 16 application with React 19 and Tailwind CSS 4. The server is a custom Node.js HTTP server with Socket.io — no API-based backend, everything runs over WebSockets. TypeScript throughout.

Why Socket.io over REST? The game is inherently real-time. Both players must be guided through phases synchronously: matchmaking, ready check, timers, simultaneous recording, card exchange, trick results. REST polling would be unsuitable. Socket.io provides bidirectional real-time communication, room-based broadcasting, and an event-driven architecture that matches the game flow.

The server manages rooms with a state object per game. The game phases are a linear state machine:

waiting → ready_check → topic → prep → recording → transcribing → analyzing → card_battle → result
Glitch art: Abstract state machine data flow with WebSocket connections
REALTIME PIPELINE — WEBSOCKETS DRIVE THE GAME FLOW

The frontend is a single React component that renders all phases via conditional rendering. The phase is controlled by the server and synchronized via socket events.

The Rule That Holds Everything Together

The AI doesn't invent new arguments. It extracts and scores what the player actually said. Strong arguments get high scores. Weak ones get low scores. If you don't say anything good, you'll have bad cards — literally.

This means: the player wins through the quality of their arguments AND through strategic play. Both have to work.

What the Pivot Changed

The difference is fundamental. In V1, the player was a spectator after voice input. In V2, they're an active decision-maker for the entire match. Strategic depth comes from card selection and timing. The tension curve rises with each trick instead of falling linearly. And replayability is significantly higher because every match produces different cards.

What I'd Do Differently Today

One dimension I'm not fully convinced by yet: Eloquence. Finding a rhetorically brilliant phrasing in 60 spontaneous seconds is hard. On top of that, there's a methodological issue: the AI transcribes the voice input and inevitably smooths it out somewhat. When it then evaluates the eloquence of the phrasing, it's partly rating its own post-processing. There's a randomness factor the player can't control.

In a next version, I'd probably replace Eloquence with Creativity — meaning how original and unexpected an argument is. That's something players can consciously aim for, and it depends less on the AI's linguistic cleanup.


What Could Come Next

A few ideas are already in the drawer: an "Unpopular Opinion Mode" where players defend absurd positions. AI characters with different debate styles — aggressive, philosophical, satirical. A ranking system with Elo ratings. And a spectator mode for live audiences.

But for now, the core game stands. From first idea on March 7th to working prototype took six days. BattleTalk started as "AI debates for you" and became "your arguments become weapons — use them wisely". The pivot from passive watching to active card play fundamentally changed the concept: from tech demo to actual game.