CFR Strategy Explorer

Hole cards

Your private cards — only you see these. The network encodes them into a 52-dimensional one-hot vector and estimates your preflop equity against a random hand.

Card 1

Card 2

Betting state

Pot3 BB

Total chips currently in the middle. Starts at 3BB (SB 1 + BB 2). The network normalises this by 2× the starting stack.

To call1 BB

How much you need to pay to stay in the hand. 0 means no bet to face. Directly influences the fold vs. call decision.

My stack49 BB

Your remaining chips. A deeper stack increases the significance of the all-in decision on later streets.

About this project

What does a mathematically optimal poker strategy look like — and how do you compute it without any human expertise? This project builds a self-learning algorithm that figures it out from scratch, starting from a simple 3-card game and scaling up to full No-Limit Hold'em.

Learning by regretting

The algorithm plays against itself millions of times and keeps track of one thing: regret. After every hand it asks — "what would have happened if I had bet instead of checked?" Over time it plays the actions it regrets skipping the most. No poker rules were hard-coded; bluffing and value-betting emerge entirely on their own.

Why a neural network?

A simple lookup table would need a separate entry for every possible game situation. In real poker that's more entries than atoms in the observable universe. A neural network sidesteps this by learning patterns — it generalises from situations it has seen to ones it hasn't, the same way a human learns to play new hands from experience.

What is a Nash equilibrium?

A strategy where neither player can do better by changing their approach — even if they know exactly what the opponent is doing. Think of it as a perfectly balanced strategy: unpredictable enough that no one can exploit it, yet rational enough that it doesn't throw away value. This is what the algorithm converges towards.

From toy games to real poker

The project starts with Kuhn Poker — a 3-card game solvable in milliseconds — and scales up to full No-Limit Hold'em. Each step adds cards, betting rounds, or players, revealing exactly what makes the problem harder. The same algorithm runs on all of them; only the computational cost changes.

Making it fast

Training requires hundreds of millions of simulated hands. The core engine is written in C++ and runs on the GPU, cutting each training round from hours to minutes on a standard laptop. The trained strategy is then compressed into an ONNX model that runs inference directly in your browser — no server involved.

How to use this tool

Select your two hole cards, set the street and pot size, then hit Analyse. The network returns the GTO probability for each action — how often a theoretically optimal player would fold, call, raise, or go all-in in that exact situation. Higher iterations of training produce sharper, more reliable recommendations.

⌥ View source on GitHub →