An open API service indexing awesome lists of open source software.

https://github.com/ronenh24/farkle_simulation

Simulating two-player Farkle with different strategies.
https://github.com/ronenh24/farkle_simulation

deep-q-learning farkle game simulation

Last synced: about 1 year ago
JSON representation

Simulating two-player Farkle with different strategies.

Awesome Lists containing this project

README

          

# Farkle Simulation

## Author
Ronen Huang

## Time Frame
November 2021 to December 2021 (modified for Python usage), January 2025 to Present (add deep Q-Learning).

## Install Farkle Simulation
The Farkle Simulation can be downloaded by pip.
```commandline
pip install farkle-simulation
```

## Farkle
The rules of Farkle can be seen in the *In a Nutshell* section of [https://farkle.games/official-rules/](https://farkle.games/official-rules/). In this case, there are two players with **Player 1** as the first player and **Player 2** as the second player.

The scoring system can be seen in the table below

Dice to Keep Score
Three 1s or Straight 1000
Three Different Pairs 750
Three 6s 600
Three 5s 500
Three 4s 400
Three 3s 300
Three 2s 200
One 1 100
One 5 50

A player begins their turn with all six dice.
- If there are scoring combinations, a player can choose any and
- Either keep rolling or stop
- When number of dice is 0, it resets to 6
- Otherwise, a player has **"farkled"** and the turn score is **0**

Once a player reaches 10,000 points, they have won the game.

## Strategies
The implementation of the strategies is in [`strategy.py`](src/farkle_simulation/components/strategy.py).

The strategies that Player 1 and Player 2 can use are naive strategy, simple RL strategy, and custom strategy (manual). The `turn` function goes through one turn.
```python
from farkle_simulation.components.strategy import\
NaiveStrategy, SimpleRLStrategy, CustomStrategy

naive_strategy = NaiveStrategy()
simple_rl_strategy = SimpleRLStrategy()
custom_strategy = CustomStrategy()

naive_turn_score = naive_strategy.turn(current_score, advantage)
simple_rl_turn_score = simple_rl_strategy.turn(current_score, advantage)
custom_turn_score = custom_strategy.turn(current_score, advantage)
```

### Custom Strategy
The player chooses the action each roll via typing 1 to 7 on the command line.

### Naive Strategy
For each roll
- Choose action that maximizes roll score
- Stop if number of dice is less than or equal to 2 and disadvantage of less than 1,000.

### Simple Reinforcment Learning Strategy
For each roll
- Choose action that maximizes reward based on
- Number of Rolls
```python
# The gamma is reward factor decrease by roll.
actual_reward = gamma ** np.log2(num_rolls + 1) * roll_score
```
- Distance to 10,000 and Current Score
```python
# Increases as close to 10,000.
distance_factor = distance_scale ** (current_score / 10000)
```
- Advantage
```python
# Increase as more behind.
advantage_factor = 1
if advantage < 0:
advantage_factor *= max(
np.emath.logn(advantage_scale, -advantage), 1
)
```
The architecture of the deep Q-learning networks takes 11 dimensional input state
- Distance
- Advantage
- Current Turn Score
- Roll Maxes

and predicts reward for each of 7 possible actions
- Keep 1 to 6 dice
- Stop

This can be seen in [`architecture.py`](src/farkle_simulation/components/architecture.py).

The training process can be seen at [`simple_farkle_rl.py`](src/farkle_simulation/components/simple_farkle_rl.py). The best model state dictionary is saved as [simple_action_reward_state_dict.pt](src/farkle_simulation/components/simple_action_reward_state_dict.pt).
```python
from farkle_simulation.components.simple_farkle_rl import train

train()
```

The plot of turn score by current score can be seen in **[training_simple_rl.jpg](src/plots/training_simple_rl.jpg)** and the table in **[training_simple_rl.csv](src/tables/training_simple_rl.csv)**.
![Training Simple RL](src/plots/training_simple_rl.jpg)

## Simulation
The implementation of the simulation is in [`simulation.py`](src/farkle_simulation/components/simulation.py).

The `convergence_plot` function plots the expected probability Player 1 wins using some strategy with Player 2 using some other strategy.
```python
from farkle_simulation.components.simulation import convergence_plot

convergence_plot(naive_strategy, simple_rl_strategy)
convergence_plot(simple_rl_strategy, naive_strategy)
```

The convergence plots can be seen at **[convergence_naive_simple_rl.jpg](src/plots/convergence_naive_simple_rl.jpg)** ![Convergence Plot Naive vs Simple RL](src/plots/convergence_naive_simple_rl.jpg) and **[convergence_simple_rl_naive.jpg](src/plots/convergence_simple_rl_naive.jpg)** ![Convergence Plot Simple RL vs Naive](src/plots/convergence_simple_rl_naive.jpg)

The `histogram` function plots the average probability Player 1 wins using some strategy with Player 2 using some other strategy.
```python
from farkle_simulation.components.simulation import histogram

histogram(naive_strategy, simple_rl_strategy)
histogram(simple_rl_strategy, naive_strategy)
```

The histograms can be seen at **[histogram_naive_simple_rl.jpg](src/plots/histogram_naive_simple_rl.jpg)** ![Histogram Naive vs Simple RL](src/plots/histogram_naive_simple_rl.jpg) and **[histogram_simple_rl_naive.jpg](src/plots/histogram_simple_rl_naive.jpg)** ![Histogram Simple RL vs Naive](src/plots/histogram_simple_rl_naive.jpg)

## Pipeline
To train the simple RL agent and compare the strategies through convergence plot and histogram.
```python
from farkle_simulation.pipeline import train_simulate

train_simulate()
```

To play a game against the simple RL agent.
```python
from farkle_simulation.pipeline import play_game

play_game()
```
Input 1 to play as first player and 2 to play as second player.
```console
Player 1 (1) or Player (2)? 1
```
Input action to chose for dice combination.
```console
Dice Combination - (1, 1, 1, 1, 1, 4)
Legal Moves - (1) Keep 1 - max 100 remaining dice 5, (2) Keep 2 - max 200 remaining dice 4, (3) Keep 3 - max 1000 remaining dice 3, (4) Keep 4 - max 1100 remaining dice 2, (5) Keep 5 - max 1200 remaining dice 1, (7) No Roll - max 1200 remaining dice 0
Action - 7
```
The output of action is.
```console
Current Turn Score - 1200
Current Score - 1200
Turn Score - 1200
Player 1 Score - 1200 Player 2 Score - 0
```
The output of a "farkle" is.
```console
Dice Combination - (2, 6)
Farkled
Turn Score - 0
```

## References
Farkle Official Rules. (2025). Retrieved from https://farkle.games/official-rules/

Le Cam, L. (1986). The Central Limit Theorem Around 1935. *Statistical Science*, *1*(1), 78–91. Retrieved from http://www.jstor.org/stable/2245503

Mnih, V., Kavukcuoglu, K., Silver, D. *et al.* (2015). Human-level control through deep reinforcement learning. *Nature* **518**, 529–533. Retrieved from https://doi.org/10.1038/nature14236