Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jimthompson5802/connection_solver
Testbed for Agentic Solver for NYT Connection Puzzle
https://github.com/jimthompson5802/connection_solver
Last synced: 19 days ago
JSON representation
Testbed for Agentic Solver for NYT Connection Puzzle
- Host: GitHub
- URL: https://github.com/jimthompson5802/connection_solver
- Owner: jimthompson5802
- License: mit
- Created: 2024-02-19T23:23:26.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2024-10-24T02:52:57.000Z (23 days ago)
- Last Synced: 2024-10-24T19:26:47.382Z (23 days ago)
- Language: Python
- Homepage:
- Size: 493 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Connection Solver Virtual Assistant Testbed
Experimental project to solve the [NYT Connection puzzles](https://www.nytimes.com/games/connections) using agentic workflow based on the [`langchain` ecosystem](https://python.langchain.com/v0.2/docs/introduction/). In particular used:
* [`langchain`'s OpenAI LLM abstraction layer](https://python.langchain.com/v0.2/api_reference/openai/chat_models/langchain_openai.chat_models.base.ChatOpenAI.html#chatopenai) to interact with OpenAI's `gpt-4o` model
* [`langgraph`'s stateful orchestration framework](https://langchain-ai.github.io/langgraph/tutorials/multi_agent/multi-agent-collaboration/#multi-agent-network) to manage the agent's workflowHistorical NYT Connection Puzzles were used in testing the agent. Past puzzles can be found [here](https://word.tips/todays-nyt-connections-answers/).
## Connection Puzzle Description
Connections is a word game that challenges players to find themes between words. The user is presented with 16 words and must create groups of four items that share something in common. For example: Tropical fruit: banana, mango, pineapple, guava.## Solution Strategy
The agent uses the `PuzzleState` class to manage the agent's state and controls the agent's workflow.
```python
# Puzzle phase enums
PUZZLE_PHASE_UNINITIALIZED = "PUZZLE_PHASE_UNINITIALIZED"
PUZZLE_PHASE_SETUP = "PUZZLE_PHASE_SETUP"
PUZZLE_PHASE_SETUP_COMPLETE = "PUZZLE_PHASE_SETUP_COMPLETE"
PUZZLE_PHASE_SOLVING = "PUZZLE_PHASE_SOLVING"
PUZZLE_PHASE_COMPLETE = "PUZZLE_PHASE_COMPLETE"class PuzzleState(TypedDict):
puzzle_phase: int = PUZZLE_PHASE_UNINITIALIZED
words_remaining: List[str] = []
invalid_connections: List[List[str]] = []
recommended_words: List[str] = []
recommended_connection: str = ""
recommended_correct: bool = False
found_yellow: bool = False
found_greeen: bool = False
found_blue: bool = False
found_purple: bool = False
mistake_count: int = 0
recommendation_count: int = 0
llm_temperature: float = 1.0
input_source_type: str = ""
```
The attributes `words_remaining` and `mistake_count` are used to determine when to terminate the agent. When a correct group of 4 words are found, these words are removed from `words_remaining`. If a mistake is made, then `mistake_count` is incremented. The agent is terminated when either `words_reamaining` becomes empty or `mistake_count` exceeds a threshold.Overall control is performed by the `run_planner()` function. The agent's workflow is defined by the `StateGraph` class from `langgraph`. The agent's workflow is defined by a series of nodes and edges. The nodes are the agent's processing steps and the edges are the transitions between the processing steps. This function determines the next step in the agent's workflow based on the `puzzle_phase` of the agent.
Agent's workflow defintion:
```python
workflow = StateGraph(PuzzleState)workflow.add_node("run_planner", run_planner)
workflow.add_node("get_input_source", get_input_source)
workflow.add_node("read_words_from_file", read_words_from_file)
workflow.add_node("read_words_from_image", read_words_from_image)
workflow.add_node("get_recommendation", get_recommendation)
workflow.add_node("regenerate_recommendation", regenerate_recommendation)
workflow.add_node("apply_recommendation", apply_recommendation)
workflow.add_node("clear_recommendation", clear_recommendation)workflow.add_conditional_edges(
"run_planner",
determine_next_action,
{
"get_input_source": "get_input_source",
"get_recommendation": "get_recommendation",
END: END,
},
)workflow.add_conditional_edges(
"get_input_source",
route_input_source,
{
"read_words_from_file": "read_words_from_file",
"read_words_from_image": "read_words_from_image",
},
)workflow.add_edge("read_words_from_file", "run_planner")
workflow.add_edge("read_words_from_image", "run_planner")
workflow.add_edge("get_recommendation", "apply_recommendation")
workflow.add_edge("clear_recommendation", "run_planner")
workflow.add_edge("regenerate_recommendation", "apply_recommendation")workflow.add_conditional_edges(
"apply_recommendation",
is_end,
{
"run_planner": "run_planner",
"clear_recommendation": "clear_recommendation",
"regenerate_recommendation": "regenerate_recommendation",
},
)workflow.set_entry_point("run_planner")
app = workflow.compile()
app.get_graph().draw_png("images/connection_solver_graph.png")
```Diagram of the agent's workflow:
![Connection Solver Workflow](./images/connection_solver_graph.png)## Repo Contents
Major contents of the repo:
| File/Folder | Description |
| --- | --- |
| `src/agent/app.py` | Main entry point for the agent. Define workflow processing steps (aka graph nodes), workflow transitions (aka graph edges) and `PuzzleState` data structure.|
| `src/agent/tools.py` | Tools used by the agent: retrieve puzzle setup, interact with user and interface to OpenAI LLM|
| `src/agent/utils.py` | Utilities to be used by the agent. |
| `src/agent/tests/` | Unit tests for the agent. |
| `src/agent_testbed/` | Directory containing technical proof-of-concept code. |
| `data/` | Directory containing past NYT Connection Puzzles for testing. |
| `prompt_testbed/` | Directory containing sample prompts used in testing with the OpenAI Playground. |## Lesson Learned
While prompt engineering is a critical component to the agent's success, an equally critical function is setting up the right data structures to be used by the LLM. Speficially, randomizing the order of the words in `words_remaining` seemed to allow the LLM to get unstuck from invalid groupings.Automated testing is needed. Right now the agent is tested manually. This can be tedious as more test cases are needed. Automated testing would allow for more rapid development and testing of the agent.
Experiment tracking is needed. As different designs of the workflow and changes in functionality at different steps in the process, the results from testing should be automatically recorded. For this body of work, all of this was done either in hand-written notes or tracked via memory.
From a Virtual Coding Assistant perspective, perplexity.ai seemed to generate more useful code for `langchain` and `langgraph`. Github Copilot generated code for these libraries generated code that was not compatible with the current version of the libraries. This is probably due to GH Copilot is trained on code in public repos vs perplexity.ai uses a RAG based approach on current content in the web. perplexity.ai appears to support better at code generation for new and quickly evolving packages. However, once I have some code in the Visual Studio Code IDE, then GH Copilot reduced the effort to refactor and revise the code. For long standing packages, e.g, `pandas`, `numpy`, `matplotlib`, GH Copilot generates useful code snippets.
## Sample Runs
Note: Due to the random nature of the LLM, the results vary from run to run. For example, running the same puzzle multiple times may result in different recommendations from the LLM. As a result, the puzzle may get solved in one run and not in another.
### Solved Connection Puzzle 1
**Expected Solution**
```text
š” MAKE GOOD ON, AS A PROMISE: FULFILL ,HONOR ,KEEP ,UPHOLDš¢ BEDDING: BLANKET ,SHAM ,SHEET ,THROW
šµ ACTIONS IN CARD GAMES: DISCARD ,DRAW ,PASS ,PLAY
š£ CABINET DEPARTMENTS: ENERGY ,JUSTICE ,LABOR ,STATE
```
**Example Run**
```text
/usr/local/bin/python /workspaces/connection_solver/src/agent/app.py
Please enter the file location: data/word_list5.txtWords read from file: ['uphold', 'discard', 'honor', 'energy', 'state', 'play', 'justice', 'labor', 'pass', 'fulfill', 'draw', 'keep', 'blanket', 'sham', 'sheet', 'throw']
RECOMMENDED WORDS ['blanket', 'sheet', 'sham', 'throw'] with connection bedding items
Is the recommendation accepted? (y/g/b/p/n): g
Recommendation ['blanket', 'sheet', 'sham', 'throw'] is correctRECOMMENDED WORDS ['play', 'discard', 'draw', 'pass'] with connection Card game actions
Is the recommendation accepted? (y/g/b/p/n): b
Recommendation ['play', 'discard', 'draw', 'pass'] is correctRECOMMENDED WORDS ['honor', 'uphold', 'keep', 'fulfill'] with connection ways to maintain or adhere to something (e.g., a promise, duty)
Is the recommendation accepted? (y/g/b/p/n): y
Recommendation ['honor', 'uphold', 'keep', 'fulfill'] is correctRECOMMENDED WORDS ['energy', 'state', 'justice', 'labor'] with connection Departments of the US Government
Is the recommendation accepted? (y/g/b/p/n): p
Recommendation ['energy', 'state', 'justice', 'labor'] is correct
SOLVED THE CONNECTION PUZZLE!!!FINAL PUZZLE STATE:
{ 'found_blue': True,
'found_purple': True,
'found_yellow': True,
'invalid_connections': [],
'llm_temperature': 0.7,
'mistake_count': 0,
'recommendation_count': 4,
'recommended_connection': 'Departments of the US Government',
'recommended_correct': True,
'recommended_words': ['energy', 'state', 'justice', 'labor'],
'words_remaining': []}
```### Solved Connection Puzzle 2
**Expected Solution**
```text
š” BRING ABOUT: GENERATE ,INSPIRE ,PROMPT ,PROVOKEš¢ THINGS THAT ARE OFTEN SCENTED: CANDLE ,INCENSE ,LOTION ,SOAP
šµ THINGS THAT MIGHT STING: INSULT ,JELLYFISH ,NETTLE ,WASP
š£ VIDEO GAME FRANCHISES: CIVILIZATION ,HALO ,MADDEN ,METROID
```
**Example Run**
```text
/usr/local/bin/python /workspaces/connection_solver/src/agent/app.py
Please enter the file location: data/word_list2.txtWords read from file: ['inspire', 'madden', 'jellyfish', 'metroid', 'insult', 'candle', 'halo', 'provoke', 'soap', 'generate', 'incense', 'civilization', 'lotion', 'wasp', 'prompt', 'nettle']
RECOMMENDED WORDS ['madden', 'provoke', 'insult', 'incense'] with connection To Anger or Annoy
Is the recommendation accepted? (y/g/b/p/n): n
Recommendation ['madden', 'provoke', 'insult', 'incense'] is incorrectRECOMMENDED WORDS ['soap', 'lotion', 'candle', 'incense'] with connection Items that can produce a pleasant scent or are used in personal care
Is the recommendation accepted? (y/g/b/p/n): g
Recommendation ['soap', 'lotion', 'candle', 'incense'] is correctRECOMMENDED WORDS ['provoke', 'insult', 'nettle', 'incense'] with connection To Annoy or Irritate
Is the recommendation accepted? (y/g/b/p/n): n
Recommendation ['provoke', 'insult', 'nettle', 'incense'] is incorrectRECOMMENDED WORDS ['provoke', 'inspire', 'prompt', 'generate'] with connection words related to causing or inducing action or emotion
Is the recommendation accepted? (y/g/b/p/n): y
Recommendation ['provoke', 'inspire', 'prompt', 'generate'] is correctRECOMMENDED WORDS ['wasp', 'halo', 'metroid', 'jellyfish'] with connection video game titles
Is the recommendation accepted? (y/g/b/p/n): n
Recommendation ['wasp', 'halo', 'metroid', 'jellyfish'] is incorrectRECOMMENDED WORDS ['madden', 'civilization', 'metroid', 'halo'] with connection Video Game Titles
Is the recommendation accepted? (y/g/b/p/n): p
Recommendation ['madden', 'civilization', 'metroid', 'halo'] is correctRECOMMENDED WORDS ['jellyfish', 'nettle', 'insult', 'wasp'] with connection things that sting
Is the recommendation accepted? (y/g/b/p/n): b
Recommendation ['jellyfish', 'nettle', 'insult', 'wasp'] is correct
SOLVED THE CONNECTION PUZZLE!!!FINAL PUZZLE STATE:
{ 'found_blue': True,
'found_purple': True,
'found_yellow': True,
'invalid_connections': [ ['madden', 'provoke', 'insult', 'incense'],
['provoke', 'insult', 'nettle', 'incense'],
['wasp', 'halo', 'metroid', 'jellyfish']],
'llm_temperature': 0.7,
'mistake_count': 3,
'recommendation_count': 7,
'recommended_connection': 'things that sting',
'recommended_correct': True,
'recommended_words': ['jellyfish', 'nettle', 'insult', 'wasp'],
'words_remaining': []}
```### Solved Connection Puzzle 3
**Expected Solution**
```text
š” RUMMAGE: COMB ,DIG ,ROOT ,SIFTš¢ SOUNDS OF THUNDER: CLAP ,PEAL ,ROLL ,RUMBLE
šµ WAYS TO WEAR YOUR HAIR UP: BUN ,BRAID ,PONY ,TWIST
š£ THINGS THAT CAN HAVE LEAVES: BOOK ,SALAD ,TABLE ,TREE
```**Example Run**
```text
/usr/local/bin/python /workspaces/connection_solver/src/agent/app.py
Please enter the file location: data/word_list4.txtWords read from file: ['rumble', 'table', 'pony', 'sift', 'roll', 'bun', 'tree', 'twist', 'salad', 'clap', 'comb', 'peal', 'dig', 'braid', 'root', 'book']
RECOMMENDED WORDS ['bun', 'pony', 'braid', 'comb'] with connection Related to hairstyles
Is the recommendation accepted? (y/g/b/p/n): n
Recommendation ['bun', 'pony', 'braid', 'comb'] is incorrectRECOMMENDED WORDS ['twist', 'braid', 'roll', 'bun'] with connection Hairstyles
Is the recommendation accepted? (y/g/b/p/n): n
Recommendation ['twist', 'braid', 'roll', 'bun'] is incorrectRECOMMENDED WORDS ['comb', 'dig', 'sift', 'root'] with connection Actions related to gardening or soil preparation
Is the recommendation accepted? (y/g/b/p/n): y
Recommendation ['comb', 'dig', 'sift', 'root'] is correctRECOMMENDED WORDS ['clap', 'rumble', 'peal', 'roll'] with connection Types of sounds
Is the recommendation accepted? (y/g/b/p/n): g
Recommendation ['clap', 'rumble', 'peal', 'roll'] is correctRECOMMENDED WORDS ['table', 'book', 'salad', 'tree'] with connection Types of leaves
Is the recommendation accepted? (y/g/b/p/n): p
Recommendation ['table', 'book', 'salad', 'tree'] is correctRECOMMENDED WORDS ['twist', 'bun', 'pony', 'braid'] with connection types of hairstyles
Is the recommendation accepted? (y/g/b/p/n): b
Recommendation ['twist', 'bun', 'pony', 'braid'] is correct
SOLVED THE CONNECTION PUZZLE!!!FINAL PUZZLE STATE:
{ 'found_blue': True,
'found_purple': True,
'found_yellow': True,
'invalid_connections': [ ['bun', 'pony', 'braid', 'comb'],
['twist', 'braid', 'roll', 'bun']],
'llm_temperature': 0.7,
'mistake_count': 2,
'recommendation_count': 6,
'recommended_connection': 'types of hairstyles',
'recommended_correct': True,
'recommended_words': ['twist', 'bun', 'pony', 'braid'],
'words_remaining': []}
```### Solved Connection Puzzle 4
This puzzle is defined by the image from the NYT Connection Puzzle grid for October 20, 2024. A screenshot of the NYT online Connection Puzzle is saved to disk. The agent reads the words from the image and solves the puzzle.**Puzzle Grid Screenshot**
![Connection Puzzle Grid](src/agent_testbed/connection_puzzle_image.png)
**Expected Solution**
![Connection Puzzle Solution](src/agent_testbed/connection_puzzle_solution.png)
**Example Run**
```text
/usr/local/bin/python /workspaces/connection_solver/src/agent/app.py
Enter 'file' to read words from a file or 'image' to read words from an image: image
Please enter the image file location: src/agent_testbed/connection_puzzle_image.pngWords read from image: ['paddle', 'sew', 'row', 'story', 'oar', 'fore', 'column', 'racket', 'net', 'butt', 'feature', 'ball', 'clatter', 'table', 'ruckus', 'article']
RECOMMENDED WORDS ['oar', 'paddle', 'fore', 'row'] with connection Rowing-related terms
Is the recommendation accepted? (y/g/b/p/n): n
Recommendation ['oar', 'paddle', 'fore', 'row'] is incorrectRECOMMENDED WORDS ['oar', 'paddle', 'butt', 'ball'] with connection Parts of a Rowing Boat
Is the recommendation accepted? (y/g/b/p/n): n
Recommendation ['oar', 'paddle', 'butt', 'ball'] is incorrectRECOMMENDED WORDS ['story', 'feature', 'article', 'column'] with connection Parts of a newspaper or magazine
Is the recommendation accepted? (y/g/b/p/n): y
Recommendation ['story', 'feature', 'article', 'column'] is correctRECOMMENDED WORDS ['racket', 'ruckus', 'clatter', 'row'] with connection Noise or commotion
Is the recommendation accepted? (y/g/b/p/n): g
Recommendation ['racket', 'ruckus', 'clatter', 'row'] is correctRECOMMENDED WORDS ['net', 'table', 'ball', 'paddle'] with connection Table Tennis Terms
Is the recommendation accepted? (y/g/b/p/n): b
Recommendation ['net', 'table', 'ball', 'paddle'] is correctRECOMMENDED WORDS ['fore', 'sew', 'butt', 'oar'] with connection Homophones of numbers (four, so, but, or)
Is the recommendation accepted? (y/g/b/p/n): p
Recommendation ['fore', 'sew', 'butt', 'oar'] is correct
SOLVED THE CONNECTION PUZZLE!!!FINAL PUZZLE STATE:
{ 'found_blue': True,
'found_purple': True,
'found_yellow': True,
'input_source_type': 'image',
'invalid_connections': [ ['oar', 'paddle', 'fore', 'row'],
['oar', 'paddle', 'butt', 'ball']],
'llm_temperature': 0.7,
'mistake_count': 2,
'recommendation_count': 6,
'recommended_connection': 'Homophones of numbers (four, so, but, or)',
'recommended_correct': True,
'recommended_words': ['fore', 'sew', 'butt', 'oar'],
'words_remaining': []}
```### Failed to Solve Connection Puzzle 1
**Expected Solution**
```text
š” FOOTBALL POSITIONS: CENTER ,GUARD ,QUARTERBACK ,SAFETYš¢ CABLE CHANNELS: DISCOVERY ,HISTORY ,NICKELODEON ,OXYGEN
šµ FICTIONAL CLOWNS: HOMEY ,JOKER ,PENNYWISE ,RONALD
š£ WHAT āDā MIGHT STAND FOR: DEFENSE ,DEMOCRAT ,DIMENSIONAL ,DRIVE
```**Example Run**
```text
/usr/local/bin/python /workspaces/connection_solver/src/agent/app.py
Please enter the file location: data/word_list3.txtWords read from file: ['center', 'pennywise', 'democrat', 'safety', 'oxygen', 'history', 'guard', 'homey', 'joker', 'quarterback', 'ronald', 'defense', 'discovery', 'drive', 'nickelodeon', 'dimensional']
RECOMMENDED WORDS ['quarterback', 'safety', 'defense', 'guard'] with connection Football positions
Is the recommendation accepted? (y/g/b/p/n): n
Recommendation ['quarterback', 'safety', 'defense', 'guard'] is incorrectRECOMMENDED WORDS ['nickelodeon', 'joker', 'pennywise', 'ronald'] with connection Famous Clowns
Is the recommendation accepted? (y/g/b/p/n): n
Recommendation ['nickelodeon', 'joker', 'pennywise', 'ronald'] is incorrectRECOMMENDED WORDS ['quarterback', 'defense', 'guard', 'safety'] with connection Football Positions
Is the recommendation accepted? (y/g/b/p/n): n
Recommendation ['quarterback', 'defense', 'guard', 'safety'] is incorrectRECOMMENDED WORDS ['quarterback', 'center', 'dimensional', 'drive'] with connection Positions or terms related to football
Is the recommendation accepted? (y/g/b/p/n): n
Recommendation ['quarterback', 'center', 'dimensional', 'drive'] is incorrect
FAILED TO SOLVE THE CONNECTION PUZZLE TOO MANY MISTAKES!!!FINAL PUZZLE STATE:
{ 'found_blue': False,
'found_purple': False,
'found_yellow': False,
'invalid_connections': [ ['quarterback', 'safety', 'defense', 'guard'],
['nickelodeon', 'joker', 'pennywise', 'ronald'],
['quarterback', 'defense', 'guard', 'safety'],
[ 'quarterback',
'center',
'dimensional',
'drive']],
'llm_temperature': 0.7,
'mistake_count': 4,
'recommendation_count': 4,
'recommended_connection': 'Positions or terms related to football',
'recommended_correct': False,
'recommended_words': ['quarterback', 'center', 'dimensional', 'drive'],
'words_remaining': [ 'drive',
'safety',
'discovery',
'homey',
'joker',
'defense',
'dimensional',
'democrat',
'history',
'center',
'quarterback',
'pennywise',
'ronald',
'oxygen',
'guard',
'nickelodeon']}
```### Failed to Solve Connection Puzzle 2
**Expected Solution**
```text
š” GRASSY AREA: GREEN ,LAWN ,PARK ,YARDš¢ DEAL WITH: ADDRESS ,ANSWER ,FIELD ,HANDLE
šµ MOVIES WITH āSā REMOVED: CAR ,GOODFELLA ,JAW ,SWINGER
š£ ___ LAW: CRIMINAL ,HARVARD ,LEMON ,NATURAL
```**Example Run**
```text
/usr/local/bin/python /workspaces/connection_solver/src/agent/app.py
Please enter the file location: data/word_list1.txtWords read from file: ['goodfella', 'jaw', 'answer', 'handle', 'park', 'lemon', 'yard', 'field', 'natural', 'car', 'harvard', 'swinger', 'green', 'criminal', 'address', 'lawn']
RECOMMENDED WORDS ['park', 'lawn', 'field', 'yard'] with connection Outdoor spaces
Is the recommendation accepted? (y/g/b/p/n): n
Recommendation ['park', 'lawn', 'field', 'yard'] is incorrectRECOMMENDED WORDS ['lawn', 'yard', 'handle', 'jaw'] with connection Parts of a Tool
Is the recommendation accepted? (y/g/b/p/n): n
Recommendation ['lawn', 'yard', 'handle', 'jaw'] is incorrectRECOMMENDED WORDS ['answer', 'address', 'field', 'park'] with connection Things related to location or response
Is the recommendation accepted? (y/g/b/p/n): n
Recommendation ['answer', 'address', 'field', 'park'] is incorrectRECOMMENDED WORDS ['lawn', 'green', 'lemon', 'natural'] with connection Things that are green
Is the recommendation accepted? (y/g/b/p/n): n
Recommendation ['lawn', 'green', 'lemon', 'natural'] is incorrect
FAILED TO SOLVE THE CONNECTION PUZZLE TOO MANY MISTAKES!!!FINAL PUZZLE STATE:
{ 'found_blue': False,
'found_purple': False,
'found_yellow': False,
'invalid_connections': [ ['park', 'lawn', 'field', 'yard'],
['lawn', 'yard', 'handle', 'jaw'],
['answer', 'address', 'field', 'park'],
['lawn', 'green', 'lemon', 'natural']],
'llm_temperature': 0.7,
'mistake_count': 4,
'recommendation_count': 4,
'recommended_connection': 'Things that are green',
'recommended_correct': False,
'recommended_words': ['lawn', 'green', 'lemon', 'natural'],
'words_remaining': [ 'lawn',
'park',
'address',
'swinger',
'answer',
'field',
'lemon',
'yard',
'jaw',
'handle',
'goodfella',
'car',
'criminal',
'green',
'harvard',
'natural']}
```