https://github.com/bryndin/ai-coded-pacman
How far can we get by coding via LLM prompt?
https://github.com/bryndin/ai-coded-pacman
Last synced: 3 months ago
JSON representation
How far can we get by coding via LLM prompt?
- Host: GitHub
- URL: https://github.com/bryndin/ai-coded-pacman
- Owner: bryndin
- Created: 2024-04-28T00:14:24.000Z (about 1 year ago)
- Default Branch: master
- Last Pushed: 2024-05-23T06:25:02.000Z (about 1 year ago)
- Last Synced: 2024-05-23T06:26:32.311Z (about 1 year ago)
- Language: JavaScript
- Size: 1.61 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README

# AI Coded Pacman
This is an experiment to see how capable LLMs are working with code. Can prompts replace coding? Is there a difference between LLMs? What are the most efficient ways to utilize LLMs in development?
The goal is to make LLMs do the coding. We anticipate the manual involvement will be necessary, but we aim to keep it minimal. We will utilize AI for refactoring as much as possible while being productive. Hopefully, LLM will produce quality code and perform code cleanups as it goes. If not, we may try refactoring via prompting.
### LLMs and Tech Stack
We picked [Pacman](https://en.wikipedia.org/wiki/Pac-Man) game for our implementation subject. Javascript for ease of deployment and the [p5.js](https://p5js.org/) game dev library for the engine. [Github Pages](https://pages.github.com/) host the game.
The p5.js / Javascript combo fits our goals well. We have no prior experience with p5.js lib, nor do we use Javascript professionally as our main language.
We plan to try different LLMs as we go. Freely available [Google Gemini](https://gemini.google.com/), [Microsoft Copilot](https://copilot.microsoft.com/) (ChatGPT4-Turbo?) and [ChatGPT 3.5](https://chatgpt.com/). Self-hosting Open Source models via [Oolama](https://ollama.com/). Paid variations of Gemini 1.5, and other LLMs that become available.
## Latest deploy
🎮 https://bryndin.github.io/ai-coded-pacman## Dev Journal
📒[Step-by-step dev log](journal.md)There we track development progress. List the problem for each step to work on, prompts used, LLM answers, and our notes. It's THE essence of the project.
## Notes & Findings
These are general summary notes, for the detailed development notes see the [Dev Journal](journal.md).### **LLM choice** (ordered best to worst)
- *Gemini 1.5 Pro*: Can accept, analyze, and reply with large chunks of code. Being more generous with the context windows is a major benefit. Other models suffer from crippled input/output limits affecting dev productivity. Superior reasoning and quality of code too.
- *Gemini* / *MS Copilot* / *LLama3-70b*: are reasonable thinkers and code generators, often providing comparable results. All suffer from artificial limits on prompt/answer sizes. They aren't as advanced as the paid pro models. Temperature balance has to be found, shooting for more concise answers can lead to oversimplified or less usable results.
- Smaller models (e.g. *llama3:8b-instruct-q6_K*, *deepseek-coder:6.7b*) have limited use in working with code, and tend to hallucinate hard on refactorings. Some (e.g. )llama3:8b*) had a role for the simpler questions. With some (e.g. *deepseek-coder*) we suffered to find a good use for them.### "Averaged" generated code that varies upon regeneration
There must be a volume of Pacman implementations, with many making it into the training sets. The LLM generated code feels "averaged". Regenerating it produces a similar, yet slightly different version. Global variables are replaced by function arguments, or class attributes and vice versa. Controls are based on arrows and/or WASD, etc.These variations complicate putting different code blobs together and require human attention to make them compatible, especially when using different context sessions.
### Crippling Limits on Context Window in LLMs
Models, especially the free ones, suffer from a small context window size. Limits on prompt input and reply output further cripple the experience. Public LLMs don't publish their context window sizes, they could be as small as 8k tokens. Input/output limits could even go to 1k by the feel of it.With a larger codebase (>5k tokens) supplying code to LLMs via prompt becomes a challenge. Manually extracting subsets of code for the prompt is a distraction. Same with utilizing truncated answers needing manual adaption. For example, code with "insert needed logic here" comments.
Finding a balance between the amount of code to give to prompt and the effort needed to integrate the response code back into the codebase could be a learning process initially. It also depends on your code organization.
Suggestions:
- Keep code decoupled to simplify its use in prompts.
- Make a copy of the code and remove all parts non-related to your ask. Add text description and instructions on top to create a prompt. Expect a few tries may be needed to get the best solution.
- If the context window gets polluted with tries, start a new session or even switch to another LLM.
- Try other models with bigger context, and especially the input/output windows.