Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/blixt/first-aid
An annoying but helpful CLI called First Aid.
https://github.com/blixt/first-aid
Last synced: about 1 month ago
JSON representation
An annoying but helpful CLI called First Aid.
- Host: GitHub
- URL: https://github.com/blixt/first-aid
- Owner: blixt
- License: mit
- Created: 2024-05-25T16:21:19.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-06-26T22:33:52.000Z (5 months ago)
- Last Synced: 2024-07-28T17:24:04.659Z (4 months ago)
- Language: Go
- Size: 215 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# First Aid
A little help from a reluctant AI on the command line.
> [!CAUTION]
> This tool gives an AI access to run commands and code on your computer.
> Furthermore, it’s sending everything it sees to OpenAI’s servers.
>
> If either of these things make you uncomfortable, don’t run this tool. I hope the code can be interesting nonetheless!## ToC
- [First Aid](#first-aid)
- [ToC](#toc)
- [Usage](#usage)
- [Intended use cases for this tool](#intended-use-cases-for-this-tool)
- [Roadmap](#roadmap)
- [Tool ideas](#tool-ideas)
- [For developers](#for-developers)
- [The `tool` package](#the-tool-package)
- [Tools that return images](#tools-that-return-images)
- [The `writer`, `serif`, and `spinner` packages](#the-writer-serif-and-spinner-packages)
- [The `llm` package](#the-llm-package)
- [A quote from the tool itself](#a-quote-from-the-tool-itself)## Usage
```sh
git clone https://github.com/blixt/first-aid.git
cd first-aid
go mod download
OPENAI_API_KEY=... go run main.go
```You can also `go install .` to add `first-aid` to your PATH if you’re so inclined.
## Intended use cases for this tool
This tool is an exploration of how automation can be made more useful for anyone
in day-to-day tasks. Example tasks:“Write a nice commit message for my changes in this repo”
“Put a markdown table of a summary of files in this directory into my clipboard”
“What does this error mean?” → Take screenshot and analyze the problem
*(From phone)* “What’s the last page I looked at on my computer?”
*(From phone)* “Did I leave my keys in the apartment?” → Remote control a camera
## Roadmap
The development goals of this tool are roughly:
- [ ] Have fun
- [ ] Create a codebase that can be helpful to people building AI projects
- [ ] Make the tool capable of helping with any computer related issue
- [ ] Implement cross-device support (ask about your computer from your phone)
- [ ] Add in multimodal flows (ability to see and hear)
- [ ] Play with realtime, async, and parallel flows
- [ ] Support local models and/or other LLM providers
- [ ] Sandboxing (e.g. Docker) for security and privacy
- [ ] Introduce ways to clear the context window (effective memory)
- [ ] Add a server layer that can run / synchronize multiple instances of an agent
- [ ] Solve for session based tools, such as long-running command line tools
- [ ] Answer the question of asking the LLM to write a script vs. use tools
- Or both... maybe?## Tool ideas
- [ ] Control Chrome via extension
- See list of open tabs
- Activate tab
- Screenshot tab
- Click/type in tab
- [ ] Schedule a task for later
- Something like “check the weather tomorrow morning and speak it out loud”
- Also includes repeating tasks like “every day at 2pm”## For developers
I’m aiming to make this codebase approachable and to contain little pieces of
code that can be helpful to other people building AI related tools in Go. So
below I’ll point at a few parts of the codebase I think could be useful.### The `tool` package
The `tool` package makes it very easy to create tools for the LLM to use. The
main goal was the ergonomy of defining a tool. Here’s an example of a tool:```go
package mypkgimport (
"fmt"
"os/exec""github.com/blixt/first-aid/tool"
)type RunPowerShellCmdParams struct {
Command string `json:"command" description:"The PowerShell command to run"`
}var RunPowerShellCmd = tool.Func(
"Run PowerShell command",
"Run a shell command on the user's computer (a Windows machine) and return the output",
"run_powershell_cmd",
func(r tool.Runner, p RunShellCmdParams) tool.Result {
// Run the PowerShell command and capture the output or error.
cmd := exec.Command("powershell", "-Command", p.Command)
output, err := cmd.CombinedOutput() // Combines both STDOUT and STDERR
if err != nil {
return tool.Error(p.Command, fmt.Errorf("%w: %s", err, firstLineBytes(output)))
}
return tool.Success(p.Command, map[string]any{"output": string(output)})
})
```This can now be turned into a JSON schema (which is what most LLM APIs accept
for tool use) by calling `RunPowerShellCmd.Schema()`.To run the tool with the data received from the LLM:
```go
arguments := json.RawMessage(`{"command":"Get-ComputerInfo"}`)
result := RunPowerShellCmd.Run(tool.NopRunner, arguments)
```This will parse the JSON into the parameters type, validate it, and call the
function defined above with the correct parameters.The API has been optimized to be able to show human readable representations of
the tool before, during, and after running it, which explains the extra `label`
value and the `tool.Runner` interface.Obviously you usually have more than one tool, and for this we have toolboxes:
```go
toolbox := tool.Box(
mypkg.ListFiles,
mypkg.RunPowerShellCmd,
mypkg.RunPython,
)schema := openai.Tools(toolbox) // Can be used directly for "tools" in OpenAI's API
// The function name and JSON arguments can be used directly from "tool_calls"
arguments := json.RawMessage(`{"code":"print('hi')"}`)
result := toolbox.Run(tool.NopRunner, "run_python", arguments)
```### Tools that return images
One thing that OpenAI’s API strangely does not allow is a tool returning an
image. It makes a lot of sense that with a multimodal LLM you will want to
process images not directly provided by the user but also created by a tool
(such as a tool that browses a web page and returns a screenshot to the LLM).To work around this, I fake a message from the user (because unlike what the
documentation says, GPT-4o does not support images in "assistant" or "system"
messages either) in addition to the tool result, and make sure to mention the
same filename in both so that the LLM will associate the results.This is the API for a tool to return an image:
```go
var rb tool.ResultBuilder
rb.AddImage(screenshotPath)
return rb.Success(
"Take screenshot",
map[string]any{"screenshotFileName": filepath.Base(screenshotPath)},
)
```Note that for now the tool result itself points out this workaround.
### The `writer`, `serif`, and `spinner` packages
Part of having fun with this project was giving the command line tool a bit more
personality. Partially, by making it unnecessarily sarcastic and bleak, but also
by making it type character by character with a serif font which makes it stand
out on the command line. The formatting is done in a very simple way using the
`serif` package. It was built to do the same thing those Twitter font generators
do, but with some additional support for international letters (ç, ü, and so on)
and numbers. It also supports italic, bold, and italic+bold variations.```go
package mainimport (
"fmt""github.com/blixt/first-aid/serif"
)func main() {
fmt.Println(serif.Format("Étoiles dans l’été, rêves enchantés."))
// Same as:
fmt.Println("𝙴́𝚝𝚘𝚒𝚕𝚎𝚜 𝚍𝚊𝚗𝚜 𝚕’𝚎́𝚝𝚎́, 𝚛𝚎̂𝚟𝚎𝚜 𝚎𝚗𝚌𝚑𝚊𝚗𝚝𝚎́𝚜.")
}
```The `writer` package was built to be used for a block of output that is written
character by character using the above serif formatting. Over time it also grew
to support interweaving tasks with an associated label and spinner, where the
label can be updated over time until the task is complete. This allows us to
make tool use by the LLM look like just another part of its continuous stream of
output, much like the UI of ChatGPT.For ease of use with `fmt`, it implements `io.Writer`:
```go
package mainimport (
"fmt"
"time""github.com/blixt/first-aid/writer"
)func main() {
w := writer.New()
go func() {
defer w.Done()
fmt.Fprintln(w, "Let me just think about that for a few seconds...")
fmt.Fprintln(w, "")
w.SetTask("Thinking...") // Starts a spinner on the current line
time.Sleep(4*time.Second)
w.SetTask("") // This resets the current line to be empty
fmt.Fprintln(w, "✅ Done thinking!")
fmt.Fprintln(w, "")
fmt.Fprintln(w, "Wait, what were we doing?")
}()
w.StartAndWait()
}
```The speed increases if the unwritten content gets too long.
### The `llm` package
Probably the least interesting package, it just implements a loop of sending
messages to an LLM, and if the LLM returns tool calls, call the LLM once more
with the results of those tool calls.```go
func main() {
model := openai.New(os.Getenv("OPENAI_API_KEY"), "gpt-4o")
ai := llm.New(
model,
mypkg.ListFiles,
mypkg.RunPowerShellCmd,
mypkg.RunPython,
)// System prompt is dynamic so it can always be up-to-date.
ai.SystemPrompt = func() content.Content {
return content.Textf("You're a helpful bot. The time is %s.", time.Now().Format(time.RFC1123))
}// Chat returns a channel of updates.
for update := range ai.Chat("Give me a random number") {
switch update := update.(type) {
case llm.ErrorUpdate:
panic(update.Error)
case llm.TextUpdate:
// Received for each chunk of text from the LLM.
fmt.Print(update.Text)
case llm.ToolStartUpdate:
// Received the moment the LLM streams that it intends to use a tool.
fmt.Printf("(%s: ", update.Tool.Label())
case llm.ToolDoneUpdate:
// Received after the LLM finished sending arguments and the tool ran.
fmt.Printf("%s)\n", update.Result.Label())
}
}
}
```Example output:
```text
(Run Python: `import random` (+1 line))
Here's a random number for you: **48**.
```If you want to use Google’s Gemini 1.5 Pro instead, it’s easy:
```go
model := google.New("gemini-1.5-pro-001").
WithGeminiAPI(os.Getenv("GOOGLE_API_KEY"))
```You can use `WithVertexAI(…)` instead if you have a project set up for it.
## A quote from the tool itself
I asked the tool to update this README with its thoughts:
> There's nothing like a command line tool with a sarcastic AI to make you
> question all your life choices. Enjoy automating the mundane, because who
> wouldn't want their computer mocking them while getting things done? Cheers to
> that.