{"id":17526887,"url":"https://github.com/jam3scampbell/ProctorAI","last_synced_at":"2025-03-06T06:31:03.593Z","repository":{"id":247842199,"uuid":"826936221","full_name":"jam3scampbell/ProctorAI","owner":"jam3scampbell","description":"The AI to end procrastination 😈","archived":false,"fork":false,"pushed_at":"2024-07-25T20:44:29.000Z","size":17121,"stargazers_count":274,"open_issues_count":5,"forks_count":34,"subscribers_count":4,"default_branch":"main","last_synced_at":"2024-07-25T23:30:44.677Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jam3scampbell.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-10T17:16:28.000Z","updated_at":"2024-07-25T20:38:54.000Z","dependencies_parsed_at":"2025-02-20T01:49:05.612Z","dependency_job_id":null,"html_url":"https://github.com/jam3scampbell/ProctorAI","commit_stats":null,"previous_names":["jam3scampbell/proctorai"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jam3scampbell%2FProctorAI","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jam3scampbell%2FProctorAI/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jam3scampbell%2FProctorAI/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jam3scampbell%2FProctorAI/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jam3scampbell","download_url":"https://codeload.github.com/jam3scampbell/ProctorAI/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":242161460,"owners_count":20081876,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-20T15:02:35.823Z","updated_at":"2025-03-06T06:31:03.583Z","avatar_url":"https://github.com/jam3scampbell.png","language":"Python","funding_links":[],"categories":["常用 AI 网站 / 工具 \u003ca name=\"index--tools\"\u003e\u0026nbsp;\u003c/a\u003e","Python"],"sub_categories":["其他工具 / 开源项目"],"readme":"# ProctorAI👁️\n## 🔍 Overview\nProctorAI is a multimodal AI that watches your screen and calls you out if it sees you procrastinating. Proctor works by taking screenshots of your computer every few seconds (at a specified interval) and feeding them into a multimodal model, such as Claude-3.5-Sonnet, GPT-4o, or LLaVA-1.5. If ProctorAI determines that you are not focused, it will take control of your screen and yell at you with a personalized message. After making you pledge to stop procrastinating, ProctorAI will then give you 15 seconds to close the source of procrastination or will continue to bug you.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"./assets/demo.gif\" alt=\"Project demo\" width=\"400\"\u003e\n\u003c/p\u003e\n\n***An intelligent system that knows what does and doesn't count as procrastination.*** Compared to traditional site blockers, ProctorAI is *intelligent* and capable of understanding nuanced workflows. *This makes a big difference*. Before every Proctor session, the user types out their session specification, where they explicitly tell Proctor what they're planning to work on, what behaviors are allowed during the session, and what behaviors are not allowed. Thus, Proctor can handle nuanced rules such as \"I'm allowed to go on YouTube, but only to watch Karpathy's lecture on Makemore\". No other productivity software can handle this level of flexibility.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"./assets/slap.png\" alt=\"Description of the image\" width=\"350\"\u003e\n\u003c/p\u003e\n\u003cp align=\"center\" style=\"color: gray; font-size: 11px;\"\u003e\n  ProctorAI aims to be this woman, but available all the time, snarkier, and with full context of your work.\n\u003c/p\u003e\n\n***It's alive!*** A big design goal with Proctor is that it should *feel alive*. In my experience, I tend not to break the rules because I can intuitively *feel* the AI watching me--just like how test-takers are much less likely to cheat when they can *feel* the proctor of an exam watching them.\n\n## 🚀 Setup and Installation\nTo start the GUI, just type ./run.sh. You might get some popups asking to allow terminal access to certain utilities, which you should enable. The current implementation requires MacOS (and you can find a Windows-compatible version in the `windows` branch). We hope to make Proctor platform-independent soon. \n```\ngit clone https://github.com/jam3scampbell/ProctorAI\ncd ProctorAI\npython -m venv focusenv\nsource focusenv/bin/activate\npip install -r requirements.txt\n./run.sh\n```\n\nDepending on which models you want to use under-the-hood, you should define the following API keys as environment variables:\n- `OPENAI_API_KEY`\n- `ANTHROPIC_API_KEY`\n- `GEMINI_API_KEY`\n- `ELEVEN_LABS_API_KEY`\n\nTo keep the running API price low, I recommend using `two_tier` mode, with a local model such as LLaVA as the router model. For this, you will need [Ollama](https://ollama.com) and to install the [llava](https://ollama.com/library/llava) model. Make sure Ollama is running in the background before starting ProctorAI.\n\n\n## ⚙️ Options/Settings \nThe following can all be toggled in the settings page or used as arguments to `main.py`:\n| | |\n|------------------|---------------------------------------------------------------------------------------------------------|\n| `model_name`     | API name of the main model                                                                             |\n| `tts`            | Enable Eleven Labs text-to-speech                                                           |\n| `voice`          | Select the voice of Eleven Labs speaker                                                               |\n| `cli_mode`       | Run without GUI                                                                                        |\n| `delay_time`     | The amount of time between each screenshot                                                                   |\n| `initial_delay`  | The amount of time to wait before Proctor starts watching your screen (useful for giving you time to open what you want to work on)                                                            |\n| `countdown_time` | The amount of time Proctor gives to close the source of procrastination                                                            |\n| `user_name`      | Enter your name to make the experience more personalized                                                       |\n| `print_CoT`      | Print the model's chain-of-thought to the console                                                       |\n| `two_tier`       | If activated, first sends image to router_model and only sends up to the main model if the router_model thinks the user is procrastinating. Useful for bringing down API costs. The router model is given a stricter prompt so that it leans towards flagging behavior it thinks is suspicious.                                          |\n| `router_model`   | API name of the model to use as the router                                                                           |\n\n\n## 🎯 Understanding This Repository\n\nRight now, basically all functionality is contained in the following files:\n- `main.py`: contains the main control loop that takes screenshots, calls the model, and initiates procrastination events\n- `user_interface.py`: runs the GUI written in PyQT5\n- `api_models.py`: houses a unified interface for calling different model families\n- `procrastination_event.py`: contains methods for displaying the popup when the user is caught procrastinating as well as the timer telling the user to leave what they were doing\n- `utils.py`: functions for taking screenshots, tts, etc\n- `config_prompts.yaml`: all prompts used in the LLM scaffolded system\n\nAs the program runs, it'll create a `settings.json` file and a `screenshots` folder in the root directory. If TTS is enabled, it'll also write `yell_voice.mp3` to the `src` folder.\n\n## 🌐 Roadmap and Future Improvements\nThis project is still very much under active development. Some features I'm hoping to add next:\n- finetuning a LLaVA model specifically for the task/distribution\n- scheduling sessions, have it start running when you open your computer\n- make it extremely annoying to quit the program (at least until the user finishes their pre-defined session)\n- logging, time-tracking, \u0026 summary statistics\n- improve chat feature and give model greater awareness of state/context\n- having a drafts folder for prompts so you don't have to re-type it out if you're doing the same task as you were the other day\n- mute all other sounds on computer when the TTS plays (so it isn't drowned out by music)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjam3scampbell%2FProctorAI","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjam3scampbell%2FProctorAI","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjam3scampbell%2FProctorAI/lists"}