{"id":27624441,"url":"https://github.com/twtrubiks/browser-use-tutorial","last_synced_at":"2025-07-24T14:38:53.553Z","repository":{"id":288590100,"uuid":"968591024","full_name":"twtrubiks/browser-use-tutorial","owner":"twtrubiks","description":"深入 browser-use 的多模態自動化革命 ","archived":false,"fork":false,"pushed_at":"2025-06-21T10:22:58.000Z","size":15,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-21T11:27:35.206Z","etag":null,"topics":["ai","browser-use","gemini","llm","python3"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/twtrubiks.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-04-18T11:08:12.000Z","updated_at":"2025-06-21T10:23:01.000Z","dependencies_parsed_at":"2025-04-19T00:41:04.149Z","dependency_job_id":"8b8cfc18-66e3-4d1f-ab76-8013a287545f","html_url":"https://github.com/twtrubiks/browser-use-tutorial","commit_stats":null,"previous_names":["twtrubiks/browser-use-tutorial"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/twtrubiks/browser-use-tutorial","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/twtrubiks%2Fbrowser-use-tutorial","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/twtrubiks%2Fbrowser-use-tutorial/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/twtrubiks%2Fbrowser-use-tutorial/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/twtrubiks%2Fbrowser-use-tutorial/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/twtrubiks","download_url":"https://codeload.github.com/twtrubiks/browser-use-tutorial/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/twtrubiks%2Fbrowser-use-tutorial/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266856924,"owners_count":23995766,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-24T02:00:09.469Z","response_time":99,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","browser-use","gemini","llm","python3"],"created_at":"2025-04-23T11:44:32.881Z","updated_at":"2025-07-24T14:38:53.541Z","avatar_url":"https://github.com/twtrubiks.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 賦予瀏覽器 AI 大腦：深入 browser-use 的多模態自動化革命 🧠\n\n目前使用版本 `0.3.1`\n\n* [Youtube Tutorial - 賦予瀏覽器 AI 大腦：深入 browser-use 的多模態自動化革命 🧠](https://youtu.be/IIt68zX6xq8)\n\n今天介紹 [browser-use](https://github.com/browser-use/browser-use) 🤖\n\n還有另一個 UI 版本 [web-ui](https://github.com/browser-use/web-ui)\n\n這個項目是 AI 透過瀏覽器瀏覽（特別是大型語言模型 LLM，而且是**多模態模型**）,\n\n然後透過 LLM 模擬人類的動作.\n\n什麼是**多模態模型**, 指的是不同類型或形式的數據/資訊來源,\n\n像是 文字 📝, 圖片 🖼️, 聲音 🔊 等等. 同時處理多種的能力.\n\n它的運作方式大致如下 ⚙️\n\n1.  **接收指令：** 你用自然語言描述想要執行的動作（例如：「點擊登入按鈕」、「在搜尋框輸入『天氣』」）。\n\n2.  **理解畫面：** 它會擷取當前瀏覽器頁面的視覺資訊以及可能的網頁結構資訊。\n\n3.  **AI 分析 (LLM)：** 將視覺資訊和你的指令一起傳送給多模態的 LLM。LLM 會像人一樣「看懂」畫面上的元素佈局和文字內容，理解你的意圖，並找出對應的操作目標（例如哪個按鈕是「登入」、哪個輸入框是「搜尋框」）。\n\n4.  **執行動作：** AI 模型決定了要操作的元素和方式後，這個庫會轉換成實際的瀏覽器自動化指令（例如模擬點擊、輸入文字）。\n\n關於速度部份 ⏱️\n\n**這種透過視覺畫面反應的方式，通常在執行單一步驟時，會比傳統的自動化方法（例如使用 XPath 或 CSS Selector）來得慢。**\n\n主要原因如下：\n\n1. **畫面分析成本：** 分析圖像內容比直接透過程式碼（如 DOM 結構）查找元素要複雜得多，需要更多的計算資源。\n\n2. **LLM 推理延遲 ⏳：** 將畫面資訊傳送給大型語言模型（通常是雲端服務），等待模型分析並回傳結果，這個過程包含網路延遲和模型本身的運算時間，通常需要數秒鐘。而傳統方法直接在本地端查找元素，速度非常快（毫秒級）。\n\n3. **多步驟決策：** LLM 需要理解上下文、視覺佈局和指令意圖，這個「思考」過程比直接按選擇器定位元素要耗時。\n\n**不過，雖然單一步驟較慢，但這種方法有其優勢**\n\n1. **更強的適應性與穩定性：** 傳統方法依賴固定的選擇器（Selectors），一旦網站前端稍微修改，腳本就可能失效。而這種 AI 方法是理解頁面的「語意」和「視覺結構」，即使頁面有小幅變動，只要人類還能辨識，AI 通常也能正確找到目標元素.\n\n2. **更自然的互動方式：** 可以用自然語言下指令，降低了編寫自動化腳本的門檻。\n\n3. **處理視覺元素：** 對於沒有良好結構或難以用選擇器定位的元素，基於視覺的方法可能更有效。\n\n## 開始把玩\n\n這邊使用 `Python 3.12.3`\n\n安裝 browser-use\n\n```cmd\npip install browser-use\n```\n\n如果你要用其他的大語言模型, 像是我使用 GEMINI, 要多安裝\n\n```cmd\npip install langchain-google-genai\n```\n\n其他的 model 設定可參考這邊 [examples/models](https://github.com/browser-use/browser-use/tree/main/examples/models)\n\n如果需要記憶功能, 可安裝\n\n```cmd\npip install \"browser-use[memory]\"\n```\n\n預設會啟用記憶, 如果你安裝了, 不需要記憶就設定 `enable_memory=False`\n\n```python\nagent = Agent(\n    enable_memory=False,\n    ...\n)\n```\n\n### CLI 版本\n\n```cmd\npip install \"browser-use[cli]\"\n```\n\n記得要設定環境變數\n\n```cmd\nexport GOOGLE_API_KEY=xxxx\necho $GOOGLE_API_KEY\n```\n\n移除環境變數\n\n```cmd\nunset GOOGLE_API_KEY\n```\n\n接著直接執行\n\n```cmd\nbrowser-use\n```\n\n如果你要用其他的 大語言模型, 可參考 [examples/models](https://github.com/browser-use/browser-use/tree/main/examples/models)\n\n這裡提供很多, 連 [Ollama 簡介 🤖](https://github.com/twtrubiks/dify-ollama-docker-tutorial/blob/main/ollama.md) 也有.\n\n安裝 Playwright\n\n```cmd\nplaywright install chromium --with-deps --no-shell\n```\n\n之前有介紹過 [docker-selenium-tutorial](https://github.com/twtrubiks/docker-selenium-tutorial), 差異如下\n\n**總結對比：**\n\n| 特性             | Selenium                                  | Playwright                                       | 主要差異                                    |\n| :--------------- | :---------------------------------------- | :----------------------------------------------- | :------------------------------------------ |\n| **架構** | WebDriver (HTTP, 需 Driver)             | WebSocket/Pipe (直連, 管理瀏覽器)               | Playwright 連接更直接                       |\n| **速度/穩定性** | 相對較慢，需手動精確管理等待              | **通常更快、更穩定 (內建自動等待)** | Playwright 在這方面優勢明顯                 |\n| **API/易用性** | 強大但可能較冗長                          | **現代、直觀、簡潔** | Playwright API 更符合現代開發習慣             |\n| **內建功能** | 核心功能為主，進階功能需額外設定          | **功能豐富 (網路、追蹤、多上下文)** | Playwright 開箱即用功能多                   |\n| **設定** | 需管理 Driver 執行檔                      | **安裝指令自動下載瀏覽器** | Playwright 設定更簡單                       |\n| **語言支援** | **極廣泛** | 主流語言 (TS/JS, Py, Java, .NET)                 | Selenium 支援更多語言                     |\n| **瀏覽器支援** | 廣泛 (含舊版)                             | **現代主流瀏覽器 (Chromium, FF, WebKit)** | Selenium 支援更廣，Playwright 專注現代      |\n| **社群/背景** | **歷史悠久，社群巨大，W3C標準** | 較新，Microsoft 維護，快速成長                  | Selenium 基礎更廣，Playwright 發展活躍        |\n\n### 簡單範例 ✨\n\n先來一個簡單的範例 [demo.py](demo.py) `python3 demo.py`\n\n你會發現真的很強, 他有滾動功能, 自己透過網頁去思考, prompt 可參考 [prompts.py](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/prompts.py)\n\n### 驗證碼範例 🔒\n\n這次呼叫自己本地的 chrome 瀏覽器.\n\n解決驗證碼也可以 [demo-captcha.py](demo-captcha.py) `python3 demo-captcha.py`\n\n這邊我測試的結果是用越強的 model 正確機率越高\n\n`gemini-2.5-pro-preview-05-06` -\u003e 正確機率越高\n\n`gemini-2.0-flash-exp` -\u003e 正確機率低\n\n如果你想看其他的範例, 可參考 [examples](https://github.com/browser-use/browser-use/tree/main/examples)\n\n你會發現幾乎可以辦法任何事情.\n\n也可以整合 slack 去呼叫 [examples/integrations](https://github.com/browser-use/browser-use/tree/main/examples/integrations/slack)\n\n### twitter 自動發文 🐦\n\n這邊不使用帳密登入, 使用 cookie 的方式.\n\n先去下載 `Cookie-Editor` 擴充套件, 並且匯出為 json, 這邊保存為 `twitter_cookies.json`,\n\n然後把裡面的 `\"sameSite\"` **全都** 修改為 `\"Lax\"` (不修改會錯誤)\n\n```cmd\nERROR    [agent] ❌ Result failed 1/3 times:\nBrowserContext.add_cookies: cookies[9].sameSite: expected one of (Strict|Lax|None)\n```\n\n[twitter_post_using_cookies.py](twitter_post_using_cookies.py) `python3 twitter_post_using_cookies.py`\n\n### 整合 streamlit\n\n也有整合 streamlit, [streamlit_demo.py](demo-captcha.py) `python3 -m streamlit run streamlit_demo.py`\n\n### 下載檔案\n\n[download_file.py](download_file.py) 可以下載檔案\n\n### 剪貼簿\n\n[clipboard.py](clipboard.py) 可以操作 剪貼簿\n\n### 拖拉\n\n[drag_drop.py](drag_drop.py) 可以拖拉\n\n### 設定多 agent 以及調整瀏覽器視窗大小\n\n[multiple_agents_same_browser_and_window_sizing.py](multiple_agents_same_browser_and_window_sizing.py)\n\n### 抓取 ptt 表特板圖片\n\n[crawler_ptt.py](crawler_ptt.py) 這個範例有自定義輸出格式.\n\n### 使用 ollama\n\n不需要填入 key, 但是要設定你的 `OLLAMA_HOST`\n\n[ollama_demo.py](ollama_demo.py)\n\n## Donation\n\n文章都是我自己研究內化後原創，如果有幫助到您，也想鼓勵我的話，歡迎請我喝一杯咖啡 :laughing:\n\n綠界科技ECPAY ( 不需註冊會員 )\n\n![alt tag](https://payment.ecpay.com.tw/Upload/QRCode/201906/QRCode_672351b8-5ab3-42dd-9c7c-c24c3e6a10a0.png)\n\n[贊助者付款](http://bit.ly/2F7Jrha)\n\n歐付寶 ( 需註冊會員 )\n\n![alt tag](https://i.imgur.com/LRct9xa.png)\n\n[贊助者付款](https://payment.opay.tw/Broadcaster/Donate/9E47FDEF85ABE383A0F5FC6A218606F8)\n\n## 贊助名單\n\n[贊助名單](https://github.com/twtrubiks/Thank-you-for-donate)\n\n## License\n\nMIT license\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftwtrubiks%2Fbrowser-use-tutorial","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftwtrubiks%2Fbrowser-use-tutorial","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftwtrubiks%2Fbrowser-use-tutorial/lists"}