{"id":17492045,"url":"https://github.com/waynechang65/baha-crawler","last_synced_at":"2025-04-22T20:15:44.196Z","repository":{"id":47281726,"uuid":"178695201","full_name":"WayneChang65/baha-crawler","owner":"WayneChang65","description":"baha-crawler is a web crawler module designed to scarp data from Bahamut Forum.","archived":false,"fork":false,"pushed_at":"2024-01-28T10:24:46.000Z","size":382,"stargazers_count":3,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-22T20:15:39.574Z","etag":null,"topics":["bahamut","crawler","javascript","nodejs","scraper","spider","webcrawler"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/WayneChang65.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-03-31T13:57:24.000Z","updated_at":"2023-03-22T05:48:53.000Z","dependencies_parsed_at":"2025-03-04T07:31:55.413Z","dependency_job_id":"01aec2e5-1f78-40e2-9662-98827c27ff13","html_url":"https://github.com/WayneChang65/baha-crawler","commit_stats":null,"previous_names":[],"tags_count":13,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WayneChang65%2Fbaha-crawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WayneChang65%2Fbaha-crawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WayneChang65%2Fbaha-crawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WayneChang65%2Fbaha-crawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/WayneChang65","download_url":"https://codeload.github.com/WayneChang65/baha-crawler/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250316066,"owners_count":21410476,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bahamut","crawler","javascript","nodejs","scraper","spider","webcrawler"],"created_at":"2024-10-19T08:07:10.560Z","updated_at":"2025-04-22T20:15:44.191Z","avatar_url":"https://github.com/WayneChang65.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![npm](https://img.shields.io/npm/v/@waynechang65/baha-crawler.svg)](https://www.npmjs.com/package/@waynechang65/baha-crawler)\n[![npm](https://img.shields.io/npm/dm/@waynechang65/baha-crawler.svg)](https://www.npmjs.com/package/@waynechang65/baha-crawler)\n[![Npm package total downloads](https://badgen.net/npm/dt/@waynechang65/baha-crawler)](https://npmjs.ccom/package/@waynechang65/baha-crawler)\n[![Build Status](https://travis-ci.com/WayneChang65/baha-crawler.svg?branch=master)](https://travis-ci.com/WayneChang65/baha-crawler)\n[![GitHub](https://img.shields.io/github/license/waynechang65/baha-crawler.svg)](https://github.com/WayneChang65/baha-crawler/)\n\n# baha-crawler\n\nbaha-crawler 是一個專門用來爬[巴哈姆特](https://www.gamer.com.tw/)各版資料的爬蟲模組。  \n  \nbaha-crawler is a web crawler module designed to scarp data from [Bahamut Forum](https://www.gamer.com.tw/).\n\n## 前言(Overview)\n\n[巴哈姆特](https://www.gamer.com.tw/)是台灣最大的電玩討論區，也是許多台灣玩家不可不知的電玩資訊網站。\n找了一下，似乎巴哈的爬蟲少之又少，更別說是用Javascript所寫的模組了。\n本人為了在Node.js上爬[巴哈姆特](https://www.gamer.com.tw/)，乾脆就自己用javascript打造一個簡單的爬蟲模組，並且分享給大家使用。  \n  \n[Bahamut Forum](https://www.gamer.com.tw/) is the most famous and biggest game forum in Taiwan and game plays are well-know forum.\nJust search a while, [Bahamut Forum](https://www.gamer.com.tw/) crawler modules are not easy to be found especially written by javascript.  \nIn order to scrap data from [Bahamut Forum](https://www.gamer.com.tw/) by Node.js, \nI just create a simple [Bahamut Forum](https://www.gamer.com.tw/) crawler module by javascript and share it to everyone to use.\n\n## 這個爬蟲模組能做什麼事？ (What can it do ?)\n\n* 爬[巴哈姆特](https://www.gamer.com.tw/)任意版上資料。  \nScraping posts of any board on [巴哈姆特](https://www.gamer.com.tw/)  \n* 可以爬多頁資料。  \nSupport to scrape pages in one time  \n* 爬資料時，可選擇是否忽略**置頂文**。  \nSupport to skip **fixed upper posts**  \n* 爬的資料以單一帖發文為單位，其中包含該帖的**主題**及**超連結**。(其他資料如推文數、日期及作者等，因為個人目前用不到，所以尚未實作，有興趣歡迎fork and PR)  \nScraped posts contain **titles** and **hyperlinks**.(Other data like authers, dates, likes,... are not implimented yet and welcome to fork send PR)\n\n## 如何在您的專案使用？ (How to use it in your project ?)\n\n* 利用 npm 套件進行下載  \nUse npm to install  \n\n```\nnpm install @waynechang65/baha-crawler\n```\n\n* 在您的專案環境中，引用 baha-crawler模組。  \nInclude @waynechang65/baha-crawler package in your project\n\n```javascript\nconst baha_crawler = require('@waynechang65/baha-crawler');\n```\n\n* 接下來，用**async函式**包含下面幾行程式就搞定了。  \nAdd programs below in an **async function** in your project  \n\n```javascript\n// *** Initialize ***  \nawait baha_crawler.initialize();\n\n// *** GetResult  ***\n    let baha = await baha_crawler.getResults({\n        board: '23805',\n        pages: 3,\n        skipTPs: true\n    }); // ToS Board(23805), 3 pages, skip fixed upper posts\n\n// *** Close      ***\nawait baha_crawler.close();\n```\n\n* 爬完的資料會透過函式 getResults() 回傳一個物件，裏面各陣列放著爬完的資料，結構如下：  \nScraped data will be returned with an object by getResults() function, it shows below.\n\n```javascript\n{ titles[], urls[] }\n```\n\n## 如何跑範例程式？ (How to run the example ?)\n\n* 從Github下載baha-crawler專案程式碼。  \nClone baha-crawler from GitHub\n\n```\ngit clone https://github.com/WayneChang65/baha-crawler.git\n```\n\n* 進入baha-crawler專案目錄  \nGet into baha-crawler directory\n\n```\ncd baha-crawler\n```\n\n* 下載跑範例程式所需要的環境組件  \nInstall dependencies in the cloned baha-crawler folder\n\n```\nnpm install\n```\n\n* 透過 npm 直接使用以下指令。(實際範例程式在 ./examples/demo.js)  \nRun it with npm. (the demo example is in ./examples/demo.js)\n\n```\nnpm run start\n```  \n\n![image](https://raw.githubusercontent.com/WayneChang65/baha-crawler/master/img/demo_result.png)  \n\n## 基本函式 (Base Methods)\n\n* initialize(): 初始化物件, initialize baha-crawler object\n* getResults(options): 開始爬資料，scrape data\n\n\u003e options.board: 欲爬的巴哈版名編號, board name of baha  \n\u003e options.pages: 要爬幾頁, pages  \n\u003e options.skipTPs: 是否忽略置頂文, skip fixed upper posts or not  \n\n* close(): 關閉物件, close baha-crawler object\n\n## 參考網站 (Reference)\n\n* [puppeteer](https://github.com/GoogleChrome/puppeteer)\n* [巴哈姆特](https://www.gamer.com.tw/)\n\n## 貢獻一己之力 (Contribution)\n\nbaha-crawler 雖然是一個小模組，但本人還是希望這個專案能夠持續進步！若有發現臭蟲(bug)或問題，請幫忙在Issue留言告知詳細情形。  \n歡迎共同開發。歡迎Pull Request，謝謝。:)\n\nEven though baha-crawler is a small project, I hope it can be improving. If there is any issue, please comment and welcome to fork and send Pull Request. Thanks. :)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwaynechang65%2Fbaha-crawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwaynechang65%2Fbaha-crawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwaynechang65%2Fbaha-crawler/lists"}