{"id":49856818,"url":"https://github.com/raymelon/tagalog-dictionary-scraper","last_synced_at":"2026-05-14T20:10:06.201Z","repository":{"id":44806559,"uuid":"73626763","full_name":"raymelon/tagalog-dictionary-scraper","owner":"raymelon","description":"Builds a Tagalog dictionary by collecting Tagalog words from tagalog.pinoydictionary.com","archived":false,"fork":false,"pushed_at":"2023-02-19T20:46:33.000Z","size":1021,"stargazers_count":23,"open_issues_count":0,"forks_count":14,"subscribers_count":1,"default_branch":"master","last_synced_at":"2023-10-20T22:46:47.957Z","etag":null,"topics":["beautiful-soup","database","dictionary","python","scraper","tagalog","tagalog-dictionary","web-scraper","web-scraping"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/raymelon.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2016-11-13T16:06:01.000Z","updated_at":"2023-10-20T22:46:48.373Z","dependencies_parsed_at":"2022-09-03T04:10:50.492Z","dependency_job_id":"8dfc32fb-602d-462c-b69d-8d46f987cbee","html_url":"https://github.com/raymelon/tagalog-dictionary-scraper","commit_stats":null,"previous_names":[],"tags_count":0,"template":null,"template_full_name":null,"purl":"pkg:github/raymelon/tagalog-dictionary-scraper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raymelon%2Ftagalog-dictionary-scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raymelon%2Ftagalog-dictionary-scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raymelon%2Ftagalog-dictionary-scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raymelon%2Ftagalog-dictionary-scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/raymelon","download_url":"https://codeload.github.com/raymelon/tagalog-dictionary-scraper/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raymelon%2Ftagalog-dictionary-scraper/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33041328,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-13T13:14:54.681Z","status":"online","status_checked_at":"2026-05-14T02:00:06.663Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["beautiful-soup","database","dictionary","python","scraper","tagalog","tagalog-dictionary","web-scraper","web-scraping"],"created_at":"2026-05-14T20:10:05.528Z","updated_at":"2026-05-14T20:10:06.162Z","avatar_url":"https://github.com/raymelon.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Tagalog Dictionary Scraper :ledger: [![Tweet](https://img.shields.io/twitter/url/http/shields.io.svg?style=social)](https://twitter.com/intent/tweet?text=Check%20out%20Tagalog%20Dictionary%20Scraper!%20Ating%20pag-ibayuhin%20ang%20ating%20talahuluganan.%20%40github%20https://github.com/raymelon/tagalog-dictionary-scraper)\n\n\u003e **_Ating pag-ibayuhin ang ating talahuluganan!_**\n\nCollects [Tagalog](http://tagaloglang.com/) words from [tagalog.pinoydictionary.com](http://tagalog.pinoydictionary.com/), a database of [Tagalog](http://tagaloglang.com/) words powered by Cyberspace.ph Web Hosting. This script uses a common web scraping technique known as HTML parsing.\n\n## 42,723 words (as of Feb 19, 2023)\n\n\u003ca href=\"https://github.com/raymelon/tagalog-dictionary-scraper/blob/master/tagalog_dict.txt\" target=\"_blank\"\u003e**See the word list at `tagalog_dict.txt`**\u003c/a\u003e\n\n![](https://reposs.herokuapp.com/?path=raymelon/tagalog-dictionary-scraper)\n[![License: GPL v3](https://img.shields.io/badge/License-GPL%20v3-blue.svg)](http://www.gnu.org/licenses/gpl-3.0)\n[![Build Status](https://travis-ci.org/raymelon/tagalog-dictionary-scraper.svg)](https://travis-ci.org/raymelon/tagalog-dictionary-scraper)\n[![codecov](https://codecov.io/gh/raymelon/tagalog-dictionary-scraper/branch/master/graph/badge.svg)](https://codecov.io/gh/raymelon/tagalog-dictionary-scraper)\n\n[![contributions welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat)]()\n\n## API Resource\n\nServed through GitHub Pages, the scraped words are accessible via REST resource.\n\n**Host**\n\n[https://raymelon.github.io/tagalog-dictionary-scraper/](https://raymelon.github.io/tagalog-dictionary-scraper/)\n\n**Method**\n\nGET\n\n**Resources Available**\n\n| Resource | Display      | Endpoint                                                                                                  |\n| -------- | ------------ | --------------------------------------------------------------------------------------------------------- |\n| `csv`    | `default`    | [/tagalog_dict.csv](https://raymelon.github.io/tagalog-dictionary-scraper/tagalog_dict.csv)               |\n| `csv`    | `with lines` | [/tagalog_dict_lines.csv](https://raymelon.github.io/tagalog-dictionary-scraper/tagalog_dict_lines.csv)   |\n| `json`   | `default`    | [/tagalog_dict.json](https://raymelon.github.io/tagalog-dictionary-scraper/tagalog_dict.json)             |\n| `json`   | `with lines` | [/tagalog_dict_lines.json](https://raymelon.github.io/tagalog-dictionary-scraper/tagalog_dict_lines.json) |\n| `txt`    | `default`    | [/tagalog_dict.txt](https://raymelon.github.io/tagalog-dictionary-scraper/tagalog_dict.txt)               |\n\n## How is it done? :muscle:\n\nEach webpage is loaded and parsed, extracting the words enclosed in `\u003ch2 class='word-entry'\u003e` tag.\n\nIncluded is [`tagalog.pinoydictionary.com`](http://tagalog.pinoydictionary.com/) `html` [snippet](https://github.com/raymelon/tagalog-dictionary-scraper/blob/master/tagalog.pinoydictionary.com%20html%20snippet.html) containing the source of\n[`http://tagalog.pinoydictionary.com/list/a/`](http://tagalog.pinoydictionary.com/list/a/) to serve as point of reference on how dictionary words from the page are extracted.\n\n**Disclaimer:**\nI do not own the `html` code cited above, it is owned by [tagalog.pinoydictionary.com](http://tagalog.pinoydictionary.com/).\n\n## How did the project started? :thought_balloon:\n\nThe main purpose of this project is for a [Scrabble ®](http://www.scrabble.com/) Tagalog dictionary database, but other uses may vary.\n\n## Tools :pencil2:\n\n- [Python3 v3.5+](https://www.python.org/) :snake:\n- [beautifulsoup4 v4.5.1](https://www.crummy.com/software/BeautifulSoup/) :ramen: :package: for parsing html pages\n\n```\n  python -m pip install -U pip beautifulsoup4\n```\n\n- [requests-futures v1.0.0](https://github.com/ross/requests-futures) :zap: for request concurrency\n\n```\n  python -m pip install -U pip requests-futures\n```\n\n## Notes :pushpin:\n\n- Run the scraper script [`collect_tagalog.py`](https://github.com/raymelon/tagalog-dictionary-scraper/blob/master/collect_tagalog.py)\n- See the output of collected words at [`tagalog_dict.txt`](https://github.com/raymelon/tagalog-dictionary-scraper/blob/master/tagalog_dict.txt)\n- Match [`max_workers`](https://github.com/raymelon/tagalog-dictionary-scraper/blob/master/collect_tagalog.py#L57) value with the CPU and network capacity of the environment. See the [comment](https://github.com/raymelon/tagalog-dictionary-scraper/blob/master/collect_tagalog.py#L41-L56) for estimated values and expected download rates.\n\n## License [![License: GPL v3](https://img.shields.io/badge/License-GPL%20v3-blue.svg)](http://www.gnu.org/licenses/gpl-3.0)\n\n[GNU General Public License 3.0](https://www.gnu.org/licenses/gpl-3.0.en.html)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fraymelon%2Ftagalog-dictionary-scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fraymelon%2Ftagalog-dictionary-scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fraymelon%2Ftagalog-dictionary-scraper/lists"}