https://github.com/wanadzhar913/faiq-scraping-projects
This repo collates a list of websites I've scraped. They have either been for open source contributions or for my own personal practice/use.
https://github.com/wanadzhar913/faiq-scraping-projects
beautifulsoup python requests
Last synced: about 2 months ago
JSON representation
This repo collates a list of websites I've scraped. They have either been for open source contributions or for my own personal practice/use.
- Host: GitHub
- URL: https://github.com/wanadzhar913/faiq-scraping-projects
- Owner: wanadzhar913
- License: apache-2.0
- Created: 2023-09-11T14:33:59.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-01-15T04:05:00.000Z (almost 2 years ago)
- Last Synced: 2025-01-07T00:46:33.422Z (9 months ago)
- Topics: beautifulsoup, python, requests
- Language: Jupyter Notebook
- Homepage:
- Size: 52.7 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Notes
This repo collates a list of websites I've scraped. They have either been for open source contributions (e.g., sourcing a [Malaysian text dataset](https://github.com/huseinzol05/malaysian-dataset) for [fine-tuning LLama 2](https://www.linkedin.com/feed/update/urn:li:activity:7100586312268730368/)) or for my own personal practice/use.
I'm also hoping that this repo serves as a benchmark for my code quality over time. 🤣
# Websites Scraped
- https://theedgemalaysia.com/
- https://timchew.net/
- https://techrakyat.com/
- https://mat-gaming.com/
- https://www.leaazleeya.com/
- https://www.bikesrepublic.com/
- https://en.wikipedia.org/wiki/Road_signs_in_Malaysia# Datasets
- https://huggingface.co/datasets/wanadzhar913/crawl-theedgemalaysia
- https://huggingface.co/datasets/wanadzhar913/crawl-timchew
- https://huggingface.co/datasets/wanadzhar913/crawl-techrakyat
- https://huggingface.co/datasets/wanadzhar913/crawl-mat-gaming
- https://huggingface.co/datasets/wanadzhar913/crawl-leaazleeya
- https://huggingface.co/datasets/wanadzhar913/crawl-bikesrepublic
- https://huggingface.co/datasets/wanadzhar913/wikipedia-malaysian-road-sign-images