An open API service indexing awesome lists of open source software.

https://github.com/wanadzhar913/faiq-scraping-projects

This repo collates a list of websites I've scraped. They have either been for open source contributions or for my own personal practice/use.
https://github.com/wanadzhar913/faiq-scraping-projects

beautifulsoup python requests

Last synced: about 2 months ago
JSON representation

This repo collates a list of websites I've scraped. They have either been for open source contributions or for my own personal practice/use.

Awesome Lists containing this project

README

          

# Notes

This repo collates a list of websites I've scraped. They have either been for open source contributions (e.g., sourcing a [Malaysian text dataset](https://github.com/huseinzol05/malaysian-dataset) for [fine-tuning LLama 2](https://www.linkedin.com/feed/update/urn:li:activity:7100586312268730368/)) or for my own personal practice/use.

I'm also hoping that this repo serves as a benchmark for my code quality over time. 🤣

# Websites Scraped

- https://theedgemalaysia.com/
- https://timchew.net/
- https://techrakyat.com/
- https://mat-gaming.com/
- https://www.leaazleeya.com/
- https://www.bikesrepublic.com/
- https://en.wikipedia.org/wiki/Road_signs_in_Malaysia

# Datasets

- https://huggingface.co/datasets/wanadzhar913/crawl-theedgemalaysia
- https://huggingface.co/datasets/wanadzhar913/crawl-timchew
- https://huggingface.co/datasets/wanadzhar913/crawl-techrakyat
- https://huggingface.co/datasets/wanadzhar913/crawl-mat-gaming
- https://huggingface.co/datasets/wanadzhar913/crawl-leaazleeya
- https://huggingface.co/datasets/wanadzhar913/crawl-bikesrepublic
- https://huggingface.co/datasets/wanadzhar913/wikipedia-malaysian-road-sign-images