Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kizman-23/scraping
https://github.com/kizman-23/scraping
Last synced: 23 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/kizman-23/scraping
- Owner: KizMan-23
- Created: 2024-07-10T15:41:34.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-12-11T21:31:40.000Z (about 1 month ago)
- Last Synced: 2024-12-11T22:28:05.751Z (about 1 month ago)
- Language: HTML
- Size: 905 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Scraping is an essentail task in obtaining data especially from web. It offers an easy alternative to extract when one cannot use APIs of the sites or the sites do not offer substantive APIs.
This scraping repository offers all the scraping tasks i have undergone so far. Scraping in itself can be a very complex tasks as many websites are designed proof from scraping and bot actions.[NBA Scraping](nba_scraping.ipynb) contains scraped data for the NBA from www.nba.com website. National BasketBall Association is a prestige American League for top and professional basketballers. Obtaining this data
was necessary not just in trying the necessary packages such as BeatifulSoup but for use in training a model that could predict Most Valuable Player (MVP) for the NBA is future seasons.![nba_2](https://github.com/user-attachments/assets/7463e73c-cc19-47d4-aaf3-d0f90db217e9)
In another application, training a model on previous team data can help to better predict the team's performance and standings in the future seasons, this finds a lucrative application in sports analysis and betting markets.
[web scraing](web_scraping.ipynb) is also a project on scraping basketball data from basketball-reference website
![web_scrape](https://github.com/user-attachments/assets/03572109-0163-44c8-ba92-9d632f494bdd)
[Premier League Data](premier_league.ipynb) In the same vein as NBA data, the premier league is the top English Football League. The league is made of 20 teams and is one of the most watched sports league in the worls.
![pre-lg scrape](https://github.com/user-attachments/assets/557a3478-b3c9-428c-939f-0516692e0d2f)
Obtaining data about this league is very invaluable as it is not just predicting teams that could finish top in the season but also for its wide application in the sports betting industry.
![pre-lg-2](https://github.com/user-attachments/assets/2e24c651-d804-4c32-93c8-816f5ce8f2f1)
[Tweeter Scraper](tweet_scraping/tweet.py) is a bot system that can be used to scrape information and data from Twitter or more currently known as X. Scraping the X platform became tougher in the hands of the new owner but I had to
try my best in scraping for infornatiom as much as the current system could allow me.![tweet scrape](https://github.com/user-attachments/assets/e5ae80b1-c2fb-400c-be8b-bce474a0e5c0)
X scrapping is super important in today's frenzy of Artificial Intelligence as the platform holds vast amount of information and discussions that can't be equally obtained from any other social media platform.
X APIs fall in the category of platforms which thier apis are not sustainable small and individual persons and businesses due to the pricing of the apis.[Walmart Products Scraped](walmart/data_cleaning.ipynb) is a notebook bot system i used to scrape the walmart website of different products and product information. This scraping exercise presented a very complex system developed by walmart
to proof thier sites against bot actions.![walmart](https://github.com/user-attachments/assets/b802b351-1e03-4a63-a153-d4980d2cbd28)
I was able to get around the system to extract the necessary information and products i needed. Walmart as one of the largest global e-commence platform contains tons of information and details about wide range of products. The data of these products can be used not just in learning more about the products but in training models around those products. In a more comprehensive algorithm for [scraping walmart](walmart/walmart_products.py),
![walmart-2](https://github.com/user-attachments/assets/698c3a67-b069-4246-b620-19059a1c141c)
it provides an alternative route to achieve better results than what the notebook own offered.