Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/pb319/scrapify
The repository contains some beginner-friendly resources to help you start web-scraping using Beautiful Soup.
https://github.com/pb319/scrapify
beautifulsoup python webscraping
Last synced: 2 months ago
JSON representation
The repository contains some beginner-friendly resources to help you start web-scraping using Beautiful Soup.
- Host: GitHub
- URL: https://github.com/pb319/scrapify
- Owner: pb319
- Created: 2024-08-11T05:09:18.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-08-25T07:21:47.000Z (5 months ago)
- Last Synced: 2024-08-25T08:33:35.065Z (5 months ago)
- Topics: beautifulsoup, python, webscraping
- Language: Jupyter Notebook
- Homepage:
- Size: 1010 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
### Table of Contents
- [Resources](https://github.com/pb319/Scrapify#resource)
- [Objective](https://github.com/pb319/Scrapify#objective)
- [Approach](https://github.com/pb319/Scrapify#objective)
- [Output Files](https://github.com/pb319/Scrapify#output-files)#### Resource:
Youtube Video Link: [Click Here](https://www.youtube.com/watch?v=XVv6mJpFOb0&t=2242s)#### Objective:
- Get first-hand experience with how to parse HTML(tags, classes) through `Beautiful Soup` to find single/multiple elements.
- Create a database of job descriptions, and specifications available on `www.timesjobs.com`.#### Approach:
- We used a synthetic simple HTML page to understand how `Beautiful Soup` works. [HTML File](https://github.com/pb319/Scrapify/blob/main/home.html)
- Fetch multiple elements (`Posted, Company Name, Skill Requirements, More Info`) through API request.
- Finally export it as a CSV formatted file.#### Output Files:
- [Primary_Script](https://github.com/pb319/Scrapify/blob/main/synthetic.ipynb)
- [Python Script](https://github.com/pb319/Scrapify/blob/main/main.py)
- [CSV File](https://github.com/pb319/Scrapify/blob/main/output.csv)