Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mratanusarkar/dataset-indian-companies
Web Scraping "List of companies in India" from AmbitionBox Website using Python and Beautiful Soup
https://github.com/mratanusarkar/dataset-indian-companies
beautiful-soup beautifulsoup beautifulsoup-library beautifulsoup4 bs4 company-dataset csv data dataset indian-companies jupyter jupyter-notebook jupyter-notebooks python scraper scraping scraping-websites scrapper scrapping
Last synced: 2 days ago
JSON representation
Web Scraping "List of companies in India" from AmbitionBox Website using Python and Beautiful Soup
- Host: GitHub
- URL: https://github.com/mratanusarkar/dataset-indian-companies
- Owner: mratanusarkar
- Created: 2020-04-21T14:08:03.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2023-12-31T14:59:00.000Z (12 months ago)
- Last Synced: 2023-12-31T15:32:46.569Z (12 months ago)
- Topics: beautiful-soup, beautifulsoup, beautifulsoup-library, beautifulsoup4, bs4, company-dataset, csv, data, dataset, indian-companies, jupyter, jupyter-notebook, jupyter-notebooks, python, scraper, scraping, scraping-websites, scrapper, scrapping
- Language: Jupyter Notebook
- Size: 11.7 MB
- Stars: 12
- Watchers: 2
- Forks: 9
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Dataset: Indian Companies
[![HitCount](https://hits.dwyl.com/mratanusarkar/dataset-indian-companies.svg?style=flat)](http://hits.dwyl.com/mratanusarkar/dataset-indian-companies)Web Scraping "List of companies in India" from [AmbitionBox](https://www.ambitionbox.com/) Website using Python and Beautiful Soup Library.
Find the dataset in CSV format [here](https://github.com/mratanusarkar/Dataset-Indian-Companies/blob/master/dataset/List_of_companies_in_India.csv).
This data was scraped from [here](https://www.ambitionbox.com/list-of-companies) on 22-04-2020.
Total number of data points: 452520
The process of scraping is explained in [this](https://github.com/mratanusarkar/Dataset-Indian-Companies/blob/master/AmbitionBox%2020-04-2020.ipynb) Jupyter Notebook.
The data was scraped using [this](https://github.com/mratanusarkar/Dataset-Indian-Companies/blob/master/scraper.py) Python Script.
Kaggle link [here](https://www.kaggle.com/datasets/mratanusarkar/list-of-companies-in-india).
Feel free to post your usage with this data on Kaggle.