Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/samrb-dev/autoseekout
A simple web scraping bot for scraping information from seekout.com written in Python and Selenium
https://github.com/samrb-dev/autoseekout
bot dataextraction python3 scraping-python seekout selenium selenium-python webscraping
Last synced: 5 days ago
JSON representation
A simple web scraping bot for scraping information from seekout.com written in Python and Selenium
- Host: GitHub
- URL: https://github.com/samrb-dev/autoseekout
- Owner: SamRB-dev
- License: other
- Created: 2022-10-01T04:28:25.000Z (about 2 years ago)
- Default Branch: AutoSeekOut-2.0
- Last Pushed: 2024-06-13T19:58:20.000Z (5 months ago)
- Last Synced: 2024-06-13T22:51:40.412Z (5 months ago)
- Topics: bot, dataextraction, python3, scraping-python, seekout, selenium, selenium-python, webscraping
- Language: Python
- Homepage:
- Size: 11.7 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# AutoSeekOut
[![License: CC BY-NC 4.0](https://img.shields.io/badge/License-CC_BY--NC_4.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc/4.0/)
A simple, cross-platform web scraping *tool/bot* to scrape data from [Seekout](https://seekout.com/).
Purely written in **Python** and **Selenium**. It will generate generate a CSV file
with all data scraped from *Seekout*.## Tested On
#### Linux
1. Debian
* Parrot OS Release 5.1 (Electro Ara) 64-bit#### Windows
1. Windows 10 Pro
* Version: 21H2
* Build: 19044.1889## Installation
#### Browser Driver
No matter which operating system you are using, you need to download the web driver for your prefered browser for Selenium to control your browser. Here's a list of web drivers you can download based on your operating system.**[Note]: Always make sure you are downloading the driver based on the version of your browser**
* [Chrome](https://chromedriver.chromium.org/downloads)
* [Firefox](https://github.com/mozilla/geckodriver)
* [Safari](https://developer.apple.com/documentation/webkit/testing_with_webdriver_in_safari)#### Creating a Virtual Environment
It's always a good idea to run code in a virtual environment. In order to create a python environment install the following package -
* For Python 3.7 or above
* pip3 install virtualenvNow to create an environment using virtualenv
* python3 -m venv env_name
or,
* python -m venv env_nameTo activate the environment:
* Linux/MacOS
- source path/env_name/bin/activate
* Windows
- path\to\your\env\Scripts\activateTo deactivate the environment, simply type deactivate
#### Installing Necessary Modules
After you have activated your virtual environment, it's time to install the necessary packages.
To install those packages, just type
* pip3 install -r requirements.txt## Final Steps
Lastly, to get the script up and running, you need to make few changes in the script itself. On my future updates I'll make sure to reduce these steps to make your life easier. But for subsequent time, make these following changes.1. Open the scripts in your preferred IDE or text editor.
2. On line 15, DPATH variable, set the path of your browser driver as string. i.e.
* DPATH = "path/to/browser/drive.exe"3. On line 27,28 (EMAIL,PASSWD) variables, set your login credetials as string. i.e.
* EMAIL = "[email protected]"
* PASSWD = "password123"4. On line 31,34 (STARTFROM,LIMIT) variables, set the starting page number(STARTFROM) and last page number (LIMIT) of the project you want the bot to scrape. i.e.
* STARTFROM = 1
* LIMIT = 1005. On line 37 (TITLE) variable, set the title to the title of the project data page so that the bot can identify the project. i.e If the project title is "Projects/Database - Intuit" then you will set the variable as
* TITLE = "Intuit"6. Lastly, on line 40 (FILE) variable, set the file name as your desired file name. i.e.
* FILE = "Intuit.csv"### Run the script in CMD/terminal
```
python3 AutoSeekOut.py
```
OR,
```
python AutoSeekOut.py
```