Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/csmai/job_scraping

Last synced: about 13 hours ago
JSON representation

Host: GitHub
URL: https://github.com/csmai/job_scraping
Owner: csmai
License: other
Created: 2023-10-05T12:57:28.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2023-11-07T10:34:28.000Z (about 1 year ago)
Last Synced: 2024-08-02T13:20:05.919Z (3 months ago)
Language: Python
Size: 437 KB
Stars: 2
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.txt

Awesome Lists containing this project

README

        # Job Data Scraping and Analysis Package

This Python package provides a set of scripts for web scraping and analyzing job data from professional websites. The package allows you to gather job information based on predefined search criteria and analyze the technology stack requirements for the given job title in `config.py`. 

## Table of Contents

- [Package Overview](#package-overview)

- [Prerequisites](#prerequisites)

- [Usage](#usage)

- [License](#license)

## Package Overview

This package is designed to assist users in collecting job data from professional websites and gaining insights into the technology stack requirements for given job titles. It includes web scraping functionalities, data analysis, and data visualization.

For example for the search keywords "Data, engineer" given in `config.py`, the following result is created by this package:

![Data engineer](dat_eng.png)

For the search keywords "Python, developer" a different graph is generated:

![Python developer](py_dev.png)

## Prerequisites

Before using this package, please ensure you have the following dependencies installed:

- Python 3.x

- Required Python libraries: `pandas`, `sqlalchemy`, `matplotlib`, `seaborn`.

## Usage

Follow these steps to use the package:

1. **Setup**:

   - Clone or download the package to your local machine.

   - Configure a PostgreSQL database using the `DB_URI` variable in `main.py` and `analyze_data.py`. Replace placeholders with actual credentials.

2. **Configuration**:

   - Modify search criteria in the `config.py` file.

   - Set environment variables for website URLs using `PRF_URL` and `NOF_URL`, or replace URL placeholders in the scripts.

3. **Execution**:

   - Run the script in your terminal:

     ```

     python main.py

     ```

   - View detailed information in the generated log files.

4. **Visualization**:

   - The package will create an image representing the most common technology stack.

## License

MIT License with No Selling Clause - see `LICENSE.txt`