Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/moe131/job-project

Last synced: 1 day ago
JSON representation

Host: GitHub
URL: https://github.com/moe131/job-project
Owner: Moe131
Created: 2024-06-11T21:50:25.000Z (5 months ago)
Default Branch: main
Last Pushed: 2024-06-12T20:13:14.000Z (5 months ago)
Last Synced: 2024-06-13T03:27:20.742Z (5 months ago)
Language: Python
Size: 340 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Structure
```
├── docker-compose.yaml
├── dockerfile
├── jobs_project
│ ├── jobs_project
│ │ ├── __init__.py
│ │ ├── items.py
│ │ ├── middlewares.py
│ │ ├── pipelines.py
│ │ ├── settings.py
│ │ └── spiders
│ │ ├── __init__.py
│ │ └── json_spider.py
│ └── scrapy.cfg
├── query.py
├── README.md
└── requirements.txt
└── data ├── s01.json
└── s02.json
```

# Follow these steps to run the project

1. Place the json files into **data** folder as show above
2. build docker images by running this command :
```
docker-compose up --build
```
3. run the project using this command

```
docker-compose up
```

# A brief description of the pipeline process

For each entry in the jobs array retrieved from the json file a dictionary is created. A dictionary contains all the data for that specific job. Each job goes through the pipeline and it is inserted into Postgres **raw_table**. If for a job, the column does not exist in the table, it will be dynamically added to table. Furthermore, columns that include dates will be handled and inserted into database in timestamp with time zone data type.
After all the jobs are inserted into the database, query.py will retrieve the data and organize them into a CSV file with data separated by comma.

# A screenshot of how jobs can be sorted in database based on updated date :

![Screen Shot 2024-06-12 at 13 10 17](https://github.com/Moe131/job-project/assets/65834335/5b9c4142-2a37-4193-bded-ccd44363f98c)

# A screenshot jobs in the database:

![Screen Shot 2024-06-12 at 13 12 21](https://github.com/Moe131/job-project/assets/65834335/00b55b87-b2dd-4215-95f9-7c000f2ee20f)