https://github.com/jpcadena/car-sales-etl
ETL process for a Car Sales project.
https://github.com/jpcadena/car-sales-etl
asyncpg car-sales data-analysis data-engineering data-visualization database etl etl-pipeline postgresql python sqlalchemy
Last synced: 3 months ago
JSON representation
ETL process for a Car Sales project.
- Host: GitHub
- URL: https://github.com/jpcadena/car-sales-etl
- Owner: jpcadena
- License: mit
- Created: 2023-02-07T15:04:52.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2023-02-28T21:18:57.000Z (over 2 years ago)
- Last Synced: 2025-01-15T18:47:07.306Z (5 months ago)
- Topics: asyncpg, car-sales, data-analysis, data-engineering, data-visualization, database, etl, etl-pipeline, postgresql, python, sqlalchemy
- Language: Python
- Homepage:
- Size: 1.3 MB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# car-sales-etl
Table of Contents
## About the project
[![Project][project-screenshot]](https://example.com)
This project is about building a data pipeline to extract, transform, and
load (ETL) data from a source to a target. The data source is a CSV file
containing information about car sales. The target is a PostgreSQL database
table.PostgreSQL was preferred for the richer data handling with multiple data types,
transaction management and its highly scalability to provide great performance
at CRUD operations.\
The project followed SQLAlchemy models scheme based on OOP concepts that
provide an excellent abstraction when working with multiple datasets for a
future process. This high level abstraction provides a greater control over the
data being inserted as the table structure can be defined with multiple
constraints and relationships.\
For more advanced requirements, transactions, migrations and more complex
operations can be performed through the ORM so managing large amounts of data
won't be an issue.\
The project also works with PEP8 style that is tested with Pylint and this
includes type hinting for variables, functions arguments and more.If performance is critical, consider using Python 3.11 in terms of handling
exceptions that can be thrown and re-raised in shorter execution times.\
Assets are also included with future consideration for HTML and CSS files.\
Testing could be done using unittests (to be implemented in a future release).### Transformations
- Remove any rows with missing values.
- Convert the date columns to a standard format.
- Create a new column to store the year of the sale.
- Replace the categorical values in the "Car Model" column with numerical
values.### Requirements
- The target database should be either PostgreSQL or MySQL.
- The pipeline should be runnable using a command-line interface.
- The pipeline should have error handling and logging capabilities.
- The pipeline should be modular and easily extendable to handle additional
data sources and transformations.## Built with
* [![Python][Python.org]][Python-url]
## Getting started
### Prerequisites
* [Python 3.10+][Python-docs-url]
### Installation
1. Clone the **repository**
```
git clone https://github.com/jpcadena/car-sales-etl.git
```
2. Change the directory to **root project**
```
cd car-sales-etl
```
3. Create a **virtual environment** *venv*
```
python3 -m venv venv
```
4. Activate **environment** in Windows
```
.\venv\Scripts\activate
```
5. Or with Unix/Mac OS X
```
source venv/bin/activate
```
6. Install requirements with PIP
```
pip install -r requirements.txt
```### Usage
1. Rename file **sample.env** to **.env**.
2. Replace your **credentials** into the *.env* file.
3. Execute with console.
```
python main.py
```## Contributing
If you have a suggestion that would make this better, please fork the repo and
create a pull request.1. Fork the Project
2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)
3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the Branch (`git push origin feature/AmazingFeature`)
5. Open a Pull RequestUse docstrings with **reStructuredText** format by adding triple double quotes
**"""** after function definition.\
Add a brief function description, also for the parameters including the return
value and its corresponding data type.\
Please use **linting** to check your code quality
following [PEP 8](https://peps.python.org/pep-0008/).\
Check documentation
for [Visual Studio Code](https://code.visualstudio.com/docs/python/linting#_run-linting)
or [Jetbrains Pycharm](https://github.com/leinardi/pylint-pycharm/blob/master/README.md).\Recommended plugin for
autocompletion: [Tabnine](https://www.tabnine.com/install)## License
Distributed under the MIT License.
## Contact
LinkedIn: [Juan Pablo Cadena Aguilar][linkedin-url]
E-mail: [Juan Pablo Cadena Aguilar](mailto:[email protected]?subject=[GitHub]car-sales-etl)
[linkedin-url]: https://linkedin.com/in/juanpablocadenaaguilar
[project-screenshot]: assets/static/project.png
[Python.org]: https://img.shields.io/badge/python-3670A0?style=for-the-badge&logo=python&logoColor=ffdd54
[Python-url]: https://www.python.org/
[Python-docs-url]: https://docs.python.org/3.10/