https://github.com/tupizz/fiap-fase04-business_analytics_petroleum_prices

Last synced: 2 months ago
JSON representation

Host: GitHub
URL: https://github.com/tupizz/fiap-fase04-business_analytics_petroleum_prices
Owner: tupizz
Created: 2024-04-29T00:03:32.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-04-30T00:26:59.000Z (about 1 year ago)
Last Synced: 2025-02-09T14:47:25.743Z (4 months ago)
Language: Jupyter Notebook
Size: 55.6 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        ### Iniciar Projeto

```bash

poetry new poetry-demo

poetry add matplotlib seaborn

poetry add pandas

poetry add numpy

```

### Iniciar banco de dados

```bash

docker compose up -d

```

```sql

CREATE TABLE public.preco_combustivel(

    regiao VARCHAR(255),

    estado VARCHAR(255),

    municipio VARCHAR(255),

    revenda VARCHAR(255),

    cnpj VARCHAR(255),

    nome_rua VARCHAR(255),

    numero_rua VARCHAR(255),

    complemento VARCHAR(255),

    bairro VARCHAR(255),

    cep VARCHAR(255),

    produto VARCHAR(255),

    data_coleta DATE,

    valor_venda FLOAT,

    unidade_medida VARCHAR(255),

    bandeira VARCHAR(255)

);

```

### Load data to database (option 1) ✅ choosed

```bash

poetry run python ./business_analytics_fase04/01_load_pg_data.py

```

```python

def transform_and_load(file_path: str):

    encoding = detect_encoding(file_path)

    print(f"Detected encoding: {encoding}")

    print(f"Loading data from {os.path.basename(file_path)}...")

    # Load CSV into DataFrame

    df = pd.read_csv(file_path, encoding=encoding, delimiter=';', quotechar='"', header=0, on_bad_lines='skip')

    # Rename columns as necessary (adjust according to your needs)

    df.rename(columns={

        'Regiao - Sigla': 'regiao',

        'Estado - Sigla': 'estado',

        'Municipio': 'municipio',

        'Revenda': 'revenda',

        'CNPJ da Revenda': 'cnpj',

        'Nome da Rua': 'nome_rua',

        'Numero Rua': 'numero_rua',

        'Complemento': 'complemento',

        'Bairro': 'bairro',

        'Cep': 'cep',

        'Produto': 'produto',

        'Data da Coleta': 'data_coleta',

        'Valor de Venda': 'valor_venda',

        'Unidade de Medida': 'unidade_medida',

        'Bandeira': 'bandeira'

    }, inplace=True)

    df.drop(['Valor de Compra'], axis=1, inplace=True)

    # Convert data types

    df['data_coleta'] = pd.to_datetime(df['data_coleta'], format='%d/%m/%Y')

    df['valor_venda'] = df['valor_venda'].str.replace(',', '.').astype(float)

    # Handle missing values

    df['complemento'] = df['complemento'].fillna('')  # Assuming empty string for missing complemento

    print(df.info())

    # # Load DataFrame into PostgreSQL table

    df.to_sql('preco_combustivel', con=engine, if_exists='append', index=False)

    print(f"Data from {os.path.basename(file_path)} loaded successfully.")

```

### Load data to database (option 2)

- Here we're going to load the data from the CSV file to the database.

- We're going to use KNIME to do this.

- We're going to use the `DB Writer` node to write the data to the database.

- We're going to use the `DB Reader` node to read the data from the CSV file.

- We're going to use the `CSV Reader` node to read the data from the CSV file.

- We're going to use the `DB Connection Table Reader` node to read the data from the database.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tupizz/fiap-fase04-business_analytics_petroleum_prices

Awesome Lists containing this project

README