{"id":30894759,"url":"https://github.com/rogeriols/extract_email_files","last_synced_at":"2025-09-08T21:33:08.829Z","repository":{"id":309769148,"uuid":"1037489331","full_name":"RogerioLS/extract_email_files","owner":"RogerioLS","description":null,"archived":false,"fork":false,"pushed_at":"2025-08-13T17:05:24.000Z","size":2,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-13T19:24:29.747Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/RogerioLS.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-13T16:44:18.000Z","updated_at":"2025-08-13T17:05:27.000Z","dependencies_parsed_at":"2025-08-13T19:24:30.920Z","dependency_job_id":"1673a9b4-99fa-4e4a-9875-5c705ce02fe3","html_url":"https://github.com/RogerioLS/extract_email_files","commit_stats":null,"previous_names":["rogeriols/extract_email_files"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/RogerioLS/extract_email_files","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RogerioLS%2Fextract_email_files","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RogerioLS%2Fextract_email_files/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RogerioLS%2Fextract_email_files/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RogerioLS%2Fextract_email_files/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/RogerioLS","download_url":"https://codeload.github.com/RogerioLS/extract_email_files/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RogerioLS%2Fextract_email_files/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274231449,"owners_count":25245625,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-08T02:00:09.813Z","response_time":121,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-09-08T21:32:09.510Z","updated_at":"2025-09-08T21:33:08.820Z","avatar_url":"https://github.com/RogerioLS.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align = center\u003e\n\n## :books: Quantum - Automated Email Excel Processor\n\n![License](https://custom-icon-badges.demolab.com/github/license/RogerioLS/extract_email_files?logo=law\u0026color=dark-green)\n![Last commit](https://custom-icon-badges.demolab.com/github/last-commit/RogerioLS/extract_email_files?logo=history\u0026color=dark-green)\n![Code size in bytes](https://img.shields.io/github/languages/code-size/RogerioLS/extract_email_files?logo=file-code\u0026color=dark-green)\n![Repo size](https://img.shields.io/github/repo-size/RogerioLS/extract_email_files?logo=database)\n![Top language](https://img.shields.io/github/languages/top/RogerioLS/extract_email_files?color=dark-green)\n![Languages](https://custom-icon-badges.demolab.com/github/languages/count/RogerioLS/extract_email_files?logo=command-palette\u0026color=red)\n[![Pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit\u0026logoColor=white)](https://github.com/pre-commit/pre-commit)\n\u003c/div\u003e\n\n### Overview\n\nQuantum is an automated Python script that monitors an Outlook inbox for specific emails, extracts attached Excel files, performs a data quality check, and sends status notifications. It is designed to streamline the process of handling recurring data files received via email.\n\n---\n\n### Features\n\n- **Email Monitoring**: Connects to an Outlook account and searches for emails with a specific subject line.\n- **Attachment Extraction**: Downloads and saves `.xlsx` attachments from the target email.\n- **Data Validation**: Uses `pandas` to read the Excel file and checks for a predefined number of missing values (`NaN`) in a specific column to ensure data quality.\n- **Automated Notifications**:\n    - Sends an **alert email** if the data quality check fails.\n    - Sends a **success email** upon successful processing.\n- **Robust Logging**:\n    - Provides real-time, color-coded feedback in the terminal.\n    - Saves detailed logs in JSON format, organized by date (`INFO` and `ERROR` logs are saved separately).\n- **Environment-based Configuration**: Uses a `.env` file to manage sensitive data and paths, keeping them separate from the code.\n\n---\n\n### How It Works\n\nThe main workflow is orchestrated by `main.py`:\n\n1.  **Initialization**: Loads environment variables from the `.env` file.\n2.  **Email Extraction**: The `extrair_excel_email` module connects to Outlook, finds the most recent email matching the `HEADLINE_PREFIX` subject, and saves its Excel attachment to the `PASTA_RAIZ_QUANTUM` directory.\n3.  **Excel Processing**: The `processar_excel_extraido` module reads the newly downloaded Excel file. It counts the number of `NaN` values in the 'Retorno' column.\n4.  **Decision Making**:\n    - **On Failure**: If the `NaN` count exceeds a defined limit, the process is halted. The `enviar_email_alerta` module sends an email detailing the error.\n    - **On Success**: If the data is valid, the `enviar_email_sucesso` module sends a confirmation email.\n5.  **Logging**: Throughout the process, the `logger_quantum` module records all actions, warnings, and errors.\n\n\n### Project Structure\n\n```bash\n.\n├── .env                # Environment variables (credentials, paths) - Not versioned\n├── .gitignore          # Specifies files to be ignored by Git\n├── documentation.txt   # This file\n├── LICENSE             # Project license\n├── main.py             # Main script, orchestrator of the workflow\n├── README.md           # Project summary and setup guide\n├── requirements.txt    # List of Python dependencies\n└── source/\n    ├── email/\n    │   ├── extrair_excel_email.py  # Handles Outlook connection and attachment extraction\n    │   ├── envia_email_alerta.py   # Sends data quality alert emails\n    │   └── envia_email_sucesso.py  # Sends success confirmation emails\n    ├── logger/\n    │   └── logger_config.py        # Configures console and file logging\n    └── manipulação_excel/\n        └── manipulação_excel.py    # Handles Excel file reading and data validation\n```\n\n---\n\n### Module Descriptions\n\n#### `main.py`\n\nThe entry point of the application. It controls the execution flow by calling modules in the correct order:\n1.  Loads environment variables from `.env`.\n2.  Calls `extrair_excel_email` to get the attachment.\n3.  Calls `processar_excel_extraido` to validate the data.\n4.  Based on the validation result, calls either `enviar_email_alerta` or `enviar_email_sucesso`.\n5.  Includes top-level error handling to catch any unexpected exceptions during the process.\n\n#### `source/email/extrair_excel_email.py`\n\n-   **Purpose**: To connect to Outlook, find a specific email, and download its attachment.\n-   `inicializar_outlook()`: Establishes a connection with the Outlook application. It includes a retry mechanism that kills and restarts the Outlook process if the initial connection fails.\n-   `extrair_excel_email()`:\n    -   Searches the inbox of the specified Outlook account (`@asa.com.br`).\n    -   Filters emails by the current date and a subject line containing `HEADLINE_PREFIX`.\n    -   Saves the first matching `.xlsx` attachment found to the `PASTA_RAIZ_QUANTUM` directory.\n\n#### `source/manipulacao_excel/manipulacao_excel.py`\n\n-   **Purpose**: To read and validate the data from the extracted Excel file.\n-   `ler_excel_mais_recente_da_pasta()`: Finds the most recently modified Excel file (`.xlsx` or `.xls`) in a given directory.\n-   `quantidade_nan()`: Counts the number of `NaN` (Not a Number) values in the \"Retorno\" column of a pandas DataFrame.\n-   `processar_excel_extraido()`: Orchestrates the reading and validation process. It returns the DataFrame if the `NaN` count is within the allowed limit (`limites_null`), otherwise it returns the `NaN` count.\n\n#### `source/email/envia_email_alerta.py`\n\n-   **Purpose**: To notify the user of a data quality issue.\n-   `enviar_email_alerta()`:\n    -   Constructs an HTML-formatted email.\n    -   The email body includes the number of `NaN` values found versus the allowed limit.\n    -   Connects to an SMTP server (Office365) using credentials from `.env` and sends the alert.\n\n#### `source/email/envia_email_sucesso.py`\n\n-   **Purpose**: To confirm that the process completed successfully.\n-   `enviar_email_sucesso()`:\n    -   Constructs a simple HTML-formatted success email.\n    -   Connects to the SMTP server and sends the confirmation.\n\n#### `source/logger/logger_config.py`\n\n-   **Purpose**: To provide structured and informative logging.\n-   `print_log()`: A function to print color-coded and timestamped messages to the console. The color can be based on the log level (`INFO`, `ERROR`, etc.) or a specified theme color.\n-   `Logger` class: A silent, file-based logger.\n    -   It accumulates log entries in memory (`info` and `error` lists).\n    -   Using `atexit`, it automatically saves the logs to JSON files when the script terminates.\n    -   Logs are organized into `YYYY/MM/` subdirectories, with filenames containing the date (e.g., `quantum_info_20250814.json`).\n    -   Error logs include a full traceback for easier debugging.\n\n---\n\n### Configuration (`.env`)\n\nThe `.env` file is crucial for configuring the script without hardcoding sensitive information.\n\n-   `EMAIL_USER`: The sender's email address (must have SMTP access).\n-   `EMAIL_PASSWORD`: The password for the sender's email. For accounts with 2FA, an \"app password\" is usually required.\n-   `EMAIL_DESTINATARIO`: The recipient of the alert and success emails.\n-   `PASTA_RAIZ_QUANTUM`: The absolute path to the directory where the script will save the extracted Excel files.\n-   `HEADLINE_PREFIX`: The text string the script looks for in the email subject to identify the correct email.\n-   `PASTA_LOG`: The absolute path to the directory where JSON log files will be stored.\n\n---\n\n### Setup and Configuration\n\n1.  **Clone the repository:**\n    ```bash\n    git clone \u003crepository-url\u003e\n    cd quantum\n    ```\n\n2.  **Install dependencies:**\n    ```bash\n    pip install -r requirements.txt\n    ```\n\n3.  **Create the `.env` file:**\n    Create a file named `.env` in the root directory and add the following variables. This file is included in `.gitignore` to prevent committing sensitive information.\n\n    ```properties\n    # Email credentials for sending notifications\n    EMAIL_USER=your_email@example.com\n    EMAIL_PASSWORD=your_app_password # Use an app password if 2FA is enabled\n    EMAIL_DESTINATARIO=recipient_email@example.com\n\n    # Path where the extracted Excel file will be saved\n    PASTA_RAIZ_QUANTUM=W:\\\\path\\\\to\\\\your\\\\excel\\\\folder\n\n    # Subject line prefix to identify the target email\n    HEADLINE_PREFIX=Daily Fundos\n\n    # Path to store JSON log files\n    PASTA_LOG=W:\\\\path\\\\to\\\\your\\\\logs\\\\folder\n    ```\n\n---\n\n### Usage\n\nRun the main script from the project's root directory:\n\n```bash\npython main.py\n```\n\nThe script will start, log its progress in the console, and perform the defined workflow.\n\n---\n\n### Dependencies\n\n- `pandas`\n- `python-dotenv`\n- `colorama`\n- `openpyxl`\n- `pywin32`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frogeriols%2Fextract_email_files","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frogeriols%2Fextract_email_files","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frogeriols%2Fextract_email_files/lists"}