{"id":32898392,"url":"https://github.com/shreeya-methuku/predictive_analysis","last_synced_at":"2026-04-20T04:02:11.814Z","repository":{"id":322550892,"uuid":"1089947923","full_name":"shreeya-methuku/Predictive_Analysis","owner":"shreeya-methuku","description":"This project leverages machine learning to perform time-series forecasting on key financial metrics, including sales (income), expenses, cash flow, and profit.","archived":false,"fork":false,"pushed_at":"2025-11-05T03:11:51.000Z","size":3719,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-11-05T05:24:43.951Z","etag":null,"topics":["arima","financial-forecasting","numpy","pandas","sarima","xgboost"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/shreeya-methuku.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-11-05T03:01:42.000Z","updated_at":"2025-11-05T03:13:38.000Z","dependencies_parsed_at":null,"dependency_job_id":"8777bd89-70f8-4c56-93b0-f827bc124044","html_url":"https://github.com/shreeya-methuku/Predictive_Analysis","commit_stats":null,"previous_names":["shreeya-methuku/predictive_analysis"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/shreeya-methuku/Predictive_Analysis","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shreeya-methuku%2FPredictive_Analysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shreeya-methuku%2FPredictive_Analysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shreeya-methuku%2FPredictive_Analysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shreeya-methuku%2FPredictive_Analysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/shreeya-methuku","download_url":"https://codeload.github.com/shreeya-methuku/Predictive_Analysis/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shreeya-methuku%2FPredictive_Analysis/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32032302,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-20T00:18:06.643Z","status":"online","status_checked_at":"2026-04-20T02:00:06.527Z","response_time":94,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arima","financial-forecasting","numpy","pandas","sarima","xgboost"],"created_at":"2025-11-10T12:01:08.358Z","updated_at":"2026-04-20T04:02:11.808Z","avatar_url":"https://github.com/shreeya-methuku.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# AI-Powered Financial Forecasting\n\nThis project leverages machine learning to perform time-series forecasting on key financial metrics, including sales (income), expenses, cash flow, and profit. The models are designed to provide accurate, data-driven financial projections that can assist in planning, budgeting, and strategic decision-making.\n\n## Table of Contents\n\n- [Project Objective](#project-objective)\n- [Datasets](#datasets)\n- [Methodology](#methodology)\n  - [Data Preprocessing](#data-preprocessing)\n  - [Feature Engineering](#feature-engineering)\n  - [Modeling](#modeling)\n- [Technologies Used](#technologies-used)\n- [Evaluation](#evaluation)\n- [Results](#results)\n- [How to Use](#how-to-use)\n\n## Project Objective\n\nThe primary goal is to build an accurate, interpretable, and scalable predictive analytics solution for forecasting key financial metrics using historical daily financial data [file:11]. The solution aims to provide actionable insights to support operational and strategic planning by identifying trends and seasonal patterns in the financial data [file:11].\n\n## Datasets\n\nThe models were developed and tested on three separate real-world financial datasets, each from a different industry, to ensure the robustness of the approach across various business behaviors and transactional volumes [file:11]. Each dataset was processed independently.\n\nThe preprocessing pipeline was designed to handle large-scale financial datasets with over 300,000 rows of transactional data [file:11].\n\n## Methodology\n\nThe project follows a structured time-series analysis workflow, from data cleaning to model evaluation.\n\n### Data Preprocessing\n\nThe raw financial datasets underwent a rigorous preprocessing pipeline which included [file:11]:\n-   **Filtering and Validation**: Retaining only records in INR and relevant financial categories (e.g., “Income”, “Expense”) [file:11].\n-   **Data Cleaning**: Removing invalid or missing financial entries and standardizing date formats [file:11].\n-   **Outlier Removal**: Excluding records with values beyond three standard deviations from the mean to improve model stability [file-attachment:1].\n-   **Aggregation**: Aggregating financial values to a daily frequency to create a consistent time series for analysis [file:11].\n\n### Feature Engineering\n\nTo capture complex patterns and seasonality, a rich set of features was engineered from the time-series data [file:11]:\n-   **Calendar-Based Features**: Day, month, year, quarter, and day-of-week [file:11].\n-   **Event-Based Flags**: Indicators for the start/end of the month, quarter, and financial year [file:11].\n-   **Cyclical Transformations**: Sine and cosine encoding of time-based fields (e.g., month, day) to capture seasonality [file:11].\n-   **Lag Features**: Values from previous time steps (1, 7, 14, 30, 90, and 365 days ago) [file:11].\n-   **Rolling Window Statistics**: Rolling means, standard deviations, min, and max for various windows (7, 14, 30, 90 days) [file:11].\n-   **Historical Averages**: Aggregated averages by month, quarter, and year [file:11].\n-   **Trend Indicator**: A feature representing the number of days since the start of the dataset [file:11].\n\n### Modeling\n\nSeveral models were evaluated, including classical statistical approaches and modern machine learning methods [file:11]:\n-   **ARIMA \u0026 SARIMA**: Initially used as baseline models but struggled to capture non-linear relationships in the data [file:11].\n-   **XGBoost Regressor**: This was selected as the final model. It consistently outperformed the baselines across all datasets, demonstrating superior flexibility and accuracy in handling complex, feature-rich financial data [file:11].\n\n## Technologies Used\n\n-   **Programming Language**: Python [file:11]\n-   **Core Libraries**:\n    -   `pandas` \u0026 `NumPy` for data manipulation [file:11].\n    -   `XGBoost` for the core prediction model [file:11].\n    -   `scikit-learn` for evaluation metrics (MAE, RMSE, MAPE) [file:11].\n    -   `statsmodels` for baseline ARIMA/SARIMA models [file:11].\n    -   `matplotlib` for data visualization [file:11].\n-   **Development Environment**: Google Colab, Visual Studio Code [file:11]\n\n## Evaluation\n\nModel performance was evaluated using standard regression metrics and visual analysis [file:11]:\n-   **Metrics**:\n    -   Mean Absolute Error (MAE) [file:11]\n    -   Root Mean Squared Error (RMSE) [file:11]\n    -   Mean Absolute Percentage Error (MAPE) [file:11]\n    -   **Forecast Accuracy**: Calculated as `100 - MAPE` [file:11]\n-   **Visualizations**:\n    -   Forecast vs. Actual graphs (daily and monthly) [file:11].\n    -   Feature importance plots from XGBoost [file:11].\n    -   Residual analysis plots (scatter and histograms) [file:11].\n\n## Results\n\nThe XGBoost model's performance varied across the three datasets, highlighting the importance of data quality for predictive accuracy.\n\n| Dataset | Income Accuracy | Expense Accuracy | Profit Accuracy | Cashflow Accuracy |\n| :--- | :--- | :--- | :--- | :--- |\n| **Dataset 1** | 81.51% | 74.09% | 75.46% | 77.90% |\n| **Dataset 2** | 19.00% | 45.62% | 73.43% | 76.68% |\n| **Dataset 3** | 83.63% | 96.18% | 89.12% | 84.49% |\n\n**Observations** [file:11]:\n-   **Dataset 3** yielded the most robust and consistent forecasts, demonstrating the impact of high-quality, well-structured data.\n-   **Dataset 2** had limited data volume, resulting in poor performance for some metrics (e.g., income).\n-   This comparative analysis underscores that data quality, consistency, and volume are critical for achieving reliable machine learning models in finance.\n\n## How to Use\n\n1.  **Clone the repository**:\n    ```\n    git clone \u003cyour-repository-url\u003e\n    ```\n2.  **Install dependencies**:\n    ```\n    pip install pandas numpy xgboost scikit-learn matplotlib\n    ```\n3.  **Prepare your data**: Ensure your financial data is in a CSV or Excel file and matches the structure used in the preprocessing steps.\n4.  **Run the Notebooks**: Open and execute the Jupyter/Colab notebooks for Income, Expense, Cashflow, or Profit. Make sure to place your data file in the correct directory.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshreeya-methuku%2Fpredictive_analysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshreeya-methuku%2Fpredictive_analysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshreeya-methuku%2Fpredictive_analysis/lists"}