{"id":44257419,"url":"https://github.com/berrieslab/fpca-load-tools","last_synced_at":"2026-02-10T16:35:21.273Z","repository":{"id":252253293,"uuid":"833331997","full_name":"BerriesLab/fpca-load-tools","owner":"BerriesLab","description":"Functional Principal Component Analysis and Functional Regression Tools for Electricity Load Curves","archived":false,"fork":false,"pushed_at":"2024-08-09T09:31:18.000Z","size":3844,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-04T21:58:31.138Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BerriesLab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-07-24T20:30:33.000Z","updated_at":"2025-02-24T05:39:47.000Z","dependencies_parsed_at":"2025-04-12T00:46:45.247Z","dependency_job_id":"61c3441a-1bcb-4705-b313-8f64733e0419","html_url":"https://github.com/BerriesLab/fpca-load-tools","commit_stats":null,"previous_names":["berrieslab/fpca-load-tools"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/BerriesLab/fpca-load-tools","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BerriesLab%2Ffpca-load-tools","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BerriesLab%2Ffpca-load-tools/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BerriesLab%2Ffpca-load-tools/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BerriesLab%2Ffpca-load-tools/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BerriesLab","download_url":"https://codeload.github.com/BerriesLab/fpca-load-tools/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BerriesLab%2Ffpca-load-tools/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29307913,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-10T16:09:25.305Z","status":"ssl_error","status_checked_at":"2026-02-10T16:08:52.170Z","response_time":65,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-02-10T16:35:20.656Z","updated_at":"2026-02-10T16:35:21.253Z","avatar_url":"https://github.com/BerriesLab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Functional Principal Component Analysis and Functional Regression Tools for Electricity Load Curves\n\n[![DOI](https://zenodo.org/badge/833331997.svg)](https://zenodo.org/doi/10.5281/zenodo.13273085)\n\nBased on the work by [**D. Beretta et al.**, *Sustainable Energy, Grids and Networks, Volume 21, March 2020,\n100308*](https://www.sciencedirect.com/science/article/abs/pii/S2352467719304461).\nReaders are encouraged to consult the manuscript to master the methodology.\n\nThese tools are designed for scientists, researchers, and engineers working in all fields of electrical grid\noptimization. They provide capabilities to apply FPCA to daily electricity load curves and predict future\nconsumption patterns based on historical data.\n\n\u003c!-- TOC --\u003e\n* [Installation](#installation)\n* [Overview](#overview)\n* [Time Series](#time-series)\n  * [Loading time series](#loading-time-series)\n  * [Converting between UTC and Local time](#converting-between-utc-and-local-time)\n  * [Filtering complete time-series](#filtering-complete-time-series)\n* [FPCA](#fpca)\n  * [Applying FPCA](#applying-fpca)\n  * [Saving and loading FPCA results](#saving-and-loading-fpca-results)\n  * [Displaying FPCA results](#displaying-fpca-results)\n* [Functional Regression](#functional-regression)\n  * [The model](#the-model)\n  * [Prediction](#prediction)\n  * [Loading and Saving](#loading-and-saving)\n* [Tutorial](#tutorial)\n* [Contributing](#contributing)\n* [Credits](#credits)\n* [License](#license)\n\u003c!-- TOC --\u003e\n\n## Installation\n\nThe `fpca-load-tools` can be installed either via pip or by cloning this repository.\n\nTo install the tools via pip, use the following commands based on your operating system:\n\n- **On Unix/Mac**: Open a terminal and run:\n  ```bash \n  pip install fpca-load-tools\n- **On Windows**: Open Command Prompt or PowerShell and run:\n  ```bash\n  pip install fpca-load-tools\n\nTo install the app by cloning the repository to your local machine, follow these steps:\n\n- Clone the repository:\n  ```bash\n  git clone https://github.com/BerriesLab/fpca-load-tools.git\n- Navigate to the project directory:\n  ```bash\n  cd project-name\n- Install the required dependencies:\n  ```bash\n  pip install -r requirements.txt\n\n## Overview\n\n**fpca-load-tools** is designed around three main classes:\n\n- [`ElectricityLoadTimeSeries`](fpca_load_tools/time_series.py): Manages time series data within a Pandas DataFrame, including\n  pre-processing operations such as filtering for complete time series and augmenting time series with calendar\n  information.\n- [`ElectricityLoadFPCA`](fpca_load_tools/fpca.py): Applies Functional Principal Component Analysis (FPCA) to daily\n  electricity load curves. This class requires an ElectricityLoadTimeSeries object as attribute.\n    - **Note 1**: The ElectricityLoadTimeSeries object is stored as a reference in ElectricityLoadFPCA. Therefore, any\n      changes made to the ElectricityLoadTimeSeries object outside ElectricityLoadFPCA will also affect the data being\n      processed by the FPCA.\n    - **Note 2**: The results of the FPCA, including the scores, are stored in the *results* attribute of the class.\n- [`ElectricityLoadRegression`](fpca_load_tools/prediction.py): Trains a model and predicts daily electricity load curves\n  using FPCA data. It requires an ElectricityLoadFPCA object as attribute.\n    - **Note 1**: The ElectricityLoadFPCA object is stored as a reference in ElectricityLoadRegression.Therefore, any\n      modifications to the ElectricityLoadFPCA object outside ElectricityLoadRegression will affect the data being\n      processed by the regressor.\n    - **Note 2**: The results of the training are stored in the class attributes '**model**' and '**scaler**'.\n\nThe following figure is a graphical representation of the classes with their own attributes and methods.\n\n```mermaid\nclassDiagram\n    class ElectricityLoadTimeSeries {\n        • ts: pd.DataFrame\n        • filter_complete_data()\n        • filter_complete_days()\n        • filter_complete_months()\n        • filter_complete_years()\n        • filter_non_null_entries()\n        • resample_days()\n        • augment_time_series_with_day_of_the_week()\n        • augment_time_series_with_year_month_day()\n        • drop_year_month_day()\n        • convert_utc_to_local_timestamp()\n        • sort()\n        • save_time_series()\n        • load_time_series()\n        • load_example_entsoe_transparency()\n    }\n    class ElectricityLoadFPCA {\n        • ts: ElectricityLoadTimeSeries\n        • results: ElectricityLoadFPCAResults\n        • apply_fpca_to_all_days_grouped_by_date()\n        • apply_fpca_to_all_days_grouped_by_weekday()\n        • apply_fpca_to_all_days_grouped_by_month()\n        • plot_scores_vs_day_of_the_week()\n        • plot_scores_vs_month_of_the_year()\n        • plot_cdf_of_explained_variability()\n        • plot_fpc()\n        • plot_functional_boxplot()\n        • save_fpca_results()\n        • load_fpca_results()\n    }\n    class ElectricityLoadFPCAResults {\n        • day = None\n        • day_of_the_week = None\n        • month_of_the_year = None\n    }\n    class ElectricityLoadRegression {\n        • fpca: ElectricityLoadFPCA\n        • model: dict[LinearRegression]\n        • scaler: StandardScaler\n        • train_linear_model()\n        • predict_daily_eletricity_load_curve()\n        • save_model()\n        • load_model()\n    }\n    ElectricityLoadFPCAResults --|\u003e ElectricityLoadFPCA\n    ElectricityLoadFPCA --|\u003e ElectricityLoadRegression\n    ElectricityLoadTimeSeries --|\u003e ElectricityLoadFPCA\n\n```\n\n## Time Series\n\n### Loading time series\n\nUsers can load time series from CSV files using the [`load_time_series()`](fpca_load_tools/time_series.py) method of the\nElectricityLoadTimeSeries class. The expected data structure in the CSV file is:\n\n| utc_timestamp | load | feature_1 | feature_2 | ... | feature_n |\n|---------------|------|-----------|-----------|-----|-----------|\n| ...           | ...  | ...       | ...       | ... | ...       |\n\n- utc_timestamp: The timestamp in Coordinated Universal Time (UTC) or Greenwich Mean Time (GMT).\n- load: The electricity load measurement.\n- feature_1 to feature_n: Additional features for analysis and/or prediction.\n\nUpon loading, the CSV file is converted into a Pandas DataFrame with a **DateTimeIndex** based on the **utc_timestamp**.\n\nUsers can load multiple files and features as needed. The method automatically merges new CSV files with the existing\nDataFrame in memory on the **DateTimeIndex**. Users should ensure that only one column named **load** is present in\nmemory. To help the user, [`load_time_series()`](fpca_load_tools/time_series.py) allows to select which columns to load from\nteh CSV file and to choose the names for these columns in the destination DataFrame. If multiple columns with the same\nname are loaded, Pandas will handle them by renaming the new columns with suffixes (e.g., **column_name_r**).\n\nAn example of meteorological time series data that could be merged with the electricity load time series is:\n\n| utc_timestamp | temperature | radiation | relative_humidity |\n|---------------|-------------|-----------|-------------------|\n| ...           | ...         | ...       | ...               |\n\nTo save time series data to CSV files, users can use the [`save_time_series()`](fpca_load_tools/time_series.py) method.\n\n### Converting between UTC and Local time\n\nWhen studying electricity load time series, the choice between using UTC (Coordinated Universal Time) and local time\ndepends on the objectives of the analysis and the nature of the data. For standardization purposes, such as comparing\nelectricity load across different time zones, using UTC provides a uniform time reference and simplifies time zone\nconversions. However, if the goal is to investigate **consumer behavior**, local time may be more relevant since\nelectricity load often correlates with human activities and routines, which follow local time patterns (e.g., peak load\ntimes during mornings and evenings). Similarly, for **operational planning**, such as scheduling generation or demand\nresponse activities, local time aligns better with the actual timing of events and conditions experienced by consumers\nand grid operators.\n\nTo facilitate this, the [`ElectricityLoadTimeSeries`](fpca_load_tools/time_series.py) class includes the\nmethod [`convert_utc_to_local_timestamp`](fpca_load_tools/time_series.py), which converts the UTC DateTimeIndex to the\ncorresponding local timestamp. This method requires the user to specify the geographical area in [Olson Timezone](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones) format.\n\nIt is important to note that converting from UTC to local time, including accounting for daylight saving time, can\nresult in days with duplicate entries or missing values. To address these issues, you can either resample the days with\nduplicates or missing entries or remove days that do not meet completeness and integrity requirements. For detailed\nguidance on handling these issues, see the section on [**Filtering complete time series**](#Filtering-complete-time-series)\n\n### Filtering complete time-series\n\nTo execute FPCA and predict future electricity load curves, it is essential that the dataset is complete. The\ncompleteness criteria are as follows:\n\n- **Year Completeness**: A year is considered complete if the number of months with non-null entries meets or exceeds a\n  tolerance percentage of the expected number of months, which is 12. By default, this tolerance level is set to 11/12.\n- **Month Completeness**: A month is considered complete if the number of days with non-null entries meets or exceeds a\n  tolerance percentage of the expected number of calendar days. By default, this tolerance level is set to 95% of the\n  month's calendar days.\n- **Day Completeness**: A day is considered complete if the number of non-null entries meets or exceeds a tolerance\n  percentage of the expected number of entries. By default, this tolerance level is set to 100% of the expected entries.\n\nTo filter a complete dataset, the user can use the [`filter_complete_data()`](fpca_load_tools/time_series.py) method\nfrom the [`ElectricityLoadTimeSeries`](fpca_load_tools/time_series.py) class. This method utilizes four sequentially\nexecuted methods:\n\n1. [`filter_non_null_entries()`](fpca_load_tools/time_series.py): Delete all rows with at least one `None` value.\n2. [`filter_complete_years()`](fpca_load_tools/time_series.py): Remove incomplete years. The default tolerance is set to\n   11/12.\n3. [`filter_complete_months()`](fpca_load_tools/time_series.py): Remove incomplete months. The default tolerance is set to 95%\n   of the month’s calendar days.\n4. [`filter_complete_days()`](fpca_load_tools/time_series.py): Remove incomplete days. The default tolerance is set to 100% of the\n   mode of the time series grouped by date.\n\n**Note 1**: As the first step, entries with null data are dropped by the [`filter_non_null_entries`](fpca_load_tools/time_series.py) method. This is\nessential because the subsequent methods only evaluate the DateTimeIndex values, regardless of the columns actual values.\n\n**Note 2**: When filtering days with a tolerance level less than 100% or converting timestamps from UTC to local time, the\nresulting time series may include missing values. To address this, the user can use\nthe [`resample_days()`](fpca_load_tools/time_series.py) method. This method resamples the time series daily with a\nuser-defined frequency (defaulting to one hour). Missing values are linearly interpolated between their nearest\nneighbors, and any remaining None values at the beginning or end of an interpolated period are filled with the nearest\nneighbor value.\n\n## FPCA\n\nThe standard PCA in scikit-learn expects a 2D data matrix where each row represents a sample and each column represents\na feature. In the context of FPCA, the \"features\" are values of the functions at discretized points.\n\n### Applying FPCA\n\nThe class [`ElectricityLoadFPCA`](fpca_load_tools/fpca.py) offers three methods to apply three different types of FPCAs:\n\n- [`apply_fpca_to_all_days_grouped_by_date()`](fpca_load_tools/fpca.py): Applies FPCA to daily curves grouped by date.\n- [`apply_fpca_to_all_days_grouped_by_weekday()`](fpca_load_tools/fpca.py): Applies FPCA to daily curves grouped by day of\n  the week.\n- [`apply_fpca_to_all_days_grouped_by_month()`](fpca_load_tools/fpca.py): Applies FPCA to daily curves grouped by month of\n  the year.\n\nThe results from each FPCA are stored in an instance of the [`ElectricityLoadFPCAResults`](fpca_load_tools/fpca.py)\nclass, which is the '**results**' attribute of [`ElectricityLoadFPCA`](fpca_load_tools/fpca.py).Note that only one result\nper FPCA type can be stored at a time: performing an analysis again will overwrite any previous results. For example,\nrunning [`apply_fpca_to_all_days_grouped_by_date()`](fpca_load_tools/fpca.py) a second time will replace the results from the\nfirst analysis.\n\n### Saving and loading FPCA results\n\nFPCA results can be saved to and loaded from a pickle file on disk using the following methods:\n\n- [`save_fpca_results()`](fpca_load_tools/fpca.py)\n- [`load_fpca_results()`](fpca_load_tools/fpca.py).\n\n### Displaying FPCA results\n\nThe [`ElectricityLoadFPCAResults`](fpca_load_tools/fpca.py) class provides several plotting methods for visualizing\nFPCA results, similar to the visualizations reported in **D. Beretta et al.**, *Sustainable Energy, Grids and Networks,\nVolume 21, March 2020, 100308*. These methods include:\n\n- [`plot_functional_boxplot()`](fpca_load_tools/fpca.py): Plots a functional boxplot that overlays all daily load curves with median and\n  interquartile bands.\n\n\u003cdiv id=\"fig_params\" align=\"center\"\u003e \n    \u003cimg src=\"images/iqr.png\" alt=\"\" width=\"500\"\u003e\n    \u003cp\u003e\u003cb\u003eFigure 1:\u003c/b\u003e Functional boxplot for a representative dataset.\u003c/p\u003e\n\u003c/div\u003e\n\n- [`plot_fpc()`](fpca_load_tools/fpca.py): Plots the Functional Principal Components (FPCs), rescaled according to their explained variance ratio.\n\n\u003cdiv id=\"fig_params\" align=\"center\"\u003e \n    \u003cimg src=\"images/fpc.png\" alt=\"\" width=\"500\"\u003e\n    \u003cp\u003e\u003cb\u003eFigure 2:\u003c/b\u003e FPCs of a representative dataset rescaled by their explained variance ratio.\u003c/p\u003e\n\u003c/div\u003e\n\n- [`plot_cdf_of_explained_variability()`](fpca_load_tools/fpca.py): Plots the Cumulative Distribution Function (CDF) of the explained\n  variability percentage as a function of the number of FPCs. \n\n\u003cdiv id=\"fig_params\" align=\"center\"\u003e \n    \u003cimg src=\"images/cdf.png\" alt=\"\" width=\"500\"\u003e\n    \u003cp\u003e\u003cb\u003eFigure 3:\u003c/b\u003e CDF of a representative dataset.\u003c/p\u003e\n\u003c/div\u003e\n\n- [`plot_scores_vs_day_of_the_week()`](fpca_load_tools/fpca.py): Plots a boxplot of FPC scores versus the day of the week for the first n FPCs. \n\n\u003cdiv id=\"fig_params\" align=\"center\"\u003e \n    \u003cimg src=\"images/boxplot_weekday.png\" alt=\"\" width=\"500\"\u003e\n    \u003cp\u003e\u003cb\u003eFigure 4:\u003c/b\u003e Scores boxplot of a representative dataset vs day of the week.\u003c/p\u003e\n\u003c/div\u003e\n\n- [`plot_scores_vs_month_of_the_year()`](fpca_load_tools/fpca.py): Plots a boxplot of FPC scores versus the month of the year for the first n FPCs. \n\n\u003cdiv id=\"fig_params\" align=\"center\"\u003e \n    \u003cimg src=\"images/boxplot_month.png\" alt=\"\" width=\"500\"\u003e\n    \u003cp\u003e\u003cb\u003eFigure 5:\u003c/b\u003e Scores boxplot of a representative dataset vs month of the year.\u003c/p\u003e\n\u003c/div\u003e\n\n\n\n**Note**: All above methods collect the data to plot from the [`ElectricityLoadFPCA`](fpca_load_tools/fpca.py) class' attributes. \n\n## Functional Regression\n\nFPCA can be integrated into any time-series predictive model to predict daily electricity load curves. Unlike\ntraditional time-series models that predict actual data, FPCA-based models predict the scores of a selected number of\nFunctional Principal Components (FPCs). This approach balances model complexity and explained variability. For more\ndetails on this methodology, refer to [**D. Beretta et al.**, *Sustainable Energy, Grids and Networks, Volume 21, March\n2020, 100308*](https://www.sciencedirect.com/science/article/abs/pii/S2352467719304461).\n\n### The model\n\nThe functional decomposition allows to cast the electricity load curve of a given day in the form:\n\n$$f^{(i)}(t) = \\sum{c^{(i)}_k \\phi_k} $$\n\nwhere $f(t)$ is the electricity load curve of the i-th day, $c^{(i)}_k$ is the score of the k-th FPC for the i-th day,\nand $\\phi_k$ is the k-th FPC of the time series grouped by date. The $c^{(i)}_k$ can be estimated with the linear model:\n\n$$ \nc_k^{(i)} = w_k^{(i)} + w_{k,1}^{(i)} * x_1^{(i)} + w_{k,2}^{(i)} * x_2^{(i)} + ... + w_{k,m}^{(i)} x_m^{(i)} \n$$\n\nwhere $c_k^{(i)}$ is the score of the k-th FPC for the i-th day, $x_l^{(i)}$ is the l-th feature for the i-th day, and\n$w_{k,l}^{(i)}$ is the l-th feature weight for the k-th FPC of the i-th day.\n\n**Note**: Since the model predicts the FPCs scores, and since the FPCs are daily time series, the features must be\naveraged over the day, e.g. the average temperature of the day.\n\n### Prediction\n\nThe class [`ElectricityLoadRegression`](fpca_load_tools/prediction.py) handles the prediction process. It can be instantiated with or \nwithout passing an instance of [`ElectricityLoadFPCA`](fpca_load_tools/fpca.py).\n\nThe [`ElectricityLoadRegression`](fpca_load_tools/prediction.py) class provides a method for training a linear model and a method for\npredicting the electricity load curves. Specifically:\n\n- [`train_linear_model()`](fpca_load_tools/prediction.py): Trains the model described in section [**The model**](#The-model) \niteratively on the first n FPCs using [scikit-learn LinearRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html). This results in n weight matrices, \none for each FPC, which are stored as part of the respective objects in the class '**model**' attribute. \n\n\u003cdiv id=\"fig_params\" align=\"center\"\u003e \n    \u003cimg src=\"images/predicted_scores.png\" alt=\"\" width=\"500\"\u003e\n    \u003cp\u003e\u003cb\u003eFigure 6:\u003c/b\u003e Actual vs predicted scores of FPC1 for a representative dataset.\u003c/p\u003e\n\u003c/div\u003e\n\n- [`predict_daily_electricity_load_curve()`](fpca_load_tools/prediction.py): Predicts the electricity load curve for a specified future date,\nand returns a list of prediction metrics, including the percentage power error.\n\n\u003cdiv id=\"fig_params\" align=\"center\"\u003e \n    \u003cimg src=\"images/predicted_load.png\" alt=\"\" width=\"500\"\u003e\n    \u003cp\u003e\u003cb\u003eFigure 7:\u003c/b\u003e Actual vs predicted electricity load curve.\u003c/p\u003e\n\u003c/div\u003e\n\n### Loading and Saving\n\nThe ElectricityLoadRegression class includes methods for saving and loading the model parameters and the feature scaler:\n\n- [`save_model()`](fpca_load_tools/prediction.py): Saves the model and feature scaler to pickle file.\n- [`load_model()`](fpca_load_tools/prediction.py): Loads a previously saved model and feature scaler from pickle file.\n\n## Tutorial\nPlease follow the [tutorial](tutorials/fpca_and_prediction_for_entso_e_it_dataset.py) to learn how to use `fpca-load-tools` in practice.\n\n## Contributing\n\nPlease read [CONTRIBUTING.md](CONTRIBUTING.md) for details on our code of conduct, and the process for submitting pull\nrequests.\n\n## Credits\n\nThis app has been developed by **D. Beretta**, building on the work by **D. Beretta et al.**, *Sustainable Energy, \nGrids and Networks, Volume 21, March 2020, 100308*. Please refer to [CREDITS.md](CREDITS.md) and [CITATION.md](CITATION.cff) \nfor more details.\n\n## License\n\nThis project is licensed under the GNU License - see the [LICENSE](LICENSE) file for details.\n\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fberrieslab%2Ffpca-load-tools","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fberrieslab%2Ffpca-load-tools","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fberrieslab%2Ffpca-load-tools/lists"}