{"id":18777387,"url":"https://github.com/cloudera/cml_amp_structural_time_series","last_synced_at":"2025-07-29T17:16:54.825Z","repository":{"id":37654203,"uuid":"329438858","full_name":"cloudera/CML_AMP_Structural_Time_Series","owner":"cloudera","description":"Applying a structural time series approach to California hourly electricity demand data.","archived":false,"fork":false,"pushed_at":"2024-12-05T17:38:28.000Z","size":833,"stargazers_count":9,"open_issues_count":1,"forks_count":12,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-06-09T14:57:04.387Z","etag":null,"topics":["demand-forecasting","prophet","time-series"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cloudera.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2021-01-13T21:44:33.000Z","updated_at":"2025-03-14T02:02:40.000Z","dependencies_parsed_at":"2024-11-07T20:11:08.459Z","dependency_job_id":"bbb6f04c-4f04-4fa5-8601-1e1e93993bbe","html_url":"https://github.com/cloudera/CML_AMP_Structural_Time_Series","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/cloudera/CML_AMP_Structural_Time_Series","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cloudera%2FCML_AMP_Structural_Time_Series","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cloudera%2FCML_AMP_Structural_Time_Series/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cloudera%2FCML_AMP_Structural_Time_Series/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cloudera%2FCML_AMP_Structural_Time_Series/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cloudera","download_url":"https://codeload.github.com/cloudera/CML_AMP_Structural_Time_Series/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cloudera%2FCML_AMP_Structural_Time_Series/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260116645,"owners_count":22961064,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["demand-forecasting","prophet","time-series"],"created_at":"2024-11-07T20:10:24.915Z","updated_at":"2025-06-16T07:07:13.261Z","avatar_url":"https://github.com/cloudera.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Structural Time Series\n\nThis repo accompanies the Cloudera Fast Forward report [Structural Time Series](https://structural-time-series.fastforwardlabs.com/). It provides an example application of generalized additive models (via the [Prophet](https://facebook.github.io/prophet/) library) to California hourly electricity demand data.\n\nThe primary output of this repository is a small application exposing a probabilistic forecast and interface for asking a probabilistic question against it. The final app looks like this.\n\n![Forecasting app interface](img/app.png)\n\nInstructions are given both for general use (on a laptop, say), and for Cloudera CML and CDSW. We'll first describe what's here, then go through how to run everything.\n\n## Structure\n\nThe folder structure of the repo is as follows\n\n```\n.\n├── apps      # Two small Streamlit applications.\n├── cml       # This folder contains scripts that facilitate the project launch on CML.\n├── data      # This folder contains starter data, and is where forecasts will live.\n├── scripts   # This is where all the code that does something lives.\n└── sts       # A small library of useful functions.\n```\n\nThere's also an `img` folder that contains images for this README. That folder is unimportant and you can ignore it. Let's examine each of the important folders in turn.\n\n### `sts`\n\nThis is a small Python library of utility functions useful to our problem. Its structure is as follows:\n\n```\nsts\n├── data\n│   └── loader.py\n└── models\n    ├── baselines.py\n    └── prophet.py\n```\n\nBuilding a small library of problem-specific abstractions allows us to reuse them in multiple places. The code in `data/loader.py`, is reused in most of the scripts and applications. In this case, we have closed model details (such as the number of Fourier terms to include in a given Prophet model) into the library. It would be trivial to pass these through as arguments though, if we wanted to perform an extensive hyperparameter search for example.\n\n### `scripts`\n\nThese imperative scripts are where the _work_ of the analysis is done. Side-effectful actions such as I/O and model training occur in these scripts.\n\n```\nscripts\n├── fit_baseline_model.py\n├── fit_simple_prophet_model.py\n├── fit_complex_prophet_model.py\n├── fit_complex_log_prophet_model.py\n├── get_csv.py\n├── make_forecast.py\n└── validation_metrics.py\n```\n\n### `apps`\n\nTwo applications accompany this project. Each has a launcher script to assist launching an [Application](https://docs.cloudera.com/machine-learning/cloud/applications/topics/ml-applications.html) with CDSW/CML. To launch the applications in another environment, run the code inside the launcher files, with the prefixed `!` removed. You may need to specify different ports.\n\n```\napps\n├── diagnostics.py          # A model comparison and debugging assistant.\n├── forecast.py             # The primary forecasting interface.\n├── launch_diagnostics.py   # Launcher script for CDSW/CML\n└── launch_forecast.py      # Launcher script for CDSW/CML\n```\n\n#### Diagnostics\n\nThe diagnostic application serves two purposes. First, it computes and reports top level metrics for any forecasts saved in the `data/forecasts` directory.\n\n![Diagnostic app showing model metrics](img/diagnostic-metrics.png)\n\nSecond, it provides a few diagnostic charts, including a zoomable forecast.\n\n![Diagnostic app showing chart of forecast](img/diagnostic-chart.png)\n\n#### Forecast\n\nThe primary forecast application (pictured at the top of this README) is a prototype user interface for the forecast this analysis generates.\n\n### `cml`\n\nThese scripts serve as launch instructions to facilitate the automated project setup on CML. Each script is triggered by the declarative pipeline as defined in the `.project-metadata.yaml` file found in the project's root directory.\n\n```\ncml\n├── install_dependencies.py\n└── fit_models_parallel.py\n```\n\n## Running through the analysis\n\nThere are three ways to launch this project on CML:\n\n1. **From Prototype Catalog** - Navigate to the Prototype Catalog on a CML workspace, select the \"Structural Time Series\" tile, click \"Launch as Project\", click \"Configure Project\"\n2. **As ML Prototype** - In a CML workspace, click \"New Project\", add a Project Name, select \"ML Prototype\" as the Initial Setup option, copy in the [repo URL](https://github.com/cloudera/CML_AMP_Structural_Time_Series.git), click \"Create Project\", click \"Configure Project\"\n3. **Manual Setup** - In a CML workspace, click \"New Project\", add a Project Name, select \"Git\" as the Initial Setup option, copy in the [repo URL](https://github.com/cloudera/CML_AMP_Structural_Time_Series.git), click \"Create Project\". Launch a Python3 Workbench Session with at least 4GB of memory and 2vCPUs. Then follow the instructions below, in order.\n\n### Installation\n\nThe code and applications within were developed against Python 3.7, and are likely also to function with more recent versions of Python.\nIn CML or CDSW, start a Python 3 session (with at least 2 vCPU / 4 GiB Memory), and run\n\n```python\n!pip3 install -r requirements.txt     # notice `pip3`, not `pip`\n!pip3 install prophet==1.1.5\n```\n\nNext, install the `sts` module from this repository, with\n\n```python\n!pip3 install -e .\n```\n\nfrom inside the root directory of this repo.\nIf running from the session terminal instead of the REPL, omit the bangs (`!`).\n\n### Data\n\nWe use historic California electricity demand data from the [US Energy Information Administration](https://www.eia.gov/opendata/qb.php?category=3389936\u0026sdid=EBA.CAL-ALL.D.H).\n\nA full set of data through October 12th 2020 is included as a starter. More recent data can be fetched from the [EIA open data API](https://www.eia.gov/opendata/). Doing so requires an API key, which must be set as the `EIA_API_KEY` environment variable for this project. To fetch new data, simply call the `load_california_electricity_demand` function from the `sts.data.loader` module. The code is set up to work directly with the json response to the EIA API. By default, each time new data is fetched, it will overwrite the existing data. Similarly, when a new forecast is made, it will overwrite the existing forecast. It would not be hard to adapt the code to maintain a history of fetched data or forecasts if desired.\n\n### Scripts\n\nTo fit models and generate forecasts, we call each script in turn from the `scripts` directory.\n\n```bash\npython3 scripts/fit_baseline_model.py\npython3 scripts/fit_simple_prophet_model.py\npython3 scripts/fit_complex_prophet_model.py\npython3 scripts/fit_complex_log_prophet_model.py\n```\n\nThis will fit a series of models of increasing complexity and write their outputs (the mean forecast) to the `data/forecasts` directory. Launching the diagnostic app will show the metrics and diagnostic charts for each model.\n\nThe most complex model wins. We can view its metrics when trained on the validation data (through 2019) by running the `scripts/validation_metrics.py` script. We can then generate 1000 samples from the model trained on all available training data with the `scripts/make_forecast.py` script. When those samples are written to disk, we can use the forecast app to investigate them.\n\nThe additional script, `get_csv.py`, simply fetches and writes data as a csv, which is convenient for any ad hoc analytics and interactive exploration.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcloudera%2Fcml_amp_structural_time_series","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcloudera%2Fcml_amp_structural_time_series","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcloudera%2Fcml_amp_structural_time_series/lists"}