{"id":20248226,"url":"https://github.com/sintel-dev/pyteller","last_synced_at":"2026-06-05T03:32:16.702Z","repository":{"id":62583949,"uuid":"292660757","full_name":"sintel-dev/pyteller","owner":"sintel-dev","description":"Time series forecasting using MLPrimitives","archived":false,"fork":false,"pushed_at":"2021-11-16T19:50:10.000Z","size":89247,"stargazers_count":1,"open_issues_count":3,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-01-14T02:23:37.263Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sintel-dev.png","metadata":{"files":{"readme":"README.md","changelog":"HISTORY.md","contributing":"CONTRIBUTING.rst","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-09-03T19:24:35.000Z","updated_at":"2023-03-15T22:50:12.000Z","dependencies_parsed_at":"2022-11-03T22:17:30.822Z","dependency_job_id":null,"html_url":"https://github.com/sintel-dev/pyteller","commit_stats":null,"previous_names":["signals-dev/pyteller"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sintel-dev%2Fpyteller","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sintel-dev%2Fpyteller/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sintel-dev%2Fpyteller/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sintel-dev%2Fpyteller/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sintel-dev","download_url":"https://codeload.github.com/sintel-dev/pyteller/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241696181,"owners_count":20004745,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-14T09:47:13.465Z","updated_at":"2025-12-03T04:02:54.005Z","avatar_url":"https://github.com/sintel-dev.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"left\"\u003e\n\u003cimg width=15% src=\"https://dai.lids.mit.edu/wp-content/uploads/2018/06/Logo_DAI_highres.png\" alt=“DAI-Lab” /\u003e\n\u003ci\u003eAn open source project from Data to AI Lab at MIT.\u003c/i\u003e\n\u003c/p\u003e\n\n\u003c!-- Uncomment these lines after releasing the package to PyPI for version and downloads badges --\u003e\n\u003c!--[![PyPI Shield](https://img.shields.io/pypi/v/pyteller.svg)](https://pypi.python.org/pypi/pyteller)--\u003e\n\u003c!--[![Downloads](https://pepy.tech/badge/pyteller)](https://pepy.tech/project/pyteller)--\u003e\n[![Github Actions Shield](https://img.shields.io/github/workflow/status/signals-dev/pyteller/Run%20Tests)](https://github.com/signals-dev/pyteller/actions)\n[![Coverage Status](https://codecov.io/gh/signals-dev/pyteller/branch/master/graph/badge.svg)](https://codecov.io/gh/signals-dev/pyteller)\n\n\n\n# pyteller\n\nTime series forecasting using MLPrimitives\n\n- Documentation: https://signals-dev.github.io/pyteller\n- Homepage: https://github.com/signals-dev/pyteller\n\n# Overview\n\npyteller is a time series forecasting library built with the end user in mind.\n\n\n## Table of Contents\n\n* [I. Data Format](#data-format)\n   * [I.1 Input](#input)\n   * [I.2 Output](#output)\n   * [I.3 Datasets in the library](#datasets-in-the-library)\n* [II. pyteller Pipelines](#pyteller-pipelines)\n   * [II.1 Current Available Pipelines](#current-available-pipelines)\n* [III. Install](#install)\n* [IV. Quick Start](#quick-start)\n\n\n# Data Format\n\n## Input\n\nThe expected input to pyteller pipelines is a .csv file with data in one of the following formats:\n\n### Targets Table\n#### Option 1: Single Entity (Academic Form)\nThe user must specify the following:\n* `timestamp_col`: the **string** denoting which column contains the **pandas timestamp** objects or **python datetime** objects corresponding to the time at which the observation is made\n* `target_signal`: an **integer** or **float** column with the observed target values at the indicated timestamps\n\nThis is an example of such table, where the `timestamp_col` is 'timestamp' and the `target_signal` is 'value'\n\n|  timestamp |     value |\n|------------|-----------|\n| 7/1/14 1:00 |  6210 |\n| 7/1/14 1:30 | 4656|\n| 7/1/14 2:00 | 3820 |\n|7/1/14 1:30|\t4656|\n|7/1/14 2:00|\t3820|\n|7/1/14 2:30|\t2873|\n|\n\n#### Option 2: Multiple Entity (Flat Form)\nThe user must specify the following:\n* `timestamp_col`: the **string** denoting which column contains the **pandas timestamp** objects or **python datetime** objects corresponding to the time at which the observation is made\n* `entities`: the **list** denoting the columns the user wants to make forecasts for\n\n\nThis is an example of such table, where the `timestamp_col` is 'timestamp' and the `entities` can be ['taxi 1','taxi 3']\n\n|  timestamp |     taxi 1 |     taxi 2 |    taxi 3 |\n|------------|-----------|-----------| -----------|\n| 7/1/14 1:00 |  6210 |  510 |  6230 |\n| 7/1/14 1:30 | 4656| 5666|656|\n| 7/1/14 2:00 | 3820 | 2420 | 3650 |\n|7/1/14 1:30|\t4656|\t4664| 380 |\n|7/1/14 2:00|\t3820|\t3520| 320 |\n|7/1/14 2:30|\t2873|\t1373| 3640 |\n\n\n#### Option 3: Multiple Entity (Long Form)\nThe user must specify the following:\n* `timestamp_col`: the **string** denoting which column contains the **pandas timestamp** objects or **python datetime** objects corresponding to the time at which the observation is made\n* `entity_col`: the **string** denoting which column contains the entities you will seperately make forecasts for\n* `target_signal`: the **string** denoting which columns contain the observed target value that you want to forecast for\n\n\nThis is an example of such table, where the `timestamp_col` is 'timestamp', the `entity_col` is 'region', and the `target_signal` is 'demand'.\n\n\n\n |  timestamp | region  |   demand |   Temp |   Rain |\n|------------|------------|-----------| -----------|-----------|\n 9/27/20 21:20 |  DAYTON|1841.6 | 65.78|\t0|\n|  9/27/20 21:20 | DEOK|2892.5 |75.92|\t0|\n| 9/27/20 21:20|  DOM|11276 | 55.29|\t0|\n|9/27/20 21:20| DPL|2113.7| 75.02|\t0.06|\n| 9/27/20 21:25 | DAYTON|1834.1 | 65.72|\t0|\n| 9/27/20 21:25 |DEOK| 2880.2 | 75.92|\t0|\n| 9/27/20 21:25| DOM| 11211.7 | 55.54|\t0|\n|9/27/20 21:25|DPL| 2086.6| 75.02|\t0.06|\n\n\n## Output\n\nThe output of the pyteller Pipelines is another table that contains the timestamp and the forecasting value(s), matching the format of the input targets table.\n\n## Datasets in the library\n\nFor development and evaluation of pipelines, we include the following datasets:\n#### NYC taxi data\n* Found on the [nyc website](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page), or the processed version maintained by Numenta [here](https://github.com/numenta/NAB/tree/master/data).\n* No modifications were made from the Numenta version\n\n#### Wind data\n* Found here on [kaggle](https://www.kaggle.com/sohier/30-years-of-european-wind-generation/metadata)\n* After downloading the FasTrak 5-Minute .txt files the .txt files for each day from 1/1/13-1/8/18 were compiled into one .csv file\n\n\n#### Weather data\n* Maintained by Iowa State University's [IEM](https://mesonet.agron.iastate.edu/request/download.phtml?network=ILASOS)\n* The downloaded data was from the selected network of 8A0 Albertville and the selected date range was 1/1/16 0:15 - 2/16/16 0:55\n\n#### Traffic data\n* Found on [Caltrans PeMS](http://pems.dot.ca.gov/?dnode=Clearinghouse)\n* No modifications were made from the Numenta version\n\n#### Energy data\n* Found on [kaggle](https://www.kaggle.com/robikscube/hourly-energy-consumption/metadata)\n* No modifications were made after downloading pjm_hourly_est.csv\nWe also use PJM electricity demand data found [here](https://dataminer2.pjm.com/feed/inst_load).\n\n\n\n## Current Available Pipelines\n\nThe pipelines are included as **JSON** files, which can be found\nin the subdirectories inside the [pyteller/pipelines](orion/pipelines) folder.\n\nThis is the list of pipelines available so far, which will grow over time:\n\n| name | location | description |\n|------|----------|-------------|\n| Persistence | [pyteller/pipelines/sandbox/persistence](../pipelines/sandbox/persistence) | uses the latest input to the model as the next output\n\n\n# Install\n\n## Requirements\n\n**pyteller** has been developed and tested on [Python 3.5, 3.6, 3.7 and 3.8](https://www.python.org/downloads/)\n\nAlso, although it is not strictly required, the usage of a [virtualenv](https://virtualenv.pypa.io/en/latest/)\nis highly recommended in order to avoid interfering with other software installed in the system\nin which **pyteller** is run.\n\nThese are the minimum commands needed to create a virtualenv using python3.6 for **pyteller**:\n\n```bash\npip install virtualenv\nvirtualenv -p $(which python3.6) pyteller-venv\n```\n\nAfterwards, you have to execute this command to activate the virtualenv:\n\n```bash\nsource pyteller-venv/bin/activate\n```\n\nRemember to execute it every time you start a new console to work on **pyteller**!\n\n\u003c!-- Uncomment this section after releasing the package to PyPI for installation instructions\n## Install from PyPI\n\nAfter creating the virtualenv and activating it, we recommend using\n[pip](https://pip.pypa.io/en/stable/) in order to install **pyteller**:\n\n```bash\npip install pyteller\n```\n\nThis will pull and install the latest stable release from [PyPI](https://pypi.org/).\n--\u003e\n\n## Install from source\n\nWith your virtualenv activated, you can clone the repository and install it from\nsource by running `make install` on the `stable` branch:\n\n```bash\ngit clone git@github.com:signals-dev/pyteller.git\ncd pyteller\ngit checkout stable\nmake install\n```\n\n## Install for Development\n\nIf you want to contribute to the project, a few more steps are required to make the project ready\nfor development.\n\nPlease head to the [Contributing Guide](https://signals-dev.github.io/pyteller/contributing.html#get-started)\nfor more details about this process.\n\n# Quick Start\n\nIn this short tutorial we will guide you through a series of steps that will help you\ngetting started with **pyteller**.\n\n## 1. Load the data\n\nIn the first step we will load the **Alabama Weather** data into a dataframe from the demo datasets in the `data` folder. This represents all of the data up-to-date that will be used to train the model.\n\n```python3\nfrom pyteller.data import load_data\ncurrent_data=load_data('../pyteller/data/AL_Weather_current.csv')\n```\nThe output is a dataframe:\n\n```\n    station     valid       tmpf        dwpf        relh        drct        sknt        p01i        alti      vsby        feel\n0     8A0    1/1/16 0:15   41.000     39.200       93.240      350.000      6.000      0.000       30.250    10.000      36.670\n1     4A6    1/1/16 0:15   41.000     39.000       70.080      360.000      5.000      0.000       30.300    10.000      37.080\n2     8A0    1/1/16 0:35   39.200     37.400       93.190      360.000      6.000      0.000       30.250    10.000      34.200\n3     4A6    1/1/16 0:35   41.000     32.000       70.080      360.000      5.000      0.000       30.290    10.000      37.080\n4     8A0    1/1/16 0:55   37.400     37.400       100.000     360.000      8.000      0.000       30.250    10.000      30.760\n```\n\n\nOnce we have the data, create an instance of the `Pyteller` class, where the input arguments are the forecast settings.\n\n```python3\nfrom pyteller.core import Pyteller\npyteller = Pyteller (\n    pipeline='persistence',\n    pred_length=3,\n    offset=5,\n)\n```\n\n## 2. Fit the data\nThe user now calls the `pyteller.fit` method to fit the data to the pipeline. The inputs are the loaded data and the column names. The user also specifies which signal or entities they want to predict for here.\n```python3\npyteller.fit(\n    data=current_data,\n    timestamp_col='valid',\n    target_signal='tmpf',\n    entity_col='station')\n```\n\n\n## 3. Save the trained model\nAt this point, the user has a trained model that can be pickled by calling the `pyteller.save` method, inputting the desired output path:\n\n```python3\npyteller.save('../fit_models/persistence')\n```\n\n## 4. Load the new data\nOnce the user gets new data that they want to use to make a prediction, they can load it in the same way they loaded the training data.\n```python3\ninput_data=load_data('../pyteller/data/AL_Weather_input.csv')\n ```\n\n ## 5. Forecast\nTo make a forecast, the user calls the `pyteller.forecast` method, which will output the forecasts for all signals and all entities.\n\n```python3\nforecast = pyteller.forecast(input_data)\n```\nThe output is a dataframe of all the predictions:\n\n```python3\n timestamp        8A0            4A6\n 2/4/16 18:15    42.800        44.800\n 2/4/16 18:35    42.800        42.600\n 2/4/16 18:55    44.800        43.000\n\n\n\n\n\n\n# What's next?\n\nFor more details about **pyteller** and all its possibilities\nand features, please check the [documentation site](\nhttps://signals-dev.github.io/pyteller/).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsintel-dev%2Fpyteller","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsintel-dev%2Fpyteller","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsintel-dev%2Fpyteller/lists"}