https://github.com/pasanbhanu/time-series-forcasting-benchmark-dataset-preprocessing
Benchmark Datasets for Time Series Forecasting Preprocessing - NASA HTTP Dataset, WorldCup98 Dataset
https://github.com/pasanbhanu/time-series-forcasting-benchmark-dataset-preprocessing
benchmark-datasets datasets machine-learning
Last synced: about 2 months ago
JSON representation
Benchmark Datasets for Time Series Forecasting Preprocessing - NASA HTTP Dataset, WorldCup98 Dataset
- Host: GitHub
- URL: https://github.com/pasanbhanu/time-series-forcasting-benchmark-dataset-preprocessing
- Owner: PasanBhanu
- License: mit
- Created: 2024-06-23T07:37:56.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-02-19T13:32:41.000Z (3 months ago)
- Last Synced: 2025-03-24T08:18:02.600Z (about 2 months ago)
- Topics: benchmark-datasets, datasets, machine-learning
- Language: Jupyter Notebook
- Homepage: https://ita.ee.lbl.gov/html/traces.html
- Size: 2.95 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Data Pre Processor for Time Series Forcasting
This is a data preprocessing algorithm for widely used data sets provided by ["The Internet Traffic Archive"](https://ita.ee.lbl.gov).
The supported datasets are,
- WorldCup98 Dataset - [View](https://ita.ee.lbl.gov/html/contrib/WorldCup.html)1,352,804,107 web requests recorded at servers for the 1998 World Cup.
- NASA HTTP Logs Dataset - [View](https://ita.ee.lbl.gov/html/contrib/NASA-HTTP.html)3,461,612 HTTP logs from a busy WWW server for two months.
This algorithm process the both data sets and create CSV for time series analysis. CSV file format is given below.
| minute | count |
|--------|-------|
|1995-07-01 00:00:00| 42 |
|1995-07-01 00:01:00| 61 |
|1995-07-01 00:02:00| 57 |### Features of Algorithm
- WorldCup98 dataset automatic FTP download
- WorldCup98 dataset cross validation with original file for record count
- Visualize the processed data
- Timeseries ready csv output
- Shrink the dataset size for easier processing### Preprocessed Files
If you are interested in preprocessed files, check `processeddata` folder for CSV files.