Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/drkenreid/global-temperature-analysis
A work-in-progress data analysis pipeline that leverages R, SQL Server, and Tableau to download, process, clean, and analyze global temperature data, aiming to uncover long-term climate trends and patterns.
https://github.com/drkenreid/global-temperature-analysis
climate climate-change climate-data climate-model climate-science data-science msql msql-database r sql tsql
Last synced: 2 months ago
JSON representation
A work-in-progress data analysis pipeline that leverages R, SQL Server, and Tableau to download, process, clean, and analyze global temperature data, aiming to uncover long-term climate trends and patterns.
- Host: GitHub
- URL: https://github.com/drkenreid/global-temperature-analysis
- Owner: DrKenReid
- Created: 2024-08-24T23:50:22.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-09-22T03:52:58.000Z (4 months ago)
- Last Synced: 2024-11-21T17:06:48.106Z (2 months ago)
- Topics: climate, climate-change, climate-data, climate-model, climate-science, data-science, msql, msql-database, r, sql, tsql
- Language: R
- Homepage:
- Size: 62.4 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# 🌡️ Global Temperature Analysis Project 🌍
A comprehensive data analysis pipeline for NOAA global temperature data, using R and SQL Server.
## 📋 Data Source
Uses NOAAGlobalTemp dataset, including:
- Global Historical Climate Network-Monthly (GHCNm) for land
- Extended Reconstructed Sea Surface Temperature (ERSST) for sea
- ICOADS and IABP for Arctic Ocean## 📊 Dataset Details
1. **Time Series Data**: `combined_time_series.csv` (1850-present, anomalies vs. 1901-2000 average)
2. **Gridded Data**: `gridded_data.csv` (5° x 5° grid, anomalies vs. 1991-2020 base)## 👥 Who Is This For?
- Climate researchers
- Data scientists working with environmental data
- Anyone interested in global temperature patterns## ✨ Features
- Automated data download and processing of raw NOAA temperature data
- Robust SQL database for data storage and querying
- Comprehensive data cleaning and analysis
- Calculates statistics and identifies temperature trends
- Exports results for further use
- Enhanced error handling and detailed logging
- Improved data consistency checks
- Modular SQL script execution
- Automated database setup and table creation
- Progress tracking for data conversion and processing## 🛠️ Requirements
- **R** (3.6.0+)
- **SQL Server** (2019+)
- **R packages**:```
curl, DBI, dplyr, httr, ncdf4, odbc, readr, xml2, progress, lubridate, tidyverse
```The script will automatically install and load these packages if they are not already available in your R environment.
## 🚀 How to Use
1. **Ensure** you have R and SQL Server installed on your system.
2. **Clone** the repository.
3. **Open** R or RStudio and set the working directory to the `R/` folder.
4. **Run** `runner.R`.
5. The script will automatically install any missing packages.
6. **Check** `data/processed/` for results and the SQL database for exploration data.The script handles package installation, database setup, data download, and processing automatically.
## 📁 Project Structure
Before running, your project should look like this. After running, you'll find the empty folders filled with downloaded raw temperature data, processed CSVs, and more.
```
temperature-analysis-project/
│
├── data/
│ ├── raw/
│ └── processed/
│
├── docs/
│ ├── database_schema.md
│ ├── data_dictionary.md
│ └── data_processing_pipeline.md
│
├── outputs/
│ ├── plots/
│ ├── tableau/
│ └── tables/
│
├── R/
│ ├── runner.R
│ └── utils.R
│
├── sql/
│ ├── drop_tables.sql
│ ├── explore_data.sql
│ ├── process_data.sql
│ ├── run_diagnostics.sql
│ └── setup_database.sql
│
├── .gitignore
└── README.md
```## 🔍 Key Features
- Automated data download and conversion
- SQL database creation and management
- Enhanced error handling and logging in R scripts
- Improved SQL script execution with support for multiple statements
- Automated database and table creation
- Data consistency checks for `TimeSeries` and `GriddedData` tables
- Detailed diagnostic queries for data verification
- Progress bars for data conversion and processing tasks## 🔜 Upcoming Features
- Advanced statistical analysis
- Machine learning integration
- Interactive visualization dashboard
- Geospatial analysis
- Correlation with other climate indicators## 🤝 Contributions
Contributions, bug reports, and feature requests are welcome!
## 📜 Data Use and Citation
When using this data, please cite:
> NOAA National Centers for Environmental Information, Climate at a Glance: Global Time Series, published [Month] 2024, retrieved on [Date] from https://www.ncei.noaa.gov/access/monitoring/climate-at-a-glance/global/time-series
**Note**: Replace `[Month]` and `[Date]` with the actual month and date of retrieval.
## 📄 License
MIT License
---