{"id":29673650,"url":"https://github.com/aka-buccia/progetto-statistica","last_synced_at":"2026-05-17T17:43:03.293Z","repository":{"id":304961171,"uuid":"996679411","full_name":"aka-buccia/progetto-statistica","owner":"aka-buccia","description":"EDA, Classification, and Linear Regression on a Weather Dataset for the NS-25 project","archived":false,"fork":false,"pushed_at":"2025-07-17T09:29:20.000Z","size":13544,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-07-17T13:09:18.383Z","etag":null,"topics":["classification","mle","python","regression","statistics"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aka-buccia.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-05T09:50:33.000Z","updated_at":"2025-07-17T09:29:23.000Z","dependencies_parsed_at":"2025-07-17T16:52:04.900Z","dependency_job_id":"e65bb96a-8765-4e5f-b869-ca8674684ecf","html_url":"https://github.com/aka-buccia/progetto-statistica","commit_stats":null,"previous_names":["aka-buccia/progetto-statistica"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/aka-buccia/progetto-statistica","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aka-buccia%2Fprogetto-statistica","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aka-buccia%2Fprogetto-statistica/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aka-buccia%2Fprogetto-statistica/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aka-buccia%2Fprogetto-statistica/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aka-buccia","download_url":"https://codeload.github.com/aka-buccia/progetto-statistica/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aka-buccia%2Fprogetto-statistica/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266580565,"owners_count":23951246,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-22T02:00:09.085Z","response_time":66,"last_error":null,"robots_txt_status":null,"robots_txt_updated_at":null,"robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["classification","mle","python","regression","statistics"],"created_at":"2025-07-22T22:06:28.562Z","updated_at":"2026-05-17T17:42:53.274Z","avatar_url":"https://github.com/aka-buccia.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Numerical Statistics 25 - Weather Dataset analysis\n\n![Plot showing value distribution of the dataset features](./images/features_distribution.png)\n\n## General\n\n- /scripts/classification.py contains code for EDA and classification\n- /scripts/linear_regression.py contains code for linear regression\n\nPlease note that linear regression analysis requires a processed (cleaned) dataset. Therefore, it's required to run `classification.py` at least once before executing `linear_regression.py`.\n\n## Report\nIn the /reports directory there's available:\n- **report.pdf** and **report.ipynb**: a detailed report of the analysis\n- **project_presentation.pdf**: a general presentation of the project for exam evaluation\n\n## Change city\n\nThe dataset contains data for 18 European cities. My analysis was conducted on Oslo, but the code is designed to work with other cities as well. You only need to change the value of the variable `citta` in both scripts. The selected city should have all the weather parameters registered, and among the 18 cities, only the following meet this requirement:\n\n- Budapest\n- Dusseldorf\n- Maastricht\n- Munchen\n- Oslo\nFeel free to modify the `citta` variable to explore the data for any of these cities.\n\n## Dependencies\n\nInstall dependencies with\n\n```bash\npip install -r requirements.txt\n```\n\nEventually you can create a virtual enviroment\n\n```bash\npython3 -m venv weather_project\nsource weather_project/bin/activate\npip install -r requirements.txt\n```\n\nTo deactivate when you're done\n\n```bash\ndeactivate\n```\n\n## Reference\n\n- Klein Tank, A.M.G. and Coauthors, 2002. Daily dataset of 20th-century surface\n  air temperature and precipitation series for the European Climate Assessment.\n  Int. J. of Climatol., 22, 1441-1453.\n  Data and metadata available at \u003chttp://www.ecad.eu\u003e\n- Florian Huber, Dafne van Kuppevelt, Peter Steinbach, Colin Sauze, Yang Liu, Berend Weel, \"Will the sun shine? – An accessible dataset for teaching machine learning and deep learning\", DOI TO BE ADDED!\n  Data and metadata available at \u003chttps://github.com/florian-huber/weather_prediction_dataset\u003e\n- Dataset available on Kaggle at \u003chttps://www.kaggle.com/datasets/thedevastator/weather-prediction\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faka-buccia%2Fprogetto-statistica","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faka-buccia%2Fprogetto-statistica","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faka-buccia%2Fprogetto-statistica/lists"}