{"id":28478794,"url":"https://github.com/devag2004/electricity-analysis-using-spark","last_synced_at":"2026-05-01T03:34:37.748Z","repository":{"id":295073858,"uuid":"989038251","full_name":"devag2004/Electricity-analysis-using-spark","owner":"devag2004","description":"electricity analysis project made using spark","archived":false,"fork":false,"pushed_at":"2025-05-23T13:27:21.000Z","size":2251,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-06-07T18:01:46.976Z","etag":null,"topics":["data-analysis","spark","spark-mllib"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/devag2004.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-23T13:11:29.000Z","updated_at":"2025-05-23T13:29:37.000Z","dependencies_parsed_at":"2025-05-23T15:06:38.791Z","dependency_job_id":null,"html_url":"https://github.com/devag2004/Electricity-analysis-using-spark","commit_stats":null,"previous_names":["devag2004/electricity-analysis-using-spark"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/devag2004/Electricity-analysis-using-spark","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devag2004%2FElectricity-analysis-using-spark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devag2004%2FElectricity-analysis-using-spark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devag2004%2FElectricity-analysis-using-spark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devag2004%2FElectricity-analysis-using-spark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/devag2004","download_url":"https://codeload.github.com/devag2004/Electricity-analysis-using-spark/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devag2004%2FElectricity-analysis-using-spark/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263282478,"owners_count":23442181,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-analysis","spark","spark-mllib"],"created_at":"2025-06-07T18:00:38.716Z","updated_at":"2026-05-01T03:34:37.631Z","avatar_url":"https://github.com/devag2004.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Electricity-analysis-using-spark\nThe project aims to provide insights into how various appliances contribute to electricity bills,\nhow consumption varies across cities and seasons, and how we can predict future consumption.\nThese insights can help consumers make informed decisions about their energy usage, assist\nutility companies in load balancing, and support policymakers in designing effective energy\nconservation initiatives.\nThe dataset used in this project contains information about:\n• Usage hours of various appliances (fans, air conditioners, refrigerators, TVs, monitors,\nand motor pumps)\n• City and regional information\n• Seasonal variations (month)\n• Electricity tariff rates\n• Monthly usage hours\n• Monthly electricity bills\nBy applying advanced analytics to this data, the project demonstrates how big data technologies\ncan transform raw information into actionable knowledge in the energy sector.\nLibraries and Technologies Used\nThe implementation relies on a robust stack of technologies and libraries:\n1. Apache Spark: The core framework that enables distributed data processing and\nanalysis. Spark's ability to process large datasets in-memory makes it ideal for this\nproject.\n2. PySpark: Python API for Spark that provides access to Spark's functionality while\nallowing integration with Python's rich ecosystem of data science libraries.\n\n3. Spark ML (MLlib): Spark's machine learning library used for building predictive\nmodels (RandomForestRegressor) and clustering algorithms (KMeans).\n4. Data Visualization Libraries:\no Matplotlib: For creating static visualizations like bar charts and line plots\no Seaborn: For generating statistical graphics like heatmaps\no Plotly: For interactive 3D visualizations of clustering results\n5. Scientific Computing Libraries:\no NumPy: For numerical computations and array operations\no Pandas: For data manipulation and analysis\n6. Google Colab Integration: The code includes components for file uploads and display\nin a Google Colab environment.\n\nWorkflow\nThe application follows a logical workflow designed to progress from basic to advanced\nanalysis:\nData Acquisition and Processing\n1. Data Loading: The user uploads a CSV file containing electricity consumption data.\n2. Initial Processing: The system loads the data into a Spark DataFrame, which\ndistributes the processing across available computing resources.\n3. Data Validation: Basic validation checks ensure that required columns are present.\nExploratory Data Analysis\n1. Dataset Overview: The system provides basic statistics and allows the user to select\nspecific charts for exploration.\n2. Consumption by City: Analyzes and visualizes how electricity consumption varies\nacross different cities, with detailed textual analysis explaining the patterns.\n3. Consumption by Month: Examines seasonal variations in electricity usage,\nidentifying peak months and trends.\n4. Appliance Usage Analysis: Investigates how different appliances contribute to overall\nelectricity consumption.\nAdvanced Analytics\n1. Correlation Analysis: A heatmap visualizes the relationships between different\nvariables, helping identify which factors are most strongly related to electricity bills.\n2. Appliance Impact Charts: Dedicated analysis of how each appliance's usage affects\nthe final electricity bill.\n3. Predictive Modeling: Building a Random Forest regression model to predict electricity\nbills based on appliance usage and other factors.\n4. Consumption Pattern Discovery: K-means clustering identifies distinct consumption\npatterns among households.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdevag2004%2Felectricity-analysis-using-spark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdevag2004%2Felectricity-analysis-using-spark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdevag2004%2Felectricity-analysis-using-spark/lists"}