{"id":24800189,"url":"https://github.com/mischieff01/predict-future-sales","last_synced_at":"2025-03-25T00:44:43.153Z","repository":{"id":272804826,"uuid":"917809140","full_name":"mischieff01/Predict-Future-Sales","owner":"mischieff01","description":null,"archived":false,"fork":false,"pushed_at":"2025-01-22T20:30:15.000Z","size":37060,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-30T03:17:39.699Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mischieff01.png","metadata":{"files":{"readme":"Readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-16T17:26:54.000Z","updated_at":"2025-01-22T19:50:48.000Z","dependencies_parsed_at":"2025-01-16T19:10:35.396Z","dependency_job_id":null,"html_url":"https://github.com/mischieff01/Predict-Future-Sales","commit_stats":null,"previous_names":["mischieff01/predict-future-sales"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mischieff01%2FPredict-Future-Sales","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mischieff01%2FPredict-Future-Sales/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mischieff01%2FPredict-Future-Sales/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mischieff01%2FPredict-Future-Sales/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mischieff01","download_url":"https://codeload.github.com/mischieff01/Predict-Future-Sales/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245377961,"owners_count":20605375,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-01-30T03:17:48.114Z","updated_at":"2025-03-25T00:44:43.116Z","avatar_url":"https://github.com/mischieff01.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Sales Analysis Project\n\nThis project is focused on analyzing sales data to extract meaningful insights and trends. The project includes data preprocessing, feature extraction, and visualization techniques to understand the sales patterns over time.\n\n## Features of the Notebook\n\n1. **Library Imports**:\n   - Utilizes essential Python libraries such as `numpy`, `pandas`, `matplotlib`, and `seaborn` for data manipulation and visualization.\n   - Includes `scikit-learn` for preprocessing and additional tools for advanced analysis.\n\n2. **Data Loading**:\n   - Loads datasets, including `sales_train.csv`, and other related files.\n   - Uses `pandas` for efficient data handling.\n\n3. **Data Cleaning**:\n   - Handles missing values with `isnull().sum()`.\n   - Processes and formats columns like `date` into datetime format for better analysis.\n\n4. **Feature Engineering**:\n   - Extracts `year` and `month` from the `date` column to allow temporal analysis.\n   - Groups data by `date_block_num` for aggregated insights.\n\n5. **Visualization**:\n   - Utilizes `matplotlib` to visualize monthly sales trends over different years.\n   - Implements custom plot settings for enhanced readability, such as `MONTHS`, `LINEWIDTH`, and `ALPHA` parameters.\n\n## Key Functions and Code Snippets\n\n### Extract Year and Month:\n```python\n# Convert date column to datetime\nsales['date'] = pd.to_datetime(sales['date'], format='%d.%m.%Y')\nsales['year'] = sales['date'].dt.year\nsales['month'] = sales['date'].dt.month\n```\n\n### Check for Missing Values:\n```python\n# Check for null values\nprint(sales.isnull().sum())\n```\n\n### Group by Date Block:\n```python\ndf = sales.groupby('date_block_num', as_index=False).agg({\n    'year': 'first',\n    'month': 'first',\n    'item_cnt_day': 'sum'\n})\n```\n\n### Plot Sales for a Specific Year:\n```python\nplt.figure(figsize=(10, 6))\nplt.plot(MONTHS, df[df['year'] == 2013].item_cnt_month, '-o', color='steelblue', linewidth=LINEWIDTH, alpha=ALPHA, label='2013')\nplt.show()\n```\n\n## Troubleshooting\n\n### Common Errors:\n- **`KeyError: 'year'`**:\n  - Cause: The `year` column might not exist in the DataFrame.\n  - Fix: Ensure the `year` column is created during preprocessing.\n  ```python\n  sales['year'] = sales['date'].dt.year\n  ```\n\n### Verifying Data:\n- Check the columns in the DataFrame:\n  ```python\n  print(df.columns)\n  ```\n- Ensure `date` column is correctly converted:\n  ```python\n  sales['date'] = pd.to_datetime(sales['date'], format='%d.%m.%Y')\n  ```\n\n## Dependencies\n- Python 3.x\n- Libraries:\n  - `numpy`\n  - `pandas`\n  - `matplotlib`\n  - `seaborn`\n  - `scikit-learn`\n\n## Usage\n1. Clone the repository:\n   ```bash\n   git clone https://github.com/your-username/sales-analysis.git\n   cd sales-analysis\n   ```\n\n2. Install the required libraries:\n   ```bash\n   pip install -r requirements.txt\n   ```\n\n3. Open the Jupyter Notebook:\n   ```bash\n   jupyter notebook sales_analysis.ipynb\n   ```\n\n4. Run the cells sequentially to process and analyze the sales data.\n\n## How to Use This Repository\n1. **Understand the Structure**:\n   - The repository contains the main notebook (`sales_analysis.ipynb`), datasets (`data/` folder), and a requirements file (`requirements.txt`).\n\n2. **Prepare the Data**:\n   - Place the datasets (e.g., `sales_train.csv`) in the `data/` folder.\n\n3. **Run the Analysis**:\n   - Follow the notebook to preprocess, clean, and analyze the sales data.\n\n4. **Extend the Project**:\n   - Modify the notebook to include additional features, visualizations, or analyses as needed.\n\n## Output\n- Monthly and yearly sales trends.\n- Insights into data patterns and seasonality.\n\n## Future Work\n- Incorporate machine learning models for sales prediction.\n- Add interactive visualizations using tools like Plotly or Dash.\n- Enhance data preprocessing with automated anomaly detection.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmischieff01%2Fpredict-future-sales","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmischieff01%2Fpredict-future-sales","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmischieff01%2Fpredict-future-sales/lists"}