An open API service indexing awesome lists of open source software.

https://github.com/shadowfaxx1/crop-data-analysis

Crop Prediction Using Machine Learning , a side project for the internship done for UCT Noida
https://github.com/shadowfaxx1/crop-data-analysis

Last synced: 2 months ago
JSON representation

Crop Prediction Using Machine Learning , a side project for the internship done for UCT Noida

Awesome Lists containing this project

README

        

# 🔰 Crop Yield Prediction :

[![Crop Yield Prediction Data Science](https://img.shields.io/badge/Crop-Yield--Prediction--Data%20Science-blueviolet)](https://github.com/shadowfaxx1/upSkill-Campus.git)

This data science project aims to predict crop yields based on various factors related to the cost of cultivation and production for different crops in different states.

## Table of Contents

- [Introduction](#introduction)
- [Project Structure](#project-structure)
- [Data Sources](#data-sources)
- [Installation](#installation)
- [Usage](#usage)
- [Data Preprocessing](#data-preprocessing)
- [Exploratory Data Analysis (EDA)](#exploratory-data-analysis-eda)
- [Model Training and Evaluation](#model-training-and-evaluation)
- [Results](#results)
- [Future Improvements](#future-improvements)
- [Contributing](#contributing)
- [License](#license)

## Introduction

In this project, we use machine learning algorithms to predict crop yields based on different cost factors related to the cultivation and production of crops. The goal is to understand how the cost of cultivation impacts crop yields and make predictions for future yields.

## Project Structure

The project is structured as follows:

- `data/`: Contains the datasets used for training and evaluation.
- `notebooks/`: Jupyter notebooks for data preprocessing, exploratory data analysis, and model training.
- `models/`: Saved models and model evaluation results.
- `scripts/`: Helper scripts and functions used in the project.
- `readme.md`: This file, providing an overview of the project.

## Data Sources

The data used in this project is sourced from multiple datasets:

1. `datafile4.csv`: Contains information on the cost of cultivation for different crops and states.
2. `datafile2.csv`: Provides data on crop production and yield over time.
3. `datafile.csv`: Contains additional information on crop growth over time.
4. `datafile3.csv`: contains additional data
5. `produce.csv`: contains production data in ton/mm.

## Installation

To run this project locally, you will need to install the required libraries. You can do this using the following steps:

> 1. Clone the repository: gh repo clone shadowfaxx1/upSkill-Campus
> 2. Install the required Python libraries: pip install -r requirements.txt

## Usage

To use the project, follow these steps:

1. Navigate to the main execuetable file and run the Jupyter notebooks in the specified order and > change the path of the datasets .
2. The notebooks cover data preprocessing, exploratory data analysis, and model training.
3. Evaluate the performance of different machine learning algorithms for crop yield prediction.

## Data Preprocessing

In the data preprocessing step, we handle missing values and clean the data to prepare it for analysis and model training.

## Exploratory Data Analysis (EDA)

EDA involves visualizing and analyzing the data to gain insights into the relationships between different variables and the target variable (crop yield).

## Model Training and Evaluation

We train different machine learning algorithms, including Decision Trees, Linear Regression, XGBoost, and Random Forest, to predict crop yields.
Evaluation metrics such as Mean Squared Error (MSE) and Mean Absolute Error (MAE) are used to assess the performance of the models.

## Results

The results of the models are visualized and compared to understand the effectiveness of each algorithm in predicting crop yields.

## Future Improvements

This project can be further improved in the following ways:

- Experiment with more advanced machine learning algorithms and hyperparameter tuning to improve prediction accuracy.
- Explore additional features or external datasets that could enhance model performance.
- Deploy the best-performing model as a web application for real-time predictions.

## Contributing

Contributions to this project are welcome. If you find any issues or have suggestions for improvements, please submit a pull request or open an issue.

## License

This project is licensed under the MIT License. See the `LICENSE` file for details.