https://github.com/camilajaviera91/bagging-with-kaggle
Code in which an initial approach to decision trees and bagging will be made, and an attempt will be made to ensure that the model can be trained with any dataset coming from Kaggle (for this, we will again use the 'connect with Kaggle' project).
https://github.com/camilajaviera91/bagging-with-kaggle
accuracy-score bagging-classifier curses decision-tree-classifier kaggle labelencoder pandas python simpleimputer sklearn-library train-test-split
Last synced: about 1 month ago
JSON representation
Code in which an initial approach to decision trees and bagging will be made, and an attempt will be made to ensure that the model can be trained with any dataset coming from Kaggle (for this, we will again use the 'connect with Kaggle' project).
- Host: GitHub
- URL: https://github.com/camilajaviera91/bagging-with-kaggle
- Owner: CamilaJaviera91
- Created: 2024-12-04T17:37:57.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-12-14T02:31:50.000Z (10 months ago)
- Last Synced: 2025-04-12T21:11:59.111Z (6 months ago)
- Topics: accuracy-score, bagging-classifier, curses, decision-tree-classifier, kaggle, labelencoder, pandas, python, simpleimputer, sklearn-library, train-test-split
- Language: Python
- Homepage:
- Size: 523 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# bagging-with-kaggle
## General Machine Learning Pipeline with Bagging Classifier
This project implements a machine learning pipeline to analyze and predict survival on the Titanic dataset. It leverages a Bagging Classifier with decision trees to enhance model robustness. The project includes data preprocessing, model training, evaluation, and saving the results.
## Features
- Interactive column selection for preprocessing.
- Handles missing values and encodes categorical variables.
- Implements Bagging Classifier with decision trees.
- Exports the processed dataset to .csv and Google Sheets.## Prerequisites
Before running the code, ensure you have the following:
- Python 3.8+
- Kaggle API credentials for downloading the dataset.
- Necessary Python libraries (see requirements.txt).
- Access to Google Sheets API (if using the csv_to_sheets function).## Instalation
### 1. Clone this repository
```
git clone https://github.com//.git
cd
```### 2. Intall required Python libraries
```
pip install -r requirements.txt
```### 3. Set up the Kaggle API:
- Download your kaggle.json file from [Kaggle API](https://www.kaggle.com/docs/api).
- Place it in the appropriate directory (~/.kaggle on Unix or %USERPROFILE%\.kaggle on Windows).### 4. Configure Google Sheets API:
- Follow [Google Sheets API documentation](https://developers.google.com/sheets/api/guides/concepts) to set up credentials.
- Place the credentials in the project directory.## Usage
### 1. Run the main script:
```
python bagging.py
```### Fetch a dataset from Kaggle:
- When prompted, enter a search term to find datasets on Kaggle (e.g., "Titanic", "Housing Prices").
- A list of datasets matching your search will be displayed. For example:

- Enter the number corresponding to the dataset you want to download.

### 3. Specify a folder to save the dataset:
- Enter a name for a new folder where the dataset will be downloaded and unzipped.

### 4. Dataset selection:
- If the downloaded dataset contains multiple *.csv files*, the script will load the first *.csv file* by default.
- The dataset is automatically loaded into a *Pandas DataFrame.*

### 5. Follow the prompts in bagging.py:
- Interactively select columns for analysis.
- Handle missing values automatically.
- Specify the target column (dependent variable).

### 6. Model Training and Evaluation:
- The script splits the data into training and testing sets.
- Trains a Bagging Classifier using decision trees.
- Accuracy on the test set is displayed in the console.
### 7. Save Results:
- Processed data is saved as a *.csv file* in the *save/* directory.

- Optionally, upload the dataset to Google Sheets using the *google_sheets_utils.py* script.

### 8. Create a Looker Studio with googlesheets:
- Looker Studio: [Example - Titanic Survival Rate](https://lookerstudio.google.com/reporting/9296e179-5f35-42b4-92d8-d36b4dee1999/page/ZgkYE)

## Acknowledgments
- Kaggle datasets: [Kaggle Datasets.](https://www.kaggle.com/datasets)
- Scikit-learn: [Scikit-learn Documentation.](https://scikit-learn.org/stable/)
- Pandas: [Pandas Documentation.](https://pandas.pydata.org/docs/)
- Curses: [Curses Documentation.](https://docs.python.org/3/library/curses.html)