An open API service indexing awesome lists of open source software.

https://github.com/ryanga09/digitalent_fundamentaldatascience-selfpractice

A repository of hands-on projects from DigiTalent’s Fundamental Data Science training, covering web scraping, data exploration, data cleaning, and data annotation. Includes Jupyter notebooks and example code for practical learning.
https://github.com/ryanga09/digitalent_fundamentaldatascience-selfpractice

data data-analysis data-science data-visualization dataset digitalent komdigi notebook-jupyter notebooks

Last synced: 10 months ago
JSON representation

A repository of hands-on projects from DigiTalent’s Fundamental Data Science training, covering web scraping, data exploration, data cleaning, and data annotation. Includes Jupyter notebooks and example code for practical learning.

Awesome Lists containing this project

README

          

# πŸ“Š DigiTalent Fundamental Data Science - Self Practice

## πŸ“… Created On

June 2025

## πŸ“œ Description

This repository contains hands-on exercises and learning materials from DigiTalent’s _Fundamental Data Science_ training. The focus topics include:

- 🌐 Data Scraping
Learn how to acquire data from various web sources using automated tools.
Subtopics:

- What is Data?
- Data Collection Methods
- Data Scraping Tools
- Data Integrity & Ethics
- Hands-on Practice through the included self-practice exercises

- πŸ“ˆ Data Exploration
Analyze and understand the structure and patterns in your data.
Subtopics:

- Data Understanding
- Reviewing Dataset Structure
- Data Validation Techniques
- Hands-on Practice through the included self-practice exercises

- 🧹 Data Cleansing
Clean and refine your dataset to ensure quality and reliability.
Subtopics:

- Data Cleaning Concepts
- Handling Missing & Duplicate Values
- Data Reduction Strategies
- Hands-on Practice through the included self-practice exercises

- 🏷️ Data Annotation
Prepare labeled datasets for use in supervised machine learning tasks.
Subtopics:
- Defining Labels & Categories
- Data Annotation Techniques
- Manual & Assisted Labeling Tools
- Hands-on Practice through the included self-practice exercises

## πŸ—‚οΈ Repository Structure

```bash
DigiTalentPractice-FundamentalDataScience/
β”œβ”€β”€ data/ # Contains raw/external datasets
β”‚ β”œβ”€β”€ Data_Nasabah.csv # Local dataset
β”‚ └── train_prices.csv # Kaggle dataset (not included in repo)
β”‚
β”œβ”€β”€ notebooks/ # Jupyter notebooks
β”‚ β”œβ”€β”€ self_practice-1.ipynb
β”‚ β”œβ”€β”€ self_practice-2.ipynb
β”‚ β”œβ”€β”€ self_practice-3.ipynb
β”‚ └── self_practice-4.ipynb
β”‚
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ README.md # Project overview and setup instructions
└── .gitignore # Files/folders to exclude from version control

```

**⚠️ Note: data/train_prices.csv is downloaded via the Kaggle API and is not included in this repository. Make sure to download it manually before running related notebooks.**

## πŸš€ How to Use

1. πŸ“₯ Clone this repository to your local machine:

```bash
git clone https://github.com/RyanGA09/DigiTalentPractice-FundamentalDataScience.git
```

2. πŸ“¦ Install the environment (recommended to use venv or conda):

```bash
pip install -r requirements.txt
```

3. πŸ“˜ Open the notebook corresponding to the topic you want to learn and run the code cells sequentially.

## πŸ‘¨β€πŸ’» Author

Ryan Gading Abdullah

[![GitHub](https://img.shields.io/badge/GitHub-000000?style=for-the-badge&logo=github&logoColor=white)](https://github.com/RyanGA09)
[![GitLab](https://img.shields.io/badge/GitLab-FC6D26?style=for-the-badge&logo=gitlab&logoColor=white)](https://gitlab.com/RyanGA09)
[![Instagram](https://img.shields.io/badge/Instagram-E4405F?style=for-the-badge&logo=instagram&logoColor=white)](https://instagram.com/ryan_g._a)
[![LinkedIn](https://img.shields.io/badge/LinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/ryan-gading-abdullah/)