https://github.com/ryanga09/digitalent_fundamentaldatascience-selfpractice
A repository of hands-on projects from DigiTalentβs Fundamental Data Science training, covering web scraping, data exploration, data cleaning, and data annotation. Includes Jupyter notebooks and example code for practical learning.
https://github.com/ryanga09/digitalent_fundamentaldatascience-selfpractice
data data-analysis data-science data-visualization dataset digitalent komdigi notebook-jupyter notebooks
Last synced: 10 months ago
JSON representation
A repository of hands-on projects from DigiTalentβs Fundamental Data Science training, covering web scraping, data exploration, data cleaning, and data annotation. Includes Jupyter notebooks and example code for practical learning.
- Host: GitHub
- URL: https://github.com/ryanga09/digitalent_fundamentaldatascience-selfpractice
- Owner: RyanGA09
- Created: 2025-06-27T02:09:43.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2025-08-01T12:39:05.000Z (11 months ago)
- Last Synced: 2025-08-01T14:45:04.160Z (11 months ago)
- Topics: data, data-analysis, data-science, data-visualization, dataset, digitalent, komdigi, notebook-jupyter, notebooks
- Language: Jupyter Notebook
- Homepage:
- Size: 1.17 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# π DigiTalent Fundamental Data Science - Self Practice
## π
Created On
June 2025
## π Description
This repository contains hands-on exercises and learning materials from DigiTalentβs _Fundamental Data Science_ training. The focus topics include:
- π Data Scraping
Learn how to acquire data from various web sources using automated tools.
Subtopics:
- What is Data?
- Data Collection Methods
- Data Scraping Tools
- Data Integrity & Ethics
- Hands-on Practice through the included self-practice exercises
- π Data Exploration
Analyze and understand the structure and patterns in your data.
Subtopics:
- Data Understanding
- Reviewing Dataset Structure
- Data Validation Techniques
- Hands-on Practice through the included self-practice exercises
- π§Ή Data Cleansing
Clean and refine your dataset to ensure quality and reliability.
Subtopics:
- Data Cleaning Concepts
- Handling Missing & Duplicate Values
- Data Reduction Strategies
- Hands-on Practice through the included self-practice exercises
- π·οΈ Data Annotation
Prepare labeled datasets for use in supervised machine learning tasks.
Subtopics:
- Defining Labels & Categories
- Data Annotation Techniques
- Manual & Assisted Labeling Tools
- Hands-on Practice through the included self-practice exercises
## ποΈ Repository Structure
```bash
DigiTalentPractice-FundamentalDataScience/
βββ data/ # Contains raw/external datasets
β βββ Data_Nasabah.csv # Local dataset
β βββ train_prices.csv # Kaggle dataset (not included in repo)
β
βββ notebooks/ # Jupyter notebooks
β βββ self_practice-1.ipynb
β βββ self_practice-2.ipynb
β βββ self_practice-3.ipynb
β βββ self_practice-4.ipynb
β
βββ requirements.txt # Python dependencies
βββ README.md # Project overview and setup instructions
βββ .gitignore # Files/folders to exclude from version control
```
**β οΈ Note: data/train_prices.csv is downloaded via the Kaggle API and is not included in this repository. Make sure to download it manually before running related notebooks.**
## π How to Use
1. π₯ Clone this repository to your local machine:
```bash
git clone https://github.com/RyanGA09/DigiTalentPractice-FundamentalDataScience.git
```
2. π¦ Install the environment (recommended to use venv or conda):
```bash
pip install -r requirements.txt
```
3. π Open the notebook corresponding to the topic you want to learn and run the code cells sequentially.
## π¨βπ» Author
Ryan Gading Abdullah
[](https://github.com/RyanGA09)
[](https://gitlab.com/RyanGA09)
[](https://instagram.com/ryan_g._a)
[](https://www.linkedin.com/in/ryan-gading-abdullah/)