https://github.com/adhishnanda/motion-based-german-learning-app
AI-powered language learning app with gesture recognition (MediaPipe + ML/DL models), real-time interaction, spaced repetition, and full React/TypeScript UI. Demonstrates ML engineering, computer vision, and frontend expertise.
https://github.com/adhishnanda/motion-based-german-learning-app
capstone-project computer-vision data-science deep-learning gesture-recognition interactive interactive-learning machine-learning mediapipe portfolio-project pose-estimation react scikit-learn tensorflow typescript
Last synced: 1 day ago
JSON representation
AI-powered language learning app with gesture recognition (MediaPipe + ML/DL models), real-time interaction, spaced repetition, and full React/TypeScript UI. Demonstrates ML engineering, computer vision, and frontend expertise.
- Host: GitHub
- URL: https://github.com/adhishnanda/motion-based-german-learning-app
- Owner: adhishnanda
- License: mit
- Created: 2025-11-24T17:34:05.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2025-11-24T20:54:12.000Z (5 months ago)
- Last Synced: 2025-11-28T08:28:34.028Z (4 months ago)
- Topics: capstone-project, computer-vision, data-science, deep-learning, gesture-recognition, interactive, interactive-learning, machine-learning, mediapipe, portfolio-project, pose-estimation, react, scikit-learn, tensorflow, typescript
- Language: Jupyter Notebook
- Homepage:
- Size: 5.24 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: docs/README.md
- License: LICENSE
Awesome Lists containing this project
README
# Motion-Based Interactive German Learning App
A **browser-based**, gesture-controlled German vocabulary learning application that enables **hands-free interaction** using **real-time computer vision** and **gesture recognition** via a webcam.
The project explores how **gesture-first interfaces** and **embodied interaction** can be applied to language learning, while carefully balancing **AI capability, latency, stability, and deployability**.
---
## โจ What this app does?
The application allows users to control the learning experience using **hand gestures**, without relying on a keyboard or mouse:
- Category selection
- Learn mode (flashcards)
- Test mode (MCQ-based quiz)
- Result summary and retry
- Help / gesture guide screen
- Basic progress persistence
The goal is to make vocabulary learning more **active**, **engaging**, and **contactless**, especially in scenarios where traditional input methods are inconvenient.
---
## ๐ฅ Core idea: Hybrid AI system design
This project follows a **hybrid AI approach**, separating **runtime interaction** from **offline experimentation**.
### Runtime (Browser)
- Uses a **pretrained MediaPipe Gesture Recognizer** model for **real-time gesture inference**
- Applies **confidence thresholding**, **temporal smoothing**, and **cooldown/debouncing** to reduce false positives
- Uses **deterministic, context-aware gesture-to-action mapping**
- Optimized for **low latency, stability, and usability**
### Offline (Data science & ML/DL experiments)
- A separate pipeline collects a **custom gesture dataset**
- Multiple **ML and DL models** are trained and evaluated
- Results are used to **analyze trade-offs** and **justify design decisions**
- Offline models are **not deployed in the browser**, due to practical runtime constraints
This mirrors real-world AI system design:
> the most accurate offline model is not always the best deployable solution.
---
## ๐ง Runtime gesture recognition (Browser)
### Runtime pipeline
1. Webcam stream via browser camera APIs
2. MediaPipe Gesture Recognizer (pretrained)
- Outputs gesture label
- Outputs confidence score
3. Filtering layer:
- Confidence thresholding
- Temporal smoothing
- Cooldown/debounce (~1000 ms)
4. Context-aware mapping:
- Gesture โ action based on current screen
5. React-based UI update
This design prioritizes **robust interaction** over raw model complexity.
---
## โ Gesture vocabulary (example mapping)
> Exact mappings vary slightly by screen context (category, learn, test, results).
| Gesture | Typical Use |
|------|------------|
| ๐ Thumbs Up | Next / navigate forward |
| ๐ Thumbs Down | Previous / navigate backward |
| โ Open Palm | Select / flip flashcard |
| ๐ Pointing | Select MCQ option (test mode) |
| โ Fist | Retake test |
| โ Victory | Toggle help screen |
| ๐ค โI Love Youโ | Return to category selection |
---
## ๐ Learning flows
### Learn mode
- Gesture-controlled flashcards
- German โ English vocabulary
- Card flip and navigation via gestures
- Smooth, distraction-free UI
### Test mode
- Multiple-choice questions
- Gesture-based answer selection
- Result visualization and retry flow
### Summary & persistence
- Stores basic progress and settings in `LocalStorage`
- Enables session continuity without a backend
---
## ๐งช Offline ML / DL experiments (Data science pipeline)
Offline experimentation was conducted to **study gesture classification in a controlled environment** and to understand the limits of different modeling approaches.
### Dataset
- ~3,000 labeled samples
- Gesture classes:
- NEXT
- PREVIOUS
- SELECT
- REST
### Feature engineering
- Relative joint distances
- Limb and joint angles
- Normalized landmark coordinates
- Symmetry and alignment features
### Models evaluated
- Random Forest
- Support Vector Machine (SVM)
- Logistic Regression
- Gradient Boosting
- Multi-Layer Perceptron (MLP)
- Convolutional Neural Network (CNN)
**Key takeaway:**
Classical ML models achieved very high accuracy in offline evaluation, and CNNs showed comparable performance at higher complexity. These results supported the decision to **keep ML/DL models offline** and rely on a pretrained runtime model for stability and deployability.
---
## โ Why Offline ML / DL Was NOT Deployed
### Key insight from experiments
- Offline ML / DL models achieved **very high accuracy** in controlled evaluation.
- However, deploying custom ML / DL models directly in the browser would:
- increase latency
- increase system complexity
- reduce robustness for real-time interaction
- The pretrained runtime gesture recognizer is already optimized for:
- speed
- stability
- browser compatibility
โก๏ธ **Final decision:**
Use offline ML / DL for **analysis and justification**, not for runtime inference.
---
## ๐งฐ Technology Stack
### Runtime (Browser)
- MediaPipe Gesture Recognizer (pretrained)
- Browser Camera APIs (`getUserMedia`)
- JavaScript (ES6+)
- HTML / CSS
- LocalStorage (lightweight persistence)
> The runtime system is intentionally **framework-agnostic**. The focus is on gesture recognition, stability logic, and interaction design rather than a specific UI framework.
### Offline ML / DL (Experiments)
- Python
- NumPy, Pandas
- scikit-learn
- TensorFlow / Keras
- Jupyter Notebook
- Matplotlib / Seaborn
> Offline ML/DL is used for experimentation, evaluation, and system design justification, not for deployment.
---
## ๐ Repository Structure
```text
motion-edu-app/
โโโ frontend/ # Browser-based gesture-controlled learning app
โ โโโ src/ # UI logic, gesture handling, screen flows
โ โโโ assets/ # UI assets (icons, images, styles)
โโโ ml/ # Offline ML / DL experiments
โ โโโ notebooks/ # Jupyter notebooks for ML and DL pipelines
โ โ โโโ gesture_ml.ipynb
โ โ โโโ gesture_dl.ipynb
โ โโโ scripts/ # Helper scripts for preprocessing and training
โ โโโ requirements.txt # Python dependencies for experiments
โโโ data/ # Vocabulary data and gesture datasets
โโโ models/ # Saved offline ML / DL models (experimental)
โโโ README.md # Project documentation
```
## ๐งญ Repository Walkthrough (for Interviewers)
### `frontend/`
Browser-based application responsible for:
- capturing webcam input
- running real-time gesture recognition
- applying stability logic (confidence thresholding, temporal smoothing, cooldown)
- mapping gestures to learning actions (category / learn / test / results)
### `ml/`
Offline experimentation work, including:
- dataset loading and preprocessing
- feature engineering
- training and evaluation of multiple classical ML models and a CNN
- analysis using confusion matrices and model comparison to inform design decisions
### `data/`
- German vocabulary resources
- gesture datasets used for offline experimentation
### `models/`
- saved models from offline ML / DL experiments
- retained for analysis and documentation purposes
---
## ๐งฑ System Architecture
###
Overview of the System
```mermaid
flowchart TB
subgraph R[Runtime in Browser]
CAM[Webcam stream] --> MP[MediaPipe Gesture Recognizer]
MP --> OUT[Gesture label and confidence]
OUT --> FIL[Stability filter]
FIL --> MAP[Context aware mapping]
MAP --> UI[Web UI]
UI --> LS[LocalStorage]
end
FILNOTE[Filtering includes threshold smoothing cooldown]
FIL -.-> FILNOTE
```
**Runtime notes:**
- **Stability filter** = confidence threshold + temporal smoothing + cooldown/debounce
- **Web UI** = category / learn / test / results screens (framework-agnostic)
### ๐ง ML / DL Offline Pipeline (High-level)
```mermaid
flowchart TB
A[Data Collection via Webcam] --> B[Landmark Extraction using MediaPipe]
B --> C[Gesture Labeling NEXT PREV SELECT REST]
C --> D[Gesture Dataset about 3000 samples]
D --> E[Preprocessing cleaning and normalization]
E --> F[Feature Engineering distances angles normalization symmetry]
F --> G[Train Test Split or Cross Validation]
G --> H1[Classical ML Models RF SVM LR GB MLP]
G --> H2[Deep Learning CNN]
H1 --> I[Evaluation accuracy and confusion matrix]
H2 --> I
I --> J[Model Comparison accuracy versus complexity]
J --> K[System Decision pretrained runtime model]
```
**Notes:**
- Dataset size โ 3,000 samples
- Features include distances, joint angles, normalized coordinates, symmetry
- Classical ML models achieved very high accuracy
- CNN showed comparable performance at higher complexity
- Final decision: keep ML/DL offline, use pretrained model at runtime
### ๐ Offline Experiment Flow - Sequence Diagram (Training + Evaluation)
```mermaid
sequenceDiagram
autonumber
participant Dev as Developer
participant Cap as Capture Script
participant MP as MediaPipe (Landmarks)
participant DS as Dataset (CSV/NPY)
participant FE as Feature Engineering
participant ML as ML/DL Training
participant EV as Evaluation (CV/CM)
participant DEC as Design Decision
Dev->>Cap: Start recording session
Cap->>MP: Process frames
MP-->>Cap: Landmarks (x,y,z + visibility)
Cap->>DS: Save samples + labels (NEXT/PREV/SELECT/REST)
Dev->>FE: Load dataset
FE->>FE: Compute distances/angles/normalized features
FE->>ML: Train models (RF/SVM/LR/GB/MLP + CNN)
ML-->>EV: Predictions + metrics
EV-->>Dev: Confusion matrix + CV accuracy
Dev->>DEC: Select approach for runtime
DEC-->>Dev: Use pretrained runtime model (stability + deployability)
```
### ๐ Runtime Gesture Inference
```mermaid
sequenceDiagram
autonumber
participant U as User
participant Cam as Browser Camera API
participant MP as MediaPipe Gesture Recognizer
participant F as Filter Layer
participant M as Context Mapper
participant UI as App UI (Learn/Test/Results)
participant LS as LocalStorage
U->>Cam: Allow camera permission
Cam-->>MP: Video frames
MP-->>UI: gesture label + confidence
UI->>F: raw prediction stream
F-->>M: stable gesture event (threshold + smoothing + cooldown)
M-->>UI: trigger action (context-aware mapping)
UI->>LS: save progress/stats/settings
UI-->>U: updated screen (next card / select option / results)
```
### โWhy hybrid?โ โ Decision diagram (what you tried vs what you shipped)
```mermaid
flowchart LR
A[Offline ML/DL looks great
~95โ99%+ accuracy] --> B{Deploy in browser?}
B -->|Hard| C[Constraints: latency, integration, stability
browser runtime limits]
B -->|Practical| D[Use pretrained model in browser
optimized + low latency]
C --> E[Keep offline pipeline for analysis
feature insights + comparison]
D --> F[Ship stable gesture-first app
threshold + smoothing + cooldown + mapping]
E --> F
```
---
## ๐ Running the Project
The project is divided into runtime (browser) and offline ML experiments.
Runtime (Browser-based application)
- Open the project in a modern browser
- Grant webcam permissions
- Interact with the learning interface using hand gestures
> No backend server is required.
> All inference and interaction run locally in the browser.
### ๐งช Offline ML / DL Experiments
Navigate to the ML workspace (`cd ml`), create and activate a virtual environment using `python -m venv .venv`, activate it on Windows with `.venv\Scripts\activate`, install dependencies via `pip install -r requirements.txt`, and launch Jupyter using `jupyter notebook`.
Run the following notebooks:
- `gesture_ml.ipynb`
- `gesture_dl.ipynb`
These notebooks cover:
- dataset loading
- feature engineering
- model training
- evaluation and comparison
---
## โ๏ธ Ethical & Privacy Considerations
- All video processing happens locally in the browser
- No camera data is stored or transmitted
- Gesture datasets used for offline experiments were anonymized
- Intended strictly for academic and experimental purposes
---
## ๐ Key Takeaways
- Demonstrates applied AI system design, not just model training
- Shows how offline ML experiments can guide architectural decisions
- Highlights real-world trade-offs between:
- accuracy
- latency
- interpretability
- deployability
- Combines AI, Data Science, and HCI into a cohesive system
---
## ๐ฎ Future Work
- Adaptive gesture models personalized per user
- Lightweight in-browser ML inference (TensorFlow.js / TFLite)
- Multimodal interaction (speech + gesture)
- Formal user studies and learning outcome evaluation
- Support for additional languages
---
## ๐ฏ What This Demonstrates?
### ๐ง AI / ML Engineering
- Full ML pipeline: **data โ features โ models โ evaluation**
- Classical ML + Deep Learning (**scikit-learn + TensorFlow**)
- Confusion matrices & classification reports
- Model comparison & selection
- Offline experimentation & documentation
- Understanding latency vs. accuracy trade-offs in gesture systems
---
### ๐ Computer Vision Engineering
- Real-time **human pose estimation** with MediaPipe
- Gesture classification (rule-based + ML models)
- Landmark normalization, smoothing, temporal filtering
- Low-latency inference optimizations
- Interaction design for embodied learning
---
### ๐ป Software Engineering
- Full **React + TypeScript** architecture
- Component-based UI design
- Custom hooks & context providers
- LocalStorage persistence (progress, stats, preferences)
- Responsive UI, dark mode, animations
- Telemetry export system
---
### ๐ Learning Science Integration
- **Spaced repetition** algorithm
- Difficulty ranking for flashcards
- Lesson summary analytics
- Embodied active recall (gestures + movement)
---
### ๐ Relevant Roles This Project Fits
- **Machine Learning Engineer**
- **AI Engineer**
- **Computer Vision Engineer**
- **Data Scientist (Applied / Product)**
- **Full-Stack ML Developer**
---
## ๐ง Project Summary
We built a gesture-controlled German learning app that runs entirely in the browser. At runtime, the system uses a pretrained gesture recognizer combined with confidence thresholds, temporal smoothing, and cooldown logic to ensure stable interaction.
In parallel, an offline ML / DL pipeline was developed where gesture data was collected, features were engineered, multiple classical ML models and a CNN were trained, and confusion matrices were analyzed. The key insight was that while offline ML achieved very high accuracy, deploying custom models in the browser would increase latency and complexity. Therefore, offline ML was used to guide design decisions, and a pretrained model was chosen for runtime robustness and deployability.