{"id":34519528,"url":"https://github.com/adhishnanda/motion-based-german-learning-app","last_synced_at":"2026-04-07T20:32:36.388Z","repository":{"id":326007069,"uuid":"1103302936","full_name":"adhishnanda/motion-based-german-learning-app","owner":"adhishnanda","description":"AI-powered language learning app with gesture recognition (MediaPipe + ML/DL models), real-time interaction, spaced repetition, and full React/TypeScript UI. Demonstrates ML engineering, computer vision, and frontend expertise.","archived":false,"fork":false,"pushed_at":"2025-11-24T20:54:12.000Z","size":5496,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-11-28T08:28:34.028Z","etag":null,"topics":["capstone-project","computer-vision","data-science","deep-learning","gesture-recognition","interactive","interactive-learning","machine-learning","mediapipe","portfolio-project","pose-estimation","react","scikit-learn","tensorflow","typescript"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/adhishnanda.png","metadata":{"files":{"readme":"docs/README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-11-24T17:34:05.000Z","updated_at":"2025-11-24T21:01:38.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/adhishnanda/motion-based-german-learning-app","commit_stats":null,"previous_names":["adhishnanda/motion-based-german-learning-app"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/adhishnanda/motion-based-german-learning-app","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adhishnanda%2Fmotion-based-german-learning-app","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adhishnanda%2Fmotion-based-german-learning-app/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adhishnanda%2Fmotion-based-german-learning-app/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adhishnanda%2Fmotion-based-german-learning-app/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/adhishnanda","download_url":"https://codeload.github.com/adhishnanda/motion-based-german-learning-app/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adhishnanda%2Fmotion-based-german-learning-app/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31528454,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-07T16:28:08.000Z","status":"ssl_error","status_checked_at":"2026-04-07T16:28:06.951Z","response_time":105,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["capstone-project","computer-vision","data-science","deep-learning","gesture-recognition","interactive","interactive-learning","machine-learning","mediapipe","portfolio-project","pose-estimation","react","scikit-learn","tensorflow","typescript"],"created_at":"2025-12-24T04:36:04.244Z","updated_at":"2026-04-07T20:32:36.377Z","avatar_url":"https://github.com/adhishnanda.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Motion-Based Interactive German Learning App\r\n\r\nA **browser-based**, gesture-controlled German vocabulary learning application that enables **hands-free interaction** using **real-time computer vision** and **gesture recognition** via a webcam.\r\n\r\nThe project explores how **gesture-first interfaces** and **embodied interaction** can be applied to language learning, while carefully balancing **AI capability, latency, stability, and deployability**.\r\n\r\n---\r\n\r\n## ✨ What this app does?\r\n\r\nThe application allows users to control the learning experience using **hand gestures**, without relying on a keyboard or mouse:\r\n\r\n- Category selection\r\n- Learn mode (flashcards)\r\n- Test mode (MCQ-based quiz)\r\n- Result summary and retry\r\n- Help / gesture guide screen\r\n- Basic progress persistence\r\n\r\nThe goal is to make vocabulary learning more **active**, **engaging**, and **contactless**, especially in scenarios where traditional input methods are inconvenient.\r\n\r\n---\r\n\r\n## 🔥 Core idea: Hybrid AI system design\r\n\r\nThis project follows a **hybrid AI approach**, separating **runtime interaction** from **offline experimentation**.\r\n\r\n### Runtime (Browser)\r\n- Uses a **pretrained MediaPipe Gesture Recognizer** model for **real-time gesture inference**\r\n- Applies **confidence thresholding**, **temporal smoothing**, and **cooldown/debouncing** to reduce false positives\r\n- Uses **deterministic, context-aware gesture-to-action mapping**\r\n- Optimized for **low latency, stability, and usability**\r\n\r\n### Offline (Data science \u0026 ML/DL experiments)\r\n- A separate pipeline collects a **custom gesture dataset**\r\n- Multiple **ML and DL models** are trained and evaluated\r\n- Results are used to **analyze trade-offs** and **justify design decisions**\r\n- Offline models are **not deployed in the browser**, due to practical runtime constraints\r\n\r\nThis mirrors real-world AI system design:  \r\n\u003e the most accurate offline model is not always the best deployable solution.\r\n\r\n---\r\n\r\n## 🧠 Runtime gesture recognition (Browser)\r\n\r\n### Runtime pipeline\r\n1. Webcam stream via browser camera APIs  \r\n2. MediaPipe Gesture Recognizer (pretrained)  \r\n   - Outputs gesture label  \r\n   - Outputs confidence score  \r\n3. Filtering layer:\r\n   - Confidence thresholding  \r\n   - Temporal smoothing  \r\n   - Cooldown/debounce (~1000 ms)  \r\n4. Context-aware mapping:\r\n   - Gesture → action based on current screen  \r\n5. React-based UI update\r\n\r\nThis design prioritizes **robust interaction** over raw model complexity.\r\n\r\n---\r\n\r\n## ✋ Gesture vocabulary (example mapping)\r\n\r\n\u003e Exact mappings vary slightly by screen context (category, learn, test, results).\r\n\r\n| Gesture | Typical Use |\r\n|------|------------|\r\n| 👍 Thumbs Up | Next / navigate forward |\r\n| 👎 Thumbs Down | Previous / navigate backward |\r\n| ✋ Open Palm | Select / flip flashcard |\r\n| 👉 Pointing | Select MCQ option (test mode) |\r\n| ✊ Fist | Retake test |\r\n| ✌ Victory | Toggle help screen |\r\n| 🤟 “I Love You” | Return to category selection |\r\n\r\n---\r\n\r\n## 📚 Learning flows\r\n\r\n### Learn mode\r\n- Gesture-controlled flashcards\r\n- German → English vocabulary\r\n- Card flip and navigation via gestures\r\n- Smooth, distraction-free UI\r\n\r\n### Test mode\r\n- Multiple-choice questions\r\n- Gesture-based answer selection\r\n- Result visualization and retry flow\r\n\r\n### Summary \u0026 persistence\r\n- Stores basic progress and settings in `LocalStorage`\r\n- Enables session continuity without a backend\r\n\r\n---\r\n\r\n## 🧪 Offline ML / DL experiments (Data science pipeline)\r\n\r\nOffline experimentation was conducted to **study gesture classification in a controlled environment** and to understand the limits of different modeling approaches.\r\n\r\n### Dataset\r\n- ~3,000 labeled samples\r\n- Gesture classes:\r\n  - NEXT\r\n  - PREVIOUS\r\n  - SELECT\r\n  - REST\r\n\r\n### Feature engineering\r\n- Relative joint distances\r\n- Limb and joint angles\r\n- Normalized landmark coordinates\r\n- Symmetry and alignment features\r\n\r\n### Models evaluated\r\n- Random Forest\r\n- Support Vector Machine (SVM)\r\n- Logistic Regression\r\n- Gradient Boosting\r\n- Multi-Layer Perceptron (MLP)\r\n- Convolutional Neural Network (CNN)\r\n\r\n**Key takeaway:**  \r\nClassical ML models achieved very high accuracy in offline evaluation, and CNNs showed comparable performance at higher complexity. These results supported the decision to **keep ML/DL models offline** and rely on a pretrained runtime model for stability and deployability.\r\n\r\n---\r\n\r\n## ❓ Why Offline ML / DL Was NOT Deployed\r\n\r\n### Key insight from experiments\r\n- Offline ML / DL models achieved **very high accuracy** in controlled evaluation.\r\n- However, deploying custom ML / DL models directly in the browser would:\r\n  - increase latency\r\n  - increase system complexity\r\n  - reduce robustness for real-time interaction\r\n\r\n- The pretrained runtime gesture recognizer is already optimized for:\r\n  - speed\r\n  - stability\r\n  - browser compatibility\r\n\r\n➡️ **Final decision:**  \r\nUse offline ML / DL for **analysis and justification**, not for runtime inference.\r\n\r\n---\r\n\r\n## 🧰 Technology Stack\r\n\r\n### Runtime (Browser)\r\n- MediaPipe Gesture Recognizer (pretrained)\r\n- Browser Camera APIs (`getUserMedia`)\r\n- JavaScript (ES6+)\r\n- HTML / CSS\r\n- LocalStorage (lightweight persistence)\r\n\r\n\u003e The runtime system is intentionally **framework-agnostic**. The focus is on gesture recognition, stability logic, and interaction design rather than a specific UI framework.\r\n\r\n### Offline ML / DL (Experiments)\r\n- Python\r\n- NumPy, Pandas\r\n- scikit-learn\r\n- TensorFlow / Keras\r\n- Jupyter Notebook\r\n- Matplotlib / Seaborn\r\n\r\n\u003e Offline ML/DL is used for experimentation, evaluation, and system design justification, not for deployment.\r\n\r\n---\r\n\r\n## 📁 Repository Structure\r\n\r\n```text\r\nmotion-edu-app/\r\n├── frontend/                # Browser-based gesture-controlled learning app\r\n│   ├── src/                 # UI logic, gesture handling, screen flows\r\n│   └── assets/              # UI assets (icons, images, styles)\r\n├── ml/                      # Offline ML / DL experiments\r\n│   ├── notebooks/           # Jupyter notebooks for ML and DL pipelines\r\n│   │   ├── gesture_ml.ipynb\r\n│   │   └── gesture_dl.ipynb\r\n│   ├── scripts/             # Helper scripts for preprocessing and training\r\n│   └── requirements.txt     # Python dependencies for experiments\r\n├── data/                    # Vocabulary data and gesture datasets\r\n├── models/                  # Saved offline ML / DL models (experimental)\r\n└── README.md                # Project documentation\r\n```\r\n\r\n## 🧭 Repository Walkthrough (for Interviewers)\r\n\r\n### `frontend/`\r\nBrowser-based application responsible for:\r\n- capturing webcam input\r\n- running real-time gesture recognition\r\n- applying stability logic (confidence thresholding, temporal smoothing, cooldown)\r\n- mapping gestures to learning actions (category / learn / test / results)\r\n\r\n### `ml/`\r\nOffline experimentation work, including:\r\n- dataset loading and preprocessing\r\n- feature engineering\r\n- training and evaluation of multiple classical ML models and a CNN\r\n- analysis using confusion matrices and model comparison to inform design decisions\r\n\r\n### `data/`\r\n- German vocabulary resources\r\n- gesture datasets used for offline experimentation\r\n\r\n### `models/`\r\n- saved models from offline ML / DL experiments  \r\n- retained for analysis and documentation purposes\r\n\r\n---\r\n\r\n## 🧱 System Architecture \r\n### \u003cimg width=\"20\" height=\"20\" alt=\"image\" src=\"https://github.com/user-attachments/assets/40428f64-0f67-44e6-bd01-28210f3c3fea\" /\u003e Overview of the System\r\n\r\n```mermaid\r\nflowchart TB\r\n  subgraph R[Runtime in Browser]\r\n    CAM[Webcam stream] --\u003e MP[MediaPipe Gesture Recognizer]\r\n    MP --\u003e OUT[Gesture label and confidence]\r\n    OUT --\u003e FIL[Stability filter]\r\n    FIL --\u003e MAP[Context aware mapping]\r\n    MAP --\u003e UI[Web UI]\r\n    UI --\u003e LS[LocalStorage]\r\n  end\r\n\r\n  FILNOTE[Filtering includes threshold smoothing cooldown]\r\n  FIL -.-\u003e FILNOTE\r\n```\r\n**Runtime notes:**\r\n- **Stability filter** = confidence threshold + temporal smoothing + cooldown/debounce  \r\n- **Web UI** = category / learn / test / results screens (framework-agnostic)\r\n\r\n### 🧠 ML / DL Offline Pipeline (High-level)\r\n\r\n```mermaid\r\nflowchart TB\r\n  A[Data Collection via Webcam] --\u003e B[Landmark Extraction using MediaPipe]\r\n  B --\u003e C[Gesture Labeling NEXT PREV SELECT REST]\r\n  C --\u003e D[Gesture Dataset about 3000 samples]\r\n  D --\u003e E[Preprocessing cleaning and normalization]\r\n  E --\u003e F[Feature Engineering distances angles normalization symmetry]\r\n  F --\u003e G[Train Test Split or Cross Validation]\r\n  G --\u003e H1[Classical ML Models RF SVM LR GB MLP]\r\n  G --\u003e H2[Deep Learning CNN]\r\n  H1 --\u003e I[Evaluation accuracy and confusion matrix]\r\n  H2 --\u003e I\r\n  I --\u003e J[Model Comparison accuracy versus complexity]\r\n  J --\u003e K[System Decision pretrained runtime model]\r\n```\r\n**Notes:**\r\n- Dataset size ≈ 3,000 samples\r\n- Features include distances, joint angles, normalized coordinates, symmetry\r\n- Classical ML models achieved very high accuracy\r\n- CNN showed comparable performance at higher complexity\r\n- Final decision: keep ML/DL offline, use pretrained model at runtime\r\n\r\n### 🔁 Offline Experiment Flow - Sequence Diagram (Training + Evaluation)\r\n\r\n```mermaid\r\nsequenceDiagram\r\n  autonumber\r\n  participant Dev as Developer\r\n  participant Cap as Capture Script\r\n  participant MP as MediaPipe (Landmarks)\r\n  participant DS as Dataset (CSV/NPY)\r\n  participant FE as Feature Engineering\r\n  participant ML as ML/DL Training\r\n  participant EV as Evaluation (CV/CM)\r\n  participant DEC as Design Decision\r\n\r\n  Dev-\u003e\u003eCap: Start recording session\r\n  Cap-\u003e\u003eMP: Process frames\r\n  MP--\u003e\u003eCap: Landmarks (x,y,z + visibility)\r\n  Cap-\u003e\u003eDS: Save samples + labels (NEXT/PREV/SELECT/REST)\r\n  Dev-\u003e\u003eFE: Load dataset\r\n  FE-\u003e\u003eFE: Compute distances/angles/normalized features\r\n  FE-\u003e\u003eML: Train models (RF/SVM/LR/GB/MLP + CNN)\r\n  ML--\u003e\u003eEV: Predictions + metrics\r\n  EV--\u003e\u003eDev: Confusion matrix + CV accuracy\r\n  Dev-\u003e\u003eDEC: Select approach for runtime\r\n  DEC--\u003e\u003eDev: Use pretrained runtime model (stability + deployability)\r\n```\r\n\r\n### 🔄 Runtime Gesture Inference\r\n\r\n```mermaid\r\nsequenceDiagram\r\n  autonumber\r\n  participant U as User\r\n  participant Cam as Browser Camera API\r\n  participant MP as MediaPipe Gesture Recognizer\r\n  participant F as Filter Layer\r\n  participant M as Context Mapper\r\n  participant UI as App UI (Learn/Test/Results)\r\n  participant LS as LocalStorage\r\n\r\n  U-\u003e\u003eCam: Allow camera permission\r\n  Cam--\u003e\u003eMP: Video frames\r\n  MP--\u003e\u003eUI: gesture label + confidence\r\n  UI-\u003e\u003eF: raw prediction stream\r\n  F--\u003e\u003eM: stable gesture event (threshold + smoothing + cooldown)\r\n  M--\u003e\u003eUI: trigger action (context-aware mapping)\r\n  UI-\u003e\u003eLS: save progress/stats/settings\r\n  UI--\u003e\u003eU: updated screen (next card / select option / results)\r\n```\r\n\r\n### “Why hybrid?” — Decision diagram (what you tried vs what you shipped)\r\n\r\n```mermaid\r\nflowchart LR\r\n  A[Offline ML/DL looks great\u003cbr/\u003e~95–99%+ accuracy] --\u003e B{Deploy in browser?}\r\n  B --\u003e|Hard| C[Constraints: latency, integration, stability\u003cbr/\u003ebrowser runtime limits]\r\n  B --\u003e|Practical| D[Use pretrained model in browser\u003cbr/\u003eoptimized + low latency]\r\n  C --\u003e E[Keep offline pipeline for analysis\u003cbr/\u003efeature insights + comparison]\r\n  D --\u003e F[Ship stable gesture-first app\u003cbr/\u003ethreshold + smoothing + cooldown + mapping]\r\n  E --\u003e F\r\n```\r\n\r\n---\r\n\r\n## 🚀 Running the Project\r\n\r\nThe project is divided into runtime (browser) and offline ML experiments.\r\n\r\nRuntime (Browser-based application)\r\n- Open the project in a modern browser\r\n- Grant webcam permissions\r\n- Interact with the learning interface using hand gestures\r\n\r\n\u003e No backend server is required.\r\n\u003e All inference and interaction run locally in the browser.\r\n\r\n### 🧪 Offline ML / DL Experiments\r\n\r\nNavigate to the ML workspace (`cd ml`), create and activate a virtual environment using `python -m venv .venv`, activate it on Windows with `.venv\\Scripts\\activate`, install dependencies via `pip install -r requirements.txt`, and launch Jupyter using `jupyter notebook`.\r\n\r\nRun the following notebooks:\r\n- `gesture_ml.ipynb`\r\n- `gesture_dl.ipynb`\r\n\r\nThese notebooks cover:\r\n- dataset loading  \r\n- feature engineering  \r\n- model training  \r\n- evaluation and comparison  \r\n\r\n---\r\n\r\n## ⚖️ Ethical \u0026 Privacy Considerations\r\n\r\n- All video processing happens locally in the browser\r\n- No camera data is stored or transmitted\r\n- Gesture datasets used for offline experiments were anonymized\r\n- Intended strictly for academic and experimental purposes\r\n\r\n---\r\n\r\n## 📌 Key Takeaways\r\n\r\n- Demonstrates applied AI system design, not just model training\r\n- Shows how offline ML experiments can guide architectural decisions\r\n- Highlights real-world trade-offs between:\r\n  - accuracy\r\n  - latency\r\n  - interpretability\r\n  - deployability\r\n- Combines AI, Data Science, and HCI into a cohesive system\r\n\r\n---\r\n\r\n## 🔮 Future Work\r\n\r\n- Adaptive gesture models personalized per user\r\n- Lightweight in-browser ML inference (TensorFlow.js / TFLite)\r\n- Multimodal interaction (speech + gesture)\r\n- Formal user studies and learning outcome evaluation\r\n- Support for additional languages\r\n\r\n---\r\n\r\n## 🎯 What This Demonstrates?\r\n\r\n### 🧠 AI / ML Engineering\r\n- Full ML pipeline: **data → features → models → evaluation**\r\n- Classical ML + Deep Learning (**scikit-learn + TensorFlow**)\r\n- Confusion matrices \u0026 classification reports\r\n- Model comparison \u0026 selection\r\n- Offline experimentation \u0026 documentation\r\n- Understanding latency vs. accuracy trade-offs in gesture systems\r\n\r\n---\r\n\r\n### 👁 Computer Vision Engineering\r\n- Real-time **human pose estimation** with MediaPipe\r\n- Gesture classification (rule-based + ML models)\r\n- Landmark normalization, smoothing, temporal filtering\r\n- Low-latency inference optimizations\r\n- Interaction design for embodied learning\r\n\r\n---\r\n\r\n### 💻 Software Engineering\r\n- Full **React + TypeScript** architecture\r\n- Component-based UI design\r\n- Custom hooks \u0026 context providers\r\n- LocalStorage persistence (progress, stats, preferences)\r\n- Responsive UI, dark mode, animations\r\n- Telemetry export system\r\n\r\n---\r\n\r\n### 📚 Learning Science Integration\r\n- **Spaced repetition** algorithm\r\n- Difficulty ranking for flashcards\r\n- Lesson summary analytics\r\n- Embodied active recall (gestures + movement)\r\n\r\n---\r\n\r\n### 🎓 Relevant Roles This Project Fits\r\n- **Machine Learning Engineer**\r\n- **AI Engineer**\r\n- **Computer Vision Engineer**\r\n- **Data Scientist (Applied / Product)**\r\n- **Full-Stack ML Developer**\r\n\r\n---\r\n\r\n## 🧠 Project Summary\r\n\r\nWe built a gesture-controlled German learning app that runs entirely in the browser. At runtime, the system uses a pretrained gesture recognizer combined with confidence thresholds, temporal smoothing, and cooldown logic to ensure stable interaction.\r\n\r\nIn parallel, an offline ML / DL pipeline was developed where gesture data was collected, features were engineered, multiple classical ML models and a CNN were trained, and confusion matrices were analyzed. The key insight was that while offline ML achieved very high accuracy, deploying custom models in the browser would increase latency and complexity. Therefore, offline ML was used to guide design decisions, and a pretrained model was chosen for runtime robustness and deployability.\r\n\r\n\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fadhishnanda%2Fmotion-based-german-learning-app","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fadhishnanda%2Fmotion-based-german-learning-app","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fadhishnanda%2Fmotion-based-german-learning-app/lists"}