https://github.com/ai-forever/automated-sign-language-tutor

Automated Sign Language Tutor: A Dual-Language Real-Time Approach for RSL and ASL
https://github.com/ai-forever/automated-sign-language-tutor

asl demo-app rsl sign-language-recognition sign-language-translation

Last synced: about 1 month ago
JSON representation

Automated Sign Language Tutor: A Dual-Language Real-Time Approach for RSL and ASL

Host: GitHub
URL: https://github.com/ai-forever/automated-sign-language-tutor
Owner: ai-forever
License: apache-2.0
Created: 2025-07-04T15:40:52.000Z (3 months ago)
Default Branch: main
Last Pushed: 2025-07-05T10:04:55.000Z (3 months ago)
Last Synced: 2025-08-31T09:44:47.999Z (about 1 month ago)
Topics: asl, demo-app, rsl, sign-language-recognition, sign-language-translation
Language: Python
Homepage:
Size: 262 KB
Stars: 1
Watchers: 0
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Automated Sign Language Tutor Project

This is the official repository for the paper _"Automated Sign Language Tutor: A Dual-Language Real-Time Approach
for RSL and ASL"_.

Appearance and video of the stand operation:

Important Note: This repository contains a backend system for sign language recognition (Streaming Sign Recognition Engine and Controller Process components). The provided frontend (ws.html, ws.css, ws.js) serves as a basic example implementation to demonstrate backend functionality and is not a production-ready application. Developers should create their own frontend implementations tailored to specific use cases while using this backend as a recognition service.

It is also possible to use the backend with custom models.

## Key Features
1. **Two operating modes**:
- **LIVE**: Real-time gesture recognition
- **Training**: Mode for teaching the model new gestures
2. **Dual language support**: Russian and English interface and recognition models
3. **WebSocket interaction**: Client sends video stream, server returns recognized gestures
4. **Visual feedback**: Notification system for users
5. **Responsive interface**: Modern design, user-friendly

## Setup and Installation

### Prerequisites
- Python 3.7+
- Node.js (optional for frontend)
- Web camera

### Installing Dependencies
```bash
pip install -r requirements.txt
```

### Models Downloading
Download the models

- https://rndml-team-cv.obs.ru-moscow-1.hc.sbercloud.ru/rsl/demostand_models/tsm/ru/mobilenet_demostand_ru.onnx
- https://rndml-team-cv.obs.ru-moscow-1.hc.sbercloud.ru/rsl/demostand_models/tsm/en/mobilenet_demostand_en.onnx

and place them in the folder `models/checkpoints/`.

### Running the Server
```bash
python server_fapi.py
```

The server will run at `localhost:3003`.

### Docker Deployment
```bash
docker build -t sign-tutor .
docker run -it -d -v $PWD:/app -p 3003:3003 sign-tutor
```

## Using the Web Interface
1. Open `ws.html` in your browser
2. Click "Enable Camera" to access your webcam
3. After enabling the camera, the "Start Stream" button will become available
4. Select operating mode:
- **LIVE**: Real-time gesture recognition
- **Training**:
- Enter gesture name in the text field
- Click "Select Gesture"
- Perform the gesture in front of the camera
- The system will notify you when the gesture is recognized correctly
5. Switching interface and model language: Use the RU/EN buttons in the top right corner
6. The recognition result is displayed in the server console.

## Working with custom models

1. Place ONNX models for Russian and English in the `models/checkpoints/` folder

2. Update the configuration files:
- `models/config_ru.yaml` for the Russian model,
- `models/config_en.yaml` for the English model.

Use the available files as examples.

3. Update the sign class files:
- `models/constants_ru.py ` with the `classes' variable for Russian,
- `models/constants_en.py ` with the `classes' variable for English.

Use the available files as examples.

## System Architecture

### Client-Server Interaction
1. **Establishing connection**:
- Client opens WebSocket connection to `ws://localhost:3003/`
- Server initializes the default language model (Russian)

2. **Main workflow**:
```mermaid
sequenceDiagram
participant Client
participant Server
participant Model

Client->>Server: {"type": "LANGUAGE", "lang": "ru"}
Server->>Model: Initialize Russian model
Model-->>Server: Model ready
Server-->>Client: {"status": 200, "message": "Language changed to ru"}

Client->>Server: {"type": "MODE", "mode": "TRAINING"}
Server-->>Client: {"status": 200, "message": "New MODE TRAINING set correctly"}

Client->>Server: {"type": "GLOSS", "gloss": "привет"}
Server-->>Client: {"status": 200, "message": "New GLOSS привет set correctly"}

loop Each frame (30 FPS)
Client->>Server: {"type": "IMAGE", "image": "data:image/jpeg"}
Server->>Model: Process frame
Model-->>Server: Recognition result
alt Gesture recognized
Server-->>Client: {"text": "привет", "type": "WORD"}
end
end
```

### Key Components

**Client (Frontend)**:
- `view/ws.html`: Main HTML interface file
- `view/ws.css`: Interface styles
- `view/ws.js`: Camera and WebSocket logic

**Server (Backend)**:
- `server_fapi.py`: Main server code (FastAPI)
- `models/model.py`: Recognition model logic
- `Runner`: Class for managing video processing flow
- `RecognitionMP`: Process for gesture recognition in a separate thread

**Model**:
- ONNX models for gesture recognition
- Configuration files for Russian and English versions
- Gesture class files for each language

## Implementation Details

### Client-Side
1. **Camera management**:
- Webcam access via WebRTC
- Frame capture and base64 encoding
- Frame sending at specified frequency (30 FPS)

2. **Mode management**:
- Smooth switching between LIVE and Training modes
- Dynamic UI element display based on current mode

3. **Localization**:
- Full support for Russian and English
- Language preference persistence between sessions

4. **Feedback**:
- Animated notification system
- Visual confirmation of user actions
- Error handling and display

### Server-Side
1. **Video processing**:
- Base64 decoding to OpenCV image
- Frame preprocessing for neural network
- Frame buffering for sequence analysis

2. **Model management**:
- Dynamic model loading for different languages
- Multithreaded processing to minimize latency
- Proper resource cleanup when switching languages

3. **Gesture recognition**:
- Sequence analysis for gesture recognition
- Threshold filtering to reduce false positives

## Configuration and Customization

### Changing Parameters
1. **Frame rate**: Change the `FPS` value in `ws.js`
2. **Video resolution**: Change `width` and `height` attributes of the `` element in `ws.html`
3. **Recognition threshold**: Change the `threshold` value in model configuration files

### Adding New Languages
1. Create a new configuration file `models/config_.yaml`
2. Add a gesture class file `models/constants_.py`
3. Update translation dictionaries in `ws.js`:
```javascript
const translations = {
// ...
: {
title: "...",
startWebcam: "...",
// ... other texts
}
}
```

## Troubleshooting

### Common Issues
1. **Camera not working**:
- Check browser permissions
- Ensure no other applications are using the camera
- Try reloading the page

2. **No connection to server**:
- Ensure the server is running (`python server_fapi.py`)
- Check WebSocket address in `ws.js`
- Ensure no firewall blocking

3. **Model not loading**:
- Check model paths in configuration files
- Ensure model files exist
- Verify content of gesture class files

### Logging
1. **Client**: Open browser developer console (F12)
2. **Server**: Logs are printed in the server console

## Authors
— [Petr Surovtsev]()
— [Alexander Nagaev](https://github.com/nagadit)
— [Alexander Kapitanov](https://github.com/hukenovs/)
— [Ilya Ovodov](https://github.com/IlyaOvodov)

## License
This project is licensed under the APACHE License. See the LICENSE file for details.

For questions and suggestions, please use the project's Issues section.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ai-forever/automated-sign-language-tutor

Awesome Lists containing this project

README