https://github.com/harshramani00/human-action-recognition

A Human Action Recognition (HAR) model combining 3D CNN and LSTM networks to accurately recognize actions in videos using spatial-temporal feature extraction. Trained on UCF-50 and outperforming existing architectures.
https://github.com/harshramani00/human-action-recognition

3d-cnn computer-vision deep-learning human-action-recognition lstm machine-learning python spatial-temporal tensorflow ucf50-dataset video-classification

Last synced: about 1 year ago
JSON representation

Host: GitHub
URL: https://github.com/harshramani00/human-action-recognition
Owner: harshramani00
Created: 2025-03-07T14:26:49.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-03-07T14:36:56.000Z (over 1 year ago)
Last Synced: 2025-06-20T04:38:38.817Z (about 1 year ago)
Topics: 3d-cnn, computer-vision, deep-learning, human-action-recognition, lstm, machine-learning, python, spatial-temporal, tensorflow, ucf50-dataset, video-classification
Language: Jupyter Notebook
Homepage:
Size: 3.65 MB
Stars: 1
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Swift-Spatio Flow: A Human Action Recognition Model Using 3D CNN-LSTM

## 📌 Project Overview

**Swift-Spatio Flow** is an advanced **Human Action Recognition (HAR)** model that combines **3D Convolutional Neural Networks (3D CNN)** with **Long Short-Term Memory (LSTM)** networks. This project aims to improve action recognition in videos by efficiently extracting **spatial and temporal features** while reducing computational cost.

🚀 **Key Applications**

- CCTV surveillance enhancement

- Assisting the visually impaired

- Self-driving cars

- Sports analytics

## 🎯 Problem Statement

Existing HAR models suffer from:

- **Complexity:** High computational cost

- **Accuracy:** Difficulty in handling low-quality videos

- **Scalability:** Struggle with large datasets

**Swift-Spatio Flow** addresses these challenges by integrating a **3D CNN and LSTM** to extract spatial and temporal features efficiently.

## 📊 Methodology

1. **Preprocessing:** 

   - Extract frames from videos

   - Resize and normalize images

   - Convert frames into sequences

2. **Model Architecture:** 

   - 3D CNN for feature extraction

   - LSTM for sequence modeling

   - Softmax activation for classification

3. **Training & Evaluation:**

   - Dataset: **UCF-50**

   - Metrics: **Accuracy, Precision, Recall, F1-score**

   - Comparison with existing models

## 🏆 Results

| Model                 | Accuracy (%) | Precision (%) | Recall (%) | F1 Score (%) |

|----------------------|-------------|-------------|-----------|-------------|

| CNN + LSTM          | 76.12       | 75.94       | 74.17     | 75.86       |

| ConvLSTM2D         | 78.95       | 78.74       | 76.14     | 78.68       |

| Time Distributed CNN | 88.50       | 88.00       | 87.52     | 87.71       |

| 3D CNN (UCF-101)    | 91.65       | 89.96       | 90.82     | 91.10       |

| **Swift-Spatio Flow** | **94.89**  | **94.37**  | **93.45** | **93.56** |

## 🔮 Future Enhancements

Train on larger datasets like Kinetics for better generalization

Optimize computational cost for real-time performance

Deploy the model as a web application

## 🤝 Contributors

- Ian Joseph K

- Aryan Patil (https://github.com/aryanator)

- Abhishek Raje

- Ramani Harsh Anilkumar

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/harshramani00/human-action-recognition

Awesome Lists containing this project

README