https://github.com/shubham0204/ondevice-face-recognition-android
On-device customizable face recognition in Android with FaceNet and an embedded vector database
https://github.com/shubham0204/ondevice-face-recognition-android
android android-application face-recognition facenet kotlin mediapipe objectbox tensorflow-lite
Last synced: 7 months ago
JSON representation
On-device customizable face recognition in Android with FaceNet and an embedded vector database
- Host: GitHub
- URL: https://github.com/shubham0204/ondevice-face-recognition-android
- Owner: shubham0204
- License: apache-2.0
- Created: 2024-06-05T03:04:14.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-03-02T14:37:46.000Z (7 months ago)
- Last Synced: 2025-03-02T15:31:09.613Z (7 months ago)
- Topics: android, android-application, face-recognition, facenet, kotlin, mediapipe, objectbox, tensorflow-lite
- Language: Kotlin
- Homepage: https://medium.com/proandroiddev/building-on-device-face-recognition-in-android-076a40dbaac6
- Size: 46 MB
- Stars: 56
- Watchers: 3
- Forks: 9
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
# On-Device Face Recognition In Android
> A simple Android app that performs on-device face recognition by comparing FaceNet embeddings against a vector database of user-given faces
> Download the APK from the [Releases](https://github.com/shubham0204/OnDevice-Face-Recognition-Android/releases)
## Updates
* 2024-09: Add face-spoof detection which uses FASNet from [minivision-ai/Silent-Face-Anti-Spoofing](https://github.com/minivision-ai/Silent-Face-Anti-Spoofing)
* 2024-07: Add latency metrics on the main screen. It shows the time taken (in milliseconds) to perform face detection, face embedding and vector search.## Goals
* Produce on-device face embeddings with FaceNet and use them to perform face recognition on a user-given set of images
* Store face-embedding and other metadata on-device and use vector-search to determine nearest-neighbors
* Use modern Android development practices and recommended architecture guidelines while maintaining code simplicity and modularity## Setup
> Download the APK from the [Releases](https://github.com/shubham0204/OnDevice-Face-Recognition-Android/releases)
Clone the `main` branch,
```bash
$> git clone --depth=1 https://github.com/shubham0204/OnDevice-Face-Recognition-Android
```Perform a Gradle sync, and run the application.
### Choosing the FaceNet model
The app provides two FaceNet models differing in the size of the embedding they provide. `facenet.tflite` outputs a 128-dimensional embedding and `facenet_512.tflite` a 512-dimensional embedding. In [FaceNet.kt](https://github.com/shubham0204/OnDevice-Face-Recognition-Android/blob/main/app/src/main/java/com/ml/shubham0204/facenet_android/domain/embeddings/FaceNet.kt), you may change the model by modifying the path of the TFLite model,
```kotlin
// facenet
interpreter =
Interpreter(FileUtil.loadMappedFile(context, "facenet.tflite"), interpreterOptions)// facenet-512
interpreter =
Interpreter(FileUtil.loadMappedFile(context, "facenet_512.tflite"), interpreterOptions)
```For change `embeddingDims` in the same file,
```kotlin
// facenet
private val embeddingDim = 128// facenet-512
private val embeddingDim = 512
```Then, in [DataModels.kt](https://github.com/shubham0204/OnDevice-Face-Recognition-Android/blob/main/app/src/main/java/com/ml/shubham0204/facenet_android/data/DataModels.kt), change the dimensions of the `faceEmbedding` attribute,
```kotlin
@Entity
data class FaceImageRecord(
// primary-key of `FaceImageRecord`
@Id var recordID: Long = 0,// personId is derived from `PersonRecord`
@Index var personID: Long = 0,var personName: String = "",
// the FaceNet-512 model provides a 512-dimensional embedding
// the FaceNet model provides a 128-dimensional embedding
@HnswIndex(dimensions = 512)
var faceEmbedding: FloatArray = floatArrayOf()
)
```## Working

We use the [FaceNet](https://arxiv.org/abs/1503.03832) model, which given a 160 * 160 cropped face image, produces an embedding of 128 or 512 elements capturing facial features that uniquely identify the face. We represent the embedding model as a function $M$ that accepts a cropped face image and returns a vector/embedding/list of FP numbers.
1. When users select an image, the app uses MLKit's `FaceDetector` to crop faces from the image. Each image is labelled with the person's name. See [`MLKitFaceDetector.kt`](https://github.com/shubham0204/OnDevice-Face-Recognition-Android/blob/main/app/src/main/java/com/ml/shubham0204/facenet_android/domain/face_detection/MLKitFaceDetector.kt).
2. Each cropped face is transformed into a vector/embedding with FaceNet. See [`FaceNet.kt`](https://github.com/shubham0204/OnDevice-Face-Recognition-Android/blob/main/app/src/main/java/com/ml/shubham0204/facenet_android/domain/embeddings/FaceNet.kt).
3. We store these face embeddings in a vector database, that enables a faster nearest-neighbor search.
4. Now, in the camera preview, for each frame, we perform face detection with MLKit's `FaceDetector` as in (1) and produce face embeddings for the face as in (2). We compare this face embedding (query vector) with those present in the vector database, and determines the name/label of the embedding (nearest-neighbor) closest to the query vector using cosine similarity.
5. The vector database performs a lossy compression on the embeddings stored in it, and hence the distance returned with the nearest-neighbor is also an estimate. Hence, we re-compute the cosine similarity between the nearest-neighbor vector and the query vector. See [`ImageVectorUseCase.kt`](https://github.com/shubham0204/OnDevice-Face-Recognition-Android/blob/main/app/src/main/java/com/ml/shubham0204/facenet_android/domain/ImageVectorUseCase.kt)## Tools
1. [TensorFlow Lite](https://ai.google.dev/edge/lite) as a runtime to execute the FaceNet model
2. [Mediapipe Face Detection](https://ai.google.dev/edge/mediapipe/solutions/vision/face_detector/android) to crop faces from the image
3. [ObjectBox](https://objectbox.io) for on-device vector-store and NoSQL database## Discussion
### Implementing face-liveness detection
> See [issue #1](https://github.com/shubham0204/OnDevice-Face-Recognition-Android/issues/1)
Face-liveness detection is the process of determining if the face captured in the camera frame is real or a spoof (photo, 3D model etc.). There are many techniques to perform face-liveness detection, the simplest ones being smile or wink detection. These are effective against static spoofs (pictures or 3D models) but do not hold for videos.
While exploring the [deepface](https://github.com/serengil/deepface) library, I discovered that it had implemented an *anti-spoof* detection system using the PyTorch models from [Silent-Face-Anti-Spoofing](https://github.com/minivision-ai/Silent-Face-Anti-Spoofing) repository. It uses the combination of two models that operate on two different scales of the same image. The model is penalized for classification-loss (cross-entropy loss) and the difference between the Fourier transform and the intermediate features from the CNN.
The models used by the `deepface` library (same as in the `Silent-Face-Anti-Spoofing`) are in the PyTorch format. The project already uses the TFLite runtime for executing the FaceNet model, and adding any other DL runtime would lead to unnecessary bloating of the application.
I converted the PT models to TFLite using this notebook: https://github.com/shubham0204/OnDevice-Face-Recognition-Android/blob/main/resources/Liveness_PT_Model_to_TF.ipynb
### How does this project differ from my earlier [`FaceRecognition_With_FaceNet_Android`](https://github.com/shubham0204/FaceRecognition_With_FaceNet_Android) project?
The [FaceRecognition_With_FaceNet_Android](https://github.com/shubham0204/FaceRecognition_With_FaceNet_Android) is a similar project initiated in 2020 and re-iterated several times since then. Here are the key similarities and differences with this project:
#### Similarities
1. Use FaceNet and FaceNet-512 models executed with TensorFlow Lite
2. Perform on-device face-recognition on a user-given dataset of images#### Differences
1. Uses ObjectBox to store face embeddings and perform nearest-neighbor search.
2. Does not read a directory from the file-system, instead allows the user to select a group of photos and *label* them with name of a person
3. Considers only the nearest-neighbor to infer the identify of a person in the live camera-feed
4. Uses the Mediapipe Face Detector instead of MLKit