https://github.com/bumble-tech/private-detector

Bumble's Private Detector - a pretrained model for detecting lewd images
https://github.com/bumble-tech/private-detector

bumble efficientnet image-classification tensorflow

Last synced: about 1 month ago
JSON representation

Bumble's Private Detector - a pretrained model for detecting lewd images

Host: GitHub
URL: https://github.com/bumble-tech/private-detector
Owner: bumble-tech
License: apache-2.0
Created: 2022-10-01T09:09:51.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2023-11-05T22:14:36.000Z (over 1 year ago)
Last Synced: 2025-05-16T05:02:04.121Z (about 1 month ago)
Topics: bumble, efficientnet, image-classification, tensorflow
Language: Python
Homepage: https://medium.com/bumble-tech/bumble-inc-open-sources-private-detector-and-makes-another-step-towards-a-safer-internet-for-women-8e6cdb111d81
Size: 60.5 KB
Stars: 1,328
Watchers: 10
Forks: 98
Open Issues: 5
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Private Detector

This is the repo for Bumble's *Private Detector*™ model - an image classifier that can detect lewd images.

The internal repo has been heavily refactored and released as a fully open-source project to allow for the wider community to use and finetune a Private Detector model of their own. You can download the pretrained SavedModel, [Frozen Model](https://github.com/bumble-tech/private-detector/issues/7) and checkpoint [here](https://storage.googleapis.com/private_detector/private_detector_with_frozen.zip)

## Model

The SavedModel can be found in `saved_model/` within `private_detector.zip` above

The model is based on Efficientnet-v2 and trained on our internal dataset of lewd images - more information can be found at the whitepaper [here](https://bumble.com/en/the-buzz/bumble-open-source-private-detector-ai-cyberflashing-dick-pics) or [here](https://medium.com/bumble-tech/bumble-inc-open-sources-private-detector-and-makes-another-step-towards-a-safer-internet-for-women-8e6cdb111d81)

## Inference

Inference is pretty simple and an example has been given in `inference.py`. The model is released as a SavedModel so it can be deployed in many different ways, but here's a quick runthrough of one way to get it working for those less familiar with Python/Tensorflow.

First you need to install [Python](https://www.python.org/downloads/) and [Conda](https://conda.io/projects/conda/en/latest/user-guide/install/index.html) on your system and go to the Terminal/Command Prompt on your machine

Then you can use the `environment.yaml` file to install the necessary packages to run the inference.

```sh

conda env create -f environment.yaml

conda activate private_detector

```

Once that's set up, you can run the inference script. Simply replace the sample `.jpg` file paths below with your own

```sh

python3 inference.py \

    --model saved_model/ \

    --image_paths \

        Yes_samples/1.jpg \

        Yes_samples/2.jpg \

        Yes_samples/3.jpg \

        Yes_samples/4.jpg \

        Yes_samples/5.jpg \

        No_samples/1.jpg \

        No_samples/2.jpg \

        No_samples/3.jpg \

        No_samples/4.jpg \

        No_samples/5.jpg \

```

Sample Output



    Probability: 93.71% - Yes_samples/1.jpg

    Probability: 93.43% - Yes_samples/2.jpg

    Probability: 94.06% - Yes_samples/3.jpg

    Probability: 94.08% - Yes_samples/4.jpg

    Probability: 91.01% - Yes_samples/5.jpg

    Probability: 9.76% - No_samples/1.jpg

    Probability: 7.14% - No_samples/2.jpg

    Probability: 8.83% - No_samples/3.jpg

    Probability: 4.87% - No_samples/4.jpg

    Probability: 5.29% - No_samples/5.jpg



## Serving

See [Tensorflow Serving example](deployments/tensorflow-serving/README.md)

## Additional Training

You can finetune the model yourself on your own data, to do so is fairly simple - though you will need the checkpoint files as can be found in `saved_checkpoint/` in `private_detector.zip`

Set up a JSON file with links to your image path lists for each class:

```json

{

    "Yes": {

        "path": "/home/sofarrell/private_detector/Yes.txt",

        "label": 0

    },

    "No": {

         "path": "/home/sofarrell/private_detector/No.txt",

         "label": 1

    }

}

```

With each `.txt` file listing off the image paths to your images

```txt

/home/sofarrell/private_detector_images/Yes/1093840880_309463828.jpg

/home/sofarrell/private_detector_images/Yes/657954182_3459624.jpg

/home/sofarrell/private_detector_images/Yes/1503714421_3048734.jpg

```

You can create the training environment with conda:

```sh

conda env create -f environment.yaml

conda activate private_detector

```

And then retrain like so:

```sh

python3 ./train.py \

    --train_json /home/sofarrell/private_detector/train_classes.json \

    --eval_json /home/sofarrell/private_detector/eval_classes.json \

    --checkpoint_dir saved_checkpoint/ \

    --train_id retrained_private_detector

```

The training script has several parameters that can be tweaked:

|Command|Description|Type|Default|

|---|---|---|---|

|`train_id`|ID for this particular training run|str||

|`train_json`|JSON file(s) which describes classes and contains lists of filenames of data files|List[str]||

|`eval_json`|Validation json file which describes classes and contains lists of filenames of data files|str||

|`num_epochs`|Number of epochs to train for|int||

|`batch_size`|Number of images to process in a batch|int|`64`|

|`checkpoint_dir`|Directory to store checkpoints in|str||

|`model_dir`|Directory to store graph in|str|`.`|

|`data_format`|Data format: [channels_first, channels_last]|str|`channels_last`|

|`initial_learning_rate`|Initial learning rate|float|`1e-4`|

|`min_learning_rate`|Minimal learning rate|float|`1e-6`|

|`min_eval_metric`|Minimal evaluation metric to start saving models|float|`0.01`|

|`float_dtype`|Float Dtype to use in image tensors: [16, 32]|int|`16`|

|`steps_per_train_epoch`|Number of steps per train epoch|int|`800`|

|`steps_per_eval_epoch`|Number of steps per evaluation epoch|int|`1`|

|`reset_on_lr_update`|Whether to reset to the best model after learning rate update|bool|`False`|

|`rotation_augmentation`|Rotation augmentation angle, value <= 0 disables it|float|`0`|

|`use_augmentation`|Add speckle, v0, random or color distortion augmentation|str||

|`scale_crop_augmentation`|Resize image to the model's size times this scale and then randomly crop needed size|float|`1.4`|

|`reg_loss_weight`|L2 regularization weight|float|`0`|

|`skip_saving_epochs`|Do not save good checkpoint and update best metric for this number of the first epochs|int|`0`|

|`sequential`|Use sequential run over randomly shuffled filenames vs equal sampling from each class|bool|`False`|

|`eval_threshold`|Threshold above which to consider a prediction positive for evaluation|float|`0.5`|

|`epochs_lr_update`|Maximum number of epochs without improvement used to reset/decrease learning rate|int|`20`|

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/bumble-tech/private-detector

Awesome Lists containing this project

README