https://github.com/meituan-dianping/vision-ml

A R-CNN machine learning model for handling Pop-up window in mobile Apps.
https://github.com/meituan-dianping/vision-ml

keras mobile-testing popup-window r-cnn vision

Last synced: about 2 months ago
JSON representation

A R-CNN machine learning model for handling Pop-up window in mobile Apps.

Host: GitHub
URL: https://github.com/meituan-dianping/vision-ml
Owner: Meituan-Dianping
License: mit
Created: 2019-02-20T10:32:33.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2022-06-21T22:30:27.000Z (almost 3 years ago)
Last Synced: 2025-03-25T20:11:16.742Z (2 months ago)
Topics: keras, mobile-testing, popup-window, r-cnn, vision
Language: Python
Homepage:
Size: 4.62 MB
Stars: 77
Watchers: 8
Forks: 25
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        
Vision-ml


[![Build Status](https://www.travis-ci.org/Meituan-Dianping/vision-ml.svg?branch=master)](https://www.travis-ci.org/Meituan-Dianping/vision-ml)

![GitHub](https://img.shields.io/badge/Python-3.6-blue)

![GitHub](https://img.shields.io/github/license/Meituan-Dianping/Vision)

See also [Vision-ui](https://github.com/Meituan-Dianping/vision-ui) a series algorithms for mobile UI testing.

> A R-CNN (Region-based Convolutional Neural Networks) machine learning model for handling pop-up window in mobile apps.

## Mobile UI Recognition

**Vision-ml** is a machine learning model that identifies the UI element that closes the Pop-up window and return its UI coordinate (x, y) on the screen.

A typical usage scenario would be:

- In mobile testing, when using [Appium](http://appium.io/) or similar framework for UI automation, it is usually very tricky to locate the components on the Pop-up window which is rendered on top of the current screen.

- Input a mobile App screenshot with the Pop-up, and you will get the predicted result (as shown in the blue box).

| 1                              | 2                                | 3                                  |

| ------------------------------ | -------------------------------- | -----------------------------------|

| ![](predict/predict_1.png)     | ![](predict/predict_2.png)       | ![](predict/predict_3.png)         |

## Requirements

Python3.6.x

```bash

# create venv before install requirements

pip install  -r requirements.txt

```

## Usage

You can use Vision with a pre-trained model in "model/trained_model_1.h5", the number in the file name

is for version control, you can update it in file named "all_config".

There are two ways of using Vision.

### Predict an image with Python script

1. Update your file path in "rcnn_predict.py"

```python

model_predict("path/to/image.png", view=True)

```

2. Run script and you will get the result

```bash

python rcnn_predict.py

```

### Predict an image with a web server

1. Start the web server

You can create server with Dockerfile

```bash

python vision_server.py

```

2. Post image to web server

```bash

curl  http://localhost:9092/client/vision  -F "file=@${IMAGE_PATH}.png"

```

3. The response from the web server will have the coordinate or the UI element, alone with a value of score 0 or 1.0 (0 means not found, 1.0 means found).

```json

{

  "code": 0,

  "data": {

    "position": [

      618,

      1763

    ],

    "score": 1.0

  }

}

```

## Train your own model

- You can choose to use your train image if your close button has different feature from the given training set. Just take a screenshot and put it in the "image/" folder

- Rename the train image with prefix "1" for close button and "0" for background.

- You can refer to the given training images in the repo for examples.

Button image named 1_1.png:

![](image/1_1.png)

Background image named 0_3.png:

![](image/0_3.png)

1. There are some images in this repo for training.

```

0_0.png 0_1.png 0_2.png 0_3.png 0_4.png 0_5.png 0_6.png 1_0.png 1_1.png 1_2.png 1_3.png 1_4.png 1_5.png 1_6.png

```

2. Get augmentation of your image, run method in "rcnn_train.py"

```python

Image().get_augmentation()

```

3. Train your image, run method in "rcnn_train.py"

```python

train_model()

```

## Model layers and params

### Model layers

- Model input image will be processed from 3d to 1d and pixel value will be set to 255 or 0, which makes the model has great classification.

- There are 5 layers and the 196,450 parameters in total, which is a light model and makes the training easier and the model is also robust for different Pop-up windows.

```

_________________________________________________________________

Layer (type)                 Output Shape              Param #   

=================================================================

conv2d_1 (Conv2D)            (None, 48, 48, 32)        320       

_________________________________________________________________

conv2d_2 (Conv2D)            (None, 46, 46, 32)        9248      

_________________________________________________________________

max_pooling2d_1 (MaxPooling2 (None, 23, 23, 32)        0         

_________________________________________________________________

conv2d_3 (Conv2D)            (None, 21, 21, 64)        18496     

_________________________________________________________________

max_pooling2d_2 (MaxPooling2 (None, 10, 10, 64)        0         

_________________________________________________________________

conv2d_4 (Conv2D)            (None, 8, 8, 64)          36928     

_________________________________________________________________

max_pooling2d_3 (MaxPooling2 (None, 4, 4, 64)          0         

_________________________________________________________________

dropout_1 (Dropout)          (None, 4, 4, 64)          0         

_________________________________________________________________

flatten_1 (Flatten)          (None, 1024)              0         

_________________________________________________________________

dense_1 (Dense)              (None, 128)               131200    

_________________________________________________________________

dense_2 (Dense)              (None, 2)                 258       

=================================================================

Total params: 196,450

Trainable params: 196,450

Non-trainable params: 0

```

### Training params

In all_config.py we have training params of batch_size and epochs

- **batch_size** is the  number of training image

for updating the model

- **epochs** is the number for training the model with all the training image.

## Performance

With CPU of [email protected]:

- Training a model takes 30s with 10 epochs

- A mobile Pop-up window screen shot with 1080p takes 10s for calculating.

## Reference

The R-CNN model refers to this [paper](https://arxiv.org/pdf/1311.2524.pdf).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/meituan-dianping/vision-ml

Awesome Lists containing this project

README

Vision-ml