Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sfikas/zah
An image descriptor, suitable for keyword spotting, that is computed as a function of Convolutional Neural Network intermediate layer responses. Implemented in MATLAB.
https://github.com/sfikas/zah
convolutional-networks keyword-spotting matconvnet neural-networks
Last synced: 3 days ago
JSON representation
An image descriptor, suitable for keyword spotting, that is computed as a function of Convolutional Neural Network intermediate layer responses. Implemented in MATLAB.
- Host: GitHub
- URL: https://github.com/sfikas/zah
- Owner: sfikas
- License: mit
- Created: 2016-10-11T14:02:27.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2020-04-13T13:22:02.000Z (about 4 years ago)
- Last Synced: 2024-03-12T12:31:40.167Z (4 months ago)
- Topics: convolutional-networks, keyword-spotting, matconvnet, neural-networks
- Language: MATLAB
- Homepage:
- Size: 26.1 MB
- Stars: 2
- Watchers: 1
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Lists
- awesome-MATLAB - Zoning Aggregated Hypercolumns for Keyword Spotting - An image descriptor, suitable for keyword spotting, that is computed as a function of Convolutional Neural Network intermediate layer responses. (Image Processing and Computer Vision)
README
# Zoning Aggregated Hypercolumns for Keyword Spotting #
Zoning Aggregated Hypercolumn features (ZAH features) are introduced with this work.
Motivated by recent research in machine vision, we use an appropriately pretrained convolutional network as a feature extraction tool.
The [convolutional network] are trained with [matconvnet] on a large collection of word images.
The resulting local cues are subsequently aggregated to form *word-level fixed-length descriptors*.The Euclidean distance can then be used to compare and query resulting descriptors of different word images (Query-by-Example keyword spotting).
If you find this work useful, please read and cite the [related paper]:
```
@inproceedings{sfikas2016zoning,
title={Zoning Aggregated Hypercolumns for Keyword Spotting},
author={Sfikas, Giorgos and Retsinas, Giorgos and Gatos, Basilis},
booktitle={15th International Conference on Frontiers in Handwriting Recognition (ICFHR)},
year={2016},
organization={IEEE}
}
```## ZAH Workflow
The workflow is:
1. The (normalised) image is split into zones
2. Hypercolumn features are computed for each of the zones, using a pretrained convolutional neural network
3. Hypercolumns are aggregated into a single feature vector per zone
4. Per-zone features are concatenated into a single feature vector, which therefore describes the whole word imageThe workflow is summarized in the following figure. A word image is in the input (top), and a vector is returned at the output (bottom):
![workflow](https://github.com/sfikas/zah/blob/master/img/workflow.png "ZAH Workflow")
## Before running the code
First you will have to compile some of the code with matlab mex, and optionally enable GPU support:
* In ```pretrained/matconvnet/Makefile```, change the MEX variable appropriately. It should point to the path of the ```mex``` executable in your system.
For example this could be something similar to ```/usr/local/MATLAB/R2012a/bin/mex``` .
* (optional) Set ENABLE_GPU in the same file in order to use the GPU for extracting ZAH features.
* Run ```cd pretrained/matconvnet/ && make distclean && make``` on the OS shell.On the MATLAB prompt, add all repo subfolders to the path, by running the following:
* ```cd zah/```
* ```addpath(genpath('.'))```Note that it is important that you execute ```addpath``` *after* having finished compiling the necessary items with MEX.
## Running the code
In order to compute the ZAH descriptor of an input image, run
```
descriptor = extractAggregatedHypercolumns_zoning('img/1/1.jpg');
```After the input file argument, the parameters are:
* modelchoice
* 0 Use the [unigram model](https://github.com/sfikas/zah/blob/master/pretrained/models/charnet_layers.mat)
* 1 Use the [bigram model](https://github.com/sfikas/zah/blob/master/pretrained/models/bigramsvtnet_layers.mat) (default choice)
* 2 Use both* layerchoice: Choose layers to use. You can select more than one layer. We have run trials with one or more of layers among the following: ```3, 6, 11, 16``` (default choice is ```11```).
* centerprior: Prior that makes pixels near the center row more important. Input is the Gaussian precision. Zero precision corresponds to no smoothing. Default value is 6.
* resizeheight: Resize word image to this height. This should ideally be a value close to 24, ie the window with which the related CNN was originally trained with. Default value is 30.For example, the following command will extract a ZAH descriptor using only the unigram-trained CNN model, use activations of layers 3 and 6, apply a centerprior with precision equal to 3 and resize input to a height of 24 pixels:
```
descriptor = extractAggregatedHypercolumns_zoning('img/1/1.jpg', 0, [3 6], 3, 24);
```## Batch extraction
Multiple images can be processed with ```batch_extract_zoning.m```. For example:
```
batchExtract_zoning('img/1/');
```All files with extension '.jpg' that are found in the given folder will be processed.
If ```batchExtract_zoning``` is run without arguments, three files will be created, containing the result:
```
dimensions.txt
distance.txt
filenames.txt
```
The file ```dimensions.txt``` contains a single integer value. That is the dimensionality of the extracted per-word descriptors.
The file ```distance.txt``` contains one descriptor on each line.
The file ```filenames.txt``` gives the correspondence between lines in ```distance.txt``` and filenames.## Acknowledgements
In the current work we make use of this third-party code/material:
* Two pretrained CNN models from [this work]. See the related [license].
* [matconvnet] code to perform feed-forward passes on the pretrained CNN models.[related paper]:
[here]:
[this work]:
[convolutional network]:
[matconvnet]:
[license]: