{"id":20338035,"url":"https://github.com/sfikas/zah","last_synced_at":"2025-07-16T11:10:47.074Z","repository":{"id":114793718,"uuid":"70599450","full_name":"sfikas/zah","owner":"sfikas","description":"An image descriptor, suitable for keyword spotting, that is computed as a function of Convolutional Neural Network intermediate layer responses. Implemented in MATLAB.","archived":false,"fork":false,"pushed_at":"2020-04-13T13:22:02.000Z","size":27403,"stargazers_count":2,"open_issues_count":1,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-01-14T17:35:47.954Z","etag":null,"topics":["convolutional-networks","keyword-spotting","matconvnet","neural-networks"],"latest_commit_sha":null,"homepage":"","language":"MATLAB","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sfikas.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2016-10-11T14:02:27.000Z","updated_at":"2022-01-13T07:39:10.000Z","dependencies_parsed_at":null,"dependency_job_id":"ba9c1509-d6d1-46ce-88d3-d120b46f0c56","html_url":"https://github.com/sfikas/zah","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sfikas%2Fzah","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sfikas%2Fzah/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sfikas%2Fzah/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sfikas%2Fzah/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sfikas","download_url":"https://codeload.github.com/sfikas/zah/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241863071,"owners_count":20032930,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["convolutional-networks","keyword-spotting","matconvnet","neural-networks"],"created_at":"2024-11-14T21:11:23.383Z","updated_at":"2025-03-04T14:27:16.728Z","avatar_url":"https://github.com/sfikas.png","language":"MATLAB","funding_links":[],"categories":["Image Processing and Computer Vision"],"sub_categories":[],"readme":"# Zoning Aggregated Hypercolumns for Keyword Spotting #\r\n\r\nZoning Aggregated Hypercolumn features (ZAH features) are introduced with this work.\r\nMotivated by recent research in machine vision, we use an appropriately pretrained convolutional network as a feature extraction tool.\r\nThe [convolutional network] are trained with [matconvnet] on a large collection of word images.\r\nThe resulting local cues are subsequently aggregated to form *word-level fixed-length descriptors*.\r\n\r\nThe Euclidean distance can then be used to compare and query resulting descriptors of different word images (Query-by-Example keyword spotting).\r\n\r\nIf you find this work useful, please read and cite the [related paper]:\r\n```\r\n@inproceedings{sfikas2016zoning,\r\n  title={Zoning Aggregated Hypercolumns for Keyword Spotting},\r\n  author={Sfikas, Giorgos and Retsinas, Giorgos and Gatos, Basilis},\r\n  booktitle={15th International Conference on Frontiers in Handwriting Recognition (ICFHR)},\r\n  year={2016},\r\n  organization={IEEE}\r\n}\r\n```\r\n\r\n## ZAH Workflow\r\n\r\nThe workflow is:\r\n\r\n1. The (normalised) image is split into zones\r\n2. Hypercolumn features are computed for each of the zones, using a pretrained convolutional neural network\r\n3. Hypercolumns are aggregated into a single feature vector per zone \r\n4. Per-zone features are concatenated into a single feature vector, which therefore describes the whole word image\r\n\r\nThe workflow is summarized in the following figure. A word image is in the input (top), and a vector is returned at the output (bottom):\r\n\r\n![workflow](https://github.com/sfikas/zah/blob/master/img/workflow.png \"ZAH Workflow\")\r\n\r\n## Before running the code\r\n\r\nFirst you will have to compile some of the code with matlab mex, and optionally enable GPU support: \r\n\r\n* In ```pretrained/matconvnet/Makefile```, change the MEX variable appropriately. It should point to the path of the ```mex``` executable in your system. \r\nFor example this could be something similar to ```/usr/local/MATLAB/R2012a/bin/mex``` .\r\n* (optional) Set ENABLE_GPU in the same file in order to use the GPU for extracting ZAH features.\r\n* Run ```cd pretrained/matconvnet/ \u0026\u0026 make distclean \u0026\u0026 make``` on the OS shell.\r\n\r\nOn the MATLAB prompt, add all repo subfolders to the path, by running the following:\r\n\r\n* ```cd zah/```\r\n* ```addpath(genpath('.'))```\r\n\r\nNote that it is important that you execute ```addpath``` *after* having finished compiling the necessary items with MEX.\r\n\r\n## Running the code\r\n\r\nIn order to compute the ZAH descriptor of an input image, run\r\n\r\n```\r\ndescriptor = extractAggregatedHypercolumns_zoning('img/1/1.jpg');\r\n```\r\n\r\nAfter the input file argument, the parameters are: \r\n\r\n* modelchoice\r\n    * 0           Use the [unigram model](https://github.com/sfikas/zah/blob/master/pretrained/models/charnet_layers.mat)\r\n    * 1           Use the [bigram model](https://github.com/sfikas/zah/blob/master/pretrained/models/bigramsvtnet_layers.mat) (default choice)\r\n    * 2           Use both\r\n\r\n* layerchoice:  Choose layers to use. You can select more than one layer. We have run trials with one or more of layers among the following: ```3, 6, 11, 16``` (default choice is ```11```).\r\n* centerprior:  Prior that makes pixels near the center row more important. Input is the Gaussian precision. Zero precision corresponds to no smoothing. Default value is 6.\r\n* resizeheight: Resize word image to this height. This should ideally be a value close to 24, ie the window with which the related CNN was originally trained with. Default value is 30.\r\n\r\nFor example, the following command will extract a ZAH descriptor using only the unigram-trained CNN model, use activations of layers 3 and 6, apply a centerprior with precision equal to 3 and resize input to a height of 24 pixels:\r\n\r\n```\r\ndescriptor = extractAggregatedHypercolumns_zoning('img/1/1.jpg', 0, [3 6], 3, 24);\r\n```\r\n\r\n## Batch extraction\r\n\r\nMultiple images can be processed with ```batch_extract_zoning.m```. For example:\r\n\r\n```\r\nbatchExtract_zoning('img/1/');\r\n```\r\n\r\nAll files with extension '.jpg' that are found in the given folder will be processed.\r\n\r\nIf ```batchExtract_zoning``` is run without arguments, three files will be created, containing the result:\r\n```\r\ndimensions.txt\r\ndistance.txt\r\nfilenames.txt\r\n```\r\nThe file ```dimensions.txt``` contains a single integer value. That is the dimensionality of the extracted per-word descriptors.\r\nThe file ```distance.txt``` contains one descriptor on each line.\r\nThe file ```filenames.txt``` gives the correspondence between lines in ```distance.txt``` and filenames.\r\n\r\n## Acknowledgements\r\n\r\nIn the current work we make use of this third-party code/material:\r\n\r\n* Two pretrained CNN models from [this work]. See the related [license].\r\n* [matconvnet] code to perform feed-forward passes on the pretrained CNN models.\r\n\r\n\r\n[related paper]: \u003chttp://www.cs.uoi.gr/~sfikas/2016ICFHR-ZAH.pdf\u003e\r\n[here]: \u003chttps://bitbucket.org/jaderberg/eccv2014_textspotting\u003e\r\n[this work]: \u003chttp://www.robots.ox.ac.uk/~vgg/publications/2014/Jaderberg14/jaderberg14.pdf\u003e\r\n[convolutional network]: \u003chttp://www.robots.ox.ac.uk/~vgg/publications/2014/Jaderberg14/jaderberg14.pdf\u003e\r\n[matconvnet]: \u003chttp://www.vlfeat.org/matconvnet/\u003e\r\n[license]: \u003chttps://github.com/sfikas/zah/blob/master/pretrained/LICENSE\u003e\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsfikas%2Fzah","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsfikas%2Fzah","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsfikas%2Fzah/lists"}