{"id":17923840,"url":"https://github.com/kevinzakka/spatial-transformer-network","last_synced_at":"2025-05-16T04:03:38.682Z","repository":{"id":41113889,"uuid":"78271665","full_name":"kevinzakka/spatial-transformer-network","owner":"kevinzakka","description":"A Tensorflow implementation of Spatial Transformer Networks.","archived":false,"fork":false,"pushed_at":"2018-06-02T22:13:17.000Z","size":4239,"stargazers_count":999,"open_issues_count":19,"forks_count":267,"subscribers_count":20,"default_branch":"master","last_synced_at":"2025-05-10T16:18:14.893Z","etag":null,"topics":["affine-transformation","attention","convnet","spatial-transformer-network","stn","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kevinzakka.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-01-07T10:02:58.000Z","updated_at":"2025-04-24T01:25:27.000Z","dependencies_parsed_at":"2022-08-10T01:35:36.161Z","dependency_job_id":null,"html_url":"https://github.com/kevinzakka/spatial-transformer-network","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kevinzakka%2Fspatial-transformer-network","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kevinzakka%2Fspatial-transformer-network/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kevinzakka%2Fspatial-transformer-network/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kevinzakka%2Fspatial-transformer-network/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kevinzakka","download_url":"https://codeload.github.com/kevinzakka/spatial-transformer-network/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254464891,"owners_count":22075570,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["affine-transformation","attention","convnet","spatial-transformer-network","stn","tensorflow"],"created_at":"2024-10-28T20:45:47.910Z","updated_at":"2025-05-16T04:03:38.661Z","avatar_url":"https://github.com/kevinzakka.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Spatial Transformer Networks\n\n\u003cp align=\"center\"\u003e\n \u003cimg src=\"./img/transformation.png\" width=\"400px\"\u003e\n\u003c/p\u003e\n\nThis is a **Tensorflow** implementation of [Spatial Transformer Networks](https://arxiv.org/abs/1506.02025) by *Max Jaderberg, Karen Simonyan, Andrew Zisserman* and *Koray Kavukcuoglu*, accompanying by two-part blog [tutorial series](https://kevinzakka.github.io/2017/01/18/stn-part2/).\n\n*Spatial Transformer Networks* (STN) is a differentiable module that can be inserted anywhere in ConvNet architecture to increase its geometric invariance. It effectively gives the network the ability to spatially transform feature maps at no extra data or supervision cost.\n\n\n\n## Installation\n\nInstall the `stn` package using:\n\n```\npip3 install stn\n```\n\nThen, you can call the STN layer as follows:\n\n```python\nfrom stn import spatial_transformer_network as transformer\n\nout = transformer(input_feature_map, theta, out_dims)\n```\n\n**Parameters**\n\n- `input_feature_map`: the output of the layer preceding the localization network. If the STN layer is the first layer of the network, then this corresponds to the input images. Shape should be (B, H, W, C).\n- `theta`: this is the output of the localization network. Shape should be (B, 6)\n- `out_dims`: desired (H, W) of the output feature map. Useful for upsampling or downsampling. If not specified, then output dimensions will be equal to `input_feature_map` dimensions.\n\n## Background Information\n\n\u003cp align=\"center\"\u003e\n \u003cimg src=\"./img/stn.png\" width=\"400px\"\u003e\n\u003c/p\u003e\n\nThe STN is composed of 3 elements.\n\n- **localization network**: takes the feature map as input and outputs the parameters of the affine transformation that should be applied to that feature map.\n\n- **grid generator:** generates a grid of (x,y) coordinates using the parameters of the affine transformation that correspond to a set of points where the input feature map should be sampled to produce the transformed output feature map.\n\n- **bilinear sampler:** takes as input the input feature map and the grid generated by the grid generator and produces the output feature map using bilinear interpolation.\n\nThe affine transformation is specified through the transformation matrix A\n\n\u003cp align=\"center\"\u003e\n \u003cimg src=\"./img/general.png\" width=\"175px\"\u003e\n\u003c/p\u003e\n\nIt can be constrained to one of *attention* by writing it in the form\n\n\u003cp align=\"center\"\u003e\n \u003cimg src=\"./img/attention.png\" width=\"175px\"\u003e\n\u003c/p\u003e\n\nwhere the parameters `s`, `t_x` and `t_y` can be regressed to allow cropping, translation, and isotropic scaling.\n\nFor a more in-depth explanation of STNs, read the two part blog post: [part1](https://kevinzakka.github.io/2017/01/10/stn-part1/) and [part2](https://kevinzakka.github.io/2017/01/18/stn-part2/).\n\n## Explore\n\nRun the [Sanity Check](https://github.com/kevinzakka/spatial-transformer-network/blob/master/Sanity%20Check.ipynb) to get a feel of how the spatial transformer can be plugged into any existing code. For example, here's the result of a 45 degree rotation:\n\n\u003cp align=\"center\"\u003e\n \u003cimg src=\"./img/b4.png\" alt=\"Drawing\" width=\"40%\"\u003e\n \u003cimg src=\"./img/after.png\" alt=\"Drawing\" width=\"40%\"\u003e\n\u003c/p\u003e\n\n**Usage Note**\n\nYou must define a localization network right before using this layer. The localization network is usually a ConvNet or a FC-net that has 6 output nodes (the 6 parameters of the affine transformation).\n\nIt is good practice to initialize the localization network to the identity transform before starting the training process. Here's a small sample code for illustration purposes.\n\n```python\n# params\nn_fc = 6\nB, H, W, C = (2, 200, 200, 3)\n\n# identity transform\ninitial = np.array([[1., 0, 0], [0, 1., 0]])\ninitial = initial.astype('float32').flatten()\n\n# input placeholder\nx = tf.placeholder(tf.float32, [B, H, W, C])\n\n# localization network\nW_fc1 = tf.Variable(tf.zeros([H*W*C, n_fc]), name='W_fc1')\nb_fc1 = tf.Variable(initial_value=initial, name='b_fc1')\nh_fc1 = tf.matmul(tf.zeros([B, H*W*C]), W_fc1) + b_fc1\n\n# spatial transformer layer\nh_trans = transformer(x, h_fc1)\n```\n\n## Attribution\n\n- [Torch Blog Post on STN's](http://torch.ch/blog/2015/09/07/spatial_transformers.html)\n- [daviddao's Tensorflow Implementation](https://github.com/daviddao/spatial-transformer-tensorflow)\n- Shoutout to [Eder Santana](https://twitter.com/edersantana) for introducing and helping me understand the paper!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkevinzakka%2Fspatial-transformer-network","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkevinzakka%2Fspatial-transformer-network","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkevinzakka%2Fspatial-transformer-network/lists"}