{"id":13627921,"url":"https://github.com/saicoco/mxnet_image_caption","last_synced_at":"2025-04-17T00:32:48.512Z","repository":{"id":217075750,"uuid":"83970981","full_name":"saicoco/mxnet_image_caption","owner":"saicoco","description":"mxnet image caption(NIC)","archived":false,"fork":false,"pushed_at":"2017-04-17T14:32:53.000Z","size":186,"stargazers_count":9,"open_issues_count":1,"forks_count":3,"subscribers_count":1,"default_branch":"master","last_synced_at":"2024-08-01T22:41:40.427Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/saicoco.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2017-03-05T13:13:32.000Z","updated_at":"2022-09-02T18:09:36.000Z","dependencies_parsed_at":"2024-01-14T13:05:58.230Z","dependency_job_id":null,"html_url":"https://github.com/saicoco/mxnet_image_caption","commit_stats":null,"previous_names":["saicoco/mxnet_image_caption"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saicoco%2Fmxnet_image_caption","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saicoco%2Fmxnet_image_caption/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saicoco%2Fmxnet_image_caption/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saicoco%2Fmxnet_image_caption/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/saicoco","download_url":"https://codeload.github.com/saicoco/mxnet_image_caption/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223735167,"owners_count":17194059,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T22:00:40.691Z","updated_at":"2024-11-08T18:31:05.677Z","avatar_url":"https://github.com/saicoco.png","language":"Python","funding_links":[],"categories":["\u003ca name=\"Vision\"\u003e\u003c/a\u003e2. Vision"],"sub_categories":["2.14 Misc"],"readme":"## image caption generation  \n\nThis is a simple implementaion of paper Neural [Image Caption][^1] based on mxnet.  \nSome codes refer [where-to-image](https://github.com/mtanti/where-image)\n\n### Usage  \n\n1. Prepare the datasets and pre_train params to dirs `datasets` and `pre_train`, here we use pretrain-model is vgg-16, datasets are Filckr8k, you could replace it with your datasets and pretrain_model. For Flickr8k, which includes images and captions, captions are store in dataset.json, looks like following:  \n```\n{\"images\": \n    [\n        {\"sentids\": [0, 1, 2, 3, 4], \n        \"imgid\": 0, \n        \"sentences\": [\n        {\"tokens\": [\"a\", \"black\", \"dog\", \"is\", \"running\", \"after\", \"a\", \"white\", \"dog\", \"in\", \"the\", \"snow\"], \"raw\": \"A black dog is running after a white dog in the snow .\", \"imgid\": 0, \"sentid\": 0}, \n        {\"tokens\": [\"black\", \"dog\", \"chasing\", \"brown\", \"dog\", \"through\", \"snow\"], \"raw\": \"Black dog chasing brown dog through snow\", \"imgid\": 0, \"sentid\": 1}, \n        {\"tokens\": [\"two\", \"dogs\", \"chase\", \"each\", \"other\", \"across\", \"the\", \"snowy\", \"ground\"], \"raw\": \"Two dogs chase each other across the snowy ground .\", \"imgid\": 0, \"sentid\": 2}, \n        {\"tokens\": [\"two\", \"dogs\", \"play\", \"together\", \"in\", \"the\", \"snow\"], \"raw\": \"Two dogs play together in the snow .\", \"imgid\": 0, \"sentid\": 3}, \n        {\"tokens\": [\"two\", \"dogs\", \"running\", \"through\", \"a\", \"low\", \"lying\", \"body\", \"of\", \"water\"], \"raw\": \"Two dogs running through a low lying body of water .\", \"imgid\": 0, \"sentid\": 4}\n                    ], \n        \"split\": \"train\", \"filename\": \"2513260012_03d33305cf.jpg\"}, ...\n    \n    ],\n\"datasets\":Flickr8k}\n```\nor you can download processed data from [here](http://cs.stanford.edu/people/karpathy/deepimagesent/), which image are extracted from vgg networks 4096-dim, and unzip them into dir 'datasets',\nthen copy file which in \"old\" dir into root dir, and run it, this is a old version about NIC.  \n\n\n2. After data downloading completes, you can run:  \n```\npython 1_preprocess_data.py\n```\nwhen it runs over, there will be a directory named \"processed_data\" which include train, val and test datasets which are splited by \"split\" key in dataset.json .\n\n3. `python 2_train_val.py` to train model on your dataset and save you dataset.  \n\n4. There are something wrong with test stage(predict), (variable length for sym, I think I should use `mx.mod.BuckingModule`), I am trying~~~~~~~~~~~, if you find the solution, welcome to\nissue me.\n\n### Reference  \n[^1]: Vinyals O, Toshev A, Bengio S, et al. Show and tell: A neural image caption generator[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015: 3156-3164.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsaicoco%2Fmxnet_image_caption","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsaicoco%2Fmxnet_image_caption","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsaicoco%2Fmxnet_image_caption/lists"}