{"id":29027638,"url":"https://github.com/defi0x1/vietnamese-generator","last_synced_at":"2026-03-10T15:33:25.619Z","repository":{"id":43240295,"uuid":"253675409","full_name":"defi0x1/vietnamese-generator","owner":"defi0x1","description":null,"archived":false,"fork":false,"pushed_at":"2025-05-23T17:21:11.000Z","size":30791,"stargazers_count":1,"open_issues_count":1,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-06-26T06:46:01.438Z","etag":null,"topics":["data-generation","synthtext-vietnamese"],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/defi0x1.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2020-04-07T03:16:57.000Z","updated_at":"2024-12-26T08:45:16.000Z","dependencies_parsed_at":"2025-06-13T04:55:46.955Z","dependency_job_id":null,"html_url":"https://github.com/defi0x1/vietnamese-generator","commit_stats":null,"previous_names":["docongminh/vietnamese-generator","defi0x1/vietnamese-generator"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/defi0x1/vietnamese-generator","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/defi0x1%2Fvietnamese-generator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/defi0x1%2Fvietnamese-generator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/defi0x1%2Fvietnamese-generator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/defi0x1%2Fvietnamese-generator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/defi0x1","download_url":"https://codeload.github.com/defi0x1/vietnamese-generator/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/defi0x1%2Fvietnamese-generator/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30340117,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-10T15:03:31.997Z","status":"ssl_error","status_checked_at":"2026-03-10T15:01:30.431Z","response_time":106,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-generation","synthtext-vietnamese"],"created_at":"2025-06-26T06:37:14.600Z","updated_at":"2026-03-10T15:33:25.592Z","avatar_url":"https://github.com/defi0x1.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SynthText\nCode for generating synthetic text images as described in [\"Synthetic Data for Text Localisation in Natural Images\", Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, CVPR 2016](http://www.robots.ox.ac.uk/~vgg/data/scenetext/).\n\n\n**Synthetic Scene-Text Image Samples**\n![Synthetic Scene-Text Samples](samples.png \"Synthetic Samples\")\n\nThe library is written in Python. The main dependencies are:\n\n```\npygame, opencv (cv2), PIL (Image), numpy, matplotlib, h5py, scipy\n```\n\n### Generating samples\n\n```\npython gen.py --viz\n```\n\nThis will download a data file (~56M) to the `data` directory. This data file includes:\n\n  - **dset.h5**: This is a sample h5 file which contains a set of 5 images along with their depth and segmentation information. Note, this is just given as an example; you are encouraged to add more images (along with their depth and segmentation information) to this database for your own use.\n  - **data/fonts**: three sample fonts (add more fonts to this folder and then update `fonts/fontlist.txt` with their paths).\n  - **data/newsgroup**: Text-source (from the News Group dataset). This can be subsituted with any text file. Look inside `text_utils.py` to see how the text inside this file is used by the renderer.\n  - **data/models/colors_new.cp**: Color-model (foreground/background text color model), learnt from the IIIT-5K word dataset.\n  - **data/models**: Other cPickle files (**char\\_freq.cp**: frequency of each character in the text dataset; **font\\_px2pt.cp**: conversion from pt to px for various fonts: If you add a new font, make sure that the corresponding model is present in this file, if not you can add it by adapting `invert_font_size.py`).\n\nThis script will generate random scene-text image samples and store them in an h5 file in `results/SynthText.h5`. If the `--viz` option is specified, the generated output will be visualized as the script is being run; omit the `--viz` option to turn-off the visualizations. If you want to visualize the results stored in  `results/SynthText.h5` later, run:\n\n```\npython visualize_results.py\n```\n### Pre-generated Dataset\nA dataset with approximately 800000 synthetic scene-text images generated with this code can be found [here](http://www.robots.ox.ac.uk/~vgg/data/scenetext/).\n\n### Adding New Images\nSegmentation and depth-maps are required to use new images as background. Sample scripts for obtaining these are available [here](https://github.com/ankush-me/SynthText/tree/master/prep_scripts).\n\n* `predict_depth.m` MATLAB script to regress a depth mask for a given RGB image; uses the network of [Liu etal.](https://bitbucket.org/fayao/dcnf-fcsp/) However, more recent works (e.g., [this](https://github.com/iro-cp/FCRN-DepthPrediction)) might give better results.\n* `run_ucm.m` and `floodFill.py` for getting segmentation masks using [gPb-UCM](https://github.com/jponttuset/mcg).\n\nFor an explanation of the fields in `dset.h5` (e.g.: `seg`,`area`,`label`), please check this [comment](https://github.com/ankush-me/SynthText/issues/5#issuecomment-274490044).\n\n### Pre-processed Background Images\nThe 8,000 background images used in the paper, along with their segmentation and depth masks, have been uploaded here:\n`http://zeus.robots.ox.ac.uk/textspot/static/db/\u003cfilename\u003e`, where, `\u003cfilename\u003e` can be:\n\n- `imnames.cp` [180K]: names of filtered files, i.e., those files which do not contain text\n- `bg_img.tar.gz` [8.9G]: compressed image files (more than 8000, so only use the filtered ones in imnames.cp)\n- `depth.h5` [15G]: depth maps\n- `seg.h5` [6.9G]: segmentation maps\n\nNote: I do not own the copyright to these images.\n## Vietnamese \n\n* Adding vietnamese\n* add arial and time new roman fonts\n* add character statistic \n* add corpus contains 73913 words\n\n### Further Information\nPlease refer to the paper for more information, or contact me (email address in the paper).\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdefi0x1%2Fvietnamese-generator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdefi0x1%2Fvietnamese-generator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdefi0x1%2Fvietnamese-generator/lists"}