{"id":24101362,"url":"https://github.com/yxtay/glove-tensorflow","last_synced_at":"2025-05-08T01:11:20.048Z","repository":{"id":37214476,"uuid":"163372321","full_name":"yxtay/glove-tensorflow","owner":"yxtay","description":"Implementation of GloVe using TensorFlow estimator API","archived":false,"fork":false,"pushed_at":"2025-02-11T04:07:03.000Z","size":377,"stargazers_count":9,"open_issues_count":3,"forks_count":5,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-31T16:02:34.986Z","etag":null,"topics":["cloud-ml-engine","docker","embedding","glove","google-cloud-ml-engine","google-cloud-platform","matrix-factorization","tensorboard","tensorflow","tensorflow-2","tensorflow-distributed","tensorflow-estimator-api","tensorflow-estimators","tensorflow-serving","word-embedding"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yxtay.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-12-28T06:03:02.000Z","updated_at":"2023-02-23T18:42:28.000Z","dependencies_parsed_at":"2023-01-30T01:30:57.944Z","dependency_job_id":null,"html_url":"https://github.com/yxtay/glove-tensorflow","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yxtay%2Fglove-tensorflow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yxtay%2Fglove-tensorflow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yxtay%2Fglove-tensorflow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yxtay%2Fglove-tensorflow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yxtay","download_url":"https://codeload.github.com/yxtay/glove-tensorflow/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252978768,"owners_count":21834916,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cloud-ml-engine","docker","embedding","glove","google-cloud-ml-engine","google-cloud-platform","matrix-factorization","tensorboard","tensorflow","tensorflow-2","tensorflow-distributed","tensorflow-estimator-api","tensorflow-estimators","tensorflow-serving","word-embedding"],"created_at":"2025-01-10T16:55:16.317Z","updated_at":"2025-05-08T01:11:20.026Z","avatar_url":"https://github.com/yxtay.png","language":"Python","readme":"# GloVe Model in TensorFlow\n\nImplementation of GloVe using TensorFlow estimator API.\n\nThe trainer module in this repository also allows for cloud model training and evaluation on Google Cloud Platform. Please refer to [cloud](cloud.md).\n\n## Setup\n\n```bash\nENV_NAME=glove-tensorflow\n\n# clone repo\ngit clone git@github.com:yxtay/glove-tensorflow.git \u0026\u0026 cd recommender-tensorflow\n\n# create and activate conda environment\nconda env create -n ${ENV_NAME} -y python=3.7\nconda activate ${ENV_NAME}\n\n# install requirements\n# make install-requirments\npip install -r requirements/main.txt -r requirements/dev.txt\n```\n\nYou may also use accompanying docker commands to avoid environment setup.\n\n## Download \u0026 Process Data\n\nThe [text8 dataset](http://mattmahoney.net/dc/textdata.html) is used for demonstration purposes.\nThe following script downloads the data, processes it to prepare the vocabulary and cooccurrence matrix. The data is serialised to `csv`.\n\n```bash\n# make data\npython -m src.data.text8\n```\n\n**With Docker**\n\n```bash\n# make docker-data\ndocker run --rm -w=/home \\\n  --mount type=bind,source=$(pwd),target=/home \\\n  continuumio/anaconda3:2019.10 \\\n  python -m src.data.text8\n```\n\n**Sample data**\n\n|   row_token_id |   col_token_id |   count |    value | row_token   | col_token   |   neg_weight |   glove_weight |   glove_value |\n|---------------:|---------------:|--------:|---------:|:------------|:------------|-------------:|---------------:|--------------:|\n|           6125 |             38 |      24 |  16.9500 | altogether  | not         |    0.6421    |       0.3428   |       2.83027 |\n|             18 |           1571 |     176 |  74.1000 | was         | prominent   |    7.5889    |       1.0000   |       4.30542 |\n|             91 |            372 |      19 |   5.4500 | th          | society     |    3.1999    |       0.2877   |       1.69562 |\n|            432 |            541 |      12 |   5.9000 | numbers     | note        |    0.6461    |       0.2038   |       1.77495 |\n|           1304 |            285 |      25 |  11.1667 | na          | europe      |    0.4112    |       0.3535   |       2.41293 |\n|             32 |             18 |    2312 | 723.2000 | be          | was         |  406.5180    |       1.0000   |       6.58369 |\n|           2247 |           1154 |     136 |  46.5833 | html        | www         |    0.0740    |       1.0000   |       3.84124 |\n|            710 |            229 |      18 |   9.0500 | cannot      | point       |    0.8569    |       0.2763   |       2.20276 |\n|            467 |           3756 |      12 |   5.2000 | style       | width       |    0.0911    |       0.2038   |       1.64866 |\n|             80 |            543 |      35 |  20.6333 | over        | lost        |    2.6989    |       0.4550   |       3.02691 | \n\n**Usage**\n\n```\nusage: text8.py [-h] [--url URL] [--dest DEST] [--vocab-size VOCAB_SIZE]\n                [--coverage COVERAGE] [--context-size CONTEXT_SIZE] [--reset]\n                [--log-path LOG_PATH]\n\nDownload, extract and prepare text8 data.\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --url URL             url of text8 data (default:\n                        http://mattmahoney.net/dc/text8.zip)\n  --dest DEST           destination directory for downloaded and extracted\n                        files (default: data)\n  --vocab-size VOCAB_SIZE\n                        maximum size of vocab (default: None)\n  --coverage COVERAGE   token coverage to set token count cutoff (default:\n                        0.9)\n  --context-size CONTEXT_SIZE\n                        size of context window (default: 5)\n  --reset               whether to recompute interactions\n  --log-path LOG_PATH   path of log file (default: main.log)\n```\n\n## Train GloVe\n\n### Estimator\n\n```bash\n# make train\npython -m trainer.estimator\n```\n\n**With Docker**\n\n```bash\n# make docker-train\ndocker run --rm -w=/home \\\n  --mount type=bind,source=$(pwd),target=/home \\\n  tensorflow/tensorflow:2.1.0-py3 \\\n  python -m trainer.estimator\n```\n\n**Usage**\n\n```\nusage: estimator.py [-h] [--train-csv TRAIN_CSV] [--vocab-txt VOCAB_TXT]\n                    [--row-name ROW_NAME] [--col-name COL_NAME]\n                    [--target-name TARGET_NAME] [--weight-name WEIGHT_NAME]\n                    [--pos-name POS_NAME] [--neg-name NEG_NAME]\n                    [--job-dir JOB_DIR] [--disable-datetime-path]\n                    [--embedding-size EMBEDDING_SIZE] [--l2-reg L2_REG]\n                    [--neg-factor NEG_FACTOR] [--optimizer OPTIMIZER]\n                    [--learning-rate LEARNING_RATE] [--batch-size BATCH_SIZE]\n                    [--train-steps TRAIN_STEPS]\n                    [--steps-per-epoch STEPS_PER_EPOCH] [--top-k TOP_K]\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --train-csv TRAIN_CSV\n                        path to the training csv data (default:\n                        data/interaction.csv)\n  --vocab-txt VOCAB_TXT\n                        path to the vocab txt (default: data/vocab.txt)\n  --row-name ROW_NAME   row id name (default: row_token)\n  --col-name COL_NAME   column id name (default: col_token)\n  --target-name TARGET_NAME\n                        target name (default: glove_value)\n  --weight-name WEIGHT_NAME\n                        weight name (default: glove_weight)\n  --pos-name POS_NAME   positive name (default: value)\n  --neg-name NEG_NAME   negative name (default: neg_weight)\n  --job-dir JOB_DIR     job directory (default: checkpoints/glove)\n  --disable-datetime-path\n                        flag whether to disable appending datetime in job_dir\n                        path (default: False)\n  --embedding-size EMBEDDING_SIZE\n                        embedding size (default: 64)\n  --l2-reg L2_REG       scale of l2 regularisation (default: 0.01)\n  --neg-factor NEG_FACTOR\n                        negative loss factor (default: 1.0)\n  --optimizer OPTIMIZER\n                        name of optimzer (default: Adam)\n  --learning-rate LEARNING_RATE\n                        learning rate (default: 0.001)\n  --batch-size BATCH_SIZE\n                        batch size (default: 1024)\n  --train-steps TRAIN_STEPS\n                        number of training steps (default: 16384)\n  --steps-per-epoch STEPS_PER_EPOCH\n                        number of steps per checkpoint (default: 16384)\n  --top-k TOP_K         number of similar items (default: 20)\n```\n\n### Keras\n\n```bash\n# make train MODEL_NAME=keras\npython -m trainer.keras\n```\n\n### Logistic Matrix Factorisation\n\n```bash\n# make train MODEL_NAME=logistic_matrix_factorisation\npython -m trainer.logistic_matrix_factorisation\n```\n\n## Tensorboard\n\nYou may inspect model training metrics with Tensorboard.\n\n```bash\n# make tensorboard\nCHECKPOINTS_DIR=checkpoints\n\ntensorboard --logdir ${CHECKPOINTS_DIR}\n```\n\n**With Docker**\n\n```bash\n# make docker-tensorboard\nCHECKPOINTS_DIR=checkpoints\n\ndocker run --rm -w=/home -p 6006:6006 \\\n  --mount type=bind,source=$(pwd),target=/home \\\n  tensorflow/tensorflow:2.1.0-py3 \\\n  tensorboard --logdir ${CHECKPOINTS_DIR}\n```\n\nAccess [Tensorboard](http://localhost:6006/) on your browser\n\n## TensorFlow Serving\n\nThe trained and serialised model may be served with TensorFlow Serving.\n\n```bash\n# make serving MODEL_NAME=glove\nJOB_DIR=checkpoints/glove_estimator\nMODEL_NAME=glove\n\ndocker run --rm -p 8500:8500 -p 8501:8501 \\\n  --mount type=bind,source=$(pwd)/${JOB_DIR}/export/exporter,target=/models/${MODEL_NAME} \\\n  -e MODEL_NAME=${MODEL_NAME} -t tensorflow/serving:2.1.0\n```\n\n**Model signature**\n\n```bash\n# make saved-model-cli JOB_DIR=checkpoints/glove_estimator/export/exporter/1582880583\nJOB_DIR=checkpoints/glove_estimator/export/exporter/1582880583\n\nsaved_model_cli show --all --dir ${JOB_DIR}\n```\n\n```\nMetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:\n\nsignature_def['serving_default']:\n  The given SavedModel SignatureDef contains the following input(s):\n    inputs['col_token'] tensor_info:\n        dtype: DT_STRING\n        shape: (-1)\n        name: col_token:0\n    inputs['row_token'] tensor_info:\n        dtype: DT_STRING\n        shape: (-1)\n        name: row_token:0\n  The given SavedModel SignatureDef contains the following output(s):\n    outputs['input_embedding'] tensor_info:\n        dtype: DT_FLOAT\n        shape: (-1, 64)\n        name: predictions/row_embedding/embedding_lookup/Identity_1:0\n    outputs['input_string'] tensor_info:\n        dtype: DT_STRING\n        shape: (-1)\n        name: predictions/input_string_lookup/LookupTableFindV2:0\n    outputs['top_k_similarity'] tensor_info:\n        dtype: DT_FLOAT\n        shape: (-1, 20)\n        name: predictions/top_k_sim:0\n    outputs['top_k_string'] tensor_info:\n        dtype: DT_STRING\n        shape: (-1, 20)\n        name: predictions/top_k_string_lookup/LookupTableFindV2:0\n  Method name is: tensorflow/serving/predict\n```\n\nOnce served, you may query the model with the following command.\n\nSample request\n\n```bash\n# make query MODEL_NAME=glove\nMODEL_NAME=glove\n\ncurl -X POST \\\n  http://localhost:8501/v1/models/${MODEL_NAME}:predict \\\n  -d '{\"instances\": [{\"row_token\": \"man\", \"col_token\": \"man\"}]}'\n```\n\nSample response\n\n```\n{\n  \"predictions\": [\n    {\n      \"input_embedding\": [\n        -0.39519611,\n        -0.000384220504,\n        0.360801637,\n        0.71601522,\n        -0.425830722,\n        -0.259146929,\n        -0.13219744,\n        0.307031065,\n        0.695665479,\n        -0.504015446,\n        ...\n      ],\n      \"input_string\": \"man\",\n      \"top_k_similarity\": [\n        0.99999994,\n        0.725774705,\n        0.707765281,\n        0.693533063,\n        0.679038405,\n        0.646895647,\n        0.642417192,\n        0.640380502,\n        0.63178885,\n        0.631023884,\n        ...\n      ],\n      \"top_k_string\": [\n        \"man\",\n        \"person\",\n        \"god\",\n        \"woman\",\n        \"young\",\n        \"movie\",\n        \"great\",\n        \"good\",\n        \"himself\",\n        \"son\",\n        ...\n      ]\n    }\n  ]\n}\n```\n\n## Cloud\n\nFor cloud model training and evaluation, please refer to [cloud](cloud.md).\n\n## References\n\n- Mahoney, M. (2006). Large Text Compression Benchmark. Retrieved from [http://mattmahoney.net/dc/text.html](http://mattmahoney.net/dc/text.html).\n- Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. \\[[pdf](https://nlp.stanford.edu/pubs/glove.pdf)\\]\\[[bib](https://nlp.stanford.edu/pubs/glove.bib)\\]\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyxtay%2Fglove-tensorflow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyxtay%2Fglove-tensorflow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyxtay%2Fglove-tensorflow/lists"}