{"id":16864397,"url":"https://github.com/stonesjtu/basis-embedding","last_synced_at":"2025-06-24T08:36:56.659Z","repository":{"id":86707554,"uuid":"98866975","full_name":"Stonesjtu/basis-embedding","owner":"Stonesjtu","description":"basis embedding: a product quantization based model compression method for language models.","archived":false,"fork":false,"pushed_at":"2024-10-29T05:53:50.000Z","size":47943,"stargazers_count":6,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-25T06:41:32.819Z","etag":null,"topics":["language-model","model-compression","product-quantization","pytorch"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Stonesjtu.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-07-31T08:35:27.000Z","updated_at":"2025-01-04T17:34:53.000Z","dependencies_parsed_at":null,"dependency_job_id":"a146e984-3ef8-4207-9a41-1624b1084131","html_url":"https://github.com/Stonesjtu/basis-embedding","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Stonesjtu%2Fbasis-embedding","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Stonesjtu%2Fbasis-embedding/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Stonesjtu%2Fbasis-embedding/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Stonesjtu%2Fbasis-embedding/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Stonesjtu","download_url":"https://codeload.github.com/Stonesjtu/basis-embedding/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248369295,"owners_count":21092550,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["language-model","model-compression","product-quantization","pytorch"],"created_at":"2024-10-13T14:42:11.012Z","updated_at":"2025-04-11T09:42:43.418Z","avatar_url":"https://github.com/Stonesjtu.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## basis embedding\n\n\u003e code for Structured Word Embedding for Low Memory Neural Network Language Model\n\nThe code repo for basis embedding to reduce model size and memory consumption\nThis repo is built based on the pytorch/examples repo on github\n\n### Parameters Introduction\nbasis embedding related arguments:\n  - `--basis` \u003c0\u003e: number of basis to decompose the embedding matrix, 0 is normal mode\n  - `--num_clusters`: number of clusters for all the vocabulary\n  - `--load_input_embedding`: path of pre-trained embedding matrix for input embedding\n  - `--load_output_embedding`: path of pre-trained embedding matrix for output embedding\n\nmisc options:\n  - `-c` or `--config`: the path for configuration file, it will override arguments parser's\n  default values and be overrided by command line options\n  - `--train`: train or just evaluation existing model\n  - `--dict \u003cNone\u003e`: use vocabulary file if specified, otherwise use the words in train.txt\n\n### examples\n\n```bash\npython main.py -c config/default.conf  # train a cross-entropy baseline\npython main.py -c config/ptb_basis_tied.conf # basis embedding inited via tied embedding on ptb\n```\nDuring training, if a keyboard interrupt (Ctrl-C) is received,\ntraining is stopped and the current model is evaluted against the test dataset.\n\nThe `main.py` script accepts the following arguments:\n\n```bash\noptional arguments:\n  -h, --help         show this help message and exit\n  -c, --config PATH  preset configurations to load\n  --data DATA        location of the data corpus\n  --model MODEL      type of recurrent net (RNN_TANH, RNN_RELU, LSTM, GRU)\n  --emsize EMSIZE    size of word embeddings\n  --nhid NHID        humber of hidden units per layer\n  --nlayers NLAYERS  number of layers\n  --lr LR            initial learning rate\n  --clip CLIP        gradient clipping\n  --epochs EPOCHS    upper epoch limit\n  --batch-size N     batch size\n  --dropout DROPOUT  dropout applied to layers (0 = no dropout)\n  --tied             tie the word embedding and softmax weights\n  --seed SEED        random seed\n  --cuda             use CUDA\n  --log-interval N   report interval\n  --save SAVE        path to save the final model\n  ... more from previous basis embedding related parameters\n```\n\n\n### File Hierarchy\n\n- main.py: the entry file, it parses the parameters, defines models\nand feeds the data to model\n- model.py: define the input embedding and LSTM layer\n- basis_loss.py: It contains a basis linear module, taking inputs from LSTM hidden state and outputing loss value.\n- basis/: core part of the basis embedding module\n- utils.py: do product quantization for pre-trained embedding\n- data.py: data pre-processing\n- .th/.th.decoder: the pre-trained embedding matrix\n- .conf: sample configuration files\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstonesjtu%2Fbasis-embedding","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstonesjtu%2Fbasis-embedding","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstonesjtu%2Fbasis-embedding/lists"}