{"id":19373651,"url":"https://github.com/4ai/bellm","last_synced_at":"2025-04-23T17:32:05.556Z","repository":{"id":227610879,"uuid":"771918034","full_name":"4AI/BeLLM","owner":"4AI","description":"Code for BeLLM: Backward Dependency Enhanced Large Language Model for Sentence Embeddings (NAACL2024)","archived":false,"fork":false,"pushed_at":"2024-06-13T05:19:35.000Z","size":253,"stargazers_count":7,"open_issues_count":2,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-02T17:52:57.060Z","etag":null,"topics":["sentence-embeddings"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2311.05296","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/4AI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-03-14T07:35:53.000Z","updated_at":"2025-03-06T15:05:58.000Z","dependencies_parsed_at":"2024-04-26T05:35:56.482Z","dependency_job_id":"92bb2fcb-4988-4ebe-9cb5-dfd937160904","html_url":"https://github.com/4AI/BeLLM","commit_stats":null,"previous_names":["4ai/bellm"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/4AI%2FBeLLM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/4AI%2FBeLLM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/4AI%2FBeLLM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/4AI%2FBeLLM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/4AI","download_url":"https://codeload.github.com/4AI/BeLLM/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250480584,"owners_count":21437574,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["sentence-embeddings"],"created_at":"2024-11-10T08:30:40.916Z","updated_at":"2025-04-23T17:32:05.151Z","avatar_url":"https://github.com/4AI.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# BeLLM\n\nBeLLM: Backward Dependency Enhanced Large Language Model for Sentence Embeddings (NAACL24)\n\nArxiv: https://arxiv.org/abs/2311.05296\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"./assets/framework.jpg\" width=\"800\" /\u003e\n\u003c/p\u003e\n\n💡 **Highlight**: To the best of our knowledge, **our work is the first to extensively investigate the effects of backward dependencies in autoregressive LLMs architectures for sentence embedding learning**. \n\n## Pretrained Models:\n\n- [SeanLee97/bellm-llama-7b-nli](https://huggingface.co/SeanLee97/bellm-llama-7b-nli)\n\n\n## Training\n\n\n### 1. Installation\n\n`angle_emb` and `billm` are required. You can install them by running the following commands:\n\n```bash\npython -m pip install -r requirements.txt\n```\n\n### 2. Dataset\n\nWe trained our models using MultiNLI and NLI datasets (they can be downloaded from sentence-transformers https://sbert.net/datasets/AllNLI.tsv.gz)\n\nWe use the following preprocessing steps to obtain the training set:\n- Transform the original format to `{\"text\": \"text\", \"positive\": \"positive of text\", \"negative\": \"negative of text\"}`.\n- Augment the negative samples with retrieval and reranking techniques.\n\nWe have pushed the processed train set to huggingface:\n- [SeanLee97/all_nli_angle_format_b](https://huggingface.co/datasets/SeanLee97/all_nli_angle_format_b)\n- [SeanLee97/all_nli_aug_angle_format_b](https://huggingface.co/datasets/SeanLee97/all_nli_aug_angle_format_b)\n\n\n### 3. Training\n\n1) \n\n```bash\nBiLLM_START_INDEX=31 WANDB_MODE=disabled CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 --master_port=1234 train.py \\\n--train_name_or_path SeanLee97/all_nli_angle_format_b \\\n--save_dir ckpts/bellm-llama-7b-nli \\\n--model_name NousResearch/Llama-2-7b-chat-hf \\\n--prompt_template 'The representative word for sentence {text} is:\"' \\\n--pooling_strategy avg \\\n--ibn_w 20.0 --cosine_w 0.0 --angle_w 1.0 --learning_rate 2e-4 --maxlen 60 \\\n--apply_lora 1 --lora_r 64 --lora_alpha 128 --lora_dropout 0.1 \\\n--is_llm 1 --apply_billm 1 --billm_model_class LlamaForCausalLM \\\n--push_to_hub 0 \\\n--logging_steps 5 --save_steps 50 --warmup_steps 80 --batch_size 256 --seed 42 --load_kbit 4 \\\n--gradient_accumulation_steps 32 --epochs 3 --fp16 1\n```\n\nIf you want to push the model to HuggingFace automatically, you can add following extra arguments:\n\n```bash\n--push_to_hub 1 \\\n--hub_model_id {YOUR_MODEL_ID} \\\n--hub_private_repo 1\n```\n\n2) continue to finetune on augmented data:\n\n```bash\nBiLLM_START_INDEX=31 WANDB_MODE=disabled CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 --master_port=1234 train.py \\\n--train_name_or_path SeanLee97/all_nli_aug_angle_format_b \\\n--pretrained_lora_path ckpts/bellm-llama-7b-nli \\\n--save_dir ckpts/bellm-llama-7b-nli-2 \\\n--model_name NousResearch/Llama-2-7b-hf \\\n--ibn_w 1.0 --cosine_w 0.0 --angle_w 0.0 --learning_rate 2e-4 --maxlen 60 \\\n--is_llm 1 --apply_lora 1 --lora_r 32 --lora_alpha 32 --lora_dropout 0.1 \\\n--push_to_hub 0 \\\n--save_steps 200 --batch_size 256 --seed 42 --load_kbit 4 --gradient_accumulation_steps 32 --epochs 3 --fp16 1\n```\n\n\n**Tips:**\n\n- Here we only use contrastive learning loss (ibn_w = 1.0, cosine_w = 0.0, angle_w = 0.0). **It is recommended to use AnglE (set `angle_w` \u003e 0) to further improve the performance.**\n- `BiLLM_START_INDEX=31` is used to set layers greater than 31 to be bidirectional. Since the LLaMA-7B has 32 layers, thus `BiLLM_START_INDEX=31` will convert the final layer bidirectional.\n\n\n### 4. Evaluation\n\n1) download senteval datasets\n\n```bash\ncd SentEval/data\nsh download_dataset.sh\n```\n\n2) evaluate on STS benchmark\n```bash\nBiLLM_START_INDEX=31 CUDA_VISIBLE_DEVICES=0 python eval_sts.py \\\n--model_name_or_path NousResearch/Llama-2-7b-hf \\\n--lora_name_or_path SeanLee97/bellm-llama-7b-nli \\\n--apply_bfloat16 0\n```\n\nResults:\n\n```\n+-------+-------+-------+-------+-------+--------------+-----------------+-------+\n| STS12 | STS13 | STS14 | STS15 | STS16 | STSBenchmark | SICKRelatedness |  Avg. |\n+-------+-------+-------+-------+-------+--------------+-----------------+-------+\n| 78.36 | 90.88 | 86.28 | 89.89 | 86.59 |    88.89     |      83.17      | 86.29 |\n+-------+-------+-------+-------+-------+--------------+-----------------+-------+\n```\n\n\n### 5. Inference\n\nHere, we combine AnglE and BiLLM to infer.\n\n```bash\nimport os\n# set environment variable for BiLLM_START_INDEX before importing the model\nos.environ['BiLLM_START_INDEX'] = '31'\nos.environ['CUDA_VISIBLE_DEVICES'] = '0'\n\nfrom scipy import spatial\n\nfrom model import AnglE\n\n\n# 1. load model\nmodel = AnglE.from_pretrained('NousResearch/Llama-2-7b-hf', pretrained_lora_path='SeanLee97/bellm-llama-7b-nli').cuda()\n\n# 2. set prompt\nmodel.set_prompt(prompt='The representative word for sentence {text} is:\"')\n\n# 3. encode\ndocs = ['I like apples', 'I like fruit', 'i am hiking.']\nvecs = model.encode([{'text': doc} for doc in docs])\n\nprint('cos sim (0, 1):', 1 - spatial.distance.cosine(vecs[0], vecs[1]))\nprint('cos sim (0, 2)', 1 - spatial.distance.cosine(vecs[0], vecs[2]))\nprint('cos sim (1, 2):', 1 - spatial.distance.cosine(vecs[1], vecs[2]))\n```\n\noutput\n\n```\ncos sim (0, 1): 0.8061720132827759\ncos sim (0, 2) 0.2913861870765686\ncos sim (1, 2): 0.29943591356277466\n```\n\n### 6. Fine-tuning\n\nYou can fine-tune the model on your own dataset by specifying `--pretrained_lora_path` to our pre-trained LoRA models.\n\n\n\n## Citation:\n\n```bibtex\n@inproceedings{li2024bellm,\n    title = \"BeLLM: Backward Dependency Enhanced Large Language Model for Sentence Embeddings\",\n    author = \"Li, Xianming and Li, Jing\",\n    booktitle = \"Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics\",\n    year = \"2024\",\n    publisher = \"Association for Computational Linguistics\"\n}\n```\n\n## 🌐 Friendship Link\n\nWelcome to follow related works:\n\n- AnglE (BeLLM's elder sister 👭): https://arxiv.org/abs/2309.12871\n- LS-LLaMA (BeLLM's father 👨🏻): https://arxiv.org/abs/2310.01208\n- We are happy to have you here! Feel free to open an issue (title starts with [Friendship Request]) to report the related works.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F4ai%2Fbellm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2F4ai%2Fbellm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F4ai%2Fbellm/lists"}