{"id":20515411,"url":"https://github.com/snap-research/locomo","last_synced_at":"2025-04-14T00:22:37.540Z","repository":{"id":224625719,"uuid":"762416453","full_name":"snap-research/locomo","owner":"snap-research","description":null,"archived":false,"fork":false,"pushed_at":"2024-08-13T03:21:32.000Z","size":7678,"stargazers_count":33,"open_issues_count":5,"forks_count":5,"subscribers_count":9,"default_branch":"main","last_synced_at":"2025-03-27T14:21:56.645Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/snap-research.png","metadata":{"files":{"readme":"README.MD","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-02-23T18:28:28.000Z","updated_at":"2025-03-25T03:10:36.000Z","dependencies_parsed_at":"2024-08-12T12:43:25.541Z","dependency_job_id":"15a1b320-bb03-4556-9307-33294359fc89","html_url":"https://github.com/snap-research/locomo","commit_stats":null,"previous_names":["snap-research/locomo"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/snap-research%2Flocomo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/snap-research%2Flocomo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/snap-research%2Flocomo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/snap-research%2Flocomo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/snap-research","download_url":"https://codeload.github.com/snap-research/locomo/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248800053,"owners_count":21163404,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-15T21:21:34.791Z","updated_at":"2025-04-14T00:22:37.495Z","avatar_url":"https://github.com/snap-research.png","language":"Python","funding_links":[],"categories":["Python","Benchmarks and Evaluation"],"sub_categories":["Embodied AI and Robotics"],"readme":"# Data and Code for the **ACL 2024** Paper \"**Evaluating Very Long-Term Conversational Memory of LLM Agents**\"\r\n**Authors**: [Adyasha Maharana](https://adymaharana.github.io/), [Dong-Ho Lee](https://www.danny-lee.info/), [Sergey Tulyakov](https://stulyakov.com/), [Mohit Bansal](https://www.cs.unc.edu/~mbansal/), [Francesco Barbieri](https://fvancesco.github.io/) and [Yuwei Fang](https://yuwfan.github.io/)\r\n\r\n**Paper**: [pdf](https://github.com/snap-research/locomo/tree/main/static/paper/locomo.pdf)\r\n\r\n## Data\r\n\r\nWe release LoCoMo, a high-quality evaluation benchmark consisting of *very* long-term conversational data. The benchmark consists of ten conversations. Each conversation is annotated for the **question-answering** and **event-summarization** tasks. Additionally, the dialogs in each conversation can be used for the **multimodal-dialog-generation** task. See statistics of the dataset in the Table below.\r\n\r\n![image](./static/images/locomo_example_stats.png)\r\n\r\nThe dataset can be found in the ```./data/locomo10.json``` file in this repository. Each sample represents a single conversation and it's corresponding annotations: \r\n* `sample_id`: identifier for the sample\r\n* `conversation`: \r\n    * List of sessions (`session_\u003cnum\u003e`) and their timestamps (`session_\u003cnum\u003e_date_time`). The numbers `\u003cnum\u003e` represent the chronological order of the sessions. * It also includes names of the two speakers i.e., `speaker_a` and `speaker_b`. \r\n    * A *turn* within each *session* contains the name of the `speaker`, the dialog id `dia_id`, and content of the dialog `text`. \r\n    * If the turn contains images, it also includes a link to the image `img_url`, caption generated by the [BLIP](https://huggingface.co/Salesforce/blip-image-captioning-large) model for the image `blip_caption` and the search query used by the third party module [icrawler](https://icrawler.readthedocs.io/en/latest/) to retrieve the image.\r\n* `observation` (generated): Observations for each of the sessions in `conversation` (`session_\u003cnum\u003e_observation`). See below for the code to regenerate observations. These observations are used as one of the databases for evaluating retrieval-augmented generation i.e., RAG models in our paper.\r\n* `session_summary` (generated): Session-level summaries for each session in `conversation` (`session_\u003cnum\u003e_summary`). See below for the code to regenerate session-level summaries. These summaries are also used as one of the databases for evaluating RAG models in our paper.\r\n* `event_summary` (annotated): List of significant events for each speaker within each session in `conversation` (`events_session_\u003cnum\u003e`). These are the ground truth annotations for the event summarization task in the LoCoMo dataset.\r\n* `qa` (annotated): Question-answer annotations for the question answering task in the LoCoMo dataset. Each sample contains `question`, `answer`, `category` label and a list of dialog ids that contain the answer i.e., `evidence`, when available.\r\n\r\n\r\n**Note 1**: This release is a subset of the conversations released previously with our first Arxiv version in March 2024. The initial release contained 50 conversations. We sampled a subset of the data to retain the longest conversations with high-quality annotations and for cost-effective evaluation of closed-source LLMs.\r\n\r\n**Note 2**: We do not release the images. However, the web URLs, captions and search queries for the images are included in the dataset.\r\n\r\n\r\n## Code\r\n\r\nConfiguration variables like API keys, output directories etc. are set in `scripts/env.sh` and run at the beginning of all other scripts.\r\n\r\n### Generate *very* long-term conversations between two LLM-agents with pre-assigned personalities using our LLM-based generative framework\r\nThe code to generate conversations is available in `scripts/generate_conversations.sh` and can be run as follows:\r\n```\r\nbash scripts/generate_conversations.sh\r\n```\r\n\r\nThis code can be run under two settings:\r\n1. Generate conversations between agents assigned with custom personas. To enable this setting, point `--out-dir` to a directory containing the files `agent_a.json` and `agent_b.json`. These files should contain the `name` and `persona_summary` of the speaker represented by the agent. See an example at `data/multimodal_dialog/example`.\r\n\r\n```\r\n{\r\n  \"name\": \"Angela\",\r\n  \"persona_summary\": \"Angela is a 31 year old woman who works as the manager of a gift shop in Chapel Hill. She curates interesting pieces from local artists and has maintained a beautiful gallery in the form of the gift shop. She also makes her own art sometimes, in the form of oil paintings.\"\r\n}\r\n```\r\n\r\n2. Create personalities using prompts from the MSC dataset. To enable this setting, point `--out-dir` to an empty directory. This will make the script sample a pair of personalities from `data/msc_personas_all.json`.\r\n\r\nSee `scripts/generate_conversations.py` for details on the various parameters that can be tweaked for generating the conversations. For example, `--num-days` can be changed to specify the temporal span of the conversations.\r\n\r\n### Evaluate open-source and closed-source LLMs on the LoCoMo Question Answering Task with the (truncated) conversation as context\r\n\r\n* Evaluate OpenAI models\r\n```\r\nbash scripts/evaluate_gpts.sh\r\n```\r\n\r\n* Evaluate Anthropic models\r\n```\r\nbash scripts/evaluate_claude.sh\r\n```\r\n\r\n* Evaluate Gemini models\r\n```\r\nbash scripts/evaluate_gemini.sh\r\n```\r\n\r\n* Evaluate models available on Huggingface\r\n```\r\nbash scripts/evaluate_hf_llm.sh\r\n```\r\n\r\n### Generate observations and session summaries from LoCoMo conversations using `gpt-3.5-turbo` for evaluating RAG-based models\r\nWe provide the observations and summaries with our release of the LoCoMo dataset. Follow these instructions to re-generate the same or for a different set of conversations.\r\n\r\n* Generate observations from all sessions:\r\n```\r\nbash scripts/generate_observations.sh\r\n```\r\n\r\n* Generate summary of each session:\r\n```\r\nbash scripts/generate_session_summaries.sh\r\n```\r\n**Note 3**: Session-summaries are different from the event summaries of the event summarization task. The former summairze only a single session whereas event summaries are specific to each speaker and contain causal, temporal connections across sessions.\r\n\r\n\r\n### Evaluate retrieval-augmented `gpt-3.5-turbo` on the LoCoMo question-answering task using (a) dialogs, (b) observations and (c) session summaries as databases.\r\n* Evaluate `gpt-3.5-turbo` using retrieval-based augmentation\r\n```\r\nbash scripts/evaluate_rag_gpts.sh\r\n```\r\n\r\n### Evaluate models on the event summarization task\r\n\r\nComing soon!\r\n\r\n### Train and evaluate `MiniGPT-5` models on the multimodal dialog generation task\r\n\r\nComing soon!\r\n\r\n\r\n### Reference\r\nPlease cite our paper if you use LoCoMo in your works:\r\n```bibtex\r\n\r\n@article{maharana2024evaluating,\r\n  title={Evaluating very long-term conversational memory of llm agents},\r\n  author={Maharana, Adyasha and Lee, Dong-Ho and Tulyakov, Sergey and Bansal, Mohit and Barbieri, Francesco and Fang, Yuwei},\r\n  journal={arXiv preprint arXiv:2402.17753},\r\n  year={2024}\r\n}\r\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsnap-research%2Flocomo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsnap-research%2Flocomo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsnap-research%2Flocomo/lists"}