{"id":13646186,"url":"https://github.com/thu-keg/chatlog","last_synced_at":"2025-05-13T17:59:11.587Z","repository":{"id":157777964,"uuid":"628623804","full_name":"THU-KEG/ChatLog","owner":"THU-KEG","description":"⏳ ChatLog: Recording and Analysing ChatGPT Across Time","archived":false,"fork":false,"pushed_at":"2024-05-30T02:29:44.000Z","size":6472,"stargazers_count":97,"open_issues_count":0,"forks_count":3,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-04-21T17:43:24.067Z","etag":null,"topics":["chatgpt","detection","evaluation","feature-extraction","knowledge","linguistic-analysis","time-series-analysis"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2304.14106","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/THU-KEG.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-04-16T14:30:53.000Z","updated_at":"2025-04-09T06:44:04.000Z","dependencies_parsed_at":"2024-01-14T10:12:12.655Z","dependency_job_id":"2e571e20-30eb-4fd7-91fa-9e887ffeb575","html_url":"https://github.com/THU-KEG/ChatLog","commit_stats":{"total_commits":22,"total_committers":2,"mean_commits":11.0,"dds":"0.045454545454545414","last_synced_commit":"aaeeb3c89c630d43b0a315e9b786a91e5e26abd5"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/THU-KEG%2FChatLog","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/THU-KEG%2FChatLog/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/THU-KEG%2FChatLog/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/THU-KEG%2FChatLog/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/THU-KEG","download_url":"https://codeload.github.com/THU-KEG/ChatLog/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254000119,"owners_count":21997389,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chatgpt","detection","evaluation","feature-extraction","knowledge","linguistic-analysis","time-series-analysis"],"created_at":"2024-08-02T01:02:50.184Z","updated_at":"2025-05-13T17:59:11.561Z","avatar_url":"https://github.com/THU-KEG.png","language":"Jupyter Notebook","funding_links":[],"categories":["Others"],"sub_categories":[],"readme":"# ⏳ ChatLog: Recording and Analysing ChatGPT Across Time\n\n# Overview\nThis repository stores data and code for the paper `ChatLog: Recording and Analysing ChatGPT Across Time` [[abs](https://arxiv.org/abs/2304.14106)][[pdf](https://arxiv.org/pdf/2304.14106.pdf)].\n\nWhile there are abundant researches about evaluating ChatGPT on natural language understanding and generation tasks, few studies have investigated how ChatGPT's behavior changes over time. In this paper, we collect a coarse-to-fine temporal dataset called ChatLog, consisting of two parts that update monthly and daily: **ChatLog-Monthly** is a dataset of **38,730** question-answer pairs collected every month including questions from both the reasoning and classification tasks. **ChatLog-Daily**, on the other hand, consists of ChatGPT's responses to **1000** identical questions for long-form generation **every day**. We conduct comprehensive automatic and human evaluation to provide the evidence for the existence of ChatGPT evolving patterns. We further analyze the unchanged characteristics of ChatGPT over time by extracting its knowledge and linguistic features. We find some stable features to improve the robustness of a RoBERTa-based detector on new versions of ChatGPT. We will continuously maintain our project on GitHub.\n\n![](./config/model_system_v3.png)\n\n\n\n# Data\n\nWe release our data at [tsinghua cloud](https://cloud.tsinghua.edu.cn/d/733684efbec84cbb8c52/).\n\nIf you have any questions about the data, please raise a issue or contact us by email: tsq22@mails.tsinghua.edu.cn\n\nNow the category is as following, you can download them by clicking the link:\n\n- ChatLog-Monthly\n  -  [202303.zip](https://cloud.tsinghua.edu.cn/d/733684efbec84cbb8c52/files/?p=%2FChatLog-Monthly%2F202303.zip\u0026dl=1)\n  -  [202304.zip](https://cloud.tsinghua.edu.cn/d/733684efbec84cbb8c52/files/?p=%2FChatLog-Monthly%2F202304.zip\u0026dl=1)\n  -  [202305.zip](https://cloud.tsinghua.edu.cn/f/710809ac4cfd44119c93/?dl=1)\n  -  [202306.zip](https://cloud.tsinghua.edu.cn/f/f4cc4bc1499a45419dea/?dl=1)\n  -  [202307.zip](https://cloud.tsinghua.edu.cn/f/a18838f8a91b412d8160/?dl=1)\n  -  [202308.zip](https://cloud.tsinghua.edu.cn/f/e97126e262cd4da682f8/?dl=1)\n  -  [202309.zip](https://cloud.tsinghua.edu.cn/f/a1b26fd7e7794e838b2e/?dl=1)\n  -  [202310.zip](https://cloud.tsinghua.edu.cn/f/5afe93757bbc497fbc5d/?dl=1)\n  -  [202311.zip](https://cloud.tsinghua.edu.cn/f/c6550c0091df40d38b9d/?dl=1)\n  -  [202312.zip](https://cloud.tsinghua.edu.cn/f/51e66dcaedaf46938c56/?dl=1)\n  -  [202401.zip](https://cloud.tsinghua.edu.cn/f/14378f3cf6e94ec0a7b5/?dl=1)\n  -  [202402.zip](https://cloud.tsinghua.edu.cn/f/75c70d3af1034da5ac57/?dl=1)\n  -  [202403.zip](https://cloud.tsinghua.edu.cn/f/beb370538a284eb6a73e/?dl=1)\n  -  [202404.zip](https://cloud.tsinghua.edu.cn/f/1f356f81b3e144208dd1/?dl=1)\n- ChatLog-Daily\n  - api\n    - [everyday_20230305-20230409.zip](https://cloud.tsinghua.edu.cn/d/733684efbec84cbb8c52/files/?p=%2FChatLog-Daily%2Fapi%2Feveryday_20230305-20230409.zip\u0026dl=1)\n    - [everyday_20230410-20230508.zip](https://cloud.tsinghua.edu.cn/d/733684efbec84cbb8c52/files/?p=%2FChatLog-Daily%2Fapi%2Feveryday_20230410-20230508.zip\u0026dl=1)\n    - [everyday_20230509-20230610.zip](https://cloud.tsinghua.edu.cn/f/eb0a3890bbcb4d46856d/?dl=1)\n    - [everyday_20230611-20230708.zip](https://cloud.tsinghua.edu.cn/f/2fa0415b3f0b4bc993af/?dl=1)\n    - [everyday_20230709-20230813.zip](https://cloud.tsinghua.edu.cn/f/80fecb1194014790b82e/?dl=1)\n    - [everyday_20230814-20230831.zip](https://cloud.tsinghua.edu.cn/f/c3a0ddeee8b14adab26d/?dl=1)\n    - [everyday_20230901-20230930.zip](https://cloud.tsinghua.edu.cn/f/215dde0578884aaa8867/?dl=1)\n    - [everyday_20231001-20231113.zip](https://cloud.tsinghua.edu.cn/f/e8f388c48a004c34a6aa/?dl=1)\n    - [everyday_20231114-20231228.zip](https://cloud.tsinghua.edu.cn/f/0cf04e4ac3dd4f87a03d/?dl=1)\n    - [everyday_20240109-20240209.zip](https://cloud.tsinghua.edu.cn/f/126faff2d49a4ce0b25e/?dl=1)\n    - [everyday_20240210-20240313.zip](https://cloud.tsinghua.edu.cn/f/316b4c8ae20f46dca015/?dl=1)\n    - [everyday_20240314-20240422.zip](https://cloud.tsinghua.edu.cn/f/1ed9cd3bf01d4eb4a688/?dl=1)\n    - [everyday_20240423-20240527.zip](https://cloud.tsinghua.edu.cn/f/6ee2f807e59f4b52a803/?dl=1)\n  - open\n    - [before0301.zip](https://cloud.tsinghua.edu.cn/d/733684efbec84cbb8c52/files/?p=%2FChatLog-Daily%2Fopen%2Fbefore0301.zip\u0026dl=1)\n  - processed_csv\n    - [avg_HC3_all_pearson_corr_feats.csv](https://cloud.tsinghua.edu.cn/d/733684efbec84cbb8c52/files/?p=%2FChatLog-Daily%2Fprocessed_csv%2Favg_HC3_all_pearson_corr_feats.csv\u0026dl=1)\n    - [avg_HC3_knowledge_pearson_corr_feats.csv](https://cloud.tsinghua.edu.cn/d/733684efbec84cbb8c52/files/?p=%2FChatLog-Daily%2Fprocessed_csv%2Favg_HC3_knowledge_pearson_corr_feats.csv\u0026dl=1)\n\nEvery `zip` file contains some `jsonl` files and each json object is as the format:\n\n| column name:  | id       | source_type                                      | source_dataset                    | source_task                                    | q                                                          | a                    | language         | chat_date                       | time                                               |\n| ------------- | -------- | ------------------------------------------------ | --------------------------------- | ---------------------------------------------- | ---------------------------------------------------------- | -------------------- | ---------------- | ------------------------------- | -------------------------------------------------- |\n| introduction: | id       | type of the source: from open-access dataset/api | dataset of the question come from | specific task name，such as sentiment analysis | question                                                   | response of  ChatGPT | language         | The time that ChatGPT responses | The time that the data is stored into our database |\n| example       | 'id': 60 | 'source_type': 'open'                            | 'source_dataset': 'ChatTrans'     | 'source_task': 'translation'                   | 'q': 'translate this sentence into Chinese: Good morning', | 'a': '早上好',       | 'language': 'zh' | 'chat_date': '2023-03-03',      | 'time': '2023-03-04 09:58:09',                     |\n\nThe ChatLog-Monthly and ChatLog-Daily will be continuously updated.\n\n# Analysis Code\n\nFor processsing data from 20230305 to 20230409, please use v1 version's shells.\nFor processsing data after 20230410, please use v2 version's shells.\n\n1. For extracting all the knowledge and linguistic features, run:\n\n```\nsh shells/process_new_data_v1.sh\n```\n\n2. For analyzing features and calculating variation, run:\n\n```\nsh shells/analyse_var_and_classify_across_time_v1.sh\n```\n\n3. Use LightGBM that ensembles the features with RoBERTa to train a robust ChatGPT detector, run:\n\n```\nsh shells/lgb_train_v1.sh\n```\n\n4. For trend and correlation analysis, first dumping knowledge features into `avg_HC3_knowledge_pearson_corr_feats.csv`\n\n```\nsh shells/draw_knowledge_feats_v1.sh\n```\n\n5. Then dump other linguistic features into `avg_HC3_all_pearson_corr_feats.csv`\n\n```\nsh shells/draw_eval_corr_v1.sh\n```\n\n6. Finally, we can draw heatmaps and lineplots for trend and correlation analysis:\n\n   - Put the dumped  `avg_HC3_knowledge_pearson_corr_feats.csv` and  `avg_HC3_all_pearson_corr_feats.csv` under the `./shells` folder\n   - Then use `./shells/knowledge_analysis.ipynb` and `./shells/temporal_analysis.ipynb` to draw every figure.\n\n   \n# Citation\nIf you find our work useful, please cite:\n\n```\n@article{tu2023chatlog,\n  title={ChatLog: Recording and Analyzing ChatGPT Across Time},\n  author={Tu, Shangqing and Li, Chunyang and Yu, Jifan and Wang, Xiaozhi and Hou, Lei and Li, Juanzi},\n  journal={arXiv preprint arXiv:2304.14106},\n  year={2023}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthu-keg%2Fchatlog","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthu-keg%2Fchatlog","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthu-keg%2Fchatlog/lists"}