Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/xianshang33/llm-paper-daily

Daily updated LLM papers. 每日更新 LLM 相关的论文,欢迎订阅 👏 喜欢的话动动你的小手 🌟 一个
https://github.com/xianshang33/llm-paper-daily

agent chatgpt large-language-models llm rag

Last synced: 3 months ago
JSON representation

Daily updated LLM papers. 每日更新 LLM 相关的论文,欢迎订阅 👏 喜欢的话动动你的小手 🌟 一个

Lists

README

        

llm-paper-daily 日常论文精选


[![Status](https://img.shields.io/badge/status-Update_04.19_13:44-success.svg)]() [![简体中文 badge](https://img.shields.io/badge/%E7%AE%80%E4%BD%93%E4%B8%AD%E6%96%87-Simplified%20Chinese-blue)](./README.md) [![English badge](https://img.shields.io/badge/%E8%8B%B1%E6%96%87-English-blue)](./README_en.md)

欢迎来到 **llm-paper-daily**! 这是一个获取最新研究论文的每日更新和分类的平台。希望为爱好者提供 LLM 研究的前沿资讯,让您更轻松地了解该领域的最新发展。

📚 **每日更新:** 仓库每天会带来最新的 LLM 研究,并附有arxiv地址、相关 git 仓库和基于 GPT-4 的简单总结

💐 **分类摘要:** 将每篇论文分类到如推理、代理、检索、应用、预训练与指令微调等不同部分,帮助您能轻松导航并发现相关的研究

🌈 **交流学习:** 最近准备拉一个讨论小组方便大家交流和互相学习。
欢迎对大模型落地、论文等等方面有兴趣的小伙伴加入🙌

/

## 目录
- [最新论文(含总结)](#最新论文)
- [分类](#分类)
- [💡 Reasoning](CATEGORIES.md#Reasoning)
- [🤖 Agent](CATEGORIES.md#Agent)
- [🦉 Knowledge and Retrieval](CATEGORIES.md#Knowledge-and-Retrieval)
- [👩‍🏫 Alignment and Hallucination](CATEGORIES.md#Alignment-and-Hallucination)
- [🎨 Application](CATEGORIES.md#Application)
- [📐 Pre-training and Instruction Fine-tuning](CATEGORIES.md#Pre-training-and-Instruction-Fine-tuning)
- [📄 Survey](CATEGORIES.md#Survey)

查看更新文章   更新时间: 04月19日 13:44

- Advancing the Robustness of Large Language Models through Self-Denoised Smoothing
- Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences
- Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
- EVIT: Event-Oriented Instruction Tuning for Event Reasoning
- Generating Diverse Criteria On-the-Fly to Improve Point-wise LLM Rankers

## 最新论文
### 04月

|  Date   | Paper | Links & Summary |
| --- | --- | --- |
| 04-18 | **Generating Diverse Criteria On-the-Fly to Improve Point-wise LLM Rankers**
机构: Westlake University, Alibaba Group, Zhejiang University
该论文提出了MCRanker模型,通过构建虚拟专业评注团队和生成多角度评估标准,有效提升了LLM排序器的一致性与全面性,可广泛适应于各类数据集,改进了排序性能。
|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.11960v1)
[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.1196.md) |
| 04-18 | **EVIT: Event-Oriented Instruction Tuning for Event Reasoning**
机构: Key Laboratory of High Confidence Software Technologies (PKU), MOE, China, School of Computer Science, Peking University, Advanced Institute of Big Data
EVIT通过提出面向事件的指令调谐(Event-Oriented Instruction Tuning)和事件四元组的概念,解决了现有小型基于指令调谐模型在事件推理任务中的表现不足问题。实验结果表明,EVIT在事件推理任务上的表现优于其他模型。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.11978v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.11978.md) |
| 04-18 | **Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing**
该论文介绍了一种名为ALPHALLM的新型框架,通过蒙特卡洛树搜索(MCTS)和大型语言模型(LLMs)的结合,实现了LLMs的自我提高,无需额外的注解数据。|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.12253v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.12253.md) |
| 04-18 | **Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences**
机构: UC Berkeley
这篇论文提出了一个与人类偏好相一致的LLM辅助评估界面EvalGen,通过混合主动式方法解决了LLM生成的评估功能评估质量受信任度的问题。论文还探讨了用户如何定义和使用评估标准的动态性,以及在实际应用中所面临的挑战。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.12272v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.12272.md) |
| 04-17 | **Many-Shot In-Context Learning**
机构: Google DeepMind
本论文主要贡献包括系统评估LLM在不同规模上下文样例的性能,导入reinforced ICL和unsupervised ICL以减少样例依赖,并发现MS-ICL可以克服预训练偏差学习高维数值预测任务。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.11018v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.11018.md) |
| 04-17 | **Unifying Bias and Unfairness in Information Retrieval: A Survey of Challenges and Opportunities with Large Language Models**
机构: Renmin University of China, Chinese Academy of Sciences, Huawei Technologies
这篇综述文章提供了一个新颖的视角来理解LLMs和IR系统中的偏见和不公平为分布失配问题,并归类了各种缓解策略。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.11457v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.11457.md)
[![GitHub](https://img.shields.io/badge/GitHub-View-brightgreen?logo=github)](https://github.com/KID-22/LLM-IR-Bias-Fairness-Survey)
|
| 04-17 | **AgentKit: Flow Engineering with Graphs, not Coding**
机构: Carnegie Mellon University, NVIDIA, Microsoft
论文引入了一种新型的 LLM 提示框架 AgentKit,针对多功能代理问题,通过模块化组件和直观设计支持构建和微调复杂的代理思维过程。AgentKit 显示出实现先进代理能力和降低用户参与门槛的潜力。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.11483v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.11483.md)
[![GitHub](https://img.shields.io/badge/GitHub-View-brightgreen?logo=github)](https://github.com/holmeswww/AgentKit)
|
| 04-17 | **A Deep Dive into Large Language Models for Automated Bug Localization and Repair**
机构: University of Virginia, Purdue University, Amazon Web Services
这篇论文提出了一种名为Toggle的新方法,该方法使用token粒度的bug定位并修复,克服了现有行粒度方法的局限,通过输入设计和LLMs的微调,大幅提升了错误修复的准确性,并在多个数据集上取得优异的表现,为APR领域带来新的进展。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.11595v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.11595.md) |
| 04-16 | **CoTAR: Chain-of-Thought Attribution Reasoning with Multi-level Granularity**
机构: Intel Labs
本文提出的CoTAR方法针对LLMs在问答任务中倾向于生成不准确归因的问题。通过在输出生成前进行推理,并在不同的归因粒度级别上引导模型,显著提升了模型在答案质量和归因精确度上的表现。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.10513v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.10513.md) |
| 04-16 | **Self-playing Adversarial Language Game Enhances LLM Reasoning**
机构: Tencent AI Lab
本论文提出了一个名为SPAG的新型训练方案,通过自我对抗性语言游戏的自我播放,有效提升了LLMs的推理能力,并且其改进是可以通过迭代过程持续增强的。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.10642v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.10642.md)
[![GitHub](https://img.shields.io/badge/GitHub-View-brightgreen?logo=github)](https://github.com/Linear95/SPAG)
|
| 04-16 | **How faithful are RAG models? Quantifying the tug-of-war between RAG and LLMs' internal prior**
机构: Stanford University
论文通过分析在RAG环境下LLMs内部知识与检索信息之间的张力,发现了LLMs倾向于遵循RAG信息的程度与模型在无上下文情况下的回答信心成反比。研究基于跨超过1200个问题的六个领域数据集,揭示了在模型的预训练知识与检索到的信息之间的固有冲突。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.10198v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.10198.md) |
| 04-15 | **Learn Your Reference Model for Real Good Alignment**
机构: Tinkoff
本论文提出了一个名为Trust Region DPO (TR-DPO) 的新方法,该方法通过交互式地更新参考策略的参数,显著改进了语言模型的对齐问题。实验结果显示,TR-DPO在两个数据集上均优于DPO方法,有效提升了模型的多参数性能。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.09656v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.09656.md) |
| 04-15 | **Compression Represents Intelligence Linearly**
机构: The Hong Kong University of Science and Technology, Tencent
这篇论文通过实证研究,证明了LLMs在下游任务性能与它们的压缩效率之间存在着几乎线性的相关性,为“更好的压缩能力表明了更高的智能”这一长期信念提供了支持。同时,提出了使用压缩效率作为评估LLMs性能的无监督度量标准的建议。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.09937v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.09937.md) |
| 04-14 | **Emerging Platforms Meet Emerging LLMs: A Year-Long Journey of Top-Down Development**
本论文的研究主要集中在如何支持和优化新兴计算平台下的机器学习模型部署,并提出了一个框架TAPML,旨在通过顶层方法和通用运行时环境促进模型部署的广泛性、便利性和强大性,文中提供了实际部署案例作为发展ML系统的深入见解和最佳实践。|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.09151v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.09151.md) |
| 04-13 | **Intuition-aware Mixture-of-Rank-1-Experts for Parameter Efficient Finetuning**
机构: Nanjing University, University of California
论文提出了一个新的用于大型语言模型多任务微调的框架Intuition-MoR1E,该框架借鉴人类认知神经科学原理,并利用排名1专家形式来管理直觉,显著提高了参数效率和多任务微调效果。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.08985v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.08985.md) |
| 04-12 | **Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length**
机构: AI at Meta, University of Southern California, Carnegie Mellon University
这篇论文介绍了MEGALODON,一个高效处理无限上下文长度序列的神经网络架构。通过引入多项创新技术,MEGALODON在长序列模型任务中显示出比Transformer更高的效率和效能,同时在不同规模和模态的基准测试中都取得了稳健的改进。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.08801v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.08801.md)
[![GitHub](https://img.shields.io/badge/GitHub-View-brightgreen?logo=github)](https://github.com/XuezheMax/megalodon)
|
| 04-11 | **Decomposing Label Space, Format and Discrimination: Rethinking How LLMs Respond and Solve Tasks via In-Context Learning**
机构: Nanyang Technological University
本文研究了ICL在提升任务性能方面的生效机制,通过分解ICL的贡献因素,发现ICL通过精细调整标签空间和格式来显著提升性能,同时强调了选择合适演示示例的重要性。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.07546v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.07546.md) |
| 04-11 | **Interactive Prompt Debugging with Sequence Salience**
这篇论文提出了一个名为序列显著性(Sequence Salience)的系统,它扩展了现有的输入显著性(IS)方法,以支持复杂的LLM提示调试。该工具提供实时交互式调试,并降低了实践者的认知负荷,支持根据显著性结果快速迭代提示,与开发者的思维模型更加对齐。|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.07498v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.07498.md) |
| 04-11 | **ChatGPT Can Predict the Future when it Tells Stories Set in the Future About the Past**
机构: Baylor University
本研究通过分析ChatGPT-3.5和ChatGPT-4的预测能力,揭示了LLMs在推理方面的新潜力。研究证明了“未来叙事”提示能够显著提升预测的准确性,为LLMs在分析环境中的潜在应用提供了有益见解。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.07396v2)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.07396.md) |
| 04-11 | **Rho-1: Not All Tokens Are What You Need**
机构: Xiamen University, Tsinghua University, Microsoft
本文提出了RHO-1,这是一种利用选择性语言建模(SLM)的新型语言模型。该模型在预训练中专注于对有用的令牌进行训练,这种方法在数学领域的连续预训练中显示出卓越性能,能够更快地达到基线性能,并且在少量令牌的情况下达到最新的状态。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.07965v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.07965.md) |
| 04-11 | **ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback**
机构: University of Central Florida, ByteDance Inc
ControlNet++通过优化生成图像与条件控制之间的像素级一致性,并通过高效的奖励微调策略减少了与图像采样相关的时间和内存成本,显著改善了在多种条件控制下的可控性。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.07987v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.07987.md) |
| 04-11 | **OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments**
机构: The University of Hong Kong, CMU, Salesforce Research
OSWORLD提供了一个新的评估环境,解决了现有基准测试的局限性,为开发能在真实计算机环境中完成开放式任务的多模态代理提供了基础。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.07972v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.07972.md) |
| 04-10 | **Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention**
机构: Google
该研究提出一种全新的注意力机制Infini-attention,它通过将压缩记忆与标准的点积注意力相结合,并在设计上支持插拔式的持续预训练和长上下文调整,使得LLMs能以有界的内存和计算资源处理无限长的上下文。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.07143v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.07143.md) |
| 04-10 | **Transferable and Efficient Non-Factual Content Detection via Probe Training with Offline Consistency Checking**
机构: Renmin University of China, Tsinghua University
本论文提出了一种通过离线自我一致性检查训练探测模型的新方法PINOS,有效地解决了现有真实性检测方法的限制。PINOS提高了过程的转移能力和效率,并且在真实性检测和问答基准测试上取得了超越现有方法的结果。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.06742v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.06742.md) |
| 04-10 | **"We Need Structured Output": Towards User-centered Constraints on Large Language Model Output**
机构: Google Research
本论文探索如何为大型语言模型(LLM)输出实现用户中心的约束,通过调查行业专业人士来了解不同场景和需求。重点是提高开发者在开发、测试和整合LLM过程中的效率,并通过满足特定的输出格式和用户界面要求来增强最终用户的体验。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.07362v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.07362.md) |
| 04-10 | **Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation**
机构: Apple, Cupertino, CA, USA
该论文提出了一种新的检索增强生成(RAG)提示方法——“超级叠加提示”,用于处理大型语言模型处理长文本时遇到的问题,并在没有额外训练或微调的情况下显著提高了时间效率和准确性。这一方法在众多预训练模型上得到验证,并且作者计划发布一个开源代码实现。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.06910v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.0691.md) |
| 04-09 | **Privacy Preserving Prompt Engineering: A Survey**
机构: University of Arkansas
这篇调研论文为了在使用LLMs进行ICL和一般提示的过程中保护隐私,提供了一个关于在这一范畴下的隐私保护方法的系统性概述,有利于推动社区在隐私保护方面的进一步研究和探索。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.06001v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.06001.md) |
| 04-09 | **Event-enhanced Retrieval in Real-time Search**
机构: Tencent Search, Platform and Content Group
EER是一种新型方法,针对实时搜索中的“语义漂移”问题,通过改进EBR模型和加入对比学习及事件三元组生成任务提升检索性能。该方法通过实验验证了其有效性,并可能为信息检索领域提供新的视角。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.05989v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.05989.md)
[![GitHub](https://img.shields.io/badge/GitHub-View-brightgreen?logo=github)](https://github.com/open-event-hub/Event-enhanced_Retrieval)
|
| 04-09 | **THOUGHTSCULPT: Reasoning with Intermediate Revision and Search**
机构: UC Berkeley
THOUGHTSCULPT作为一个基于图的框架,通过内嵌的自我修正机制,能够让LLMs在生成新的思维节点的同时迭代地改进之前的输出,特别在需要持续修正和修改的任务中表现出卓越的能力。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.05966v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.05966.md) |
| 04-09 | **RULER: What's the Real Context Size of Your Long-Context Language Models?**
机构: NVIDIA
本论文为长上下文LMs提出了新的评估工具RULER,并开源,用于测试LMs在复杂任务和长上下文理解能力上的表现,并在各种模型和任务复杂度上进行了分析,推动了长上下文LMs的未来研究。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.06654v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.06654.md) |
| 04-08 | **LLM-Augmented Retrieval: Enhancing Retrieval Models Through Language Models and Doc-Level Embedding**
机构: Meta
该论文成功提出并验证了一个增强型文档级嵌入的LLM-augmented检索框架,不仅通过生成合成的相关查询和标题增加了文档嵌入的上下文信息,还改进了检索模型训练的关键步骤,从而提升检索模型的性能和鲁棒性。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.05825v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.05825.md) |
| 04-08 | **Evaluating Interventional Reasoning Capabilities of Large Language Models**
机构: Université de Montréal, Google DeepMind, ServiceNow Research
本文对大型语言模型(LLMs)因果推理能力进行了评估。通过提出干预效果预测,它主要测试LLMs在干预实验后如何更新自己对事实的理解。结果显示GPT-4在某些条件下能够准确预测干预效果,但提示设计的微小变化会显著影响其表现。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.05545v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.05545.md) |
| 04-08 | **Know When To Stop: A Study of Semantic Drift in Text Generation**
机构: FAIR, Meta, Anthropic
本文为理解和测量语言模型在长文本生成中的语义漂移现象提供了工具。通过早停和重采样-重新排序等方法,显著提高了事实准确性,并为如何平衡信息量与事实准确性提供了可能的解决策略。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.05411v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.05411.md) |
| 04-08 | **LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding**
机构: Alibaba Group, Zhejiang University
该论文成功提出了LayoutLLM模型及其布局指导的调整策略,显著提高了模型对文档布局信息的理解和利用,尤其在零样本文档理解任务上表现出了卓越的效果。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.05225v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.05225.md)
[![GitHub](https://img.shields.io/badge/GitHub-View-brightgreen?logo=github)](https://github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/DocumentUnderstanding)
|
| 04-07 | **Prompting Large Language Models for Zero-shot Essay Scoring via Multi-trait Specialization**
机构: Peking University
该研究提出了一个零样本的大型语言模型作文评分框架(MTS),通过多轮对话来为作文的不同写作特质打分,并采用最小-最大缩放和异常值截断机制来得到最终得分。MTS在准确度上显著优于直接提示评分方法,并在小型化部署中优于ChatGPT,提供了监督学习之外的零样本作文评分方案。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.04941v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.04941.md) |
| 04-07 | **Radial Networks: Dynamic Layer Routing for High-Performance Large Language Models**
机构: Cornell University
本论文提出了径向网络,这是一种新型神经网络结构,通过动态层稀疏性和一个经过训练的路由模块来实现令牌级的层间路由。这不仅提高了模型的性能,还显著降低了计算和服务成本,为大型语言模型的进一步扩展提供了可能。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.04900v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.049.md) |
| 04-04 | **AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent**
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.03648v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.03648.md)
[![GitHub](https://img.shields.io/badge/GitHub-View-brightgreen?logo=github)](https://github.com/THUDM/AutoWebGLM)
|
| 04-04 | **ReFT: Representation Finetuning for Language Models**
机构: Stanford University, Pr(Ai)2R Group
这篇论文介绍了一种新的语言模型微调方法LoReFT,它在资源效率和模型控制能力方面显著优于现有的参数有效调整(PEFTs)方法。实验表明,该方法在多个NLP领域的任务上实现了新的最佳性能,同时保持了较少的参数需求和较高的可解释性。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.03592v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.03592.md)
[![GitHub](https://img.shields.io/badge/GitHub-View-brightgreen?logo=github)](https://github.com/stanfordnlp/pyreft)
|
| 04-04 | **Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences**
机构: Microsoft Research
这篇论文介绍了DNO——一种能够将对比学习的简洁性与从优化一般性偏好而来的理论普适性相结合的算法。DNO在后训练大型语言模型方面显著提升性能,它的成功实证了通过优化一般偏好来指导模型学习与人类价值观保持一致是可能的。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.03715v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.03715.md) |
| 04-03 | **PromptRPA: Generating Robotic Process Automation on Smartphones from Textual Prompts**
机构: Shanghai Jiao Tong University, CMU
文章介绍了PromptRPA系统,这是一个解决RPA在移动设备上应用受限的有效方案。通过利用多代理框架和在线教程,该系统能够解释各种文本提示,解决大范围的RPA任务。性能评估显示成功率显著提高,证明了文本驱动控制在RPA领域的可行性,并开辟了功能增强和适用性扩展的新方向。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.02475v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.02475.md) |
| 04-02 | **Long-context LLMs Struggle with Long In-context Learning**
机构: University of Waterloo, Carnegie Mellon University
这项研究为评估大型语言模型处理长上下文任务的能力提供了一个新的基准——LongICLBench,并显示了随着任务难度增加,LLMs的性能普遍下降,并且模型的长上下文学习能力受到提示中标签位置分布的影响。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.02060v2)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.0206.md) |
| 04-02 | **Advancing LLM Reasoning Generalists with Preference Trees**
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.02078v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.02078.md) |
| 04-02 | **Octopus v2: On-device language model for super agent**
机构: Stanford University
这篇论文解决了边缘设备上LLM的部署和功能调用效率问题,通过引入特殊的训练方法和减少推理时需处理的上下文量,显著提高了在设备上进行函数调用的准确率和降低了延迟,实验结果表明其对提升函数调用任务的性能具有显著影响。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.01744v3)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.01744.md) |
| 04-02 | **LLM-ABR: Designing Adaptive Bitrate Algorithms via Large Language Models**
机构: Microsoft
论文探讨了大型语言模型(LLMs)如何辅助设计自适应比特率(ABR)算法,通过生成多样化的候选算法,并运用早停机制在网络模拟器中进行测试,从而有效地筛选出最有效的算法设计。评估显示在特定网络场景中,利用LLMs可以显著提高ABR算法的性能。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.01617v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.01617.md) |
| 04-02 | **Long-context LLMs Struggle with Long In-context Learning**
机构: University of Waterloo, Carnegie Mellon University
这篇文章提出了一个新的评估基准,LongICLBench,用于评估LLMs在处理长输入任务时的性能,以及LLMs对输入序列中实例位置的敏感性。这一工作有助于更好地理解和改进大型语言模型在长文本处理方面的能力。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.02060v2)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.0206.md) |
| 04-02 | **CMAT: A Multi-Agent Collaboration Tuning Framework for Enhancing Small Language Models**
机构: East China Jiaotong University, Guangdong University of Technology, University of Toronto
该论文的核心贡献是提出了CMAT框架,这是一种创新方法,可实现多智能体系统内部的动态、实时记忆更新,并设计了一种新型的角色扮演机制,用于精准的任务分配和提升代理间的通信,以此显著提高整体性能和合作效率。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.01663v2)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.01663.md) |
| 04-01 | **AIOps Solutions for Incident Management: Technical Guidelines and A Comprehensive Literature Review**
机构: University of Lyon, INSA Lyon, Infologic
本文提供了 AIOps 领域中事件管理的全面文献回顾,旨在通过提供结构化的知识、确定知识空白和为该领域的未来发展奠定基础。论文建立了 AIOps 的统一术语和分类法,揭示了现有的挑战,并提供了公开数据集,为未来的研究提供了方向和基础。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.01363v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.01363.md) |
| 04-01 | **Mapping the Increasing Use of LLMs in Scientific Papers**
机构: Stanford University, UC Santa Barbara
本文是首次进行的,跨arXiv、bioRxiv和Nature组合上发表的文章的系统性、大规模分析,采用的统计估计方法可以在群体层面上测量LLM修改内容的普及程度,为理解LLM在科学写作中的应用提供了宝贵的洞察。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.01268v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.01268.md) |
| 04-01 | **Prompt-prompted Mixture of Experts for Efficient LLM Generation**
机构: CMU
GRIFFIN是一个不需要训练的MoE系统,利用LLMs前馈块内的flocking现象在不同的激活函数下提高模型效率,保持性能的同时减少了计算成本。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.01365v2)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.01365.md)
[![GitHub](https://img.shields.io/badge/GitHub-View-brightgreen?logo=github)](https://github.com/hdong920/GRIFFIN)
|
| 04-01 | **Efficiently Distilling LLMs for Edge Applications**
机构: IBM Research
本论文提供了一种新的针对边缘设备进行LLMs蒸馏的方法,允许LPFT同时显著减少模型尺寸和训练成本,尤其是优化了解码器模型的压缩抵抗和训练时长。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.01353v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.01353.md) |
| 04-01 | **LLM-RadJudge: Achieving Radiologist-Level Evaluation for X-Ray Report Generation**
机构: Microsoft Research Asia
这篇论文提出了一个基于大型语言模型的放射学报告评价新框架——LLM-RadJudge,能够有效提高放射学报告评价的临床相关性和一致性。并通过知识蒸馏技术实现了小型模型的开发,既降低了评价成本也提高了可访问性,为放射学报告生成研究和实际应用提供了有力的支撑。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.00998v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-04/2404.00998.md) |

---

### 03月

|  Date   | Paper | Links & Summary |
| --- | --- | --- |
| 03-28 | **Jamba: A Hybrid Transformer-Mamba Language Model**
机构: AI21 Labs
Jamba是基于混合Transformer-Mamba体系结构的新型大型语言模型,突破了处理长上下文的限制,并且通过应用专家混合(MoE)组件提高了模型吞吐量,同时保持了较小的内存足迹。此模型标志着在大型语言模型领域的一个新方向,并展示了高效训练与强大性能之间的可能平衡。
|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.19887v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-03/2403.19887.md) |
| 03-28 | **sDPO: Don't Use Your Data All at Once**
此论文提出了一个新的步骤化DPO(sDPO)方法,通过分步骤利用偏好数据集,并使用先前步骤中的对齐模型作为当前步骤的参考模型,有效提高了最终模型的性能与对齐度。|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.19270v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-03/2403.1927.md) |
| 03-27 | **Rejection Improves Reliability: Training LLMs to Refuse Unknown Questions Using RL from Knowledge Feedback**
这项工作通过提出RLKF框架并定义了新的模型可靠性评估指标,有效地解决了LLMs的幻觉问题,并提升了LLMs的诚实度和可靠性,显示出打造更值得信赖的AI系统的潜力。|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.18349v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-03/2403.18349.md) |
| 03-27 | **BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models**
机构: DCST Tsinghua University, Beijing Institute of Technology, Huawei Cloud BU
这项研究提出了一个新架构BLADE,可以通过小型领域特定模型增强黑盒大型语言模型,并解决了大型模型在特定领域应用中的知识不足问题。BLADE证明了其在性能和成本上都是一个有效的解决方案。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.18365v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-03/2403.18365.md) |
| 03-26 | **COIG-CQIA: Quality is All You Need for Chinese Instruction Fine-tuning**
机构: Shenzhen Institute of Advanced Technology, CAS; M-A-P; Institute of Automation, CAS
本文提出了COIG-CQIA数据集,这是一个针对中文指令调优的高质量数据集,能够促进与人类交互的对齐。研究强调了高质量数据源在模型微调中的重要性,并通过实验展示了数据集创建策略和微调方法对模型性能的显著影响。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.18058v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-03/2403.18058.md) |
| 03-26 | **The Unreasonable Ineffectiveness of the Deeper Layers**
机构: Meta FAIR, UMD
本论文针对流行的开权重预训练LLMs提出了一种简单的层剪枝策略,并展示了在删除大量层后LLMs对性能影响较小的实证研究。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.17887v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-03/2403.17887.md) |
| 03-26 | **LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning**
机构: The Hong Kong University of Science and Technology, University of Illinois Urbana-Champaign
这篇论文提出的LISA策略,通过分层权重重要性采样,实现了在保持类似于LoRA的内存效率的同时,提升了大型语言模型的微调效率和性能。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.17919v2)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-03/2403.17919.md) |
| 03-25 | **AIOS: LLM Agent Operating System**
机构: Rutgers University
AIOS作为一个LLM代理操作系统,通过设计特定的内核和模块,克服了之前资源调度和上下文管理等领域的挑战,为LLM代理的性能和效率提供了改进,为AIOS生态系统的未来发展和部署铺平了道路。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.16971v2)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-03/2403.16971.md)
[![GitHub](https://img.shields.io/badge/GitHub-View-brightgreen?logo=github)](https://github.com/agiresearch/AIOS)
|
| 03-22 | **Can large language models explore in-context?**
机构: Microsoft Research, Carnegie Mellon University
这篇论文调查了当代大型语言模型(LLMs)能否在上下文中从事探索的问题,特别是在没有训练干预的情况下。经过一系列实验,作者发现只有在特定的配置下LLMs才能稳健地进行探索。研究表明,没有适当的提示设计,即使是最先进的LLMs也可能无法在更复杂的环境中进行探索,而在这些环境中外部总结历史可能是一个非平凡的算法设计问题。这项工作提示了LLMs可能需要有针对性的算法干预才能在复杂环境中有效地工作。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.15371v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-03/2403.15371.md) |
| 03-20 | **Chain-of-Interaction: Enhancing Large Language Models for Psychiatric Behavior Understanding by Dyadic Contexts**
机构: University of Memphis, San Francisco Veterans Affairs Health Care System, University of California San Francisco
本文通过引入互动链提示方法,有效地提升了大型语言模型在理解精神病行为方面的能力,特别是在动机面谈语境下的应用。通过结构化的提示和评估方法,能够模拟专业心理治疗人员的思维过程,对模型进行了有效的域知识教育,相比传统方法取得了更好的性能。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.13786v2)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-03/2403.13786.md) |
| 03-19 | **Towards Robots That Know When They Need Help: Affordance-Based Uncertainty for Large Language Model Planners**
机构: University of Maryland
论文提出了一种新方法LAP,通过结合大型语言模型(LLMs)和场景可以供性来减少规划任务中的幻觉并实现不确定性对齐。通过在模拟和现实世界机器人操作任务的实验中表明,LAP可以显著提高成功率并减少对人类帮助的依赖,从而推动智能机器人领域的进步。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.13198v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-03/2403.13198.md) |
| 03-18 | **Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression**
机构: University of Texas at Austin, Drexel University, MIT
本文首次对经过压缩的LLMs在多个信任维度上进行了全面评估,并提供了压缩时同时考虑效率和信任度的实用建议。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.15447v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-03/2403.15447.md) |
| 03-15 | **VideoAgent: Long-form Video Understanding with Large Language Model as Agent**
机构: Stanford University
VideoAgent通过模仿人类的认知过程,在长视频理解方面迈出了重要的一步,强调了在长时间跨度内对视觉信息进行推理的重要性。此工作不仅为长视频理解设立了新的基准,也为未来该方向的研究提供了启示。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.10517v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-03/2403.10517.md) |
| 03-15 | **RAFT: Adapting Language Model to Domain Specific RAG**
机构: UC Berkeley
本论文提出的RAFT方法针对训练大型语言模型在特定领域内以“开卷”模式回答问题进行了创新,强化了模型的推理能力和对干扰文档的抵抗力,同时通过链式推理方式改进了模型生成解答的准确性。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.10131v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-03/2403.10131.md) |
| 03-15 | **Uni-SMART: Universal Science Multimodal Analysis and Research Transformer**
机构: DP Technology, AI for Science Institute Beijing
Uni-SMART 是一款创新的模型,旨在深入理解多模态科学文献,它在多个领域相对于其他顶尖文本焦点的 LLMs 显示出了更优越的性能,并有潜力改变我们与科学文献的互动方式。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.10301v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-03/2403.10301.md) |
| 03-13 | **Scaling Instructable Agents Across Many Simulated Worlds**
此论文提出的SIMA项目旨在创建一个能够在各种模拟3D环境中根据任意语言指令进行操作的AI系统。该系统的设计致力于解决在感知和体化行动中具体化语言的挑战,以及在许多不同环境中实现通用性和可扩展性。|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.10179v2)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-03/2404.10179.md) |
| 03-13 | **Call Me When Necessary: LLMs can Efficiently and Faithfully Reason over Structured Environments**
机构: Nanjing University, Microsoft
Readi框架提出了一种高效并真实地在大规模结构化环境中进行推理的方法,它充分发挥了LLMs的规划能力,并通过动态反馈优化推理路径,实现了在多跳推理任务中的显著改进。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.08593v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-03/2403.08593.md) |
| 03-13 | **Steering LLMs Towards Unbiased Responses: A Causality-Guided Debiasing Framework**
机构: ByteDance Research, University of Maryland College Park, Carnegie Mellon University
该论文成功提出了一个新的因果关系引导的去偏见框架,并通过实证研究验证了其有效性,既可以整合现有的基于提示的去偏见方法,也为诱导无偏见推理提出了新的途径。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.08743v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-03/2403.08743.md) |
| 03-12 | **Chronos: Learning the Language of Time Series**
机构: Amazon Web Services, UC San Diego, University of Freiburg
Chronos作为一个预训练的时间序列预测模型框架,在零样本和标准预测任务中表现出色。它利用了数据增强策略和公共数据集的优势,证实了时间序列预测中语言模型架构通用性的潜力,为将来的时间序列模型提供了新的研究方向。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.07815v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-03/2403.07815.md) |
| 03-11 | **RA-ISF: Learning to Answer and Understand from Retrieval Augmentation via Iterative Self-Feedback**
机构: Zhejiang University, Southeast University, Massachusetts Institute of Technology
RA-ISF是一个创新的检索增强框架,通过迭代问题分解和三个子模块的迭代处理来提高LLMs的问题解决能力,并有效降低不相关文本的干扰,显著提升知识检索的性能。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.06840v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-03/2403.0684.md) |
| 03-11 | **ERA-CoT: Improving Chain-of-Thought through Entity Relationship Analysis**
机构: Zhejiang University, Southeast University
该论文通过提出一种新的框架ERA-CoT,有效强化了大型语言模型在复杂实体场景中的推理和问题回答能力。该方法通过增强对实体关系的理解,实现了显著提升模型推理准确度,特别是在CoT推理过程中。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.06932v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-03/2403.06932.md) |
| 03-11 | **Stealing Part of a Production Language Model**
机构: Google DeepMind, ETH Zurich, University of Washington
本文提出了一项对生产语言模型进行模型窃取的新攻击方法,该方法能够有效地提取Transformer模型的最后一层,并能用于解密黑盒模型的细节信息、参数和尺寸。文章还讨论了可能的防御措施,并指出了修改API以防止未来此类攻击的必要性。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.06634v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-03/2403.06634.md) |
| 03-08 | **Harnessing Multi-Role Capabilities of Large Language Models for Open-Domain Question Answering**
机构: Gaoling School of Artificial Intelligence Renmin University of China, Nankai University, Beijing Academy of Artificial Intelligence
LLMQA是一个新的通用框架模型,通过结合检索和生成范式搜集更高质量的证据,并让LLMs在框架中发挥多重角色,提高了开放域问答系统的整体性能,实验结果也证明了其超越现有方法的有效性。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.05217v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-03/2403.05217.md) |
| 03-08 | **Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context**
机构: Google
Gemini 1.5 Pro在记忆与推理海量长上下文信息的能力上取得了显著突破,尤其是在超长文本、视频和音频处理方面。该模型不仅在效果上优于现有模型,也在计算效率上有显著提高。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.05530v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-03/2403.0553.md) |
| 03-08 | **Overcoming Reward Overoptimization via Adversarial Policy Optimization with Lightweight Uncertainty Estimation**
这篇论文介绍了Adversarial Policy Optimization (AdvPO),它是解决基于人类反馈的强化学习过程中出现的奖励过优化问题的新方法,特别是在与人类偏好对齐的大型语言模型中。AdvPO有效地在没有带来高额计算成本的情况下缓解了奖励过优化。|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.05171v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-03/2403.05171.md) |
| 03-07 | **Yi: Open Foundation Models by 01.AI**
机构: 01.AI
该论文成功地提出了一个在性能和效率上都可与GPT-3.5相媲美的Yi-34B模型,并详细描述了在大型语言模型预训练及其指令微调方面的创新方法。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.04652v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-03/2403.04652.md) |
| 03-07 | **Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference**
机构: UC Berkeley, Stanford, UCSD
Chatbot Arena是一个基于用户偏好,用于评估大型语言模型的开放平台。它通过众包方式收集用户问题并进行匿名化的随机化对决,用于评估LLMs的表现,解决了现有静态数据集基准测试的局限性,并通过精心设计的统计方法确保了评估结果的可信度和效率。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.04132v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-03/2403.04132.md) |
| 03-05 | **ChatCite: LLM Agent with Human Workflow Guidance for Comparative Literature Summary**
机构: Tsinghua University
ChatCite系统是为了克服LLM在生成文献回顾时的挑战而设计的,它通过特定的模块使LLM代理可以更有效地理解、汇总和对比不同的研究工作,进而生成有组织、有比较性分析的文献回顾。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.02574v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-03/2403.02574.md) |
| 03-05 | **MathScale: Scaling Instruction Tuning for Mathematical Reasoning**
机构: The Chinese University of Hong Kong Shenzhen, China; Microsoft Research Asia, Beijing, China; Shenzhen Research Institute of Big Data, Shenzhen, China
MathScale提出了一个可扩展的方法来创建高质量的数学推理数据,通过构建新的评估基准MWPBENCH全面地评价LLMs在数学推理上的能力,显著提升了模型解决数学问题的性能。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.02884v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-03/2403.02884.md) |
| 03-05 | **Design2Code: How Far Are We From Automating Front-End Engineering?**
机构: Stanford University, Georgia Tech, Microsoft
本文通过对Design2Code任务的形式化和基准测试,评估了当前多模态LLMs在将视觉设计转换为代码的能力,并发现GPT-4V表现最佳,为自动化前端开发提供了一种新的范式。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.03163v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-03/2403.03163.md) |

---

### 02月

|  Date   | Paper | Links & Summary |
| --- | --- | --- |
| 02-29 | **Resonance RoPE: Improving Context Length Generalization of Large Language Models**
机构: 1DIRO Université de Montréal, Mila - Quebec AI Institute, Huawei Noah’s Ark Lab
本论文提出了 Resonance Rope,这是一个改进的模型,它基于对 RoPE 位置嵌入特征波长的分析来提升模型在处理长文本时的性能。它还引入了 POSGEN 基准测试,以帮助研究和评估位置嵌入在长文本任务中的表现。
|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.00071v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-02/2403.00071.md) |
| 02-29 | **SEED: Customize Large Language Models with Sample-Efficient Adaptation for Code Generation**
机构: Peking University
本文提出了一个名为SEED的适应方法,它利用错误驱动学习来使LLMs更少样本地高效学习,针对代码生成任务实现了更佳的性能和泛化性。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.00046v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-02/2403.00046.md) |
| 02-29 | **Beyond Language Models: Byte Models are Digital World Simulators**
机构: Microsoft Research Asia
论文展现了bGPT在处理挑战性的字节级数据模拟任务中的潜力,特别强调了其在跨模态知识转移和数字世界模拟方面的能力。这揭示了字节模型在数字媒体数据处理和理解上的广泛适用性和灵活性。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.19155v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-02/2402.19155.md) |
| 02-29 | **StarCoder 2 and The Stack v2: The Next Generation**
机构: ServiceNow, Hugging Face
本论文提出了The Stack v2和StarCoder2的发展过程,这是基于代码大规模预训练和指令微调的一项工作。研究人员通过整合多样化数据源和经过精心设计的训练过程,显著提高了代码LLMs的性能,特别是在处理低资源编程语言和需要代码推理的任务上。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.19173v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-02/2402.19173.md) |
| 02-27 | **When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method**
机构: Google DeepMind
该论文提供了大型语言模型微调阶段不同因素如数据大小、模型大小以及微调方法对模型性能影响的深入洞见,定义了一种新的评估框架。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.17193v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-02/2402.17193.md) |
| 02-27 | **Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models**
机构: OpenAI
本文是一篇对Sora——一个大型视觉模型的综述。论文讨论了Sora的技术特征、创新点、以及当前应用领域的局限性和未来可能的发展机会。Sora的能力在多个维度上展现了大型视觉模型的进步,包括长视频生成和多样化视频格式的处理。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.17177v2)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-02/2402.17177.md) |
| 02-27 | **REAR: A Relevance-Aware Retrieval-Augmented Framework for Open-Domain Question Answering**
机构: Gaoling School of Artificial Intelligence Renmin University of China, School of Information Renmin University of China
该论文提出了REAR框架,重点在于通过为LLMs加入文档相关性自我意识来增强其在QA任务中利用外部知识的能力,并证实该框架有效地超越了前述方法。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.17497v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-02/2402.17497.md)
[![GitHub](https://img.shields.io/badge/GitHub-View-brightgreen?logo=github)](https://github.com/RUCAIBox/REAR)
|
| 02-27 | **The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits**
机构: Microsoft, University of Chinese Academy of Sciences
论文提出BitNet b1.58模型,这是一个1.58比特量化的大型语言模型,与传统的完整精度LLMs在性能上可比,而且更高效、更节省能源。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.17764v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-02/2402.17764.md) |
| 02-27 | **EMO: Emote Portrait Alive -- Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions**
机构: Alibaba Group
EMO 框架通过直接的音频到视频合成方法提高了生成视频的真实感和表现力,显著优于现有技术,为视频合成领域提供了一个重要的进步。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.17485v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-02/2402.17485.md) |
| 02-27 | **Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization**
机构: Zhejiang University, Institute of Software Chinese Academy of Sciences, Nanjing University of Posts and Telecommunications
Agent-Pro是一个新型的基于LLM的智能代理,能够通过政策级反思和优化在交互环境中学习和发展策略,解决了现有工作无法通过交互学习和适应的问题。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.17574v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-02/2402.17574.md) |
| 02-26 | **Do Large Language Models Latently Perform Multi-Hop Reasoning?**
机构: Google DeepMind, UCL, Google Research
本文对LLMs是否能够进行潜在的多跳推理进行了研究,并通过实验提出了评估LLMs潜在多跳推理能力的新方法。研究提示LLMs对某些关系类型的提示有很强的多跳推理证据,但这种推理路径的运用在不同类型的提示中表现出高度的情境依赖性。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.16837v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-02/2402.16837.md) |
| 02-26 | **Improving LLM-based Machine Translation with Systematic Self-Correction**
机构: Zhejiang University, Tencent, Angelalign Technology Inc.
论文成功提出了第一个基于LLMs的自我纠正翻译框架TER,并验证了其在多种语言对和不同模型间的翻译质量改进效果。它为机器翻译领域带来了新的视角,特别是在自我纠正在高资源、低资源语言和不同中心语言之间翻译的应用。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.16379v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-02/2402.16379.md) |
| 02-26 | **LLMArena: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent Environments**
研究介绍了LLMARENA基准,用以评估LLMs智能体在复杂多代理环境中的能力,指出了存在的问题并促进了未来的研究方向,包括多模态动态环境中的能力及利用外部工具的潜力。|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.16499v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-02/2402.16499.md) |
| 02-25 | **ChatMusician: Understanding and Generating Music Intrinsically with LLM**
机构: Hong Kong University of Science and Technology
本文通过创造首个针对语言模型的音乐预训练数据集和评估基准,提升了LLMs在音乐理解和生成方面的表现,并在这一未被深入研究的领域取得了实质性进展。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.16153v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-02/2402.16153.md) |
| 02-23 | **ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition**
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.15220v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-02/2402.1522.md) |
| 02-23 | **Genie: Generative Interactive Environments**
机构: Google DeepMind, University of British Columbia
Genie是能够生成新视频并能通过用户输入控制视频内容的交互环境模型,弥补了传统视频生成技术与交互体验之间的差距。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.15391v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-02/2402.15391.md) |
| 02-22 | **Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments**
通过为LLMs设计特定的工具和推理算法,研究开发了名为FUXI的新框架,有效提高了LLMs在复杂环境中的操作能|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.14672v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-02/2402.14672.md) |
| 02-22 | **Large Language Models as Urban Residents: An LLM Agent Framework for Personal Mobility Generation**
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.14744v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-02/2402.14744.md) |
| 02-22 | **CriticBench: Benchmarking LLMs for Critique-Correct Reasoning**
机构: Tsinghua University, University of Hong Kong
该论文通过CRITICBENCH评估了LLMs的批判和纠正推理能力,并探究了影响这些能力的关键因子,旨在促进LLMs批判和自我改进能力的后续研究。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.14809v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-02/2402.14809.md) |
| 02-22 | **OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement**
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.14658v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-02/2402.14658.md) |
| 02-21 | **User-LLM: Efficient LLM Contextualization with User Embeddings**
USER-LLM是一个通过用户嵌入来上下文化LLM的框架。它能有效地解决用户数据的复杂性和长序列处理的问题,提升了LLM在个性化应用上的效能,同时也保证了计算效率。|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.13598v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-02/2402.13598.md) |
| 02-21 | **AgentScope: A Flexible yet Robust Multi-Agent Platform**
机构: Alibaba Group
AgentScope是一个用于构建多代理应用的多功能平台,强调易用性与可定制性,特别适合不同技能水平的开发者使用。通过实现容错和支持多模态数据处理,以及优化分布式操作,AgentScope显著降低了多代理系统开发与部署的难度,鼓励更广泛的参与和创新。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.14034v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-02/2402.14034.md)
[![GitHub](https://img.shields.io/badge/GitHub-View-brightgreen?logo=github)](https://github.com/modelscope/agentscope)
|
| 02-20 | **Instruction-tuned Language Models are Better Knowledge Learners**
机构: FAIR at Meta, Carnegie Mellon University, University of Washington
本文介绍了一种名为预指令微调(PIT)的方法,有效地提高了LLMs从文档中吸收知识的能力,解决了所谓的困惑度诅咒问题,并且在多域的知识获取中也取得了显著进展。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.12847v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-02/2402.12847.md) |
| 02-20 | **TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization**
机构: AWS AI Labs, The University of Texas at Austin, KAIST
本论文提出了一个名为TOFUEVAL的新型评估基准,针对LLM在生成话题焦点对话摘要时的事实一致性进行了评估。研究发现,不同大小的LLM在对话领域生成的摘要中存在大量事实错误。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.13249v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-02/2402.13249.md) |
| 02-19 | **AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling**
机构: Fudan University, Multimodal Art Projection Research Community, Shanghai AI Laboratory
AnyGPT 是一个多模态架构的语言模型,通过离散序列建模,能够实现不同模态间的无缝转换和统一处理,提供任意到任意模态之间的生成能力,同时不需要改变现有的 LLM 架构或训练范式。该模型通过在语义和感知水平进行建模,能有效处理和生成高质量的多模态内容,并且与专业模型相比具有可比较的性能。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.12226v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-02/2402.12226.md) |
| 02-16 | **Speculative Streaming: Fast LLM Inference without Auxiliary Models**
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.11131v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-02/2402.11131.md) |
| 02-16 | **FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models**
机构: The University of British Columbia & Invertible AI
该论文提出了一个针对财务分析优化的多模态大型语言模型(LLM)套件FinTral。通过与现有模型的对比,展示了其在财务领域多任务环境下的先进性能,特别是在处理零样本任务和减少幻觉现象方面的能力。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.10986v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-02/2402.10986.md) |
| 02-16 | **SPAR: Personalized Content-Based Recommendation via Long Engagement Attention**
机构: The University of British Columbia, Meta
SPAR框架充分利用长期用户互动历史来提升个性化内容推荐的精度,并在多项性能指标上超越现有技术。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.10555v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-02/2402.10555.md) |
| 02-15 | **How to Train Data-Efficient LLMs**
机构: Google DeepMind, University of California San Diego, Texas A&M University
论文提出的ASK-LLM和DENSITY技术优化了大型语言模型的数据效率,有效提升了模型训练的速度和质量,并在资源限制下表现出色。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.09668v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-02/2402.09668.md) |
| 02-15 | **Chain-of-Thought Reasoning Without Prompting**
机构: Google DeepMind
这项工作揭示了通过改变解码策略,可以有效地从预训练的LLMs中自然地引发推理,并且在预训练数据中频繁出现的任务上CoT路径更常见。提出的CoT解码方法无需手动引导,就能显著提高各种推理基准上的模型性能。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.10200v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-02/2402.102.md) |
| 02-15 | **A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts**
机构: Google DeepMind, Google Research
ReadAgent 是一个受人类阅读方式启发的LLM代理系统,通过创建摘要记忆并根据需要检索信息来解决长文本任务,显著提高了模型的表现和伸缩性。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.09727v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-02/2402.09727.md) |
| 02-14 | **Premise Order Matters in Reasoning with Large Language Models**
机构: Google DeepMind
这篇论文关注于大型语言模型在处理推理任务时,前提顺序的影响,并通过创建R-GSM基准测试来评估这一现象。研究揭示了LLMs对前提顺序极为敏感,性能受顺序影响显著。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.08939v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-02/2402.08939.md) |
| 02-09 | **InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning**
机构: Shanghai AI Laboratory, Tsinghua University, Fudan University School of Computer Science
InternLM-Math模型是一种基于LLMs的数学推理工具,它整合了多种能力并提供了监督学习以帮助模型在各种数学推理任务中实现最先进的性能,并开源其代码和数据。论文还探讨了利用程序语言LEAN在多任务学习设置中解决数学问题的新方法,彰显了LLMs在形式化和代码辅助推理中的潜能。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.06332v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-02/2402.06332.md)
[![GitHub](https://img.shields.io/badge/GitHub-View-brightgreen?logo=github)](https://github.com/InternLM/InternLM-Math)
|
| 02-02 | **LimSim++: A Closed-Loop Platform for Deploying Multimodal LLMs in Autonomous Driving**
机构: Shanghai Artificial Intelligence Laboratory, College of Control Science and Engineering Zhejiang University
LimSim++是首个专为(M)LLM支持的自动驾驶而开发的封闭循环评估平台。它解决了现有仿真平台的局限性,并通过实验验证了其在多种复杂交通场景中的有效性。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.01246v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-02/2402.01246.md) |
| 02-02 | **K-Level Reasoning with Large Language Models**
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.01521v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-02/2402.01521.md) |
| 02-02 | **AMOR: A Recipe for Building Adaptable Modular Knowledge Agents Through Process Feedback**
机构: Tsinghua University, Ant Group
AMOR框架综合了基于有限状态机(FSM)的推理逻辑和过程反馈机制,展示了基于开源LLM的知识代理如何通过人类监督实现推理和适应性,提高了模型在完成知识密集任务中的能力。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.01469v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-02/2402.01469.md) |
| 02-02 | **MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models**
机构: UNC Chapel Hill.
本论文介绍了名为MAGDI的新方法,它通过结构化蒸馏方式将多LLM之间的推理交互蒸馏到更小的模型中,显著提升小模型的推理能力和泛化能力,同时降低了成本。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.01620v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-02/2402.0162.md) |
| 02-02 | **Reasoning Capacity in Multi-Agent Systems: Limitations, Challenges and Human-Centered Solutions**
机构: Megagon Labs, Carnegie Mellon University
本论文提出了多代理系统中的“推理能力”概念,以改善优化和评估,并探讨了利用人类反馈增强系统推理能力的可能性。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.01108v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-02/2402.01108.md) |
| 02-01 | **Don't Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM Collaboration**
机构: University of Washington, University of California Berkeley, The Hong Kong University of Science and Technology
本文关注的是如何在大型语言模型(LLMs)中识别知识差距并在必要时放弃回答问题。研究提出了两种基于多LLM合作的新方法,通过对比实验显示它们能有效提高LLMs放弃生成低信心输出的能力。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.00367v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-02/2402.00367.md) |
| 02-01 | **Can Large Language Models Understand Context?**
机构: Georgetown University, Apple
本文提出了一个上下文理解基准,用以评估大型语言模型(LLMs)的上下文理解能力。该基准涵盖了对文档和对话基础上下文理解的要素,通过创新的测试方法和实验分析展示了LLMs在上下文理解方面的能力和局限性。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.00858v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-02/2402.00858.md) |
| 02-01 | **Learning Planning-based Reasoning by Trajectories Collection and Process Reward Synthesizing**
机构: Nanyang Technological University, Institute for Infocomm Research A*STAR, Salesforce Research
文章提出了一个新颖的离线训练框架,专注于改进大型语言模型在处理复杂推理任务时的可靠性和精确性,通过收集轨迹和基于结果监督的直接偏好优化,无需教师模型或人类标注。在两个逻辑推理基准测试上的结果证明了该方法的有效性。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.00658v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-02/2402.00658.md) |
| 02-01 | **HR-MultiWOZ: A Task Oriented Dialogue (TOD) Dataset for HR LLM Agent**
机构: Amazon, University of Milano-Bicocca
本文介绍了一个新的针对人力资源领域的大语言模型代理(HR LLM Agent)任务的对话数据集,HR-MultiWOZ。它不仅解决了当前在构建和评估HR领域LLM代理时缺乏高质量训练数据集的问题,还提供了一个经济高效的数据集生成方法,为同领域中的其他研究提供了宝贵的资源和参考。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.01018v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-02/2402.01018.md) |

---

### 01月

|  Date   | Paper | Links & Summary |
| --- | --- | --- |
| 01-31 | **LongAlign: A Recipe for Long Context Alignment of Large Language Models**
机构: Tsinghua University, Zhipu.AI
论文提出了一种新的长上下文对齐配方LongAlign,通过构建长指令数据集、采用新的训练策略并引入评估基准来提高LLMs处理长上下文的能力,且代码、数据和长对齐的模型已开源。
|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.18058v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.18058.md)
[![GitHub](https://img.shields.io/badge/GitHub-View-brightgreen?logo=github)](https://github.com/THUDM/LongAlign)
|
| 01-30 | **Efficient Tool Use with Chain-of-Abstraction Reasoning**
机构: Meta
本文提出了一种新的链式抽象推理方法,有效地提升了LLMs使用外部工具的能力,并加速了推理过程。实验结果证明了其在多步骤推理任务上的有效性和高效性。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.17464v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.17464.md) |
| 01-30 | **Can Large Language Models be Trusted for Evaluation? Scalable Meta-Evaluation of LLMs as Evaluators via Agent Debate**
机构: Shanghai Jiao Tong University, Carnegie Mellon University, Shanghai Artificial Intelligence Laboratory
SCALEEVAL 是一种新型的元评估框架,用于评估LLMs作为评估者的可靠性和效率。通过利用LLM代理间的辩论和最小化的人类监督,该框架在评估中引入灵活性和可扩展性,并在实验中显示出与纯人工评估高度一致的结果。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.16788v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.16788.md)
[![GitHub](https://img.shields.io/badge/GitHub-View-brightgreen?logo=github)](https://github.com/GAIR-NLP/scaleeval)
|
| 01-30 | **Recovering Mental Representations from Large Language Models with Markov Chain Monte Carlo**
机构: Princeton University, University of Warwick
本文通过将 LLMs 整合到采样算法中,并运用直接采样与 MCMC 的方式提取心理表征,有效提升了效率和性能,并探索了用 LLM 进行贝叶斯推断的潜力。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.16657v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.16657.md) |
| 01-30 | **Incoherent Probability Judgments in Large Language Models**
机构: Princeton University
这篇论文探讨了大型语言模型在形成概率判断方面的连贯性问题,并发现这些模型在该领域表现出的偏差与人类认知中的系统性偏差相似。通过应用概率恒等式和重复判断的方法,研究人员量化了这些判断的不连贯性。研究还提出了一个假设,即LLMs在做出概率判断时的人类样偏差可能源自它们采用的自回归训练目标,这一假设得到了以贝叶斯取样器模型和LLMs中的自回归过程之间潜在联系为基础的支持。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.16646v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.16646.md) |
| 01-29 | **LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing LLMs' Vulnerability Reasoning**
机构: Nanyang Technological University
LLM4Vuln是一个创新的框架,它通过提供漏洞知识的向量数据库、调用工具的功能、定制的CoT提示方案以及使用精通指令的模型来结构化输出,显著提高了LLMs在代码漏洞分析中的性能。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.16185v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.16185.md) |
| 01-29 | **SelectLLM: Can LLMs Select Important Instructions to Annotate?**
机构: University of Minnesota, Carnegie Mellon University
这项工作提出了一个利用LLMs选择未标记的高质量指令的新方法SELECTLLM,通过挑战传统的选择算法并在保持数据集的全局结构的同时提升选择效果。实验结果显示了其在指令调整基准测试上的优越性能。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.16553v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.16553.md)
[![GitHub](https://img.shields.io/badge/GitHub-View-brightgreen?logo=github)](https://github.com/minnesotanlp/select-llm)
|
| 01-29 | **Beyond Direct Diagnosis: LLM-based Multi-Specialist Agent Consultation for Automatic Diagnosis**
机构: Harbin Institute of Technology
本研究提出了一个基于LLM的自动诊断方法——多专家智能代理咨询模型(AMSC),它能更好地模拟现实世界中的诊断流程,并通过集成多个专家代理的预测来提升诊断的准确性和效率。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.16107v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.16107.md) |
| 01-28 | **PRE: A Peer Review Based Large Language Model Evaluator**
这篇论文提出的PRE模型通过模拟学术界的同行评审机制,提供了一种全新的自动评估LLM的框架,它显著降低了成本,并且具有更高的通用性和可靠性。|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.15641v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.15641.md) |
| 01-27 | **MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries**
机构: Hong Kong University of Science and Technology
这篇论文开发了MultiHop-RAG数据集,以评估和改善现存的检索增强生成(RAG)系统在处理需要多步检索和推理的查询上的不足。研究还提供了一系列实验结果,揭示了目前RAG系统在此类任务上的局限性,并公开了数据集推动进一步的研究和开发。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.15391v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.15391.md)
[![GitHub](https://img.shields.io/badge/GitHub-View-brightgreen?logo=github)](https://github.com/yixuantt/MultiHop-RAG)
|
| 01-26 | **EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty**
机构: Peking University, Microsoft Research, University of Waterloo
文章提出了一个名为EAGLE的新框架,以提高大型语言模型(LLMs)自回归解码的速度,同时保证生成文本与原始LLMs的文本分布一致。EAGLE通过改进推测性采样方法,在减少时间开销和提高草稿的接受率方面取得了显著成效,对比Lookahead和Medusa实现了更快的加速效果,并且训练成本低,易于部署。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.15077v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.15077.md) |
| 01-25 | **True Knowledge Comes from Practice: Aligning LLMs with Embodied Environments via Reinforcement Learning**
机构: Nanyang Technological University, Zhejiang University
TWOSOME框架通过强化学习来有效地将大型语言模型(LLMs)与体现环境对齐,提高了样本效率和任务泛化能力,同时保留了LLMs的原始功能。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.14151v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.14151.md) |
| 01-25 | **ConstraintChecker: A Plugin for Large Language Models to Reason on Commonsense Knowledge Bases**
机构: HKUST
ConstraintChecker是一个能有效提升LLMs在CSKB推理任务中性能的独立插件工具。通过提供和检查显式约束的方式,它能够帮助LLMs在推理中取得更好的表现,且在经过验证后的指标上超过了其他的提示技术。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.14003v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.14003.md)
[![GitHub](https://img.shields.io/badge/GitHub-View-brightgreen?logo=github)](https://github.com/HKUST-KnowComp/ConstraintChecker)
|
| 01-25 | **Towards Consistent Natural-Language Explanations via Explanation-Consistency Finetuning**
机构: Columbia University, Microsoft Research, University of California Berkeley
EC-Finetuning方法成功地提高了LLMs生成解释的一致性,并且可以推广到未见过的数据集,表现出微调数据集上10.0%和分布外数据集上4.5%的解释一致性相对改善,同时也适度提升了预测准确度。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.13986v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.13986.md)
[![GitHub](https://img.shields.io/badge/GitHub-View-brightgreen?logo=github)](https://github.com/yandachen/explanation-consistency-finetuning)
|
| 01-24 | **Can AI Assistants Know What They Don't Know?**
机构: Fudan University, Shanghai Artificial Intelligence Laboratory
这篇论文重点探究了AI助手识别自己知识边界的能力,并通过构建Idk数据集并对助手调整,实现了让AI助手识别并承认不知道的问题,以减少回答中的事实错误。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.13275v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.13275.md) |
| 01-24 | **MM-LLMs: Recent Advances in MultiModal Large Language Models**
机构: Tencent AI Lab, Kyoto University, Mohamed Bin Zayed University of Artificial Intelligence
本文是一项关于MM-LLMs的综合性调研,旨在进一步推动MM-LLMs领域的研究工作。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.13601v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.13601.md) |
| 01-24 | **Consistency Guided Knowledge Retrieval and Denoising in LLMs for Zero-shot Document-level Relation Triplet Extraction**
机构: Nanjing University of Science and Technology, Northeastern University, Singapore Institute of Technology
论文提出了一个新的Zero-shot Document-level Relation Triplet Extraction (ZeroDocRTE)框架,该框架通过从LLMs中检索和去噪知识生成带标签的数据,并通过一系列新方法显著提高了文档级关系三元组抽取的性能。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.13598v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.13598.md) |
| 01-24 | **Clue-Guided Path Exploration: An Efficient Knowledge Base Question-Answering Framework with Low Computational Resource Consumption**
机构: Tsinghua University, Zhongguancun Laboratory, XinJiang University
论文提出的CGPE框架能有效支持LLMs在问答任务中的应用,通过线索引导的路径探索机制,降低了对LLMs能力的要求,并显著减少了计算资源消耗,对计算资源有限的个人和组织具有重要实际意义。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.13444v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.13444.md) |
| 01-24 | **AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents**
机构: The University of Hong Kong, Zhejiang University, Shanghai Jiao Tong University
研究人员提出了一个新的基准测试 AGENTBOARD,专门评估具有多轮交互能力的大语言模型代理,它提供了细粒度的进展率和交互式分析工具,以增进对 LLM 代理性能的深入理解。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.13178v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.13178.md) |
| 01-23 | **CCA: Collaborative Competitive Agents for Image Editing**
该论文提出了一种基于多个大语言模型(LLMs)的新型生成模型CCA,能够处理复杂的图像编辑任务并提升结果的质量和鲁棒性。通过鼓励代理的协作竞争,模型展示出优于传统方法的能力,尤其在管理复杂任务和从中间步骤中学习以改进结果方面。|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.13011v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.13011.md)
[![GitHub](https://img.shields.io/badge/GitHub-View-brightgreen?logo=github)](https://github.com/TiankaiHang/CCA)
|
| 01-23 | **AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents**
机构: Google DeepMind
本论文描述了一个名为AutoRT的系统,它利用大型基础模型控制真实世界中的机器人,使它们能够自动导航并执行任务。这标志着第一次实现LLM控制的机器人在真实环境中进行自动操作、提出目标并实现这些目标。通过AutoRT收集到的数据不仅多样化且能够提高机器人学习模型的性能,并且可以与人类偏好保持一致。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.12963v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.12963.md) |
| 01-23 | **Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment**
机构: Alibaba Inc.
论文提出了一种名为DITTO的自对齐方法,能够通过知识增强和对话模拟增强LLMs的角色扮演能力。此外,它提供了一种客观、可复制、可解释且高效的角色扮演评估方法,并通过跨监督的实验了解角色扮演的分解,为LLMs构建角色扮演功能提供了深入的理解和见解。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.12474v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.12474.md)
[![GitHub](https://img.shields.io/badge/GitHub-View-brightgreen?logo=github)](https://github.com/OFA-Sys/Ditto)
|
| 01-23 | **KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning**
机构: Samsung R&D Institute India - Bangalore
KAM-CoT是一个多模态CoT推理框架,整合了CoT推理、知识图谱和多种模态。它在具有较少可训练参数的情况下优于现有的最先进方法,展现出卓越的性能和成本效率。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.12863v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.12863.md) |
| 01-22 | **Improving Small Language Models' Mathematical Reasoning via Mix Thoughts Distillation**
机构: Institute of Information Engineering, Chinese Academy of Sciences
这篇论文通过介绍EoTD和MTD,表明了可以将LLMs的数学推理能力转化给参数数量少于一十亿个的SLMs。通过实验验证了这些方法不仅保留了SLMs的推理能力,还在一定程度上提升了该能力,使SLMs在推理任务上达到了最好水平。这一进展对于在资源受限的环境中推广SLMs的应用打开了大门,并缩小了对强大推理模型需求与计算资源限制之间的差距。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.11864v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.11864.md) |
| 01-22 | **PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety**
机构: Shanghai Artificial Intelligence Laboratory, Dalian University of Technology
本文提出了一个针对多智能体系统安全性的综合性框架PsySafe,该框架结合了心理层面的攻击、防御与评估方法。研究的实验结果有助于更深入地理解和研究多智能体系统的安全问题。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.11880v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.1188.md) |
| 01-22 | **CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation**
机构: Stanford University, Stability AI
本论文致力于解决在自动化胸部X光解释方面存在的挑战,通过引入专为CXR解释设计的大型数据集、开发了新的基础模型以及创建了一个全面的评估基准,实现了在医学成像领域的应用,并证明了在多项评估任务中CheXagent的性能优于其他模型。同时也对模型中可能存在的偏差进行了检查,为未来的研究和应用提供了重要参考。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.12208v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.12208.md) |
| 01-21 | **Interactive AI with Retrieval-Augmented Generation for Next Generation Networking**
机构: Nanyang Technological University, Guangdong University of Technology, Institute for Infocomm Research, Agency for Science Technology and Research
本文研究了将交互式AI (IAI) 集成到下一代网络中的可能性,采用了检索增强型生成(RAG)和大型语言模型(LLM)来提升决策能力,并通过真实网络优化的案例研究证明了提出框架的有效性。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.11391v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.11391.md) |
| 01-20 | **BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models**
机构: University of Illinois Urbana-Champaign, University of Washington, Western Washington University
本文提出了BadChain,这是一种针对采用COT提示的LLMs的后门攻击,不仅不需要访问训练数据集或模型参数,而且计算开销低。该方法有效地揭示了COT提示下LLMs的安全漏洞,强调了进行后门攻击和设计有效防御的重要性。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.12242v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.12242.md) |
| 01-19 | **Pruning for Protection: Increasing Jailbreak Resistance in Aligned LLMs Without Fine-Tuning**
机构: MIT
论文展示了通过Wanda剪枝方法,无需微调而提升LLMs从对齐安全性方面抵御“越狱”攻击的能力,并通过构建特定的数据集和评估体系验证模型表现。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.10862v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.10862.md) |
| 01-19 | **Tool-LMM: A Large Multi-Modal Model for Tool Agent Learning**
机构: ShanghaiTech University, Meituan, UniDT
Tool-LMM为首个致力于训练大型多模态模型以学习工具代理的系统,创新地整合了多模态输入与外部工具的正确选择,克服了文本模糊带来的问题,展现了在多模态指令下自动选择合适工具的能力。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.10727v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.10727.md)
[![GitHub](https://img.shields.io/badge/GitHub-View-brightgreen?logo=github)](https://github.com/Tool-LMM/Tool-LMM)
|
| 01-19 | **Mitigating Hallucinations of Large Language Models via Knowledge Consistent Alignment**
机构: Sun Yat-sen University, Tencent AI Lab
这篇论文引入了一种新颖的KCA方法,通过减少外部知识和内在知识之间的不一致性,从而减轻LLMs在校准过程中产生的幻觉。研究提供了未来研究的几个见解,尤其是KCA方法在多种场景下的出色表现,以及其简单性与有效性的结合。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.10768v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.10768.md)
[![GitHub](https://img.shields.io/badge/GitHub-View-brightgreen?logo=github)](https://github.com/fanqiwan/KCA)
|
| 01-19 | **Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads**
机构: Princeton University, Together AI, University of Illinois Urbana-Champaign
文章提出了一个名为Medusa的LLM推理加速框架,通过增加额外的解码头并用树形注意力机制,并行生成多个token,有效减少解码步骤数量,实现了对大模型推理速度的显著提升。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.10774v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.10774.md) |
| 01-18 | **ChatQA: Building GPT-4 Level Conversational QA Models**
机构: NVIDIA
ChatQA模型通过两阶段的指令微调策略显著改进了多轮对话式问答的效果,尤其是在上下文理解和信息检索方面。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.10225v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.10225.md) |
| 01-18 | **Self-Rewarding Language Models**
机构: Meta, NYU
本文提出了自奖励语言模型(Self-Rewarding Language Models),旨在通过自我训练来避免人类偏好数据的瓶颈,并提高模型的自奖励和执行指令的能力。实验结果表明,该模型表现出色,有望成为连续自我改进模型的开山之作。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.10020v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.1002.md) |
| 01-18 | **Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and Visual Question Generation**
机构: The University of Tokyo, RIKEN
这项研究通过创新地整合一个显性的推理过程和生成问题的能力到LMM中,以促进模型进行更可靠的推理。创建了一个新的数据集并利用它对模型进行培训,为今后LMM的进步设定了先例,并通过这种方式使模型在面临不确定性时能生成显性推理步骤和提问。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.10005v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.10005.md) |
| 01-18 | **A Fast, Performant, Secure Distributed Training Framework For Large Language Model**
机构: Ant Group China
本论文提出了一个基于模型切片的安全分布式训练框架,能在保证模型训练精度和高效率的同时,解决了服务端和客户端的模型参数及数据泄露问题。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.09796v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.09796.md) |
| 01-17 | **Vlogger: Make Your Dream A Vlog**
机构: Shanghai Jiao Tong University, Shanghai AI Laboratory, Shenzhen Institute of Advanced Technology Chinese Academy of Sciences
本论文通过介绍Vlogger系统,展示了一个创新的办法将LLMs应用于视频博客的生成过程中,从而克服了生成分钟级连贯视频内容的挑战,并取得了优异的实验结果。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.09414v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.09414.md)
[![GitHub](https://img.shields.io/badge/GitHub-View-brightgreen?logo=github)](https://github.com/zhuangshaobin/Vlogger)
|
| 01-17 | **ReFT: Reasoning with Reinforced Fine-Tuning**
机构: ByteDance Research
ReFT通过利用强化学习优化非可微目标,显著提高了大型语言模型在数学问题求解任务中的性能和泛化能力。它超越了传统的监督式学习方法,展现了在更复杂推理任务中的潜力。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.08967v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.08967.md) |
| 01-17 | **LLMs for Relational Reasoning: How Far are We?**
机构: Continental-NTU Corporate Lab, Nanyang Technological University, Singapore
本论文主要探讨了大型语言模型在关系推理方面的能力和局限性。通过全面的评估,包括新提出的测试方法和评估模块,发现LLMs虽然在某些关系推理任务上表现不错,但与专门为逻辑推理设计的模型相比,其性能相对较差。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.09042v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.09042.md) |
| 01-16 | **Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation**
机构: Johns Hopkins University, Microsoft
本论文提出了CPO,一个新的LLM微调方法,有效解决了SFT在机器翻译任务中存在的瓶颈,实现了在资源消耗极少的情况下显著提升中等规模LLM翻译模型的性能,最终与最先进的状态艺术翻译系统齐头并进。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.08417v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.08417.md) |
| 01-16 | **SpecGen: Automated Generation of Formal Program Specifications via Large Language Models**
机构: Nanjing University, Nanyang Technological University, Singapore Management University
论文提出了 SpecGen,一个结合了大型语言模型和启发式选择策略的程序形式化规范自动生成技术。通过比较与现有工具和纯 LLM 方法,SpecGen 表现出更高效和准确的生成规范的能力,并且提出了数据集促进后续研究。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.08807v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.08807.md) |
| 01-16 | **RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture**
机构: Microsoft
本文研究了大型语言模型在农业数据上生成问答对的性能,并提出了一个新的生成管道,有效地使用了RAG和微调技术增强LLM的应用场景,拓展了LLM在特定行业的应用潜力。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.08406v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.08406.md) |
| 01-16 | **MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible Pipeline**
机构: Alibaba Group
论文提出了一个新的数学推理数据集,该数据集与Python代码解释器相结合,通过改进数据集并实施特定微调流程显著提高了LLM在数学问题求解任务上的性能。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.08190v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.0819.md) |
| 01-16 | **Salute the Classic: Revisiting Challenges of Machine Translation in the Age of Large Language Models**
机构: Tencent AI Lab
本文深入分析了LLMs在机器翻译任务中领域不匹配问题,并实验了不同数量的平行数据对LLMs翻译能力的影响,展现出LLMs在处理这些挑战中的潜力。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.08350v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.0835.md)
[![GitHub](https://img.shields.io/badge/GitHub-View-brightgreen?logo=github)](https://github.com/pangjh3/LLM4MT)
|
| 01-16 | **DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models**
机构: Zhejiang University
DoraemonGPT是一个LLM驱动的智能体,通过符号记忆和工具集来理解并解答涉及动态视频的复杂问题。其采用了MCTS规划器优化回答的生成过程,能够在真实世界场景中处理更为复杂的任务。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.08392v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.08392.md) |
| 01-15 | **MAPLE: Multilingual Evaluation of Parameter Efficient Finetuning of Large Language Models**
机构: Microsoft Research India
这篇论文研究了大型语言模型在多语言任务上通过参数高效微调后的性能,特别是在低资源语言和英语任务上。研究展示了PEFT的潜力,同时指出了未来工作的一些可能方向。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.07598v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.07598.md) |
| 01-15 | **The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey**
机构: Technology Innovation Institute UAE, Islamic University of Technology Bangladesh, Stanford University, Amazon GenAI, AI Institute University of South Carolina
本论文是关于LLMs上下文长度扩展技术的详细调研。它为研究人员提供了该领域的现有策略和挑战的有组织概览,并鼓励了对未来发展的讨论。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.07872v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.07872.md) |
| 01-15 | **A Study on Large Language Models' Limitations in Multiple-Choice Question Answering**
机构: David R. Cheriton School of Computer Science
该论文针对LLMs在MCQ任务中的限制进行了研究,指出多数模型在此类任务中表现不佳。论文还发现模型的回答往往依赖于选项顺序,并提出了有效的评估方法来排除这些偏见。论文推荐在使用MCQ评估LLMs时要格外小心,并测试模型是否真正理解了任务。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.07955v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.07955.md) |
| 01-14 | **Small LLMs Are Weak Tool Learners: A Multi-LLM Agent**
机构: Sun Yat-sen University, Alibaba Group
研究表明小型LLM在作为工具学习者方面较为薄弱,通过引入α-UMi多LLM框架来构建性能更优的LLM代理,提出了必要的双阶段微调策略,并深入分析了数据规模法则。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.07324v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.07324.md) |
| 01-13 | **Bridging the Preference Gap between Retrievers and LLMs**
本论文介绍了BGM框架以解决检索器和LLMs之间的"偏好差"问题,通过一个序列到序列(seq2seq)的桥模型结合SL和RL的训练方案,优化了检索信息以满足LLMs的偏好,改进了多个下游任务的表现。|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.06954v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.06954.md) |
| 01-12 | **From Automation to Augmentation: Large Language Models Elevating Essay Scoring Landscape**
机构: Tsinghua University, University of Maryland, Beijing Xicheng Educational Research Institute
本文的研究展现了大型语言模型在教育领域中,特别是在AES系统中的潜力。LLMs不仅能够自动化评分过程,还能够通过生成反馈来增强人类评分者的表现。这不仅是技术上的进步,更为未来的人工智能辅助教育和人工智能与人类的高效协作提供了宝贵见解。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.06431v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.06431.md) |
| 01-12 | **How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs**
机构: Virginia Tech, Renmin University of China, UC Davis
本论文提出了将LLMs视为具备类人沟通能力的实体,利用了一个新的视角来研究AI安全问题。通过将十多年的社会科学研究应用于AI安全,制定了一个说服技巧分类法,并通过创建的工具自动生成了对抗性提示。结果表明,说服技巧可以有效地增强有风险行为被LLMs执行的可能性,同时揭示了当前防御手段在应对这类策略时的不足。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.06373v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.06373.md) |
| 01-12 | **Teaching Code LLMs to Use Autocompletion Tools in Repository-Level Code Generation**
机构: Nanyang Technological University, Fudan University
该论文成功提出了一种新方法TOOLGEN,通过集成自动完成工具到仓库级代码生成中的LLMs,解决了依赖性问题,提高了代码生成的质量和成功率。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.06391v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.06391.md) |
| 01-12 | **An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models**
机构: University of Washington Seattle, University of Wisconsin-Madison, Stanford University
这篇论文提出了一个实验设计框架,为了提高大型语言模型在监督式微调(SFT)过程中的标签效率。它展示了实验设计技术可以在维持低计算成本的同时,大幅提高标签效率,在一些任务中与随机采样相比节省了50%的注释成本。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.06692v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.06692.md) |
| 01-12 | **Kun: Answer Polishment for Chinese Self-Alignment with Instruction Back-Translation**
机构: Tianyu Zheng, Shuyue Guo, Xingwei Qu, Jiawei Guo, Weixu Zhang, Xinrun Du, Chenghua Lin, Wenhao Huang, Wenhu Chen, Jie Fu, Ge Zhang
这篇论文提出了Kun策略,解决了中文大型语言模型指令微调中存在的数据一致性问题,通过AP过程和新的数据生成方法,减少了对人工标注的依赖。评估结果表明,Kun策略在创建高质量数据集方面具有明显优势。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.06477v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.06477.md)
[![GitHub](https://img.shields.io/badge/GitHub-View-brightgreen?logo=github)](https://github.com/Zheng0428/COIG-Kun)
|
| 01-12 | **TestSpark: IntelliJ IDEA's Ultimate Test Generation Companion**
机构: JetBrains Research, Delft University of Technology
论文提出了TestSpark插件,它结合了基于搜索的软件测试生成和基于语言模型的测试生成方法,在IntelliJ IDEA中提高了单元测试的生成和集成效率,同时解决了LLM生成测试用例可编译性的问题。插件的开源特性使其成为连接软件开发者和研究者的桥梁,有助于测试生成技术的实用性进步。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.06580v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.0658.md)
[![GitHub](https://img.shields.io/badge/GitHub-View-brightgreen?logo=github)](https://github.com/JetBrains-Research/TestSpark)
|
| 01-12 | **APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding**
机构: Tsinghua University, Zhipu AI
通过实施APAR,该研究成功提高了LLMs在内存受限场景和高吞吐率场景下的解码效率和生成速度,同时保持了生成质量,为大语言模型的部署提供了一种新的高效策略。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.06761v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.06761.md) |
| 01-11 | **Chain of History: Learning and Forecasting with LLMs for Temporal Knowledge Graph Completion**
机构: Tsinghua Shenzhen International Graduate School Tsinghua University, School of Computer Science Peking University, Baidu Inc.
本论文提出了一种使用大型语言模型进行时间知识图谱完成的方法,通过高效的微调方法和结合结构信息的历史数据增强,提高了模型的推理能力和性能。实验显示该方法有效地提升了时间知识图谱预测的精度,达到了最先进的结果。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.06072v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.06072.md) |
| 01-11 | **The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language Models**
机构: Johns Hopkins University
此研究表明,通过使用简洁的思维链提示(CCoT),在大型语言模型中可以大幅减少文本输出的长度,而不会影响解决问题的性能。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.05618v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.05618.md) |
| 01-11 | **EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction**
机构: Fudan University, Microsoft Research Asia, Zhejiang University
本文提出了一种名为EASYTOOL的方法,可以通过简化和统一工具文档的指令来提高LLM基础代理在工具使用方面的表现,解决了现有工具使用中的不一致性、冗余性和不完整性问题。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.06201v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.06201.md)
[![GitHub](https://img.shields.io/badge/GitHub-View-brightgreen?logo=github)](https://github.com/microsoft/JARVIS)
|
| 01-11 | **Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems**
机构: Zhongguancun Laboratory, Tsinghua University, Institute of Information Engineering Chinese Academy of Sciences
本文为大语言模型系统中的风险分类、缓解措施以及评估标准提供了全面的概述,提出了一个新的系统化分类框架,帮助开发者更全面地理解和处理LLM系统的潜在风险。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.05778v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.05778.md) |
| 01-11 | **Evidence to Generate (E2G): A Single-agent Two-step Prompting for Context Grounded and Retrieval Augmented Reasoning**
机构: Qatar Computing Research Institute
本论文提出了一个新的、用于改善LLMs在上下文推理能力的单代理双步提示框架——Evidence to Generate (E2G)。通过要求LLMs在生成答案的同时提供证据与解释,E2G能够减少错误推理并提高模型在处理各种推理任务时的准确度。实验结果表明,E2G方法在多个情境密集型语言任务中表现出较CoT更好的性能。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.05787v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.05787.md) |
| 01-11 | **LLM-as-a-Coauthor: The Challenges of Detecting LLM-Human Mixcase**
机构: LAIR Lab Lehigh University, Huazhong University of Science and Technology
本文定义了混合场景中的混合文本(mixcase),构建了MIXSET数据集,并提出了通向解决混合文本检测问题的见解和方向。研究发现,现有的检测器在识别混合文本方面存在不足,这提出了制定更细粒度检测器的紧迫需求。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.05952v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.05952.md)
[![GitHub](https://img.shields.io/badge/GitHub-View-brightgreen?logo=github)](https://github.com/Dongping-Chen/MixSet)
|
| 01-11 | **Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint**
机构: Gaoling School of Artificial Intelligence, Renmin University of China; School of Information, Renmin University of China; Kuaishou Technology, Beijing China.
本论文提出了一种新的RL方法,名为RLMEC,通过生成式奖励模型和最小编辑机制,使大型语言模型在RL训练过程中实现更精细的监督和训练的稳定性。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.06081v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.06081.md)
[![GitHub](https://img.shields.io/badge/GitHub-View-brightgreen?logo=github)](https://github.com/RUCAIBox/RLMEC)
|
| 01-11 | **Patchscope: A Unifying Framework for Inspecting Hidden Representations of Language Models**
机构: Google Research, Tel Aviv University
该论文提出了一个名为Patchscopes的框架,提供了一种新的方法去解释大型语言模型(LLMs)隐藏表示中编码的信息,并且能够纠正多步推理错误。Patchscopes作为一种通用的可配置框架,不仅统一了现有的解释工具,并解决了它们自身的一些不足,同时也开辟了新的研究和应用可能性。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.06102v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.06102.md) |
| 01-11 | **TOFU: A Task of Fictitious Unlearning for LLMs**
机构: Carnegie Mellon University
文章为LLM遗忘问题提供了新的数据集和评估机制,TOFU任务展示了现有遗忘技术的不足,鼓励了相继而来的改进和研究工作。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.06121v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.06121.md) |
| 01-10 | **Attendre: Wait To Attend By Retrieval With Evicted Queries in Memory-Based Transformers for Long Context Processing**
机构: Google Research
论文成功地提出了一个新的基于内存的转换器方法,通过存储驱逐策略和ATTENDRE层,有效地减少内存需求并支持双向注意力,在长序列处理上表现出与传统方法相当的性能。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.04881v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.04881.md) |
| 01-10 | **Leveraging Print Debugging to Improve Code Generation in Large Language Models**
机构: Zhejiang University, ByteDance
本文提出了一种利用print debugging方法指导LLMs进行代码生成和调试的方法,并且在Leetcode问题集上验证了其有效性,特别是在简单和中等难度的问题上。尽管在高难度问题上效果有限,但这项工作仍然是LLMs在代码调试方面的一个重要进步。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.05319v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.05319.md) |
| 01-10 | **AUTOACT: Automatic Agent Learning from Scratch via Self-Planning**
机构: Zhejiang University, Alibaba Group
这项研究提出了一个名为AUTOACT的框架,它通过自我指导和自我规划实现语言代理的自动学习,以应对从零开始学习新任务的挑战。该框架的核心贡献在于有效的数据扩充方法和高效率的自动代理学习过程。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.05268v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.05268.md)
[![GitHub](https://img.shields.io/badge/GitHub-View-brightgreen?logo=github)](https://github.com/zjunlp/AutoAct)
|
| 01-10 | **Bootstrapping LLM-based Task-Oriented Dialogue Agents via Self-Talk**
机构: AWS AI Labs
论文提出了一种基于大型语言模型(LLMs)自我对话生成训练数据的新方法,该方法有潜力改进任务导向对话代理的性能。尽管存在一些限制,研究结果表明,当选择高质量对话作为训练数据时,可以有效提高模型的性能。这证明了在正确的设置下,通过自我生成数据进行微调的语言模型确实有潜力实现自我改进,并成为更好的任务导向对话代理。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.05033v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.05033.md) |
| 01-10 | **Prompting Large Language Models for Recommender Systems: A Comprehensive Framework and Empirical Analysis**
机构: Renmin University of China, Beijing Key Laboratory of Big Data Management and Analysis Methods, Meituan Group
这项工作提出了一个名为ProLLM4Rec的框架,系统地分析了利用大型语言模型(LLMs)作为推荐系统的基础模型,并通过实验测试了不同情况下对LLMs的影响。通过实证分析,总结了对未来研究的启发性发现。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.04997v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.04997.md) |
| 01-10 | **CASA: Causality-driven Argument Sufficiency Assessment**
机构: Peking University
本论文介绍了一个基于LLMs的零样本因果驱动论证充分性评估框架(CASA),成功应对了无观测数据下论证充分性量化和干预的难题,并在实际应用中展示了其有效性。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.05249v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.05249.md)
[![GitHub](https://img.shields.io/badge/GitHub-View-brightgreen?logo=github)](https://github.com/xxxiaol/CASA)
|
| 01-10 | **InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks**
InfiAgent-DABench提供了一个新颖的评估基准,这不仅有助于衡量智能代理在数据分析任务中的性能,同时也是探索如何改进和优化LLM在这一特定领域应用的重要一步。|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.05507v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.05507.md)
[![GitHub](https://img.shields.io/badge/GitHub-View-brightgreen?logo=github)](https://github.com/InfiAgent/InfiAgent)
|
| 01-10 | **Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security**
机构: Tsinghua University, Xiaomi AI Lab
该论文作为一项调研工作,介绍了个人LLM代理的现状、挑战和未来趋势,并提出了一种通用的系统架构和智能水平定义。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.05459v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.05459.md) |
| 01-09 | **Rewriting the Code: A Simple Method for Large Language Model Augmented Code Search**
机构: Nanyang Technological University Singapore
ReCo利用LLMs重写代码库中的代码,通过风格规范化显著提高了代码搜索的准确性,并通过新的评价指标CSSim量化了风格的差异,推动了代码样式标准化的研究。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.04514v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.04514.md) |
| 01-09 | **Know Your Needs Better: Towards Structured Understanding of Marketer Demands with Analogical Reasoning Augmented LLMs**
机构: Zhejiang University, Ant Group
本文提出了一种名为ARALLM的新方法,该方法结合了类比推理和多任务模型提炼,有效促进了大型语言模型从自然语言中理解并转换为结构化的逻辑语言的能力。通过这种方法,非专业营销人员能够利用自然语言来选择目标用户,有望改变用户定位实践。这种能力的提升,不仅在营销场景中有实际的应用价值,同时也为大型语言模型的功能性和实用性做出了有益的探索。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.04319v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.04319.md) |
| 01-09 | **Agent Alignment in Evolving Social Norms**
机构: Fudan University
此论文提出了一个EvoluationaryAgent框架,用于评估和增强大型智能代理在动态持续变化的社会规范中的自适应性和一致性。研究强调了代理在进化中与社会规范对齐的重要性,并通过实验验证了模型的可行性。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.04620v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.0462.md) |
| 01-09 | **Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding**
机构: University of California San Diego, Google Cloud AI Research, Google Research
该论文提出了一个创新的CHAIN-OF-TABLE框架,通过将表格数据显式地用于推理链,动态地规划并更新操作过程,从而提高了LLMs在基于表格的推理任务中的准确性和可靠性。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.04398v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.04398.md) |
| 01-09 | **Large Language Models for Robotics: Opportunities, Challenges, and Perspectives**
机构: Northwestern Polytechnical University, University of Georgia, Shaanxi Normal University
论文提出的多模态GPT-4V框架,结合自然语言处理和视觉感知,有望解决LLMs在机器人任务规划中面对的挑战。这对于理解和实现更高级别的人机交互和人工智能的未来具有重要意义。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.04334v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.04334.md) |
| 01-09 | **The Critique of Critique**
机构: The Hong Kong Polytechnic University, Shanghai Jiao Tong University, Shanghai Artificial Intelligence Laboratory
METACRITIQUE是首个针对自然语言批判进行评价的框架,其通过精确度和召回率的原则评估批判的质量,并实现了高度的可解释性和透明性。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.04518v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.04518.md)
[![GitHub](https://img.shields.io/badge/GitHub-View-brightgreen?logo=github)](https://github.com/GAIR-NLP/MetaCritique)
|
| 01-08 | **MARG: Multi-Agent Review Generation for Scientific Papers**
机构: Northwestern University, The Hebrew University of Jerusalem, Allen Institute for AI
本论文提出了一个创新的多代理评论生成方法(MARG),可以跨越基础模型的上下文大小限制,生成高质量的科学论文同行评审反馈。通过用户研究和自动化度量,MARG的反馈质量对比基线有显著提高,生成的有用评论数量提高了2.2倍,同时生成了更加具体的评论。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.04259v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.04259.md) |
| 01-08 | **SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems**
机构: Fudan University
论文提出了一个基于多模态大型语言模型的多代理系统——SpeechAgents,其能模拟包含多达 25 名代理人的人类交流场景,并展现出卓越的可扩展性。通过使用多模态信号作为代理间交流的媒介,系统不仅可以模拟具有正确内容、真实节奏和丰富情感的对话,而且还能应用于如戏剧创作和有声小说生成等任务。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.03945v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.03945.md) |
| 01-08 | **TTMs: Fast Multi-level Tiny Time Mixers for Improved Zero-shot and Few-shot Forecasting of Multivariate Time Series**
机构: IBM Research
TTM展示了专门针对多样化时间序列数据训练的小型预训练模型在多变量时间序列零/少样本预测中的高效性和转移学习能力。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.03955v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.03955.md) |
| 01-07 | **ChatGPT for Conversational Recommendation: Refining Recommendations by Reprompting with Feedback**
机构: University of Louisville, Microsoft
该论文探索了ChatGPT作为对话推荐系统的有效性。通过构建围绕ChatGPT的流程,模拟用户实际使用情景,并对流行偏见进行了研究和缓解。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.03605v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.03605.md) |
| 01-07 | **Grimoire is All You Need for Enhancing Large Language Models**
机构: Beihang University, Renmin University of China
该论文提出了一种名为SLEICL的方法,通过强语言模型学习示例技能并将其转移给弱语言模型,显著提高了弱模型的ICL能力。通过实验验证了该方法的有效性,展现了该技术在增强弱语言模型上下文学习能力方面的潜力。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.03385v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.03385.md) |
| 01-07 | **Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon**
机构: Beijing Academy of Artificial Intelligence, Renmin University of China, Nankai University
这篇文章介绍了激活信标这一能够扩展大型语言模型上下文长度的新技术,使得模型能在有限上下文窗口内感知更广的上下文信息,同时保留对短上下文信息的处理能力。激活信标代表了一种有效、高效、兼容且训练成本低的方法,来扩展LLMs的上下文长度。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.03462v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.03462.md) |
| 01-07 | **Exploring Large Language Model based Intelligent Agents: Definitions, Methods, and Prospects**
机构: The Chinese University of Hong Kong, DeepWisdom, Peking University
论文提出了一个用于指导未来研究与开发的基于LLM的智能代理系统的框架,并探讨了提高它们的计划能力和多模态信息处理能力的不同方法,以及如何解决LLM代理所面临的挑战,为未来的研究方向提供了清晰的指南。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.03428v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.03428.md) |
| 01-06 | **Quartet Logic: A Four-Step Reasoning (QLFR) framework for advancing Short Text Classification**
机构: Aerospace Information Research Institute Chinese Academy of Sciences, Key Laboratory of Target Cognition and Application Technology, University of Chinese Academy of Sciences
本研究提出了一个针对短文本分类任务的Quartet Logic: A Four-Step Reasoning (QLFR)框架,以及一个CoT驱动的多任务学习(QLFR-CML)方法,这两者都通过大语言模型的推理链来解决STC领域中的挑战。实验结果证明了这些方法的有效性和实用性。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.03158v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.03158.md) |
| 01-06 | **The Dawn After the Dark: An Empirical Study on Factuality Hallucination in Large Language Models**
机构: Renmin University of China, Université de Montréal
本论文通过系统性实证研究,深入了解并探索大型语言模型中的幻觉问题,识别了幻觉的来源、检测方法和减轻策略,并提出了新的基准HaluEval 2.0和简单有效的幻觉检测框架。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.03205v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.03205.md)
[![GitHub](https://img.shields.io/badge/GitHub-View-brightgreen?logo=github)](https://github.com/RUCAIBox/HaluEval-2.0)
|
| 01-06 | **CogGPT: Unleashing the Power of Cognitive Dynamics on Large Language Models**
机构: Harbin Institute of Technology, Kuaishou Technology
CogGPT通过引入迭代认知机制和记忆保持系统,有效地解决了大型语言模型在模仿人类认知动态方面的挑战,展示了在连续信息处理中的优秀表现。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.08438v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.08438.md) |
| 01-05 | **From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models**
机构: Beike Inc.
本论文介绍了RAISE框架,通过增强记忆系统和结构化的代理构建过程,提高了LLMs在多轮对话中的表现,尤其是在房地产销售情境中。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.02777v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.02777.md) |
| 01-05 | **Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache**
机构: Alibaba Group, Shanghai Jiao Tong University
文章提出了一个有效支持长上下文语言模型云服务的系统,通过分布式算法DistAttention,优化了注意力模块的处理和存储,并通过DistKV-LLM服务系统进行管理和协调,实现了在分布式环境中对资源的高效分配和管理,验证了其在性能上的明显提高。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.02669v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.02669.md) |
| 01-04 | **LLM Augmented LLMs: Expanding Capabilities through Composition**
机构: Google Research, Google DeepMind
该论文提出了一个新的模型扩展框架 —— CALM,有效整合了两个大型语言模型以实现新的任务,且在多个实验中证明了其有效性。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.02412v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.02412.md) |
| 01-04 | **SPEER: Sentence-Level Planning of Long Clinical Summaries via Embedded Entity Retrieval**
机构: Columbia University
本论文针对医院出院总结的长篇文档任务,提出了一个基于嵌入式实体检索的句子级规划方法SPEER,通过引导大型语言模型LLMs更好地覆盖关键实体,生成更完整和可信的临床总结。研究证明了SPEER方法在实际应用中可以提高文档的覆盖度和准确性,减轻临床医生的文档负担。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.02369v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.02369.md) |
| 01-04 | **ICE-GRT: Instruction Context Enhancement by Generative Reinforcement based Transformers**
机构: Bytedance Inc.
本论文提出了针对大型语言模型在特定领域任务中深度与准确性提升的方法——ICE-GRT。通过结合人类反馈的强化学习,ICE-GRT 在不牺牲一般性能的前提下,显著提升了特定领域的能力,并在多项评估任务中达到了最先进的性能。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.02072v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.02072.md) |
| 01-04 | **Self-Contrast: Better Reflection Through Inconsistent Solving Perspectives**
机构: Zhejiang University, OPPO Research Institute
本文提出了一种名为“自我对比”的新策略,用于改善大型语言模型(LLM)在反思和自我修正过程中存在的固执和不一致问题,通过创建多样化解决方案视角,对比不同解决方案的差异,并将差异总结为检查清单,进而提升了LLM的反思质量,并通过实验验证了该策略的效果和广泛适用性。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.02009v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.02009.md) |
| 01-04 | **Using LLM to select the right SQL Query from candidates**
机构: Peking University
本文提出了一种借助大型语言模型自动生成text-to-SQL测试用例的方法,并设计了三步重新排序过程,实验显示该方法能显著提高现有text-to-SQL模型的性能。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.02115v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.02115.md) |
| 01-04 | **On the Prospects of Incorporating Large Language Models (LLMs) in Automated Planning and Scheduling (APS)**
机构: University of South Carolina, New Mexico State University, IBM Research
本论文在理解大型语言模型(LLMs)与自动规划和调度(APS)的整合前景,突破了传统系统对上下文的适应性局限性,为实现更动态、上下文敏感的规划途径提供了可能性,并为进一步的应用和研究奠定了基础。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.02500v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.025.md) |
| 01-04 | **On the Prospects of Incorporating Large Language Models (LLMs) in Automated Planning and Scheduling (APS)**
机构: University of South Carolina, New Mexico State University, IBM Research
本文是对大型语言模型在自动规划和调度领域的应用进行了综述,提出了将领先的 LLMs 如 GPT-4 和 BERT 与经典规划方法结合的前景,以及在八个不同的规划问题类别中应用 LLMs 的潜力,以期发展更先进、更智能的规划系统。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.02500v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.025.md) |
| 01-03 | **MedSumm: A Multimodal Approach to Summarizing Code-Mixed Hindi-English Clinical Queries**
机构: Indian Institute of Technology Patna, Stanford University, Amazon GenAI
MedSumm是一个新颖的多模态医疗问题总结框架,它能够通过整合文本和视觉信息生成医学细节丰富的总结,有潜力提高医疗决策的质量并加深对患者问题的理解。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.01596v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.01596.md) |
| 01-03 | **Social Media Ready Caption Generation for Brands**
机构: Adobe Research India
本论文提出了一个新的框架,旨在帮助品牌在社交媒体上创造与品牌形象和个性相符的吸引人的标题。框架分为两部分,成功应对了生成与品牌相关性强且吸引眼球的社交媒体标题的挑战。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.01637v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.01637.md) |
| 01-02 | **A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models**
机构: Islamic University of Technology Bangladesh, University of South Carolina, Stanford University
本文是对LLM幻觉减轻技术的全面综述,提出了分类框架和系统化的反馈和理由方法,并评估了这些技术的有效性和影响。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.01313v2)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.01313.md) |
| 01-02 | **LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning**
这篇论文成功展示了一种无需fine-tuning即可扩展LLMs上下文窗口的方法,这对于在计算资源受限情况下提升大型语言模型处理长文本的能力具有重要意义。|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.01325v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.01325.md) |
| 01-01 | **From Prompt Engineering to Prompt Science With Human in the Loop**
机构: University of Washington
文章展示了如何将LLMs的提示工程转化为更为科学和系统的提示科学。通过引入人在环中的质性编码方法,确保了LLM生成的响应的质量和一致性,同时消除了个体主观性和随意性。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.04122v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.04122.md) |
| 01-01 | **The Earth is Flat? Unveiling Factual Errors in Large Language Models**
机构: The Chinese University of Hong Kong, Tencent AI Lab
本文介绍的FactChecker提供了针对大型语言模型的事实错误自动测试新框架,通过构建知识图谱并生成测试问题,揭示并减少了模型的事实错误。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.00761v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.00761.md) |
| 01-01 | **A & B == B & A: Triggering Logical Reasoning Failures in Large Language Models**
机构: The Chinese University of Hong Kong, Tencent AI Lab
本论文针对LLMs的逻辑推理能力的评估和改进问题,提出了一个名为LogicAsker的方法,能够全面评估LLMs的推理能力,并通过问题生成和上下文学习有效提升这些能力。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.00757v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2024-01/2401.00757.md) |

---

### 12月

|  Date   | Paper | Links & Summary |
| --- | --- | --- |
| 12-31 | **BatchEval: Towards Human-like Text Evaluation**
机构: Beijing Institute of Technology, Xiaohongshu Inc
论文提出了一种新的LLM评估范式——BATCHEVAL,解决了自动文本评估在鲁棒性和与人类判断一致性方面的问题。通过批量评估和迭代处理,BATCHEVAL在准确性和成本效率方面显著超越了现有方法。
|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.00437v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-12/2401.00437.md) |
| 12-31 | **Improving Text Embeddings with Large Language Models**
机构: Microsoft Corporation
本文采用最新的大型语言模型和合成数据,提出一种新颖的文本嵌入方法,能够在无需人工标注数据且训练步骤少于1千的情况下,达到与竞争性基准相匹配的性能,为进一步提升文本嵌入技术提供了有力证据。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.00368v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-12/2401.00368.md) |
| 12-29 | **The Right Prompts for the Job: Repair Code-Review Defects with Large Language Model**
机构: Ant Group, Nanjing University
研究探讨了LLMs在代码审查缺陷修复中的应用,提出了一个有效的半自动APR范例,分析了9种流行模型的性能,并设计了有效的提示以指导代码修复过程。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.17485v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-12/2312.17485.md) |
| 12-29 | **Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without Tuning**
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.17484v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-12/2312.17484.md)
[![GitHub](https://img.shields.io/badge/GitHub-View-brightgreen?logo=github)](https://github.com/jongjyh/trfr)
|
| 12-29 | **Building Efficient Universal Classifiers with Natural Language Inference**
机构: Vrije Universiteit Amsterdam, University of London Royal Holloway, Hugging Face
这篇论文提供了一种利用自然语言推断进行通用文本分类的新方法,并且提供了实现该方法的详细步骤和工具,能够在不牺牲性能的前提下显著提高模型的效率。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.17543v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-12/2312.17543.md) |
| 12-29 | **DB-GPT: Empowering Database Interactions with Private Large Language Models**
机构: Alibaba Group
本文提出了名为DB-GPT的创新项目,该项目集成了LLMs及数据库系统,以提升用户体验和无障碍性。DB-GPT展现了层次化设计,有效处理了隐私和安全保护等问题,同时通过多源RAG和自适应ICL提升了系统的整体性能和效率。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.17449v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-12/2312.17449.md)
[![GitHub](https://img.shields.io/badge/GitHub-View-brightgreen?logo=github)](https://github.com/eosphoros-ai/DB-GPT)
|
| 12-29 | **Enhancing Quantitative Reasoning Skills of Large Language Models through Dimension Perception**
机构: **Institution:** Shanghai Key Laboratory of Data Science School of Computer Science Fudan University, School of Data Science Fudan University, DataGrand Co. LTD
本文的研究通过建立维度单位知识库和定制化基准测试,显著提升了LLMs的定量推理能力。这为理解文本中重要的量值信息并进行高准确度的推理任务提供了新的途径。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.17532v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-12/2312.17532.md) |
| 12-28 | **Experiential Co-Learning of Software-Developing Agents**
机构: Tsinghua University,Dalian University of Technology,Beijing University of Posts and Telecommunications
本文提出了一种新的框架,称为经验共同学习(Experiential Co-Learning),通过共同追踪、共同记忆和共同推理模块的顺序实现,使得LLM驱动的智能代理能够更有效地从历史轨迹中学习,并利用历史经验来相互推理解决新任务。展示了明显优于现有技术的绩效改进。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.17025v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-12/2312.17025.md) |
| 12-28 | **Grounding-Prompter: Prompting LLM with Multimodal Information for Temporal Sentence Grounding in Long Videos**
机构: Tsinghua University
本论文提出了Grounding-Prompter方法,针对长视频中的TSG问题,将LLM与时序推理和多模态信息结合起来,证明了通过多模态提示LLM的有效性,并通过实验验证了其在长视频TSG任务中的优越性。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.17117v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-12/2312.17117.md) |
| 12-28 | **Structured Packing in LLM Training Improves Long Context Utilization**
机构: University of Warsaw, Google DeepMind, Polish Academy of Sciences
这篇论文通过提出SPLICE方法来改进长距离上下文的利用,验证了其在提高大规模语言模型上下文利用率和改进长上下文任务性能方面的有效性。SPLICE特别适用于在缺乏额外结构化信息的训练数据上构造训练示例。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.17296v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-12/2312.17296.md) |
| 12-28 | **GitAgent: Facilitating Autonomous Agent with GitHub by Tool Extension**
机构: Tsinghua University, Renmin University of China
文章主要介绍了一个名为GITAGENT的自主代理,它可以自主从GitHub扩展工具,以满足用户查询的多种需求。GITAGENT通过解决非标准化挑战,能够自主学习基于GitHub Issues/PRs的人类经验,以解决工具扩展过程中的问题,并且展示了在自主集成工具以完成跨专业领域任务方面的有效性。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.17294v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-12/2312.17294.md) |
| 12-28 | **Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs**
机构: Chinese University of Hong Kong, Tencent AI Lab
这篇论文提出了一个创新的评估模型,要求LLMs不仅要解决问题,还要进行元推理——即评估推理过程本身。这种方法有望揭示由于以往以结果为导向的评估方法而忽略的模型认知缺陷,为未来LLMs的评估和训练提供了新的方向。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.17080v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-12/2312.1708.md) |
| 12-28 | **DrugAssist: A Large Language Model for Molecule Optimization**
机构: Tencent AI Lab, Department of Computer Science Hunan University
DrugAssist是一个通过人机交互进行分子优化的模型,它突破了LLMs在药物发现过程中互动性不足的局限,并在多属性分子优化领域展现了出色的性能和转移能力。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.10334v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-12/2401.10334.md)
[![GitHub](https://img.shields.io/badge/GitHub-View-brightgreen?logo=github)](https://github.com/blazerye/DrugAssist)
|
| 12-28 | **Improving In-context Learning via Bidirectional Alignment**
机构: Nanyang Technological University, Princeton University, Salesforce Research USA
本文通过引入新颖的排名损失以及对输出分布的对齐,提出了双向对齐(BiAlign),有效提高了小型模型的 ICL 能力。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.17055v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-12/2312.17055.md) |
| 12-28 | **Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs**
机构: Chinese University of Hong Kong, Tencent AI Lab
这篇论文提出了一个挑战LLMs进行元推理的新评估范式,并开发了配套的公开基准DiagGSM8K,这为评估LLMs的认知能力增加了一个新维度。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.17080v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-12/2312.1708.md) |
| 12-27 | **Rethinking Tabular Data Understanding with Large Language Models**
机构: UC San Diego, USC, UC Davis
这篇论文深入探讨了LLMs对表格数据的理解和推理能力,对表格结构的鲁棒性、文本与符号推理的比较,以及多推理路径聚合对模型性能提升的影响做出了贡献。所提出的表格结构标准化方法和混合自一致性机制对提高LLMs在表格数据推理上的性能具有重要意义。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.16702v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-12/2312.16702.md) |
| 12-27 | **How Robust are LLMs to In-Context Majority Label Bias?**
机构: Amazon
本文对LLMs在面对ICL中多数类标签偏差时的鲁棒性进行了全面研究,通过实验发现某些模型在处理这种偏差时显示出显著的稳定性。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.16549v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-12/2312.16549.md) |
| 12-27 | **Conversational Question Answering with Reformulations over Knowledge Graph**
机构: University of Illinois at Urbana-Champaign, Amazon
CoRnNet 是一种新型RL模型,用于在知识图谱上进行会话式问题回答并结合LLM生成的改写,展现了比其他先进模型更出色的性能。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.17269v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-12/2312.17269.md) |
| 12-27 | **Adapting Large Language Models for Education: Foundational Capabilities, Potentials, and Challenges**
机构: Shanghai Jiao Tong University (SJTU)
本论文是一个关于如何适配大型语言模型于教育系统的综述,它提供了对LLMs在教育相关能力方面的发展情况的概述,并探讨了构建这样系统的潜力与挑战,为未来的相关研究提供了洞见。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.08664v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-12/2401.08664.md) |
| 12-26 | **Supervised Knowledge Makes Large Language Models Better In-context Learners**
机构: School of Engineering Westlake University, Westlake Institute for Advanced Study, Peking University
论文提出的SuperContext框架通过利用特定任务微调的SLMs的监督知识,显著提高了LLMs在自然语言理解和问答任务中的泛化能力和事实性。它代表了将小型模型的强大功能融入LLMs,以处理分布外数据和最小化幻觉现象的一种创新做法。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.15918v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-12/2312.15918.md) |
| 12-26 | **Align on the Fly: Adapting Chatbot Behavior to Established Norms**
机构: Shanghai Jiao Tong University, Shanghai Artificial Intelligence Laboratory, The Hong Kong Polytechnic University
该工作提出了一个动态的OPO方法,通过收集法律和道德规则作为外部存储器来限制LLMs的行为,无需进一步训练,并通过一个可扩展的评估模块来应对潜在的基准测试泄漏问题及扩大测试规则的范围。尽管该方法在推理效率方面存在局限性并且检索模型仍可进一步优化,但在多个评估数据集上的广泛实验表明了该方法的有效性。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.15907v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-12/2312.15907.md)
[![GitHub](https://img.shields.io/badge/GitHub-View-brightgreen?logo=github)](https://github.com/GAIR-NLP/OPO)
|
| 12-26 | **Think and Retrieval: A Hypothesis Knowledge Graph Enhanced Medical Large Language Models**
机构: Key Laboratory of High Confidence Software Technologies (Peking University), Ministry of Education; School of Computer Science Peking University, Beijing China
HyKGE框架有效解决了大型语言模型在面对医疗领域复杂问题时的准确性和解释性挑战,具有在医疗领域中的潜在应用并且在实际场景中展示出了很大的优越性。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.15883v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-12/2312.15883.md) |
| 12-26 | **Scaling Down, LiTting Up: Efficient Zero-Shot Listwise Reranking with Seq2seq Encoder-Decoder Models**
机构: University of Waterloo
这篇论文提出了LiT5-Distill和LiT5-Score两种序列到序列的编码器-解码器模型,用于有效的零样本列表级重新排序。这些方法不仅在模型效果上竞争力强,并且解决了传统依赖于大型LLM和外部相关性标签的问题,展示了在这一领域的优化和进步。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.16098v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-12/2312.16098.md)
[![GitHub](https://img.shields.io/badge/GitHub-View-brightgreen?logo=github)](https://github.com/castorini/LiT5)
|
| 12-26 | **KnowledgeNavigator: Leveraging Large Language Models for Enhanced Reasoning over Knowledge Graph**
机构: Northeastern University, Neusoft AI Magic Technology Research, Neusoft Institute of Intelligent Medical Research
这篇论文介绍了一个新型框架KnowledgeNavigator,它通过改善知识图谱上的推理过程,解决了LLM在复杂推理任务上的性能局限问题。实验结果证实了其有效性,并有望在高风险和高敏感领域推广LLM的应用。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.15880v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-12/2312.1588.md) |
| 12-26 | **A Prompt Learning Framework for Source Code Summarization**
机构: Nanyang Technological University, Tencent Inc., Nanjing University
本论文提出了一个新颖的PromptCS框架,用于源代码摘要,能够生成高质量的摘要,减少了训练成本,并提供了代码以供他人研究。
|
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.16066v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-12/2312.16066.md) |
| 12-26 | **Aligning Large Language Models with Human Preferences through Representation Engineering**
机构: Fudan University