{"id":13845748,"url":"https://github.com/howie6879/liuli","last_synced_at":"2025-05-16T15:08:54.645Z","repository":{"id":40271806,"uuid":"355373703","full_name":"howie6879/liuli","owner":"howie6879","description":"一站式构建多源、干净、个性化的阅读环境(Build a multi-source, clean and personalized reading environment in one stop.)","archived":false,"fork":false,"pushed_at":"2023-07-25T04:33:12.000Z","size":2721,"stargazers_count":889,"open_issues_count":11,"forks_count":108,"subscribers_count":9,"default_branch":"main","last_synced_at":"2025-05-13T12:08:53.853Z","etag":null,"topics":["liuli","notes","novel","wechat"],"latest_commit_sha":null,"homepage":"https://liuli.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/howie6879.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2021-04-07T01:12:32.000Z","updated_at":"2025-04-24T07:23:53.000Z","dependencies_parsed_at":"2024-01-17T07:18:06.266Z","dependency_job_id":"103ed5eb-a1c7-400a-8a76-2fdf4413769f","html_url":"https://github.com/howie6879/liuli","commit_stats":null,"previous_names":["howie6879/2c"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/howie6879%2Fliuli","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/howie6879%2Fliuli/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/howie6879%2Fliuli/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/howie6879%2Fliuli/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/howie6879","download_url":"https://codeload.github.com/howie6879/liuli/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254553958,"owners_count":22090417,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["liuli","notes","novel","wechat"],"created_at":"2024-08-04T17:03:35.001Z","updated_at":"2025-05-16T15:08:49.631Z","avatar_url":"https://github.com/howie6879.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\u003cimg src=\"./.files/images/logo_pure.jpg\" width='120px' height='120px' alt=\"Liuli logo\" \u003e\n\u003c/p\u003e\n\n\u003ch1 align=\"center\"\u003eLiuli\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e📖 一站式构建多源、干净、个性化的阅读环境\u003c/p\u003e\n\u003cp align=\"center\"\u003e\u003cstrong\u003e琉璃开净界，薜荔启禅关\u003c/strong\u003e\u003c/p\u003e\n\n\u003c!-- \u003cdiv align=center\u003e\u003cimg src=\".files/images/liuli_ads_word_cloud.jpg\"  width=\"100%\" alt=\"liuli_ads_word_cloud\" /\u003e\u003c/div\u003e --\u003e\n\n## ✨ 特性\n\n使用`Liuli`，你可以得到:\n\n- [x] 配置化开发，自定义输入、处理、输出\n- [x] 信息备份(支持跨源): Github, MongoDB\n- [ ] 机器学习赋能：验证码识别、广告分类、智能标签\n- [ ] 阅读源管控，构建知识管理平台\n- [x] 官方案例技术支持\n\n使用场景：\n\n- [ ] 目标监控\n- [x] 公众号：\n  - [x] **广告问题**：[打造一个干净且个性化的公众号阅读环境](https://mp.weixin.qq.com/s/NKnTiLixjB9h8fSd7Gq8lw)\n  - [x] **RSS订阅问题**：[基于Liuli构建纯净的RSS公众号信息流](https://mp.weixin.qq.com/s/rxoq97YodwtAdTqKntuwMA)\n- [x] 书籍(小说)追更订阅：[基于Liuli追更\u0026阅读小说](https://mp.weixin.qq.com/s/RSVZFxiq8G7a51te4q93gQ)\n\n## 🍥 使用\n\n教程[使用前必读]：\n - [01.使用教程](./docs/01.使用教程.md)\n - [02.环境变量](./docs/02.环境变量.md)\n - [03.分发器配置](./docs/03.分发器配置.md)\n - [04.备份器配置](./docs/04.备份器配置.md)\n\n快速开始，请先确保安装`Docker`：\n\n```shell\nmkdir liuli \u0026\u0026 cd liuli\n# 数据库目录\nmkdir mongodb_data\n# 任务配置目录\nmkdir liuli_config\nwget -O liuli_config/default.json https://raw.githubusercontent.com/howie6879/liuli/main/liuli_config/default.json\n# 配置 pro.env 具体查看 doc/02.环境变量.md\nvim pro.env\n# 下载 docker-compose\nwget https://raw.githubusercontent.com/howie6879/liuli/main/docker-compose.yaml\n# 启动\ndocker-compose up -d\n```\n\n代码安装使用过程如下：\n\n```shell\n# 确保有Python3.7+环境\ngit clone https://github.com/liuli-io/liuli.git --depth=1\ncd liuli\n\n# 创建基础环境\npipenv install --python={your_python3.7+_path} --dev  --skip-lock\n# 配置.env 具体查看 doc/02.环境变量.md 启动调度\npipenv run dev_schedule\n```\n\n启动成功日志如下：\n\n```shell\nLoading .env environment variables...\n[2021:12:23 23:08:35] INFO  Liuli Schedule started successfully :)\n[2021:12:23 23:08:35] INFO  Liuli Schedule time: 00:00 06:00\n[2021:12:23 23:09:36] INFO  Liuli playwright 匹配公众号 老胡的储物柜(howie_locker) 成功! 正在提取最新文章: 我的周刊(第018期)\n[2021:12:23 23:09:39] INFO  Liuli 公众号文章持久化成功! 👉 老胡的储物柜\n[2021:12:23 23:09:40] INFO  Liuli 🤗 微信公众号文章更新完毕(1/1)\n```\n\n推送效果如图：\n\n\u003cdiv align=center\u003e\u003cimg width=\"20%\" src=\"https://raw.githubusercontent.com/howie6879/oss/master/images/m3nJ61.png\" /\u003e\u003c/div\u003e\n\n## 🤔 实现\n\n大概流程如下：\n\n\u003cdiv align=center\u003e\u003cimg src=\".files/images/liuli_process.svg\" width=\"85%\" alt=\"liuli_process\" /\u003e\u003c/div\u003e\n\n简单解释一下：\n\n- **采集器**：监控各自关注的公众号、书籍或者博客源等自定义阅读源，以统一标准格式流入`Liuli`作为输入源；\n- **处理器**：对目标内容进行自定义处理，如基于历史广告数据，利用机器学习实现一个广告分类器自动打标签，或者引入钩子函数在相关节点执行等；\n- **分发器**：依靠接口层进行数据请求\u0026响应，为使用者提供个性化配置，然后根据配置自动进行分发，将干净的文章流向微信、钉钉、TG、RSS客户端甚至自建网站；\n- **备份器**：将处理后的文章进行备份，如持久化到数据库或者GitHub等。\n\n这样做就实现了干净阅读环境的构建，衍生一下，基于获取的数据，可做的事情有很多，大家不妨发散一下思路。\n\n开发进度看板:\n\n- [x] [v0.2.0](https://github.com/liuli-io/liuli/projects/1): 实现基础功能，保证常规场景解决方案可应用\n- [ ] [v0.3.0](https://github.com/liuli-io/liuli/projects/2): 实现采集器自定义，用户所见即可采集\n\n## 🤖 帮助\n\n为了提升模型的识别准确率，我希望大家能尽力贡献一些广告样本，请看样本文件：[.files/datasets/ads.csv](.files/datasets/ads.csv)，我设定格式如下：\n\n| title        | url          | is_process |\n| ------------ | ------------ | ---------- |\n| 广告文章标题 | 广告文章连接 | 0          |\n\n字段说明：\n\n- title：文章标题\n- url：文章链接，如果微信文章想、请先验证是否失效\n- is_process：表示是否进行样本处理，默认填`0`即可\n\n来个实例：\n\n\u003cdiv align=center\u003e\u003cimg src=\".files/images/liuli_ads_csv_demo.jpg\"  width=\"100%\" alt=\"liuli_ads_csv_demo\" /\u003e\u003c/div\u003e\n\n一般广告会重复在多个公众号投放，填写的时候麻烦查一下是否存在此条记录，希望大家能一起合力贡献，亲，来个 [PR](https://github.com/howie6879/liuli/issues/4) 贡献你的力量吧！\n\n## 👀 致谢\n\n感谢以下开源项目：\n\n- [Flask](https://github.com/pallets/flask): web框架\n- [Vue](https://github.com/vuejs/core): 渐进式JavaScript框架\n- [Ruia](https://github.com/howie6879/ruia): 异步爬虫框架（自研自用）\n- [playwright](https://github.com/microsoft/playwright): 使用浏览器进行数据抓取\n\n以上仅列出比较核心的开源依赖，更多第三方依赖请见[Pipfile](./Pipfile)文件。\n\n您任何PR都是对`Liuli`项目的大力支持，非常感谢以下开发者的贡献（排名不分先后）：\n\n\u003c!-- To get src for img: https://api.github.com/users/username --\u003e\n\u003ca href=\"https://github.com/howie6879\"\u003e\u003cimg src=\"https://avatars.githubusercontent.com/u/17047388?s=60\u0026v=4\" title=\"howie6879\" width=\"40\" height=\"40\" \u003e\u003c/a\u003e\n\u003ca href=\"https://github.com/AI-xiaofour\"\u003e\u003cimg src=\"https://avatars.githubusercontent.com/u/20813419?v=4\" title=\"AI-xiaofour\" width=\"40\" height=\"40\" \u003e\u003c/a\u003e\n\u003ca href=\"https://github.com/Xuenew\"\u003e\u003cimg src=\"https://avatars.githubusercontent.com/u/41135035?s=64\u0026v=4\" title=\"Xuenew\" width=\"40\" height=\"40\" \u003e\u003c/a\u003e\n\u003ca href=\"https://github.com/cn-qlg\"\u003e\u003cimg src=\"https://avatars.githubusercontent.com/u/15536545?s=64\u0026v=4\" title=\"cn-qlg\" width=\"40\" height=\"40\" \u003e\u003c/a\u003e\n\u003ca href=\"https://github.com/baboon-king\"\u003e\u003cimg src=\"https://avatars.githubusercontent.com/u/63645337?v=4\" title=\"baboon-king\" width=\"40\" height=\"40\" \u003e\u003c/a\u003e\n\u003ca href=\"https://github.com/123seven\"\u003e\u003cimg src=\"https://avatars.githubusercontent.com/u/42730681?v=4\" title=\"123seven\" width=\"40\" height=\"40\" \u003e\u003c/a\u003e\n\u003ca href=\"https://github.com/zyd16888\"\u003e\u003cimg src=\"https://avatars.githubusercontent.com/u/26684563?v=4\" title=\"zyd16888\" width=\"40\" height=\"40\" \u003e\u003c/a\u003e\n\u003ca href=\"https://github.com/LeslieLeung\"\u003e\u003cimg src=\"https://avatars.githubusercontent.com/u/22127499?v=4\" title=\"LeslieLeung\" width=\"40\" height=\"40\" \u003e\u003c/a\u003e\n\u003ca href=\"https://github.com/gclm\"\u003e\u003cimg src=\"https://avatars.githubusercontent.com/u/27618687?v=4\" title=\"gclm\" width=\"40\" height=\"40\" \u003e\u003c/a\u003e\n\u003ca href=\"https://github.com/showthesunli\"\u003e\u003cimg src=\"https://avatars.githubusercontent.com/u/3203516?v=4\" title=\"showthesunli\" width=\"40\" height=\"40\" \u003e\u003c/a\u003e\n\u003ca href=\"https://github.com/throughs\"\u003e\u003cimg src=\"https://avatars.githubusercontent.com/u/54225721?v=4\" title=\"throughs\" width=\"40\" height=\"40\" \u003e\u003c/a\u003e\n\u003ca href=\"https://github.com/LiuYi0526\"\u003e\u003cimg src=\"https://avatars.githubusercontent.com/u/50787709?v=4\" title=\"LiuYi0526\" width=\"40\" height=\"40\" \u003e\u003c/a\u003e\n\u003ca href=\"https://github.com/blue-troy\"\u003e\u003cimg src=\"https://avatars.githubusercontent.com/u/12729455?v=4\" title=\"blue-troy\" width=\"40\" height=\"40\" \u003e\u003c/a\u003e\n\u003ca href=\"https://github.com/didnhdj2\"\u003e\u003cimg src=\"https://avatars.githubusercontent.com/u/115675424?v=4\" title=\"didnhdj2\" width=\"40\" height=\"40\" \u003e\u003c/a\u003e\n\n## 👉 关于\n\n欢迎一起交流（关注入群）：\n\n\u003cdiv align=center\u003e\u003cimg src=\"https://raw.githubusercontent.com/howie6879/oss/master/images/wechat_howie.png\"  width=\"85%\" alt=\"img\" /\u003e\u003c/div\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhowie6879%2Fliuli","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhowie6879%2Fliuli","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhowie6879%2Fliuli/lists"}