{"id":19349054,"url":"https://github.com/benature/bib-catcher","last_synced_at":"2025-04-23T06:30:46.184Z","repository":{"id":113393732,"uuid":"554381581","full_name":"Benature/bib-catcher","owner":"Benature","description":"Get bibtex of multiple references in a single line text, by python scraping Google Scholar.","archived":false,"fork":false,"pushed_at":"2024-02-09T09:24:32.000Z","size":160,"stargazers_count":34,"open_issues_count":0,"forks_count":3,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-02T09:22:24.311Z","etag":null,"topics":["bib","bibtex","google-scholar"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Benature.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-10-19T18:09:22.000Z","updated_at":"2025-03-21T09:53:50.000Z","dependencies_parsed_at":null,"dependency_job_id":"43812f6b-fe6d-4df4-ba49-624a10eda7ae","html_url":"https://github.com/Benature/bib-catcher","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Benature%2Fbib-catcher","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Benature%2Fbib-catcher/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Benature%2Fbib-catcher/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Benature%2Fbib-catcher/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Benature","download_url":"https://codeload.github.com/Benature/bib-catcher/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250384700,"owners_count":21421779,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bib","bibtex","google-scholar"],"created_at":"2024-11-10T04:24:28.869Z","updated_at":"2025-04-23T06:30:45.867Z","avatar_url":"https://github.com/Benature.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003c!-- \u003ca href=\"https://github.com/Benature/bib-catcher\"\u003e\u003cimg src=\"https://i.328888.xyz/2022/12/10/f9HqU.png\" height=\"150\" align=\"right\"\u003e\u003c/a\u003e --\u003e\n\n# bib Catcher 🫳\n\n- Parse paper's reference into `.bib` file and load it in Zotero.app.\n  - support paper (citekey) and url\n- covert cite index (e.g. `[11]`) to its citekey (e.g. `[[@bib2022catcher]]`)\n\n## Features\n\n- Catch bibtex\n    ```shell\n    python catcher.py \u003ccitekey/doi\u003e [-i true/false]\n    ```\n    \n    - if `citekey/doi` is not provided, it will catch the most recent modified file in `input/`.\n    - `-i`, `--ignore_last_fail` (optional): ignore the cites that failed before\n\n- Convert reference index to citekey in wikilink format\n    `\u003ccitekey\u003e` is optional. If no citekey is provided, it will load the last caught paper.\n    ```shell\n    python converter.py \u003ccitekey\u003e\n    ```\n\n    - if `citekey` is not provided, it will convert last paper that caught\n\n\n    example:\n    ```diff\n    - Another line of recent work focuses on designing compact data structures [19,27,44] with tradeoffs between accuracy and resource footprints. \n    - Next, we compare the total storage overhead of SyNDB to that of NetSight [32]. \n    - Recent studies [65] have observed high utilization only across a few switch ports during congestion events.\n    + Another line of recent work focuses on designing compact data structures ([[al2008scalable]], [[ghorbani2017drill]], [[li2019deter]]) with tradeoffs between accuracy and resource footprints. \n    + Next, we compare the total storage overhead of SyNDB to that of [[handigol2014know|NetSight]]. \n    + Recent studies ([[zhang2017high]]) have observed high utilization only across a few switch ports during congestion events.\n    ```\n\n- Rest API server *(via Flask)*\n    ```shell\n    python api.py\n    ```\n\n- Generate reference relationship graph\n    ```shell\n    python echarts.py\n    ```\n\n## Usage\n\n[English | [中文](#bib-捕手-🫳)]\n\n```shell\npip install -r requirements.txt\n```\n\nCopy the **Reference** text from the paper and paste it into a file in `input/` folder. \nFor example, I paste the followed text to `input/benson2010network.txt` (\"benson2010network\" is the citekey of the demo paper)\n\n```txt\n[1] M. Al-Fares, A. Loukissas, and A. Vahdat. A scalable, commodity data center network architecture. In SIGCOMM, pages 63–74, 2008.\n[2] M. Al-Fares, S. Radhakrishnan, B. Raghavan, W. College, N. Huang, and A. Vahdat. Hedera: Dynamic flow scheduling for data center networks. In Proceedings of NSDI 2010, San Jose, CA, USA, April 2010. [3] T. Benson, A. Anand, A. Akella, and M. Zhang. Understanding Data Center Traffic Characteristics. In Proceedings of Sigcomm Workshop: Research on Enterprise Networks, 2009. \n[4] T. Benson, A. Anand, A. Akella, and M. Zhang. The case for fine-grained traffic engineering in data centers. In Proceedings of INM/WREN ’10, San Jose, CA, USA, April 2010. \n```\n\nThe text can be in only one line. Note that each reference should be separated with a space (` `), which is used to separate each reference by regular expression.\n\nRun the cather to scrape the bibtex from Google Scholar.\n\n```shell\npython catcher.py benson2010network\n```\n\nAfter the catcher finished, there will be output files of benson2010network in `output/benson2010network/` and `recent/`.\n- `ref.bib`: All the finded references' bibtex is saved here, which should be copied for later usage (in Zotero).\n- `fail.txt`: The reference that is failed to find in Google Scholar. (May be it is just a webpage.)\n- `title.txt`: The reference separated in each line, you can check whether the regular expression parse the **Reference** text as you want.\n- `title.csv`: A table containing the index of reference in the paper and the reference's name. It may be helpful when you want to know what exactly reference the main boby cites, avoiding rolling back the end of paper (**Reference**).\n\n### Notes\n\n- There is a file (`base/all.csv`) that contains all caught reference in history, so that the code can avoid repeat searching same paper in Google Scholar.\n- If the `output/` folder contains too many references, you can quickly get the output in `recent/`.\n\n\n---\n# bib 捕手 🫳\n\n**中文说明不再更新，最新说明请看上文英文版本**\n\n[[English](#bib-catcher-🫳) | 中文]\n\n## 使用\n\n```shell\npip install -r requirements.txt\n```\n\n复制文末的引用文献，黏贴到 `input/` 目录下的一个文件。比如我把下述文本粘贴在了 `input/benson2010network.txt` （“benson2010network” 是示例论文的 citekey）\n\n```txt\n[1] M. Al-Fares, A. Loukissas, and A. Vahdat. A scalable, commodity data center network architecture. In SIGCOMM, pages 63–74, 2008.\n[2] M. Al-Fares, S. Radhakrishnan, B. Raghavan, W. College, N. Huang, and A. Vahdat. Hedera: Dynamic flow scheduling for data center networks. In Proceedings of NSDI 2010, San Jose, CA, USA, April 2010. [3] T. Benson, A. Anand, A. Akella, and M. Zhang. Understanding Data Center Traffic Characteristics. In Proceedings of Sigcomm Workshop: Research on Enterprise Networks, 2009. \n[4] T. Benson, A. Anand, A. Akella, and M. Zhang. The case for fine-grained traffic engineering in data centers. In Proceedings of INM/WREN ’10, San Jose, CA, USA, April 2010. \n```\n\n这里的文本可以全在同一行，但是注意每一项引用都要有空格（` `）区分。否则正则表达式切分的时候会失败。\n\n运行捕手，从谷歌学术中抓取对应文献的 bibtex。\n\n```shell\npython catcher.py benson2010network\n```\n运行结束后，benson2010network 的输出文件放在了 `output/benson2010network/` 和 `recent/` 目录下。\n- `ref.bib`: 所有成功找到的文献 bibtex 都在这里。可以用来导入进 Zotero 中。\n- `fail.txt`: 在谷歌学术中搜索失败的文献清单（可能那条文献只是一个网页）。\n- `title.txt`: 正则表达式解析的结果，每行一项引用。可以在这里检查正则表达式的切分是否符合预期。\n- `title.csv`: 一个表，存储了引用在文章的序号和对应的名字。当你在正文中想知道引用编号对应哪一个具体的文献时，可以直接看这个表，就不需要滚动到文末。\n\n### 其他说明\n\n- 有一个文件 `base/all.csv` 存储了所有的文献 bibtex 查找历史记录，这样可以避免在谷歌学术中重复搜索相同的已知文献。\n- 如果 `output/` 目录放置了太多的文献资料，最近一次的输出信息可以直接在 `recent/` 中找到。","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbenature%2Fbib-catcher","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbenature%2Fbib-catcher","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbenature%2Fbib-catcher/lists"}