{"id":16446043,"url":"https://github.com/linjer/arxiv-client-py","last_synced_at":"2025-08-30T10:41:57.057Z","repository":{"id":232103913,"uuid":"783483579","full_name":"linjer/arxiv-client-py","owner":"linjer","description":"Structured Python3 client for the arXiv API","archived":false,"fork":false,"pushed_at":"2024-04-12T01:17:03.000Z","size":55,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-08-08T05:48:36.985Z","etag":null,"topics":["arxiv","python3"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/linjer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-04-08T01:31:26.000Z","updated_at":"2024-04-09T16:47:54.000Z","dependencies_parsed_at":null,"dependency_job_id":"898a11bb-f418-4c28-a3f0-a1787c6594c4","html_url":"https://github.com/linjer/arxiv-client-py","commit_stats":null,"previous_names":["linjer/arxiv-client-py"],"tags_count":12,"template":false,"template_full_name":null,"purl":"pkg:github/linjer/arxiv-client-py","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linjer%2Farxiv-client-py","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linjer%2Farxiv-client-py/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linjer%2Farxiv-client-py/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linjer%2Farxiv-client-py/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/linjer","download_url":"https://codeload.github.com/linjer/arxiv-client-py/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linjer%2Farxiv-client-py/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":272839668,"owners_count":25001862,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-30T02:00:09.474Z","response_time":77,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arxiv","python3"],"created_at":"2024-10-11T09:46:15.877Z","updated_at":"2025-08-30T10:41:57.018Z","avatar_url":"https://github.com/linjer.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# arxiv-client\n\nPython3 client for the [arXiv API](https://info.arxiv.org/help/api/user-manual.html).\nInstall package [`arxiv_client`](https://pypi.org/project/arxiv-client/) from PyPI.\n\nThis differs from the pre-existing [arxiv.py](https://github.com/lukasschwab/arxiv.py) project \nin that it further abstracts away the arXiv API so you do not need to learn to construct\nquery strings. The overall goal is to enable users to skip reading the API docs entirely.\n\n## Basic Features\n\n- Simple structured queries\n- Comprehensive entity models, with documentation\n  - For example, see the [Category](src/arxiv_client/category.py) enum for arXiv's category taxonomy\n- Fully type annotated\n\n## Usage\n\n### Daily RSS Feed\n\n```py\nimport arxiv_client as arx\n\n\nclient = arx.Client()\narticles = client.rss_by_subject(arx.Subject.COMPUTER_SCIENCE)\n```\n\n### Search\n\n```py\nimport arxiv_client as arx\n\n\ncategories = [arx.Category.CS_AI, arx.Category.CS_CL, arx.Category.CS_IR]\nclient = arx.Client()\narticles = client.search(arx.Query(keywords=[\"llm\"], categories=categories, max_results=10))\nfor article in articles:\n    print(article)\n```\n\n### Structured Search Query Logic\n\nWhen using the structured `Query` fields, multiple values within a single field are combined using `OR`, \nand multiple fields are combined using `AND`.\n\n#### Searchable Fields\n\nThe `Query` object accepts the following field filters:\n\n- `keywords`: terms across all fields\n- `title_keywords`: terms in the article title\n- `author_names`: names in the author list\n- `categories`: arXiv subject categories\n- `abstract_keywords`: terms in the article abstract\n- `comment_keywords`: terms in the author provided comment\n- `article_ids`: arXiv article IDs\n- `custom_params`: custom query string\n\n#### Example\n\n```py\nQuery(keywords=[\"llm\"], categories=[Category.CS_AI, Category.CS_IR], max_results=5)\n# Query(\n#     keywords=['llm'],\n#     title_keywords=[],\n#     author_names=[],\n#     categories=[\u003cCategory.CS_AI: 'cs.AI'\u003e, \u003cCategory.CS_IR: 'cs.IR'\u003e],\n#     abstract_keywords=[],\n#     comment_keywords=[],\n#     article_ids=[],\n#     custom_params=None,\n#     sort_criterion=SortCriterion(sort_by=\u003cSortBy.LAST_UPDATED_DATE: 'lastUpdatedDate'\u003e, sort_order=\u003cSortOrder.DESC: 'descending'\u003e),\n#     start=0,\n#     max_results=5\n# )\n```\n\nResults in the following query logic:\n\n```\n(\"llm\") in any field AND (cs.AI OR cs.IR) in the categories\n```\n\nSee the [Query](src/arxiv_client/query.py) class for more information.\n\n### Custom Search Queries\n\nIf the provided simple query logic is insufficient, the `Query` object takes a self-built query string through the `custom_params` attribute. You do not need to URL encode this value.\n\nSee [arXiv Query Construction](https://info.arxiv.org/help/api/user-manual.html#51-details-of-query-construction) for more information on building your own queries.\n\n#### Example\n\n```py\ncustom = f\"cat:{Category.CS_AI.value} ANDNOT cat:{Category.CS_RO.value}\"\nQuery(keywords=[\"paged attention\", \"attention window\"], custom_params=custom)\n# Query(\n#     keywords=['paged attention', 'attention window'],\n#     title_keywords=[],\n#     author_names=[],\n#     categories=[],\n#     abstract_keywords=[],\n#     comment_keywords=[],\n#     article_ids=[],\n#     custom_params='cat:cs.AI ANDNOT cat:cs.RO',\n#     sort_criterion=SortCriterion(sort_by=\u003cSortBy.LAST_UPDATED_DATE: 'lastUpdatedDate'\u003e, sort_order=\u003cSortOrder.DESC: 'descending'\u003e),\n#     start=0,\n#     max_results=10\n# )\n```\n\nResults in the following query logic:\n\n```\n(\"paged attention\" OR \"attention window\") in any field AND (cs.AI AND NOT cs.RO) in the categories\n```\n\nEquivalent query string:\n\n```\n(all:\"paged attention\" OR all:\"attention window\") AND (cat:cs.AI ANDNOT cat:cs.RO)\n```\n\n## Known Issues\n\nThe arXiv search API is unreliable, especially for large queries.\n\nThe API will sometimes return incomplete results or return no entries,\nalthough the response is valid. See this [GitHub issue](https://github.com/lukasschwab/arxiv.py/issues/43)\nfor discussion on the topic.\n\nIf you are encountering this problem, some things that may help include:\n\n- Reduce the page size; `100` seems to have a relatively high success rate\n- Increase paging retry and delay parameters\n- Break up large queries into smaller queries\n\nRetries often help with the issue, but are sometimes insufficient.\nIf you need more reliable access to large query results, consider looking into\nthe [arXiv Bulk Data Access](https://info.arxiv.org/help/bulk_data.html) options.\n\n## Development\n\nThis uses [hatch](https://hatch.pypa.io/latest/) for project management.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flinjer%2Farxiv-client-py","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flinjer%2Farxiv-client-py","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flinjer%2Farxiv-client-py/lists"}