{"id":21426718,"url":"https://github.com/j535d165/pyalex","last_synced_at":"2025-05-15T10:07:44.087Z","repository":{"id":64736387,"uuid":"557541347","full_name":"J535D165/pyalex","owner":"J535D165","description":"A Python library for OpenAlex (openalex.org)","archived":false,"fork":false,"pushed_at":"2025-04-07T20:18:34.000Z","size":153,"stargazers_count":238,"open_issues_count":5,"forks_count":30,"subscribers_count":8,"default_branch":"main","last_synced_at":"2025-05-15T10:07:06.602Z","etag":null,"topics":["openalex","openalexapi","publications","research","scholarly-articles","scholarly-metadata","science","utrecht-university"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/J535D165.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-10-25T21:49:11.000Z","updated_at":"2025-05-13T11:19:38.000Z","dependencies_parsed_at":"2024-03-25T20:28:05.780Z","dependency_job_id":"dfdb1e12-2e27-4995-8079-1f55264af4c1","html_url":"https://github.com/J535D165/pyalex","commit_stats":{"total_commits":102,"total_committers":8,"mean_commits":12.75,"dds":0.08823529411764708,"last_synced_commit":"016998b2799ec046462b577f1e3c88f52ae24060"},"previous_names":[],"tags_count":16,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/J535D165%2Fpyalex","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/J535D165%2Fpyalex/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/J535D165%2Fpyalex/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/J535D165%2Fpyalex/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/J535D165","download_url":"https://codeload.github.com/J535D165/pyalex/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254319721,"owners_count":22051074,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["openalex","openalexapi","publications","research","scholarly-articles","scholarly-metadata","science","utrecht-university"],"created_at":"2024-11-22T21:43:25.157Z","updated_at":"2025-05-15T10:07:39.071Z","avatar_url":"https://github.com/J535D165.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg alt=\"PyAlex - a Python wrapper for OpenAlex\" src=\"https://github.com/J535D165/pyalex/raw/main/pyalex_repocard.svg\"\u003e\n\u003c/p\u003e\n\n# PyAlex\n\n![PyPI](https://img.shields.io/pypi/v/pyalex) [![DOI](https://zenodo.org/badge/557541347.svg)](https://zenodo.org/badge/latestdoi/557541347)\n\n\nPyAlex is a Python library for [OpenAlex](https://openalex.org/). OpenAlex is\nan index of hundreds of millions of interconnected scholarly papers, authors,\ninstitutions, and more. OpenAlex offers a robust, open, and free [REST API](https://docs.openalex.org/) to extract, aggregate, or search scholarly data.\nPyAlex is a lightweight and thin Python interface to this API. PyAlex tries to\nstay as close as possible to the design of the original service.\n\nThe following features of OpenAlex are currently supported by PyAlex:\n\n- [x] Get single entities\n- [x] Filter entities\n- [x] Search entities\n- [x] Group entities\n- [x] Search filters\n- [x] Select fields\n- [x] Sample\n- [x] Pagination\n- [x] Autocomplete endpoint\n- [x] N-grams\n- [x] Authentication\n\nWe aim to cover the entire API, and we are looking for help. We are welcoming Pull Requests.\n\n## Key features\n\n- **Pipe operations** - PyAlex can handle multiple operations in a sequence. This allows the developer to write understandable queries. For examples, see [code snippets](#code-snippets).\n- **Plaintext abstracts** - OpenAlex [doesn't include plaintext abstracts](https://docs.openalex.org/api-entities/works/work-object#abstract_inverted_index) due to legal constraints. PyAlex can convert the inverted abstracts into [plaintext abstracts on the fly](#get-abstract).\n- **Permissive license** - OpenAlex data is CC0 licensed :raised_hands:. PyAlex is published under the MIT license.\n\n## Installation\n\nPyAlex requires Python 3.8 or later.\n\n```sh\npip install pyalex\n```\n\n## Getting started\n\nPyAlex offers support for all [Entity Objects](https://docs.openalex.org/api-entities/entities-overview): [Works](https://docs.openalex.org/api-entities/works), [Authors](https://docs.openalex.org/api-entities/authors), [Sources](https://docs.openalex.org/api-entities/sourcese), [Institutions](https://docs.openalex.org/api-entities/institutions), [Topics](https://docs.openalex.org/api-entities/topics), [Publishers](https://docs.openalex.org/api-entities/publishers), and [Funders](https://docs.openalex.org/api-entities/funders).\n\n```python\nfrom pyalex import Works, Authors, Sources, Institutions, Topics, Publishers, Funders\n```\n\n### The polite pool\n\n[The polite pool](https://docs.openalex.org/how-to-use-the-api/rate-limits-and-authentication#the-polite-pool) has much\nfaster and more consistent response times. To get into the polite pool, you\nset your email:\n\n```python\nimport pyalex\n\npyalex.config.email = \"mail@example.com\"\n```\n\n### Max retries\n\nBy default, PyAlex will raise an error at the first failure when querying the OpenAlex API. You can set `max_retries` to a number higher than 0 to allow PyAlex to retry when an error occurs. `retry_backoff_factor` is related to the delay between two retry, and `retry_http_codes` are the HTTP error codes that should trigger a retry.\n\n```python\nfrom pyalex import config\n\nconfig.max_retries = 0\nconfig.retry_backoff_factor = 0.1\nconfig.retry_http_codes = [429, 500, 503]\n```\n\n### Get single entity\n\nGet a single Work, Author, Source, Institution, Concept, Topic, Publisher or Funder from OpenAlex by the\nOpenAlex ID, or by DOI or ROR.\n\n```python\nWorks()[\"W2741809807\"]\n\n# same as\nWorks()[\"https://doi.org/10.7717/peerj.4375\"]\n```\n\nThe result is a `Work` object, which is very similar to a dictionary. Find the available fields with `.keys()`.\n\nFor example, get the open access status:\n\n```python\nWorks()[\"W2741809807\"][\"open_access\"]\n```\n\n```python\n{'is_oa': True, 'oa_status': 'gold', 'oa_url': 'https://doi.org/10.7717/peerj.4375'}\n```\n\nThe previous works also for Authors, Sources, Institutions, Concepts and Topics\n\n```python\nAuthors()[\"A5027479191\"]\nAuthors()[\"https://orcid.org/0000-0002-4297-0502\"]  # same\n```\n\n#### Get random\n\nGet a [random Work, Author, Source, Institution, Concept, Topic, Publisher or Funder](https://docs.openalex.org/how-to-use-the-api/get-single-entities/random-result).\n\n```python\nWorks().random()\nAuthors().random()\nSources().random()\nInstitutions().random()\nTopics().random()\nPublishers().random()\nFunders().random()\n```\n\n#### Get abstract\n\nOnly for Works. Request a work from the OpenAlex database:\n\n```python\nw = Works()[\"W3128349626\"]\n```\n\nAll attributes are available like documented under [Works](https://docs.openalex.org/api-entities/works/work-object), as well as `abstract` (only if `abstract_inverted_index` is not None). This abstract made human readable is create on the fly.\n\n```python\nw[\"abstract\"]\n```\n\n```python\n'Abstract To help researchers conduct a systematic review or meta-analysis as efficiently and transparently as possible, we designed a tool to accelerate the step of screening titles and abstracts. For many tasks—including but not limited to systematic reviews and meta-analyses—the scientific literature needs to be checked systematically. Scholars and practitioners currently screen thousands of studies by hand to determine which studies to include in their review or meta-analysis. This is error prone and inefficient because of extremely imbalanced data: only a fraction of the screened studies is relevant. The future of systematic reviewing will be an interaction with machine learning algorithms to deal with the enormous increase of available text. We therefore developed an open source machine learning-aided pipeline applying active learning: ASReview. We demonstrate by means of simulation studies that active learning can yield far more efficient reviewing than manual reviewing while providing high quality. Furthermore, we describe the options of the free and open source research software and present the results from user experience tests. We invite the community to contribute to open source projects such as our own that provide measurable and reproducible improvements over current practice.'\n```\n\nPlease respect the legal constraints when using this feature.\n\n### Get lists of entities\n\n```python\nresults = Works().get()\n```\n\nFor lists of entities, you can also `count` the number of records found\ninstead of returning the results. This also works for search queries and\nfilters.\n\n```python\nWorks().count()\n# 10338153\n```\n\nFor lists of entities, you can return the result as well as the metadata. By default, only the results are returned.\n\n```python\ntopics = Topics().get()\n```\n\n```python\nprint(topics.meta)\n{'count': 65073, 'db_response_time_ms': 16, 'page': 1, 'per_page': 25}\n```\n\n#### Filter records\n\n```python\nWorks().filter(publication_year=2020, is_oa=True).get()\n```\n\nwhich is identical to:\n\n```python\nWorks().filter(publication_year=2020).filter(is_oa=True).get()\n```\n\n#### Nested attribute filters\n\nSome attribute filters are nested and separated with dots by OpenAlex. For\nexample, filter on [`authorships.institutions.ror`](https://docs.openalex.org/api-entities/works/filter-works).\n\nIn case of nested attribute filters, use a dict to build the query.\n\n```python\nWorks()\n  .filter(authorships={\"institutions\": {\"ror\": \"04pp8hn57\"}})\n  .get()\n```\n\n#### Search entities\n\nOpenAlex reference: [The search parameter](https://docs.openalex.org/api-entities/works/search-works)\n\n```python\nWorks().search(\"fierce creatures\").get()\n```\n\n#### Search filter\n\nOpenAlex reference: [The search filter](https://docs.openalex.org/api-entities/works/search-works#search-a-specific-field)\n\n```python\nAuthors().search_filter(display_name=\"einstein\").get()\n```\n\n```python\nWorks().search_filter(title=\"cubist\").get()\n```\n\n```python\nFunders().search_filter(display_name=\"health\").get()\n```\n\n\n#### Sort entity lists\n\nOpenAlex reference: [Sort entity lists](https://docs.openalex.org/api-entities/works/get-lists-of-works#page-and-sort-works).\n\n```python\nWorks().sort(cited_by_count=\"desc\").get()\n```\n\n#### Select\n\nOpenAlex reference: [Select fields](https://docs.openalex.org/how-to-use-the-api/get-lists-of-entities/select-fields).\n\n```python\nWorks().filter(publication_year=2020, is_oa=True).select([\"id\", \"doi\"]).get()\n```\n\n#### Sample\n\nOpenAlex reference: [Sample entity lists](https://docs.openalex.org/how-to-use-the-api/get-lists-of-entities/sample-entity-lists).\n\n```python\nWorks().sample(100, seed=535).get()\n```\n\n#### Logical expressions\n\nOpenAlex reference: [Logical expressions](https://docs.openalex.org/how-to-use-the-api/get-lists-of-entities/filter-entity-lists#logical-expressions)\n\nInequality:\n\n```python\nSources().filter(works_count=\"\u003e1000\").get()\n```\n\nNegation (NOT):\n\n```python\nInstitutions().filter(country_code=\"!us\").get()\n```\n\nIntersection (AND):\n\n```python\nWorks().filter(institutions={\"country_code\": [\"fr\", \"gb\"]}).get()\n\n# same\nWorks().filter(institutions={\"country_code\": \"fr\"}).filter(institutions={\"country_code\": \"gb\"}).get()\n```\n\nAddition (OR):\n\n```python\nWorks().filter(institutions={\"country_code\": \"fr|gb\"}).get()\n```\n\n#### Paging\n\nOpenAlex offers two methods for paging: [basic (offset) paging](https://docs.openalex.org/how-to-use-the-api/get-lists-of-entities/paging#basic-paging) and [cursor paging](https://docs.openalex.org/how-to-use-the-api/get-lists-of-entities/paging#cursor-paging). Both methods are supported by PyAlex.\n\n##### Cursor paging (default)\n\nUse the method `paginate()` to paginate results. Each returned page is a list\nof records, with a maximum of `per_page` (default 25). By default,\n`paginate`s argument `n_max` is set to 10000. Use `None` to retrieve all\nresults.\n\n```python\nfrom pyalex import Authors\n\npager = Authors().search_filter(display_name=\"einstein\").paginate(per_page=200)\n\nfor page in pager:\n    print(len(page))\n```\n\n\u003e Looking for an easy method to iterate the records of a pager?\n\n```python\nfrom itertools import chain\nfrom pyalex import Authors\n\nquery = Authors().search_filter(display_name=\"einstein\")\n\nfor record in chain(*query.paginate(per_page=200)):\n    print(record[\"id\"])\n```\n\n##### Basic paging\n\nSee limitations of [basic paging](https://docs.openalex.org/how-to-use-the-api/get-lists-of-entities/paging#basic-paging) in the OpenAlex documentation.\n\n```python\nfrom pyalex import Authors\n\npager = Authors().search_filter(display_name=\"einstein\").paginate(method=\"page\", per_page=200)\n\nfor page in pager:\n    print(len(page))\n```\n\n\n### Autocomplete\n\nOpenAlex reference: [Autocomplete entities](https://docs.openalex.org/how-to-use-the-api/get-lists-of-entities/autocomplete-entities).\n\nAutocomplete a string:\n```python\nfrom pyalex import autocomplete\n\nautocomplete(\"stockholm resilience centre\")\n```\n\nAutocomplete a string to get a specific type of entities:\n```python\nfrom pyalex import Institutions\n\nInstitutions().autocomplete(\"stockholm resilience centre\")\n```\n\nYou can also use the filters to autocomplete:\n```python\nfrom pyalex import Works\n\nr = Works().filter(publication_year=2023).autocomplete(\"planetary boundaries\")\n```\n\n\n### Get N-grams\n\nOpenAlex reference: [Get N-grams](https://docs.openalex.org/api-entities/works/get-n-grams).\n\n\n```python\nWorks()[\"W2023271753\"].ngrams()\n```\n\n\n### Serialize\n\nAll results from PyAlex can be serialized. For example, save the results to a JSON file:\n\n```python\nimport json\nfrom pathlib import Path\nfrom pyalex import Work\n\nwith open(Path(\"works.json\"), \"w\") as f:\n    json.dump(Works().get(), f)\n\nwith open(Path(\"works.json\")) as f:\n    works = [Work(w) for w in json.load(f)]\n```\n\n## Code snippets\n\nA list of awesome use cases of the OpenAlex dataset.\n\n### Cited publications (works referenced by this paper, outgoing citations)\n\n```python\nfrom pyalex import Works\n\n# the work to extract the referenced works of\nw = Works()[\"W2741809807\"]\n\nWorks()[w[\"referenced_works\"]]\n```\n\n### Citing publications (other works that reference this paper, incoming citations)\n\n```python\nfrom pyalex import Works\nWorks().filter(cites=\"W2741809807\").get()\n```\n\n### Get works of a single author\n\n```python\nfrom pyalex import Works\n\nWorks().filter(author={\"id\": \"A2887243803\"}).get()\n```\n\n### Dataset publications in the global south\n\n```python\nfrom pyalex import Works\n\n# the work to extract the referenced works of\nw = Works() \\\n  .filter(institutions={\"is_global_south\":True}) \\\n  .filter(type=\"dataset\") \\\n  .group_by(\"institutions.country_code\") \\\n  .get()\n\n```\n\n### Most cited publications in your organisation\n\n```python\nfrom pyalex import Works\n\nWorks() \\\n  .filter(authorships={\"institutions\": {\"ror\": \"04pp8hn57\"}}) \\\n  .sort(cited_by_count=\"desc\") \\\n  .get()\n\n```\n\n## Experimental\n\n### Authentication\n\nOpenAlex experiments with authenticated requests at the moment. Authenticate your requests with\n\n```python\nimport pyalex\n\npyalex.config.api_key = \"\u003cMY_KEY\u003e\"\n```\n\n## Alternatives\n\nR users can use the excellent [OpenAlexR](https://github.com/ropensci/openalexR) library.\n\n## License\n\n[MIT](/LICENSE)\n\n## Contact\n\n\u003e This library is a community contribution. The authors of this Python library aren't affiliated with OpenAlex.\n\nThis library is maintained by [J535D165](https://github.com/J535D165) and [PeterLombaers](https://github.com/PeterLombaers).\nFeel free to reach out with questions, remarks, and suggestions. The\n[issue tracker](/issues) is a good starting point. You can also reach out via\n[jonathandebruinos@gmail.com](mailto:jonathandebruinos@gmail.com).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fj535d165%2Fpyalex","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fj535d165%2Fpyalex","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fj535d165%2Fpyalex/lists"}