{"id":50519791,"url":"https://github.com/diffbot/diffbot-python","last_synced_at":"2026-06-12T02:01:21.589Z","repository":{"id":13580157,"uuid":"16272753","full_name":"diffbot/diffbot-python","owner":"diffbot","description":"Python client library for Diffbot APIs","archived":false,"fork":false,"pushed_at":"2026-06-02T00:09:26.000Z","size":98,"stargazers_count":124,"open_issues_count":0,"forks_count":39,"subscribers_count":14,"default_branch":"main","last_synced_at":"2026-06-03T03:24:20.278Z","etag":null,"topics":["crawler","knowledge-graph","natural-language-processing","web-data","web-data-extraction"],"latest_commit_sha":null,"homepage":"https://docs.diffbot.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"googlemaps/android-samples","license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/diffbot.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2014-01-27T08:00:13.000Z","updated_at":"2026-05-27T16:51:49.000Z","dependencies_parsed_at":"2022-08-31T00:42:06.978Z","dependency_job_id":null,"html_url":"https://github.com/diffbot/diffbot-python","commit_stats":null,"previous_names":["diffbot/diffbot-python"],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/diffbot/diffbot-python","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/diffbot%2Fdiffbot-python","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/diffbot%2Fdiffbot-python/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/diffbot%2Fdiffbot-python/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/diffbot%2Fdiffbot-python/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/diffbot","download_url":"https://codeload.github.com/diffbot/diffbot-python/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/diffbot%2Fdiffbot-python/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34225351,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-12T02:00:06.859Z","response_time":109,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","knowledge-graph","natural-language-processing","web-data","web-data-extraction"],"created_at":"2026-06-03T03:06:49.405Z","updated_at":"2026-06-12T02:01:21.583Z","avatar_url":"https://github.com/diffbot.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Diffbot Python Library\n\nPython client library for [Diffbot](https://www.diffbot.com) APIs.\n\n\n## Installation\n\nInstall the [standalone CLI binary](#standalone-binary) for [agentic use](#how-to-use-with-an-agent):\n\n```bash\ncurl -fsSL https://raw.githubusercontent.com/diffbot/diffbot-python/main/install.sh | sh\n```\n\nIf you prefer, the full Python library can also be installed with pip:\n\n```bash\npython3 -m pip install diffbot-python\n```\n\nFor local development:\n\n```bash\npip install -e \".[dev]\"\n```\n\n## Usage\n\n### Authentication\n\nThe CLI and the library can share a single credential. The token always has to be\npassed to the client explicitly, but `resolve_token()` gives you the same lookup the\nCLI uses, in this order:\n\n1. An explicit token passed to `resolve_token(token)`.\n2. The `DIFFBOT_API_TOKEN` environment variable.\n3. A `DIFFBOT_API_TOKEN=...` line in `~/.diffbot/credentials`.\n\nSet it once and it works for both the CLI and your scripts. Either export it:\n\n```bash\nexport DIFFBOT_API_TOKEN=\u003cTOKEN\u003e\n```\n\n…or write it to the shared credentials file (handy for keeping it out of your shell environment):\n\n```bash\nmkdir -p ~/.diffbot\nprintf 'DIFFBOT_API_TOKEN=%s\\n' '\u003cTOKEN\u003e' \u003e ~/.diffbot/credentials\nchmod 600 ~/.diffbot/credentials\n```\n\nWith either in place, resolve the token and pass it to the client:\n\n```python\nfrom diffbot import Diffbot, resolve_token\n\ndb = Diffbot(token=resolve_token())  # from env var or ~/.diffbot/credentials\ndata = db.extract(\"https://www.example.com\")\n```\n\n### Extract structured content\n```python\nfrom diffbot import Diffbot\n\ndb = Diffbot(token=\"YOUR_TOKEN\")\ndata = db.extract(\"https://www.example.com\")\n```\n\n### Ask Diffbot LLM\n```python\nfrom diffbot import Diffbot\n\ndb = Diffbot(token=\"YOUR_TOKEN\")\nfor chunk in db.ask([{\"role\": \"user\", \"content\": \"What's the capital of France?\"}]):\n    print(chunk, end=\"\")\n```\n\n### Crawl a site for structured content\n```python\nfrom diffbot import Diffbot\n\ndb = Diffbot(token=\"YOUR_TOKEN\")\nfor event in db.crawl(\"https://www.example.com\", hops=1):\n    print(event)\n```\n\n### Query the Knowledge Graph\n```python\nfrom diffbot import Diffbot\n\ndb = Diffbot(token=\"YOUR_TOKEN\")\nresults = db.dql('type:Organization name:\"Diffbot\"')\n```\n\n### Web Search\n```python\nfrom diffbot import Diffbot\n\ndb = Diffbot(token=\"YOUR_TOKEN\")\nresults = db.web_search(\"diffbot knowledge graph\")\nfor r in results[\"search_results\"]:\n    print(r[\"score\"], r[\"title\"], r[\"pageUrl\"])\n    print(r[\"content\"])\n```\n\n### Entities (NLP)\n```python\nfrom diffbot import Diffbot\n\ndb = Diffbot(token=\"YOUR_TOKEN\")\nresult = db.entities(\"Apple CEO Tim Cook announced record quarterly earnings.\")\nfor entity in result[\"entities\"]:\n    print(entity[\"name\"], entity.get(\"type\"), entity.get(\"id\"))\nprint(\"sentiment:\", result.get(\"sentiment\"))\n```\n\n## Async Usage\n\n### Extract structured content\n```python\nimport asyncio\nfrom diffbot import DiffbotAsync\n\nasync def main():\n    async with DiffbotAsync(token=\"YOUR_TOKEN\") as db:\n        data = await db.extract(\"https://www.example.com\")\n        print(data)\n\nasyncio.run(main())\n```\n\n### Ask Diffbot LLM\n```python\nimport asyncio\nfrom diffbot import DiffbotAsync\n\nasync def main():\n    async with DiffbotAsync(token=\"YOUR_TOKEN\") as db:\n        async for chunk in db.ask([{\"role\": \"user\", \"content\": \"What's the capital of France?\"}]):\n            print(chunk, end=\"\")\n\nasyncio.run(main())\n```\n\n### Crawl a site for structured content\n```python\nimport asyncio\nfrom diffbot import DiffbotAsync\n\nasync def main():\n    async with DiffbotAsync(token=\"YOUR_TOKEN\") as db:\n        async for event in db.crawl(\"https://www.example.com\", hops=1):\n            print(event)\n\nasyncio.run(main())\n```\n\n### Query the Knowledge Graph\n```python\nimport asyncio\nfrom diffbot import DiffbotAsync\n\nasync def main():\n    async with DiffbotAsync(token=\"YOUR_TOKEN\") as db:\n        results = await db.dql('type:Organization name:\"Diffbot\"')\n        print(results)\n\nasyncio.run(main())\n```\n\n### Web Search\n```python\nimport asyncio\nfrom diffbot import DiffbotAsync\n\nasync def main():\n    async with DiffbotAsync(token=\"YOUR_TOKEN\") as db:\n        results = await db.web_search(\"diffbot knowledge graph\")\n        for r in results[\"search_results\"]:\n            print(r[\"score\"], r[\"title\"], r[\"pageUrl\"])\n            print(r[\"content\"])\n\nasyncio.run(main())\n```\n\n### Entities (NLP)\n```python\nimport asyncio\nfrom diffbot import DiffbotAsync\n\nasync def main():\n    async with DiffbotAsync(token=\"YOUR_TOKEN\") as db:\n        result = await db.entities(\"Apple CEO Tim Cook announced record quarterly earnings.\")\n        for entity in result[\"entities\"]:\n            print(entity[\"name\"], entity.get(\"type\"), entity.get(\"id\"))\n        print(\"sentiment:\", result.get(\"sentiment\"))\n\nasyncio.run(main())\n```\n\n## CLI\n\nThis library also includes a CLI exposed as the `db` command.\n\nTo make `db` available from anywhere, install it as an isolated tool with [uv](https://docs.astral.sh/uv/):\n\n```bash\nuv tool install .\n```\n\nThis drops a `db` executable into `~/.local/bin` (ensure it is on your `PATH`). Use `--force` to reinstall or upgrade after changes, or `--editable` to have source edits take effect immediately. Alternatively, a plain `pip install .` (or `pip install -e .`) also installs the `db` entry point into the active environment.\n\n### Standalone binary\n\nEvery release also ships a self-contained `db` binary for Linux (x86_64 and aarch64) and macOS (Apple Silicon) as a Python-free option. The installer detects your platform, verifies the SHA256 checksum, and installs (or upgrades) `db` into `~/.local/bin`:\n\n```bash\ncurl -fsSL https://raw.githubusercontent.com/diffbot/diffbot-python/main/install.sh | sh\n```\n\nPin a specific release or install location with flags (or the `DB_VERSION` / `DB_INSTALL_DIR` environment variables); re-running the installer upgrades an existing install in place:\n\n```bash\ncurl -fsSL https://raw.githubusercontent.com/diffbot/diffbot-python/main/install.sh | sh -s -- --version v0.2.1 --bin-dir ~/bin\n```\n\n### How to use\n\n```bash\nexport DIFFBOT_API_TOKEN=your-token-here\n\ndb extract https://www.example.com\ndb ask \"What's the capital of France?\"\ndb crawl https://www.example.com --hops 1\ndb crawl-list-jobs\ndb crawl-delete-job crawl-1234567890\ndb web-search \"diffbot knowledge graph\"\ndb web-search \"diffbot knowledge graph\" -n 5 -f json\ndb entities \"Apple CEO Tim Cook announced record quarterly earnings.\"\ndb entities \"Apple CEO Tim Cook announced record quarterly earnings.\" -f dql\n```\n\n### How to use with an agent\nOnce installed, this library will work alongside [`diffbot-skills`](https://github.com/diffbot/diffbot-skills) to enable your agent full access to structuring web knowledge with Diffbot. Diffbot Agent Skills even unlocks some additional skills like crafting DQL from natural language. \n\n`diffbot-skills` will pick up or install this library automatically. \n\n\n## Tests\n\nRun the mock test suite:\n```bash\npython -m pytest\n```\n\nRun live integration tests against the real API (requires a valid token).\nThe token is resolved the same way as everywhere else — the `DIFFBOT_API_TOKEN`\nenvironment variable or `~/.diffbot/credentials`:\n```bash\nDIFFBOT_API_TOKEN=your_token python -m pytest -m live\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdiffbot%2Fdiffbot-python","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdiffbot%2Fdiffbot-python","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdiffbot%2Fdiffbot-python/lists"}