{"id":28477655,"url":"https://github.com/dense-analysis/dank","last_synced_at":"2026-04-12T16:06:05.557Z","repository":{"id":296696819,"uuid":"958733924","full_name":"dense-analysis/dank","owner":"dense-analysis","description":"Dense Analysis Network Knowledge","archived":false,"fork":false,"pushed_at":"2025-05-26T17:50:06.000Z","size":15,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-19T17:09:59.189Z","etag":null,"topics":["ai","clickhouse","knowledge-graph","python","redis","scraping","scraping-websites"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dense-analysis.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-04-01T17:13:50.000Z","updated_at":"2025-07-25T19:37:23.000Z","dependencies_parsed_at":"2025-06-01T20:50:45.258Z","dependency_job_id":"72497ca4-9e76-4a14-9f9b-b5087148fa90","html_url":"https://github.com/dense-analysis/dank","commit_stats":null,"previous_names":["dense-analysis/dank"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/dense-analysis/dank","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dense-analysis%2Fdank","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dense-analysis%2Fdank/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dense-analysis%2Fdank/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dense-analysis%2Fdank/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dense-analysis","download_url":"https://codeload.github.com/dense-analysis/dank/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dense-analysis%2Fdank/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271749142,"owners_count":24814136,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-23T02:00:09.327Z","response_time":69,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","clickhouse","knowledge-graph","python","redis","scraping","scraping-websites"],"created_at":"2025-06-07T17:08:20.032Z","updated_at":"2026-04-12T16:06:05.547Z","avatar_url":"https://github.com/dense-analysis.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DANK - Dense Analysis Network Knowledge\n\nDANK is a Dense Analysis project focused on collecting and analyzing live data\nfrom the public Internet. It uses API access, web scraping, RSS feeds, and\nsemantic indexing tools to ingest external content in real time. It applies\nsentiment analysis, semantic clustering, and AI models to build structured\ninsights about the world, including trends, public perception, and evolving\nnarratives. The goal is to automate contextual understanding and surface\nrelevant knowledge as it emerges.\n\n## Requirements\n\n- Python 3.13\n- uv\n- ClickHouse (local server)\n\n## ClickHouse setup\n\n1. Install ClickHouse: https://clickhouse.com/docs/en/install\n2. Start the ClickHouse server (systemd or `clickhouse server`).\n3. Create the schema:\n\n```\n~/clickhouse/clickhouse client --multiquery \u003c schema.sql\n```\n\nThe schema uses the `dank` database by default. Adjust `config.toml` if you\nneed a different database name.\n\n## Configuration\n\nConfiguration lives in `config.toml` and should not be committed. Example:\n\n```toml\nsources = [\n  { domain = \"x.com\", accounts = [\"example\"] },\n  \"blog.codinghorror.com\",\n]\n\n[clickhouse]\nhost = \"localhost\"\nport = 8123\ndatabase = \"dank\"\nusername = \"default\"\npassword = \"\"\nsecure = false\nuse_http = true\n\n[x]\nusername = \"your-x-username\"\npassword = \"your-x-password\"\nmax_posts = 200\nmax_scrolls = 20\nscroll_pause_seconds = 1.5\n\n[storage]\ndata_dir = \"data\"\nmax_asset_bytes = 10485760\n\n[browser]\n# Optional: full path or command name for a Chromium-based browser.\nexecutable_path = \"thorium-browser\"\n# Optional: extra time to wait for the browser to start.\nconnection_timeout = 1.0\n# Optional: connection retry count for slow browser startups.\nconnection_max_tries = 30\n\n[email]\n# Optional: IMAP settings for OTP codes.\nhost = \"imap.example.com\"\nusername = \"you@example.com\"\npassword = \"your-imap-password\"\nport = 993\n\n[logging]\n# Optional: file path for scrape/process logs.\nfile = \"dank.log\"\n# Optional: logging level (DEBUG, INFO, WARNING, ERROR).\nlevel = \"INFO\"\n```\n\n`sources` controls which domains to scrape and process. Each entry can provide\naccounts for account-based sources like `x.com`.\n\nIf any particular domain lacks a specific configuration, the root of the\ndomain will be scraped to discover RSS feeds to read from.\n\n`browser.executable_path` sets the browser binary to launch. If unset, DANK\nwill try common Chromium locations.\n\n`storage.max_asset_bytes` caps asset downloads (bytes). Larger assets are\nskipped but still recorded.\n\nWhen X prompts for a one-time code, DANK will poll the IMAP inbox for messages\nfrom `x.com` that arrived after the login attempt and extract the confirmation\ncode.\n\nIf the browser takes longer to start, increase\n`browser.connection_timeout` or `browser.connection_max_tries`.\n\n`logging.file` controls where scrape/process logs are written. Relative paths\nare resolved from the current working directory.\n\n## Usage\n\nDank offers the following commands.\n\n* `uv run scrape` -- Scrape the web for data\n    * Pass `--domains` to scrape only matching domains from `sources`,\n      for example `--domains '^x\\\\.com$'`.\n* `uv run process` -- Process previously scraped data\n    * The `--age` argument can be given a duration to process, for example\n      `6hours` or `2days`.\n* `uv run clickhouse-query` -- Run queries on the database\n    * You can only run `SELECT`, `SHOW`, or `EXPLAIN` queries through this tool\n    * Query results are well formatted and easy to read\n    * Query results are truncated unless you pass `--full`\n* `uv run embed-text \"your text\"` -- Print an embedding vector\n    * Output is a JSON `list[float]` for easy copy/paste into other tools.\n* `uv run download-embedding-model` -- Download and cache embeddings model\n    * Pass `--model` to choose another Hugging Face model id.\n* `uv run web` -- Start a simple web server to view content.\n    * Pass `--no-reload` to disable hot code reloading.\n    * Supports search filters for domain/account and a days-back slider.\n\n## Testing\n\n* `uv run pytest` -- Run default test suite.\n* `uv run pytest -m embeddings -s` -- Run real-model embedding checks.\n    * These tests are skipped by default and require the model cache.\n    * Includes per-case similarity and margin output for each model.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdense-analysis%2Fdank","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdense-analysis%2Fdank","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdense-analysis%2Fdank/lists"}