{"id":18248104,"url":"https://github.com/stopka/fedicrawl","last_synced_at":"2025-04-04T15:32:05.028Z","repository":{"id":75836141,"uuid":"441181586","full_name":"Stopka/fedicrawl","owner":"Stopka","description":"Collect feeds to follow on Fediverse nodes.","archived":false,"fork":false,"pushed_at":"2023-01-07T20:03:25.000Z","size":561,"stargazers_count":6,"open_issues_count":0,"forks_count":2,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-04-04T09:44:39.290Z","etag":null,"topics":["crawler","docker","fediverse","nodejs","prisma","typescript"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Stopka.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2021-12-23T13:00:14.000Z","updated_at":"2024-09-20T18:46:55.000Z","dependencies_parsed_at":"2023-09-21T20:15:00.334Z","dependency_job_id":null,"html_url":"https://github.com/Stopka/fedicrawl","commit_stats":{"total_commits":38,"total_committers":1,"mean_commits":38.0,"dds":0.0,"last_synced_commit":"4fbfce7f12b42d9c08f997e857184be6be807953"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Stopka%2Ffedicrawl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Stopka%2Ffedicrawl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Stopka%2Ffedicrawl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Stopka%2Ffedicrawl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Stopka","download_url":"https://codeload.github.com/Stopka/fedicrawl/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247202992,"owners_count":20900887,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","docker","fediverse","nodejs","prisma","typescript"],"created_at":"2024-11-05T09:35:36.502Z","updated_at":"2025-04-04T15:32:00.019Z","avatar_url":"https://github.com/Stopka.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# FediCrawl\n\nCollect feeds to follow on Fediverse nodes.\n\nApp crawls from node to node (starting from the seed node) and searches for feeds (accounts or channels) and for other peered nodes.\n\nDiscovered nodes are added to database and are queued for next search.\nNew nodes are processed preferentially. If there is no new node, data of oldest node are refreshed.\n\nNode data are retrieved using node info api.\n\nNode's public feeds and peering nodes are retrieved only on several supported softwares using thier's public APIs.\n\n## Supported Fediverse software\nFor now only two fediverse apps are supported:\n* [Mastodon](https://joinmastodon.org/)\n* [Pleroma](https://pleroma.social/#featured-instances) (hopefully after release of version 4.2)\n* [Peertube](https://joinpeertube.org/)\n* [Misskey](https://join.misskey.page/)\n\nData providers for more apps will be probably added soon (Pull requests are welcomed)\n\n## Config\n\nConfiguration is done using environmental variables:\n\n| Variable                       | Description                                                                                         | Default value / Example value             |\n|--------------------------------|-----------------------------------------------------------------------------------------------------|-------------------------------------------|\n | `ELASTIC_URL`                  | Url address of ElasticSearch server                                                                 | `http://elastic:9200`                     |\n| `ELASTIC_USER`                 | Username for EalsticSearch server                                                                   | `elastic`                                 |\n| `ELASTIC_PASSWORD`             | Username for EalsticSearch server                                                                   | empty                                     |\n| `SEED_NODE_DOMAIN`             | Domain of the first node to search users and other nodes on                                         | `mastodon.social,mastodon.online`         |\n| `REATTEMPT_MINUTES`            | _Optional_, How many minutes should be waited for next node refresh attempt if the refresh fails    | `60 `                                     | \n| `REFRESH_HOURS`                | _Optional_, How often (in hours) should be node info refreshed                                      | `120`                                     |\n| `WAIT_FOR_JOB_MINUTES`         | _Optional_, How many minutes should the thread sleep if there are no nodes to refresh               | `60`                                      |\n| `DEFAULT_TIMEOUT_MILLISECONDS` | _Optional_, How many milliseconds should http wait for node api response on refresh                 | `10000`                                   |\n| `SEED_TIMEOUT_MILLISECONDS`    | _Optional_, How many milliseconds should http wait for node api response on refresh of seed domains | _value of `DEFAULT_TIMEOUT_MILLISECONDS`_ |\n| `BANNED_DOMAINS`               | _Optional_, Domains not to index (even with subdomains)                                             | _empty_                                   |\n| `CRAWLING_VERSION`             | _Optional_, Increasing this number can enforce recrawling of the whole index                        | 0                                         |\n| `MAX_CRAWLING_DEPTH`           | _Optional_, Limits how far is fediverse indexed from seed nodes                                     | _empty_                                   |\n| `TZ`                           | _Optional_, Timezone                                                                                | `UTC`                                     |\n## Deploy\nApp is designed to be run in docker container and deployed using docker-compose. \nMore info can be found in [FediSearch example docker-compose](https://github.com/Stopka/fedisearch-compose) project\n\nFor searching in collected feeds there is a companion server app [FediSearch](https://github.com/Stopka/fedisearch)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstopka%2Ffedicrawl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstopka%2Ffedicrawl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstopka%2Ffedicrawl/lists"}