{"id":50206814,"url":"https://github.com/Norconex/crawler","last_synced_at":"2026-06-11T17:00:32.765Z","repository":{"id":7044746,"uuid":"8323037","full_name":"Norconex/crawler","owner":"Norconex","description":"Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.","archived":false,"fork":false,"pushed_at":"2026-06-11T05:31:33.000Z","size":19055,"stargazers_count":202,"open_issues_count":24,"forks_count":71,"subscribers_count":30,"default_branch":"main","last_synced_at":"2026-06-11T07:05:48.079Z","etag":null,"topics":["collector-fs","collector-http","crawler","crawlers","filesystem-crawler","flexible","java","search-engine","web-crawler"],"latest_commit_sha":null,"homepage":"https://opensource.norconex.com/crawlers","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Norconex.png","metadata":{"files":{"readme":"README.md","changelog":"changelogs/README.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2013-02-20T21:50:42.000Z","updated_at":"2026-06-11T05:10:56.000Z","dependencies_parsed_at":"2023-10-12T17:05:11.255Z","dependency_job_id":"42d81640-ee12-4fac-874b-d21eb6d0eba6","html_url":"https://github.com/Norconex/crawler","commit_stats":{"total_commits":922,"total_committers":13,"mean_commits":70.92307692307692,"dds":0.0585683297180043,"last_synced_commit":"5a49d5c1e0d7c9101e5d8ba50a5e416d8f27d8b9"},"previous_names":["norconex/collector-http","norconex/crawler"],"tags_count":50,"template":false,"template_full_name":null,"purl":"pkg:github/Norconex/crawler","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Norconex%2Fcrawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Norconex%2Fcrawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Norconex%2Fcrawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Norconex%2Fcrawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Norconex","download_url":"https://codeload.github.com/Norconex/crawler/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Norconex%2Fcrawler/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34208761,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-11T02:00:06.485Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["collector-fs","collector-http","crawler","crawlers","filesystem-crawler","flexible","java","search-engine","web-crawler"],"created_at":"2026-05-26T01:57:49.266Z","updated_at":"2026-06-11T17:00:32.760Z","avatar_url":"https://github.com/Norconex.png","language":"Java","funding_links":[],"categories":["Java"],"sub_categories":[],"readme":"# Norconex Crawlers\n\nNorconex web and file system crawlers are full-featured crawlers (or spider) that can manipulate and store collected data in a repository of your choice (e.g., a search engine). They are very flexible, powerful, easy to extend, and portable. They can be used command-line with file-based configuration on any OS or embedded into Java applications using well-documented APIs.\n\nVisit the website for binary downloads and documentation:\nhttps://opensource.norconex.com/crawlers/\n\n## Are you on the right branch?\n\nThis branch holds version 4 code, which is still in development.\n\n**For the latest stable release of Norconex Web Crawler, use the [version 3 branch](https://github.com/Norconex/crawlers/tree/3.x-branch).**\n\n# UPCOMING: Crawler V4 Stack\n\nThe default `main` branch holds code for the upcoming version 4 crawler stack. It is now a mono-repo containing all Norconex crawler-related projects previously maintained in separate repos. All projects in this mono report will now be released simultaneously and share the same version number.\n\nUntil v4 is officially released, this branch should not be considered stable.\n\n## Projects\n\n[![Java CI with Maven](https://github.com/Norconex/crawlers/actions/workflows/maven-ci-cd.yaml/badge.svg)](https://github.com/Norconex/crawlers/actions/workflows/maven-ci-cd.yaml)\n\n| Folder                          | Artifact Id                       | Build                                                                                                                                                                                                                                                               |\n| ------------------------------- | --------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| crawler/core/                   | nx-crawler-core test              | [![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=com.norconex.crawler%3Anx-crawler-core\u0026metric=alert_status)](https://sonarcloud.io/summary/new_code?id=com.norconex.crawler%3Anx-crawler-core)                                     |\n| crawler/fs/                     | nx-crawler-fs                     | [![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=com.norconex.crawler%3Anx-crawler-fs\u0026metric=alert_status)](https://sonarcloud.io/summary/new_code?id=com.norconex.crawler%3Anx-crawler-fs)                                         |\n| crawler/web/                    | nx-crawler-web                    | [![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=com.norconex.crawler%3Anx-crawler-web\u0026metric=alert_status)](https://sonarcloud.io/summary/new_code?id=com.norconex.crawler%3Anx-crawler-web)                                       |\n| importer/                       | nx-importer                       | [![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=com.norconex.crawler%3Anx-importer\u0026metric=alert_status)](https://sonarcloud.io/summary/new_code?id=com.norconex.crawler%3Anx-importer)                                             |\n| committer/apachekafka/          | nx-committer-apachekafka          | [![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=com.norconex.crawler%3Anx-committer-apachekafka\u0026metric=alert_status)](https://sonarcloud.io/summary/new_code?id=com.norconex.crawler%3Anx-committer-apachekafka)                   |\n| committer/azurecognitivesearch/ | nx-committer-azurecognitivesearch | [![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=com.norconex.crawler%3Anx-committer-azurecognitivesearch\u0026metric=alert_status)](https://sonarcloud.io/summary/new_code?id=com.norconex.crawler%3Anx-committer-azurecognitivesearch) |\n| committer/core/                 | nx-committer-core                 | [![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=com.norconex.crawler%3Anx-committer-core\u0026metric=alert_status)](https://sonarcloud.io/summary/new_code?id=com.norconex.crawler%3Anx-committer-core)                                 |\n| committer/idol/                 | nx-committer-idol                 | [![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=com.norconex.crawler%3Anx-committer-idol\u0026metric=alert_status)](https://sonarcloud.io/summary/new_code?id=com.norconex.crawler%3Anx-committer-idol)                                 |\n| committer/elasticsearch/        | nx-committer-elasticsearch        | [![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=com.norconex.crawler%3Anx-committer-elasticsearch\u0026metric=alert_status)](https://sonarcloud.io/summary/new_code?id=com.norconex.crawler%3Anx-committer-elasticsearch)               |\n| committer/neo4j/                | nx-committer-neo4j                | [![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=com.norconex.crawler%3Anx-committer-neo4j\u0026metric=alert_status)](https://sonarcloud.io/summary/new_code?id=com.norconex.crawler%3Anx-committer-neo4j)                               |\n| committer/solr/                 | nx-committer-solr                 | [![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=com.norconex.crawler%3Anx-committer-solr\u0026metric=alert_status)](https://sonarcloud.io/summary/new_code?id=com.norconex.crawler%3Anx-committer-solr)                                 |\n| committer/sql/                  | nx-committer-sql                  | [![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=com.norconex.crawler%3Anx-committer-sql\u0026metric=alert_status)](https://sonarcloud.io/summary/new_code?id=com.norconex.crawler%3Anx-committer-sql)                                   |\n| 🪦 committer/amazoncloudsearch/ | nx-committer-amazoncloudsearch    | Deprecated                                                                                                                                                                                                                                                          |\n\nAll projects in this repository share the same Maven group id:\n\n    com.norconex.crawler\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FNorconex%2Fcrawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FNorconex%2Fcrawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FNorconex%2Fcrawler/lists"}