{"id":15346685,"url":"https://github.com/centic9/file-type-detection","last_synced_at":"2025-10-09T09:32:58.728Z","repository":{"id":56561220,"uuid":"61863648","full_name":"centic9/file-type-detection","owner":"centic9","description":"A small tool to use Apache Tika to determine the mime-type of all files in a directory","archived":false,"fork":false,"pushed_at":"2025-08-24T07:47:03.000Z","size":417,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-08-24T14:26:24.850Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/centic9.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null},"funding":{"github":"centic9"}},"created_at":"2016-06-24T06:59:20.000Z","updated_at":"2025-08-24T07:47:06.000Z","dependencies_parsed_at":"2023-02-16T00:45:26.038Z","dependency_job_id":"ddeb9e8c-16ee-4241-b84b-843c69a9e400","html_url":"https://github.com/centic9/file-type-detection","commit_stats":{"total_commits":106,"total_committers":4,"mean_commits":26.5,"dds":0.07547169811320753,"last_synced_commit":"9a4bbe800cfbf0851063f8812f4058949e723c7b"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/centic9/file-type-detection","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/centic9%2Ffile-type-detection","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/centic9%2Ffile-type-detection/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/centic9%2Ffile-type-detection/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/centic9%2Ffile-type-detection/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/centic9","download_url":"https://codeload.github.com/centic9/file-type-detection/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/centic9%2Ffile-type-detection/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279001117,"owners_count":26083022,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-09T02:00:07.460Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-01T11:26:00.595Z","updated_at":"2025-10-09T09:32:58.692Z","avatar_url":"https://github.com/centic9.png","language":"Java","funding_links":["https://github.com/sponsors/centic9"],"categories":[],"sub_categories":[],"readme":"[![Build Status](https://travis-ci.org/centic9/file-type-detection.svg)](https://travis-ci.org/centic9/file-type-detection) [![Gradle Status](https://gradleupdate.appspot.com/centic9/file-type-detection/status.svg?branch=master)](https://gradleupdate.appspot.com/centic9/file-type-detection/status)\n\nThis is a small tool to use [Apache Tika](http://tika.apache.org) to detect the mime-type of files in a\ndirectory and produce JSON output that can be used for further processing.\n\nThe JSON is printed to stdout. Summary/Error information is printed to stderr.\nSo a typical invocation will redirect stdout to a file via `\u003e file-types.txt`\n\n#### Getting started\n\n##### Grab it\n\n    git clone https://github.com/centic9/file-type-detection.git\n    cd file-type-detection\n\n##### Build it\n\n    ./gradlew check installDist\n\n#### Run it\n\n    build/install/file-type-detection/bin/file-type-detection \u003cdirectory\u003e \u003e file-types.txt\n\n### How it works\n\nThe actual code is quite small, it uses the `DirectoryWalker` from \n[Apache Commons IO](/https://commons.apache.org/proper/commons-io/) to\nsearch the provided directories and invokes a handler for each file that is found.\n\nThe handler uses a thread-pool to schedule a `Runnable` to an `Executor` which performs the\ndetection of the file-type via Apache Tika. \n\nThe async handling allows to scan the file-system in\nparallel to the file detection logic.\n\n### Helper for extracting text from files\n\nAs Tika is very good at text-extraction as well, this project also provides a small \ntool to extract text from any file-type which it supports.\n\nRun the following Java application: `org.dstadler.filesearch.ExtractText`\n\n### Support this project\n\nIf you find this tool useful and would like to support it, you can [Sponsor the author](https://github.com/sponsors/centic9)\n\n### Licensing\n\n   Copyright 2013-2022 Dominik Stadler\n\n   Licensed under the Apache License, Version 2.0 (the \"License\");\n   you may not use this file except in compliance with the License.\n   You may obtain a copy of the License at [http://www.apache.org/licenses/LICENSE-2.0](http://www.apache.org/licenses/LICENSE-2.0)\n\n   Unless required by applicable law or agreed to in writing, software\n   distributed under the License is distributed on an \"AS IS\" BASIS,\n   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n   See the License for the specific language governing permissions and\n   limitations under the License.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcentic9%2Ffile-type-detection","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcentic9%2Ffile-type-detection","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcentic9%2Ffile-type-detection/lists"}