Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/centic9/file-type-detection
A small tool to use Apache Tika to determine the mime-type of all files in a directory
https://github.com/centic9/file-type-detection
Last synced: 2 days ago
JSON representation
A small tool to use Apache Tika to determine the mime-type of all files in a directory
- Host: GitHub
- URL: https://github.com/centic9/file-type-detection
- Owner: centic9
- License: apache-2.0
- Created: 2016-06-24T06:59:20.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2024-08-11T15:18:40.000Z (about 2 months ago)
- Last Synced: 2024-08-11T16:34:23.811Z (about 2 months ago)
- Language: Java
- Size: 358 KB
- Stars: 2
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
[![Build Status](https://travis-ci.org/centic9/file-type-detection.svg)](https://travis-ci.org/centic9/file-type-detection) [![Gradle Status](https://gradleupdate.appspot.com/centic9/file-type-detection/status.svg?branch=master)](https://gradleupdate.appspot.com/centic9/file-type-detection/status)
This is a small tool to use [Apache Tika](http://tika.apache.org) to detect the mime-type of files in a
directory and produce JSON output that can be used for further processing.The JSON is printed to stdout. Summary/Error information is printed to stderr.
So a typical invocation will redirect stdout to a file via `> file-types.txt`#### Getting started
##### Grab it
git clone https://github.com/centic9/file-type-detection.git
cd file-type-detection##### Build it
./gradlew check installDist
#### Run it
build/install/file-type-detection/bin/file-type-detection > file-types.txt
### How it works
The actual code is quite small, it uses the `DirectoryWalker` from
[Apache Commons IO](/https://commons.apache.org/proper/commons-io/) to
search the provided directories and invokes a handler for each file that is found.The handler uses a thread-pool to schedule a `Runnable` to an `Executor` which performs the
detection of the file-type via Apache Tika.The async handling allows to scan the file-system in
parallel to the file detection logic.### Helper for extracting text from files
As Tika is very good at text-extraction as well, this project also provides a small
tool to extract text from any file-type which it supports.Run the following Java application: `org.dstadler.filesearch.ExtractText`
### Support this project
If you find this tool useful and would like to support it, you can [Sponsor the author](https://github.com/sponsors/centic9)
### Licensing
Copyright 2013-2022 Dominik Stadler
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at [http://www.apache.org/licenses/LICENSE-2.0](http://www.apache.org/licenses/LICENSE-2.0)Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.