Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ribugent/perl-apache-tika

A Perl interface to the Apache Tika api
https://github.com/ribugent/perl-apache-tika

perl perl-module tika tika-api

Last synced: about 2 months ago
JSON representation

A Perl interface to the Apache Tika api

Awesome Lists containing this project

README

        

# NAME

Apache::Tika - A perl interface to Apache Tika API

# SYNOPSIS

use Apache::Tika

my $tika = Apache::Tika->new();

# Extract metadata and text from a pdf file
open my $fh, '<:raw', '/local/file.pdf';
my $pdf = do { local $/; <$fh> };
close $fh;

my $meta = $tika->meta($pdf);
my $text = $tika->tika($pdf);

# Extract text from a website
my $response = LWP::UserAgent->get('http://some.web.site');
my $text = $tika->tika(
$r->decoded_content('charset' => 'none'),
$r->headers->header('content-type')
);

# DESCRIPTION

This module provide Apache Tika api support

# CONSTRUCTOR

- Apache::Tika->new(%options)

This constructs `Apache::Tika` object. You can specify the following options

- url

Apache Tika server url (defaults to http://localhost:9998)

- ua

Custom useragent

# METHODS

The following api methods are available, to get more information about method responses visit [http://wiki.apache.org/tika/TikaJAXRS](http://wiki.apache.org/tika/TikaJAXRS)

- $tika->meta($bytes, $contentType)
- $tika->rmeta($bytes, $contentType, $format)
- $tika->tika($bytes, $contentType)
- $tika->detect\_stream($bytes)
- $tika->language\_stream($bytes)

The $bytes parameter is always required and must contain the data to send to the server.
The $contentType is optional, but if know the $bytes content-type (p.e. "text/html; charset=iso-8") you can send it to improve the tika response.

# SEE ALSO

[Apache Tika](http://wiki.apache.org/tika/TikaJAXRS)