Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kuroda/extr_text
An Elixir library for extracting text and metadata from docs/xlsx/pptx files.
https://github.com/kuroda/extr_text
elixir-lang excel microsoft ooxml
Last synced: 16 days ago
JSON representation
An Elixir library for extracting text and metadata from docs/xlsx/pptx files.
- Host: GitHub
- URL: https://github.com/kuroda/extr_text
- Owner: kuroda
- License: mit
- Created: 2021-11-19T09:03:05.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2022-03-17T16:46:27.000Z (almost 3 years ago)
- Last Synced: 2024-04-14T07:49:31.577Z (9 months ago)
- Topics: elixir-lang, excel, microsoft, ooxml
- Language: Elixir
- Homepage:
- Size: 132 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
Awesome Lists containing this project
README
# ExtrText
[![ExtrText version](https://img.shields.io/hexpm/v/extr_text.svg)](https://hex.pm/packages/extr_text)
[![Hex.pm](https://img.shields.io/hexpm/dt/extr_text.svg)](https://hex.pm/packages/extr_text)*ExtrText* is an Elixir library for extracting text and meta information from `.docx`, `.xlsx` and `.pptx` files.
## Usage
```elixir
iex> docx = File.read!("example.docx")
iex> {:ok, texts} = ExtrText.get_texts(docx)
iex> texts
[
["Paragraph 1", "Paragraph 2", "Paragraph 3"]
]
iex> {:ok, metadata} = ExtrText.get_metadata(docx)
iex> metadata
%ExtrText.Metadata{
created: ~U[2021-11-19 22:25:20Z],
creator: "John Doe",
description: "",
keywords: "",
language: "ja-JP",
last_modified_by: "John Doe",
modified: ~U[2021-11-22 21:24:43Z],
revision: 2,
subject: "",
title: "Example"
}
```## Installation
Add `:extr_text` to your `mix.exs`:
```elixir
defp deps do
[
{:extr_text, "~> 0.3.1"}
]
end
```Then, run `mix deps.get`.
## Limitations
* The function `ExtrText.get_texts/1` extracts numbers and dates without format from an Excel file.
For example, even if the date is displayed as `3-Jan-20` on the Excel screen,
it will be extracted as `2020-01-03`.## Acknowledgments
This project is inspired by [ranguba/chupa-text](https://github.com/ranguba/chupa-text),
a Ruby gem package.## Author
[Tsutomu Kuroda]()
## License
[MIT license](./MIT_LICENSE.txt)