{"id":15454587,"url":"https://github.com/kuroda/extr_text","last_synced_at":"2026-03-05T03:31:07.411Z","repository":{"id":45625493,"uuid":"429732535","full_name":"kuroda/extr_text","owner":"kuroda","description":"An Elixir library for extracting text and metadata from docs/xlsx/pptx files.","archived":false,"fork":false,"pushed_at":"2022-03-17T16:46:27.000Z","size":135,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-04-14T07:49:31.577Z","etag":null,"topics":["elixir-lang","excel","microsoft","ooxml"],"latest_commit_sha":null,"homepage":"","language":"Elixir","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kuroda.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-11-19T09:03:05.000Z","updated_at":"2021-12-04T05:22:16.000Z","dependencies_parsed_at":"2022-09-10T18:03:29.035Z","dependency_job_id":null,"html_url":"https://github.com/kuroda/extr_text","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kuroda%2Fextr_text","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kuroda%2Fextr_text/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kuroda%2Fextr_text/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kuroda%2Fextr_text/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kuroda","download_url":"https://codeload.github.com/kuroda/extr_text/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243933437,"owners_count":20370986,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["elixir-lang","excel","microsoft","ooxml"],"created_at":"2024-10-01T22:04:11.498Z","updated_at":"2026-03-05T03:31:07.404Z","avatar_url":"https://github.com/kuroda.png","language":"Elixir","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ExtrText\n\n[![ExtrText version](https://img.shields.io/hexpm/v/extr_text.svg)](https://hex.pm/packages/extr_text)\n[![Hex.pm](https://img.shields.io/hexpm/dt/extr_text.svg)](https://hex.pm/packages/extr_text)\n\n*ExtrText* is an Elixir library for extracting text and meta information from `.docx`, `.xlsx` and `.pptx` files.\n\n## Usage\n\n```elixir\niex\u003e docx = File.read!(\"example.docx\")\niex\u003e {:ok, texts} = ExtrText.get_texts(docx)\niex\u003e texts\n[\n  [\"Paragraph 1\", \"Paragraph 2\", \"Paragraph 3\"]\n]\niex\u003e {:ok, metadata} = ExtrText.get_metadata(docx)\niex\u003e metadata\n%ExtrText.Metadata{\n  created: ~U[2021-11-19 22:25:20Z],\n  creator: \"John Doe\",\n  description: \"\",\n  keywords: \"\",\n  language: \"ja-JP\",\n  last_modified_by: \"John Doe\",\n  modified: ~U[2021-11-22 21:24:43Z],\n  revision: 2,\n  subject: \"\",\n  title: \"Example\"\n}\n```\n\n## Installation\n\nAdd `:extr_text` to your `mix.exs`:\n\n```elixir\n  defp deps do\n    [\n      {:extr_text, \"~\u003e 0.3.2\"}\n    ]\n  end\n```\n\nThen, run `mix deps.get`.\n\n## Limitations\n\n* The function `ExtrText.get_texts/1` extracts numbers and dates without format from an Excel file.\n  For example, even if the date is displayed as `3-Jan-20` on the Excel screen,\n  it will be extracted as `2020-01-03`.\n\n## Acknowledgments\n\nThis project is inspired by [ranguba/chupa-text](https://github.com/ranguba/chupa-text),\na Ruby gem package.\n\n## Author\n\n[Tsutomu Kuroda](\u003cmailto:t-kuroda@coregenik.com\u003e)\n\n## License\n\n[MIT license](./MIT_LICENSE.txt)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkuroda%2Fextr_text","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkuroda%2Fextr_text","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkuroda%2Fextr_text/lists"}