{"id":13484237,"url":"https://github.com/benbalter/word-to-markdown","last_synced_at":"2025-05-13T18:14:39.837Z","repository":{"id":15289062,"uuid":"18018625","full_name":"benbalter/word-to-markdown","owner":"benbalter","description":"A ruby gem to liberate content from Microsoft Word documents","archived":false,"fork":false,"pushed_at":"2025-01-08T20:41:27.000Z","size":1267,"stargazers_count":1505,"open_issues_count":10,"forks_count":157,"subscribers_count":44,"default_branch":"main","last_synced_at":"2025-04-25T17:55:55.626Z","etag":null,"topics":["converter","libreoffice","markdown","microsoft-word","ruby","word"],"latest_commit_sha":null,"homepage":"https://word2md.com","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/benbalter.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"docs/CONTRIBUTING.md","funding":".github/funding.yml","license":"LICENSE.md","code_of_conduct":"docs/CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":"docs/SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"patreon":"benbalter"}},"created_at":"2014-03-22T20:03:23.000Z","updated_at":"2025-04-23T05:24:54.000Z","dependencies_parsed_at":"2024-02-16T20:27:02.589Z","dependency_job_id":"6ad217c6-b4d2-4833-b45a-62f72286c1db","html_url":"https://github.com/benbalter/word-to-markdown","commit_stats":{"total_commits":330,"total_committers":17,"mean_commits":19.41176470588235,"dds":0.09999999999999998,"last_synced_commit":"582256077f021d1721d28296acaa3e8fda98e3ba"},"previous_names":[],"tags_count":17,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benbalter%2Fword-to-markdown","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benbalter%2Fword-to-markdown/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benbalter%2Fword-to-markdown/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benbalter%2Fword-to-markdown/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/benbalter","download_url":"https://codeload.github.com/benbalter/word-to-markdown/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254000885,"owners_count":21997443,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["converter","libreoffice","markdown","microsoft-word","ruby","word"],"created_at":"2024-07-31T17:01:21.147Z","updated_at":"2025-05-13T18:14:39.813Z","avatar_url":"https://github.com/benbalter.png","language":"Ruby","funding_links":["https://patreon.com/benbalter"],"categories":["Ruby","Uncategorized","Documentation and Presentation","Markdown Processors","Convert to Markdown Tools"],"sub_categories":["Uncategorized","Microsoft Word to Markdown"],"readme":"# Word to Markdown converter\n\nA Ruby gem to liberate content from [the jail that is Word documents](http://ben.balter.com/2012/10/19/we-ve-been-trained-to-make-paper/#jailbreaking-content)\n\n[![CI](https://github.com/benbalter/word-to-markdown/actions/workflows/ci.yml/badge.svg)](https://github.com/benbalter/word-to-markdown/actions/workflows/ci.yml) [![Gem Version](https://badge.fury.io/rb/word-to-markdown.png)](http://badge.fury.io/rb/word-to-markdown) [![Inline docs](http://inch-ci.org/github/benbalter/word-to-markdown.png)](http://inch-ci.org/github/benbalter/word-to-markdown) [![Build status](https://ci.appveyor.com/api/projects/status/x2gnsfvli3q47a2e/branch/master?svg=true)](https://ci.appveyor.com/project/benbalter/word-to-markdown/branch/master) [![Maintainability](https://api.codeclimate.com/v1/badges/aae0d67ea7db185f1595/maintainability)](https://codeclimate.com/github/benbalter/word-to-markdown/maintainability) [![Test Coverage](https://api.codeclimate.com/v1/badges/aae0d67ea7db185f1595/test_coverage)](https://codeclimate.com/github/benbalter/word-to-markdown/test_coverage)\n\n## The problem\n\n\u003e Our default content publishing workflow is terribly broken. [We've all been trained to make paper](http://ben.balter.com/2012/10/19/we-ve-been-trained-to-make-paper/), yet today, content authored once is more commonly consumed in multiple formats, and rarely, if ever, does it embody physical form. Put another way, our go-to content authoring workflow remains relatively unchanged since it was conceived in the early 80s.\n\u003e\n\u003e I'm asked regularly by government employees — knowledge workers who fire up a desktop word processor as the first step to any project — for an automated pipeline to convert Microsoft Word documents to [Markdown](http://guides.github.com/overviews/mastering-markdown/), the *lingua franca* of the internet, but as my recent foray into building [just such a converter](http://word-to-markdown.herokuapp.com/) proves, it's not that simple.\n\u003e\n\u003e Markdown isn't just an alternative format. Markdown forces you to write for the web.\n\n**[Read more](http://ben.balter.com/2014/03/31/word-versus-markdown-more-than-mere-semantics/)**\n\n## Just want to convert a Microsoft Word (or Google) document to Markdown?\n\nYou can use this **[hosted service](https://word2md.com/)** (or check out [its source](https://github.com/benbalter/word-to-markdown-server)).\n\n## Install\n\nYou'll need to install [LibreOffice](http://www.libreoffice.org/). Then:\n\n```bash\ngem install word-to-markdown\n```\n\n## Usage\n\n```ruby\nfile = WordToMarkdown.new(\"/path/to/document.docx\")\n=\u003e \u003cWordToMarkdown path=\"/path/to/document.docx\"\u003e\n\nfile.to_s\n=\u003e \"# Test\\n\\n This is a test\"\n\nfile.document.tree\n=\u003e \u003cNokogiri Document\u003e\n```\n\n### Command line usage\n\nOnce you've installed the gem, it's just:\n\n```\n$ w2m path/to/document.docx\n```\n\n*Outputs the resulting markdown to stdout*\n\n## Supports\n\n* Paragraphs\n* Numbered lists\n* Unnumbered lists\n* Nested lists\n* Italic\n* Bold\n* Explicit headings (e.g., selected as \"Heading 1\" or \"Heading 2\")\n* Implicit headings (e.g., text with a larger font size relative to paragraph text)\n* Images\n* Tables\n* Hyperlinks\n\n## Requirements and configuration\n\nWord-to-markdown requires `soffice` a command line interface to LibreOffice that works on Linux, Mac, and Windows. To install soffice, see [the LibreOffice documentation](https://www.libreoffice.org/get-help/install-howto/).\n\n## Testing\n\n```\nscript/cibuild\n```\n\n## Docker\n\nFirst, create the `Gemfile.lock` by installing the dependencies:\n\n```\nbundle install\n```\n\nEverything you need to run the executable locally:\n\n```\ndocker-compose build\ndocker-compose run --rm app bundle exec w2m --help\ndocker-compose run --rm app bundle exec w2m test/fixtures/em.docx\n```\n\n## Hosted service\n\n[Word-to-markdown-server](https://github.com/benbalter/word-to-markdown-server) contains a lightweight server for converting Word Documents as a service. A live version runs at [word2md.com](https://word2md.com).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbenbalter%2Fword-to-markdown","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbenbalter%2Fword-to-markdown","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbenbalter%2Fword-to-markdown/lists"}