https://github.com/badbye/docxpy

A pure python based utility to extract text and images from docx files.
https://github.com/badbye/docxpy

docx python python3

Last synced: 9 months ago
JSON representation

A pure python based utility to extract text and images from docx files.

Host: GitHub
URL: https://github.com/badbye/docxpy
Owner: badbye
License: mit
Fork: true (ankushshah89/python-docx2txt)
Created: 2017-03-02T06:58:15.000Z (over 9 years ago)
Default Branch: master
Last Pushed: 2022-10-23T00:09:40.000Z (over 3 years ago)
Last Synced: 2025-09-25T15:27:50.693Z (9 months ago)
Topics: docx, python, python3
Language: Python
Homepage:
Size: 46.9 KB
Stars: 5
Watchers: 1
Forks: 4
Open Issues: 1
Metadata Files:
- Readme: README.rst
- License: LICENSE.txt

Awesome Lists containing this project

README

          docxpy

======

|image0| |PyPI|

This project is forked from

`ankushshah89/python-docx2txt `__.

A new feature is added: extract the hyperlinks and its corresponding

texts.

It is a pure python-based utility to extract text from docx files. The

code is taken and adapted from

`python-docx `__. It can

however also extract **text** from header, footer and **hyperlinks**. It

can now also extract **images**.

How to install?

---------------

.. code:: bash

    pip install docxpy

How to run?

-----------

a. From command line:

.. code:: bash

    # extract text

    docx2txt file.docx

    # extract text and images

    docx2txt -i /tmp/img_dir file.docx

b. From python:

.. code:: python

    import docxpy

    file = 'file.docx'

    # extract text

    text = docxpy.process(file)

    # extract text and write images in /tmp/img_dir

    text = docxpy.process(file, "/tmp/img_dir")

    # if you want the hyperlinks

    doc = docxpy.DOCReader(file)

    doc.process()  # process file

    hyperlinks = doc.data['links']

.. |image0| image:: https://travis-ci.org/badbye/docxpy.svg?branch=master

.. |PyPI| image:: https://img.shields.io/pypi/pyversions/scrapy-corenlp.svg?style=flat-square

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/badbye/docxpy

Awesome Lists containing this project

README