https://github.com/openscilab/dmeta
Remove Metadata from Microsoft Office Files
https://github.com/openscilab/dmeta
anonymization docx metadata metadata-editor microsoft-excel microsoft-office microsoft-powerpoint microsoft-word pptx xlsx
Last synced: 15 days ago
JSON representation
Remove Metadata from Microsoft Office Files
- Host: GitHub
- URL: https://github.com/openscilab/dmeta
- Owner: openscilab
- License: mit
- Created: 2023-09-23T17:12:13.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2025-01-07T21:07:39.000Z (9 months ago)
- Last Synced: 2025-01-07T22:22:31.505Z (9 months ago)
- Topics: anonymization, docx, metadata, metadata-editor, microsoft-excel, microsoft-office, microsoft-powerpoint, microsoft-word, pptx, xlsx
- Language: Python
- Homepage:
- Size: 730 KB
- Stars: 17
- Watchers: 0
- Forks: 2
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: .github/CONTRIBUTING.md
- License: LICENSE
- Code of conduct: .github/CODE_OF_CONDUCT.md
- Security: SECURITY.md
- Authors: AUTHORS.md
Awesome Lists containing this project
README
----------
## Overview
DMeta is an open source Python package that removes metadata of Microsoft Office files.
Branch
main
dev
CI
![]()
![]()
## Installation
### PyPI
- Check [Python Packaging User Guide](https://packaging.python.org/installing/)
- Run `pip install dmeta==0.4`
### Source code
- Download [Version 0.4](https://github.com/openscilab/dmeta/archive/v0.4.zip) or [Latest Source](https://github.com/openscilab/dmeta/archive/dev.zip)
- Run `pip install .`## Usage
### In Python
⚠️ Use `in_place` to apply the changes directly to the original file.⚠️`in_place` flag is `False` by default.
#### Clear metadata for a .docx file in place
```python
import os
from dmeta.functions import clearDOCX_FILE_PATH = os.path.join(os.getcwd(), "sample.docx")
clear(DOCX_FILE_PATH, in_place=True)
```
#### Clear metadata for all existing microsoft files (.docx|.pptx|.xlsx) in the current directory
```python
from dmeta.functions import clear_all
clear_all()
```
#### Update metadata for a .pptx file in place
```python
import os
from dmeta.functions import updateCONFIG_FILE_PATH = os.path.join(os.getcwd(), "config.json")
DOCX_FILE_PATH = os.path.join(os.getcwd(), "sample.pptx")
update(CONFIG_FILE_PATH, DOCX_FILE_PATH, in_place=True)
```
#### Update metadata for all existing microsoft files (.docx|.pptx|.xlsx) in the current directory
```python
import os
from dmeta.functions import update_allCONFIG_FILE_PATH = os.path.join(os.getcwd(), "config.json")
update_all(CONFIG_FILE_PATH)
```### CLI
⚠️ You can use `dmeta` or `python -m dmeta` to run this program⚠️ Use `--inplace` to apply the changes directly to the original file.
#### Clear metadata for a .docx file in place
```console
dmeta --clear "./test_a.docx" --inplace
```
#### Clear metadata for all existing microsoft files (.docx|.pptx|.xlsx) in the current directory
```console
dmeta --clear-all
```
#### Update metadata for a .xlsx file in place
```console
dmeta --update "./test_a.xlsx" --config "./config.json" --inplace
```
#### Update metadata for all existing microsoft files (.docx|.pptx|.xlsx) files in the current directory
```console
dmeta --update-all --config "./config.json"
```
#### Version
```console
dmeta -v
dmeta --version
```
#### Info
```console
dmeta --info
```### Dmeta as pre-commit hook
To ensure that **no Microsoft Office files ever enter your repo with embedded metadata**, you can use Dmeta’s built-in pre-commit hooks.
#### 1. Install the pre-commit framework
If you don’t already have it:
```bash
pip install pre-commit
```#### 2. Add Dmeta to your project’s .pre-commit-config.yaml
In your project root, create or update .pre-commit-config.yaml:
```yaml
repos:
- repo: https://github.com/openscilab/dmeta.git
rev: v0.4 # minimum v0.4 or commit SHA
hooks:
- id: clear-metadata
```
* `rev`: must exactly match the minimum tag supporting pre-commit hooks or the commit SHA where the targetted `.pre-commit-hooks.yaml` exists.#### 3. Install the hook
```bash
pre-commit install # or pre_commit install (in windows)
```Now, every time you `git commit`, Dmeta will automatically clear metadata from any Microsoft files in-place.
#### ⚠️ Important: Clean Before You Commit
Do **not** stage or add Microsoft Office files **before** removing their metadata.
If you run `git add` on Office files that still contain embedded metadata, the pre-commit hook will attempt to clean them **in-place**, which modifies the files after they’ve been staged. As a result, **Git will block the commit** because the content has changed mid-process.
#### ✅ Suggested Correct Workflow
1. Let the hook run automatically on earlier commits that didn’t add Office files, or run it manually. To do manually you can run `pre-commit run clear-metadata --all-files`
2. Then:
```bash
git add
git commit -m "Your message"
```## Supported files
| File format | support |
| ---------------- | ---------------- |
| Microsoft Word (.docx) | ✅ |
| Microsoft PowerPoint (.pptx) | ✅ |
| Microsoft Excel (.xlsx) | ✅ |## Issues & bug reports
Just fill an issue and describe it. We'll check it ASAP! or send an email to [dmeta@openscilab.com](mailto:dmeta@openscilab.com "dmeta@openscilab.com").
- Please complete the issue template
You can also join our discord server## Acknowledgments
[Python Software Foundation (PSF)](https://www.python.org/psf/) granted DMeta library partially for version(s) 0.4.
[PSF](https://www.python.org/psf/) is the organization behind Python. Their mission is to promote, protect, and advance the Python programming language and to support and facilitate the growth of a diverse and international community of Python programmers.## Show your support
### Star this repo
Give a ⭐️ if this project helped you!
### Donate to our project
If you do like our project and we hope that you do, can you please support us? Our project is not and is never going to be working for profit. We need the money just so we can continue doing what we do ;-) .