https://github.com/ssciwr/mailcom

Recognize and pseudonymize named entities in emails
https://github.com/ssciwr/mailcom

anonymization data-privacy llm-inference pseudonymization text-preprocessing

Last synced: about 1 month ago
JSON representation

Recognize and pseudonymize named entities in emails

Host: GitHub
URL: https://github.com/ssciwr/mailcom
Owner: ssciwr
License: mit
Created: 2022-04-08T06:59:17.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2025-04-17T15:28:58.000Z (about 2 months ago)
Last Synced: 2025-04-17T21:24:28.371Z (about 1 month ago)
Topics: anonymization, data-privacy, llm-inference, pseudonymization, text-preprocessing
Language: Python
Homepage: https://ssciwr.github.io/mailcom/
Size: 16 MB
Stars: 1
Watchers: 2
Forks: 1
Open Issues: 8
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # mailcom

Tool to parse email body from email text (eml file), and retains only the text, with names removed, for French of Spanish emails.

# Installation

Install using  

`python -m pip install mailcom`

You will also need to download the French and Spanish models for spaCy and Stanza using the provided script - run this in the terminal:

`./get-models.sh`

For an overview over the available languages and models, check the [spaCy](https://spacy.io/usage/models) website.

# Usage

The package uses spaCy for sentencizing, based on the default language models, and transformers for NER recognition.

Currently, you have to set the language and eml file directory manually at the top of `parse.py`; the default directory is `data/in`. Then run `python parse.py`. After the run, the output can be found in `data/out`.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ssciwr/mailcom

Awesome Lists containing this project

README