{"id":22756327,"url":"https://github.com/spamscope/mail-parser","last_synced_at":"2025-05-14T03:06:13.300Z","repository":{"id":45006517,"uuid":"66961987","full_name":"SpamScope/mail-parser","owner":"SpamScope","description":"Tokenizer for raw mails","archived":false,"fork":false,"pushed_at":"2024-10-24T07:16:58.000Z","size":4755,"stargazers_count":367,"open_issues_count":1,"forks_count":87,"subscribers_count":10,"default_branch":"develop","last_synced_at":"2024-10-24T08:56:37.111Z","etag":null,"topics":["docker","docker-image","mail","mail-analyzer","mail-parser","mailparser","outlook","python","python3","security"],"latest_commit_sha":null,"homepage":"https://pypi.python.org/pypi/mail-parser","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SpamScope.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":["fedelemantuano"]}},"created_at":"2016-08-30T17:19:05.000Z","updated_at":"2024-10-24T07:19:36.000Z","dependencies_parsed_at":"2024-03-27T00:31:22.190Z","dependency_job_id":"01149d7a-1cfb-44d5-87c4-42ed69954586","html_url":"https://github.com/SpamScope/mail-parser","commit_stats":{"total_commits":313,"total_committers":17,"mean_commits":18.41176470588235,"dds":0.07348242811501593,"last_synced_commit":"6e6a30cdd34dad0653bebf05ac9a49ea14b222cc"},"previous_names":[],"tags_count":70,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SpamScope%2Fmail-parser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SpamScope%2Fmail-parser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SpamScope%2Fmail-parser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SpamScope%2Fmail-parser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SpamScope","download_url":"https://codeload.github.com/SpamScope/mail-parser/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248154939,"owners_count":21056541,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["docker","docker-image","mail","mail-analyzer","mail-parser","mailparser","outlook","python","python3","security"],"created_at":"2024-12-11T07:12:50.488Z","updated_at":"2025-04-10T03:40:51.485Z","avatar_url":"https://github.com/SpamScope.png","language":"Python","readme":"[![PyPI - Version](https://img.shields.io/pypi/v/mail-parser)](https://pypi.org/project/mail-parser/)\n[![Coverage Status](https://coveralls.io/repos/github/SpamScope/mail-parser/badge.svg?branch=develop)](https://coveralls.io/github/SpamScope/mail-parser?branch=develop)\n[![PyPI - Downloads](https://img.shields.io/pypi/dm/mail-parser?color=blue)](https://pypistats.org/packages/mail-parser)\n\n\n![SpamScope](https://raw.githubusercontent.com/SpamScope/spamscope/develop/docs/logo/spamscope.png)\n\n# mail-parser\n\nmail-parser is not only a wrapper for [email](https://docs.python.org/2/library/email.message.html) Python Standard Library.\nIt give you an easy way to pass from raw mail to Python object that you can use in your code.\nIt's the key module of [SpamScope](https://github.com/SpamScope/spamscope).\n\nmail-parser can parse Outlook email format (.msg). To use this feature, you need to install `libemail-outlook-message-perl` package. For Debian based systems:\n\n```\n$ apt-get install libemail-outlook-message-perl\n```\n\nFor more details:\n\n```\n$ apt-cache show libemail-outlook-message-perl\n```\n\nmail-parser supports Python 3.\n\n\n# Apache 2 Open Source License\nmail-parser can be downloaded, used, and modified free of charge. It is available under the Apache 2 license.\n\n## Support the project\nIf you find this project useful, you can support it by donating any amount you want. All donations are greatly appreciated and help maintain and develop the project.\n\n[![Donate](https://www.paypal.com/en_US/i/btn/btn_donateCC_LG.gif \"Donate\")](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick\u0026hosted_button_id=VEPXYP745KJF2)\n\n\u003ca href=\"bitcoin:bc1qxhz3tghztpjqdt7atey68s344wvmugtl55tm32\"\u003e\n  \u003cimg src=\"https://github.com/SpamScope/mail-parser/blob/develop/docs/images/Bitcoin%20SpamScope.jpg?raw=true\" alt=\"Bitcoin\" width=\"200\"\u003e\n\u003c/a\u003e\n\nBitcoin Address: `bc1qxhz3tghztpjqdt7atey68s344wvmugtl55tm32`\n\n# mail-parser on Web\n - [Splunk app](https://splunkbase.splunk.com/app/4129/)\n - [FreeBSD port](https://www.freshports.org/mail/py-mail-parser/)\n - [Arch User Repository](https://aur.archlinux.org/packages/mailparser/)\n - [REMnux](https://docs.remnux.org/discover-the-tools/analyze+documents/email+messages#mail-parser)\n\n# Description\n\nmail-parser takes as input a raw email and generates a parsed object. The properties of this object are the same name of\n[RFC headers](https://www.iana.org/assignments/message-headers/message-headers.xhtml):\n\n  - bcc\n  - cc\n  - date\n  - delivered_to\n  - from\\_ (not `from` because is a keyword of Python)\n  - message_id\n  - received\n  - reply_to\n  - subject\n  - to\n\nThere are other properties to get:\n  - body\n  - body html\n  - body plain\n  - headers\n  - attachments\n  - sender IP address\n  - to domains\n  - timezone\n\nThe `attachments` property is a list of objects. Every object has the following keys:\n  - binary: it's true if the attachment is a binary\n  - charset\n  - content_transfer_encoding\n  - content-disposition\n  - content-id\n  - filename\n  - mail_content_type\n  - payload: attachment payload in base64\n\nTo get custom headers you should replace \"-\" with \"\\_\".\nExample for header `X-MSMail-Priority`:\n\n```\n$ mail.X_MSMail_Priority\n```\n\nThe `received` header is parsed and splitted in hop. The fields supported are:\n - by\n - date\n - date_utc\n - delay (between two hop)\n - envelope_from\n - envelope_sender\n - for\n - from\n - hop\n - with\n\n\nmail-parser can detect defect in mail:\n  - [defects](https://docs.python.org/2/library/email.message.html#email.message.Message.defects): mail with some not compliance RFC part\n\nAll properties have a JSON and raw property that you can get with:\n - name_json\n - name_raw\n\nExample:\n\n```\n$ mail.to (Python object)\n$ mail.to_json (JSON)\n$ mail.to_raw (raw header)\n```\n\nThe command line tool use the JSON format.\n\n## Defects\nThese defects can be used to evade the antispam filter. An example are the mails with a malformed boundary that can hide a not legitimate epilogue (often malware).\nThis library can take these epilogues.\n\n\n# Authors\n\n## Main Author\n**Fedele Mantuano**: [LinkedIn](https://www.linkedin.com/in/fmantuano/)\n\n\n# Installation\n\nClone repository\n\n```\ngit clone https://github.com/SpamScope/mail-parser.git\n```\n\nand install mail-parser with `setup.py`:\n\n```\n$ cd mail-parser\n\n$ python setup.py install\n```\n\nor use `pip`:\n\n```\n$ pip install mail-parser\n```\n\n# Usage in a project\n\nImport `mailparser` module:\n\n```\nimport mailparser\n\nmail = mailparser.parse_from_bytes(byte_mail)\nmail = mailparser.parse_from_file(f)\nmail = mailparser.parse_from_file_msg(outlook_mail)\nmail = mailparser.parse_from_file_obj(fp)\nmail = mailparser.parse_from_string(raw_mail)\n```\n\nThen you can get all parts\n\n```\nmail.attachments: list of all attachments\nmail.body\nmail.date: datetime object in UTC\nmail.defects: defect RFC not compliance\nmail.defects_categories: only defects categories\nmail.delivered_to\nmail.from_\nmail.get_server_ipaddress(trust=\"my_server_mail_trust\")\nmail.headers\nmail.mail: tokenized mail in a object\nmail.message: email.message.Message object\nmail.message_as_string: message as string\nmail.message_id\nmail.received\nmail.subject\nmail.text_plain: only text plain mail parts in a list\nmail.text_html: only text html mail parts in a list\nmail.text_not_managed: all not managed text (check the warning logs to find content subtype)\nmail.to\nmail.to_domains\nmail.timezone: returns the timezone, offset from UTC\nmail.mail_partial: returns only the mains parts of emails\n```\n\nIt's possible to write the attachments on disk with the method:\n\n```\nmail.write_attachments(base_path)\n```\n\n# Usage from command-line\n\nIf you installed mailparser with `pip` or `setup.py` you can use it with command-line.\n\nThese are all swithes:\n\n```\nusage: mailparser [-h] (-f FILE | -s STRING | -k)\n                   [-l {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}] [-j] [-b]\n                   [-a] [-r] [-t] [-dt] [-m] [-u] [-c] [-d] [-o]\n                   [-i Trust mail server string] [-p] [-z] [-v]\n\nWrapper for email Python Standard Library\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -f FILE, --file FILE  Raw email file (default: None)\n  -s STRING, --string STRING\n                        Raw email string (default: None)\n  -k, --stdin           Enable parsing from stdin (default: False)\n  -l {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}, --log-level {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}\n                        Set log level (default: WARNING)\n  -j, --json            Show the JSON of parsed mail (default: False)\n  -b, --body            Print the body of mail (default: False)\n  -a, --attachments     Print the attachments of mail (default: False)\n  -r, --headers         Print the headers of mail (default: False)\n  -t, --to              Print the to of mail (default: False)\n  -dt, --delivered-to   Print the delivered-to of mail (default: False)\n  -m, --from            Print the from of mail (default: False)\n  -u, --subject         Print the subject of mail (default: False)\n  -c, --receiveds       Print all receiveds of mail (default: False)\n  -d, --defects         Print the defects of mail (default: False)\n  -o, --outlook         Analyze Outlook msg (default: False)\n  -i Trust mail server string, --senderip Trust mail server string\n                        Extract a reliable sender IP address heuristically\n                        (default: None)\n  -p, --mail-hash       Print mail fingerprints without headers (default:\n                        False)\n  -z, --attachments-hash\n                        Print attachments with fingerprints (default: False)\n  -sa, --store-attachments\n                        Store attachments on disk (default: False)\n  -ap ATTACHMENTS_PATH, --attachments-path ATTACHMENTS_PATH\n                        Path where store attachments (default: /tmp)\n  -v, --version         show program's version number and exit\n\nIt takes as input a raw mail and generates a parsed object.\n```\n\nExample:\n\n```shell\n$ mailparser -f example_mail -j\n```\n\nThis example will show you the tokenized mail in a JSON pretty format.\n\nFrom [raw mail](https://gist.github.com/fedelemantuano/5dd702004c25a46b2bd60de21e67458e) to\n[parsed mail](https://gist.github.com/fedelemantuano/e958aa2813c898db9d2d09469db8e6f6).\n\n\n# Exceptions\n\nExceptions hierarchy of mail-parser:\n\n```\nMailParserError: Base MailParser Exception\n|\n\\── MailParserOutlookError: Raised with Outlook integration errors\n|\n\\── MailParserEnvironmentError: Raised when the environment is not correct\n|\n\\── MailParserOSError: Raised when there is an OS error\n|\n\\── MailParserReceivedParsingError: Raised when a received header cannot be parsed\n```\n\n# Development\nThe first step is to install the development environment:\n\n```\n$ python3.10 -m virtualenv venv\n$ source venv/bin/activate\n$ pip install -e \".[dev, test]\"\n```\n\nThe second step is to run the tests:\n\n```\n$ make unittest\n```\n\nThen you can try to run the command line tool:\n\n```\n$ mail-parser -f tests/mails/mail_malformed_3 -j\n```\n\nIf all is ok, you can start to develop.\n","funding_links":["https://github.com/sponsors/fedelemantuano","https://www.paypal.com/en_US/i/btn/btn_donateCC_LG.gif","https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick\u0026hosted_button_id=VEPXYP745KJF2"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspamscope%2Fmail-parser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fspamscope%2Fmail-parser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspamscope%2Fmail-parser/lists"}