https://github.com/scrapy/xtractmime

https://mimesniff.spec.whatwg.org/ implementation for Python
https://github.com/scrapy/xtractmime

Last synced: 4 months ago
JSON representation

https://mimesniff.spec.whatwg.org/ implementation for Python

Host: GitHub
URL: https://github.com/scrapy/xtractmime
Owner: scrapy
License: bsd-3-clause
Created: 2021-05-31T11:15:55.000Z (over 4 years ago)
Default Branch: main
Last Pushed: 2024-01-16T15:47:22.000Z (almost 2 years ago)
Last Synced: 2025-09-05T07:23:38.334Z (4 months ago)
Language: Python
Size: 8.18 MB
Stars: 13
Watchers: 8
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE

Awesome Lists containing this project

README

          # xtractmime

`xtractmime` is a [BSD-licensed](https://opensource.org/licenses/BSD-3-Clause)

Python 3.7+ implementation of the [MIME Sniffing

Standard](https://mimesniff.spec.whatwg.org/).

Install from [`PyPI`](https://pypi.python.org/pypi/xtractmime):

```

pip install xtractmime

```

---

## Basic usage

Below mentioned are some simple examples of using `xtractmime.extract_mime`:

```python

>>> from xtractmime import extract_mime

>>> extract_mime(b'Sample text content')

b'text/plain'

>>> extract_mime(b'', content_types=(b'text/html',))

b'text/html'

```

Additional functionality to check if a MIME type belongs to a specific MIME type group using 

methods included in `xtractmime.mimegroups`:

```python

>>> from xtractmime.mimegroups import is_html_mime_type, is_image_mime_type

>>> mime_type = b'text/html'

>>> is_html_mime_type(mime_type)

True

>>> is_image_mime_type(mime_type)

False

```

---

## API Reference

### function `xtractmime.extract_mime(*args, **kwargs) -> Optional[bytes]`

**Parameters:**

* `body: bytes`

* `content_types: Optional[Tuple[bytes]] = None`

* `http_origin: bool = True`

* `no_sniff: bool = False`

* `extra_types: Optional[Tuple[Tuple[bytes, bytes, Optional[Set[bytes]], bytes], ...]] = None`

* `supported_types: Set[bytes] = None`

Return the [MIME type essence](https://mimesniff.spec.whatwg.org/#mime-type-essence) (e.g. `text/html`) matching the input data, or 

`None` if no match can be found.

The `body` parameter is the byte sequence of which MIME type is to be determined. `xtractmime` only considers the first few

bytes of the `body` and the specific number of bytes read is defined in the `xtractmime.RESOURCE_HEADER_BUFFER_LENGTH` constant.

`content_types` is a tuple of MIME types given in the resource metadata. For example, for resources retrieved via HTTP, users should pass the list of MIME types mentioned in the `Content-Type` header.

`http_origin` indicates if the resource has been retrieved via HTTP (`True`, default) or not (`False`).

`no_sniff` is a flag which is *`True`* if the user agent does not wish to

perform sniffing on the resource and *`False`* (by default) otherwise. Users may want to set

this parameter to *`True`* if the [`X-Content-Type-Options`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/X-Content-Type-Options) response header is set to `nosniff`. For more info, see [here](https://mimesniff.spec.whatwg.org/#no-sniff-flag).

`extra_types` is a tuple of patterns to support detecting additional MIME types. Each entry in the tuple should follow the format

**(Byte Pattern, Pattern Mask, Leading Bytes, MIME type)**:

* **Byte Pattern** is a byte sequence to compare with the first few bytes (``xtractmime.RESOURCE_HEADER_BUFFER_LENGTH``) of the `body`.

* **Pattern Mask** is a byte sequence that indicates the significance of **Byte Pattern** bytes: `b"\xff"` indicates the matching byte is strictly significant, `b"\xdf"` indicates that the byte is significant in an ASCII case-insensitive way, and `b"\x00"` indicates that the byte is not significant.

* **Leading Bytes** is a set of bytes to be ignored while matching the leading bytes in the content.

* **MIME type** should be returned if the pattern matches.

**Sample `extra_types`:**

```python

extra_types = ((b'test', b'\xff\xff\xff\xff', None, b'text/test'), ...)

```

---

**NOTE**

*Be careful while using the `extra_types` argument, as it may introduce some privilege escalation vulnerabilities for `xtractmime`. For more info, see [here](https://mimesniff.spec.whatwg.org/#ref-for-mime-type%E2%91%A1%E2%91%A8).*

---

Optional `supported_types` is a set of all [MIME types supported the by user agent](https://mimesniff.spec.whatwg.org/#supported-by-the-user-agent). If `supported_types` is not

specified, all MIME types are assumed to be supported. Using this parameter can improve the performance of `xtractmime`.

### function `xtractmime.is_binary_data(input_bytes: bytes) -> bool`

Return *`True`* if the provided byte sequence contains any binary data bytes, else *`False`*

 

### MIME type group functions

The following functions return `True` if a given MIME type belongs to a certain 

[MIME type group](https://mimesniff.spec.whatwg.org/#mime-type-groups), or 

`False` otherwise:

```

xtractmime.mimegroups.is_archive_mime_type(mime_type: bytes) -> bool

xtractmime.mimegroups.is_audio_video_mime_type(mime_type: bytes) -> bool

xtractmime.mimegroups.is_font_mime_type(mime_type: bytes) -> bool

xtractmime.mimegroups.is_html_mime_type(mime_type: bytes) -> bool

xtractmime.mimegroups.is_image_mime_type(mime_type: bytes) -> bool

xtractmime.mimegroups.is_javascript_mime_type(mime_type: bytes) -> bool

xtractmime.mimegroups.is_json_mime_type(mime_type: bytes) -> bool

xtractmime.mimegroups.is_scriptable_mime_type(mime_type: bytes) -> bool

xtractmime.mimegroups.is_xml_mime_type(mime_type: bytes) -> bool

xtractmime.mimegroups.is_zip_mime_type(mime_type: bytes) -> bool

```

**Example**

```python

>>> from xtractmime.mimegroups import is_html_mime_type, is_image_mime_type, is_zip_mime_type

>>> mime_type = b'text/html'

>>> is_html_mime_type(mime_type)

True

>>> is_image_mime_type(mime_type)

False

>>> is_zip_mime_type(mime_type)

False

```

## Changelog

See the [changelog](CHANGELOG.md)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/scrapy/xtractmime

Awesome Lists containing this project

README