https://github.com/nktknshn/tgmount-ng
https://github.com/nktknshn/tgmount-ng
pyfuse3 python telegram telegram-client telethon vfs virtual-file-system
Last synced: 5 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/nktknshn/tgmount-ng
- Owner: nktknshn
- Created: 2023-01-28T18:05:58.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2023-04-25T15:38:40.000Z (about 3 years ago)
- Last Synced: 2025-09-11T16:02:45.281Z (9 months ago)
- Topics: pyfuse3, python, telegram, telegram-client, telethon, vfs, virtual-file-system
- Language: Python
- Homepage:
- Size: 516 KB
- Stars: 53
- Watchers: 5
- Forks: 5
- Open Issues: 11
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Overview
Creates virtual file system with files posted on telegram.
**VERY ALPHA SO FAR**
Table of Contents
=================
* [Installation](#installation)
* [Requirements](#requirements)
* [Basic usage](#requirements)
* [Mounting multiple entities](#mounting-multiple-entities)
* [Sample config](#sample-config)
* [Client commands](#client-commands)
* [mount](#tgmount-mount)
* [mount config](#tgmount-mount-config)
* [list dialogs](#tgmount-list-dialogs)
* [list documents](#tgmount-list-documents)
* [download](#tgmount-download)
* [Config file structure](#config-file-structure)
* [Playing flac and mp3 from a zip archive](#playing-flac-and-mp3-from-a-zip-archive)
* [Known bugs](#known-bugs)
## Requirements
- Linux
- Python 3.10 (not sure about 3.9 and less)
## Installation:
```
pip install tgmount
```
## Basic usage
To obtain your API id follow [official manual](https://core.telegram.org/api/obtaining_api_id). Running the program for the first time will require authentication.
```
$ export TGAPP=1234567:deadbeef0d04a3efe93e1af778773d6f0 TGSESSION=tgfs
```
To mount a channel/chat/group
```
tgmount mount tgmounttestingchannel ~/mnt/tgmount1/
```
To mount an entity that doesn't have a username you will need to get its id.
```bash
tgmount list dialogs | grep 'my friends private chat'
```
To mount zip files as directories use `UnpackedZip` producer
```
tgmount mount tgmounttestingchannel ~/mnt/tgmount1/ --producer UnpackedZip
```
Use config file to create a more complex vfs structure
```
tgmount mount tgmounttestingchannel ~/mnt/tgmount1/ --root-config examples/root_config.yaml
```
## Mounting multiple entities
To mount multiple entities use `mount-config` command
```
tgmount mount-config examples/config.yaml
```
### Sample config
```yaml
# can be overwritten by --mount-dir argument
mount_dir: /home/horn/mnt/tgmount1
client:
session: tgfs
api_id: 123
api_hash: deadbeed121212121
#
message_sources:
ru2chmu:
entity: ru2chmu
updates: False
limit: 1000
friends:
entity: -388004022
limit: 1000
caches:
memory1:
type: memory
capacity: 300MB
block_size: 128KB
root:
muzach:
# A document will not be mounted more than once when it appears in a
# different messages. `recursive` means this filter will also be applied
# down the folders tree
filter: { filter: OnlyUniqueDocs, recursive: True }
# Messages from `ru2chmu` will be used to produce content in the nested folders
source: { source: ru2chmu, recursive: True }
# creates subfolder named `music`
music:
filter:
# the directory will contain music and zip archives
Union: [MessageWithMusic, MessageWithZip]
# zip archives will be mounted as folders
producer: UnpackedZip
# using cache speeds up reading from the archives
cache: memory1
texts:
# messages with text
filter: MessageWithText
# this commands tgmount to treat messages with both document and text
# as text files
treat_as: MessageWithText
friends:
source: {source: friends, recursive: True}
music-by-senders:
producer:
# this producer creates a separate directory for every sender in the entity
BySender:
dir_structure:
# these directories will only contain music
filter: MessageWithMusic
liked-music:
# this directory will be containing all music with thumb up reaction
filter:
And:
- MessageWithMusic
- ByReaction:
reaction: 👍
images:
filter:
Union: [MessageWithCompressedPhoto, MessageWithDocumentImage]
```
More about config structure read in [Config file structure](#config-file-structure)
## Client commands
### tgmount mount
```
tgmount mount [--filter FILTER] [--root-config ROOT_CONFIG]
[--producer PRODUCER] [--offset-date OFFSET_DATE] [--offset-id OFFSET_ID]
[--max-id MAX_ID] [--min-id MIN_ID] [--wait_time WAIT_TIME] [--limit LIMIT]
[--reply-to REPLY_TO] [--from-user FROM_USER] [--reverse] [--mount-texts] [--no-updates]
[--debug-fuse] [--min-tasks MIN_TASKS] entity mount-dir
```
Define the structure of the mounted folder by one of these options
```
--producer PRODUCER
--root-config ROOT_CONFIG
```
Available producers:
```python
PlainDir # just a list of files (default)
UnpackedZip # PlainDir but zips are mounted as folders
BySender # files grouped in folders by sender
ByForward # forwarded files grouped by source entity
ByPerformer # music grouped by performers
ByReactions # files grouped by reaction
```
The following arguments work as described in [TelegramClient.get_messages](https://docs.telethon.dev/en/stable/modules/client.html#telethon.client.messages.MessageMethods.get_messages).
```
--filter [FILTER]
--offset-date OFFSET_DATE
--offset-id OFFSET_ID
--max-id MAX_ID
--min-id MIN_ID
--wait_time WAIT_TIME
--limit LIMIT
--reply-to REPLY_TO
--from-user FROM_USER
--reverse
```
Available [telegram filters](https://core.telegram.org/type/MessagesFilter):
```python
InputMessagesFilterDocument
InputMessagesFilterPhotos
InputMessagesFilterVideo
InputMessagesFilterPhotoVideo
InputMessagesFilterUrl
InputMessagesFilterGif
InputMessagesFilterVoice
InputMessagesFilterMusic
InputMessagesFilterRoundVoice
InputMessagesFilterRoundVideo
InputMessagesFilterMyMentions
```
Using these filter speeds up fetching process but these filter cannot be composed.
If you don't need updates
```
--no-updates
```
If you want to also to mount text messages as text files
```
--mount-texts
```
Other arguments
```
--debug-fuse
--min-tasks MIN_TASKS
```
### tgmount mount-config
```
tgmount mount-config [--mount-dir MOUNT_DIR] CONFIG_FILE MOUNT_DIR
```
### tgmount list dialogs
```
tgmount list dialogs
```
### tgmount list documents
```
tgmount list documents [--filter FILTER] [--offset-date OFFSET_DATE] [--offset-id OFFSET_ID]
[--max-id MAX_ID] [--min-id MIN_ID] [--wait_time WAIT_TIME] [--limit LIMIT]
[--reply-to REPLY_TO] [--from-user FROM_USER] [--reverse] [--json]
[--print-message] [--include-unsupported] [--only-unsupported] [--all-types]
[--only-unique-docs] entity
```
```--print-message```
Include stringified message object in the output
`--all-types`
Print all classes a message matches
`--only-unique-docs`
Exclude repeating documents
`--include-unsupported`
Include messages that are not supported for mounting
`--only-unsupported`
Print only them
`--json`
Print in json format
### tgmount download
```
tgmount download [--output-dir OUTPUT_DIR] [--keep-filename] [--request-size REQUEST_SIZE] entity ids [ids ...]
```
`--keep-filename`
Keep original filenames
`--output-dir`
Destination folder for files
`--request_size`
How much data to fetch per request
`entity`
Entity to download from
`ids`
Messages ids
Example:
```
tgmount download -O /tmp -R 256KB tgmounttestingchannel 532 11 51 18
```
Im combination with `list documents`
```bash
tgmount download ru_python $(tgmount list documents ru_python --filter InputMessagesFilterDocument --limit 10 --json | jq '.[]|.id') -O /tmp
```
## Config file structure
Config file has the following sections:
- `client`
- `message_sources`
- `caches`
- `root`
`caches` section is optional.
### Top level properties
```yaml
# optional. can be overwritten by --mount-dir argument
mount_dir: ~/mnt/tgmount
```
### client
Contains settings for the telegram client
```yaml
client:
# telethon session name
session: session_name
# telegram api credentials
api_id: int
api_hash: str
# optional field
request_size: 128KB
# optional field. Default: False
use_ipv6: True
```
### message_sources
A message source defines a list of messages that will be used in vfs tree construction. Every message source is a separate [TelegramClient.get_messages](https://docs.telethon.dev/en/stable/modules/client.html#telethon.client.messages.MessageMethods.get_messages) request. Message source is also subscribed to events of posting, removing and editing messages in the entity it is sourced from.
```yaml
message_sources:
# key defines id of the message source to reference in the `root` section
source1:
# channel/group/chat id to fetch messages from
# string or int
entity: tgmounttestingchannel
# all the following fields are optional
# whether to listen for updates. Default: true
updates: True
# Filter for message types. If not set all the messages types including text
# messages will be fetched
filter: MessageWithMusic
# limits the number of messages
limit: 1000
# format is `31/12/2023` or '31/12/2023 13:00'
offset_date: `31/12/2023`
offset_id: 0
min_id: 0
max_id: 0
wait_time: None
reply_to: int
from_user: str | int
reverse: False
```
### caches
Defines cache storages for documents. Cached parts of a document will not be fetched twice. Usually this is not needed because OS file system does caching by itself. Cache is needed in couple with `UnpackedZip` producer since the OS file system cache is not applied in case of using this producer.
```yaml
caches:
# the key defines cache id to be referenced in `root` section
cache1:
# currently only memory cache is supported
type: memory
# The size of the cache
capacity: 300MB
# optional block size, default: 128KB
block_size: 256KB
```
### root
This section defines the structure of the mounted folder.
```yaml
root:
# optional. sets the message source for the current directory. If this is not
# set and there is no recursive filter has been defined before, the folder
# will not contain any files
source: source1
source: {source: source1}
# sets the message source for the current and for nested folders
source: {source: source1, recursive: True}
# optional. sets a filter for the current folder. Default is no filter
filter: MessageWithMusic
filter: {filter: MessageWithMusic}
# sets a filter for the current folder and subfolders
filter: {filter: MessageWithMusic, recursive: True}
# sets a filter for the current folder and subfolders overwriting another
# recursive filter if any
filter: {filter: MessageWithMusic, overwright: True, recursive: True}
# the following combines multiple filters. Only messages that match every filter
# in the list will pass. The filter below allows all documents that
# that are not video, photo or audio and not a zip file
filter:
- MessageWithOtherDocument
- Not:
- ByExtension: .zip
# in one line
filter: {filter: [MessageWithOtherDocument, Not: {ByExtension: .zip}], overwright: True, recursive: True}
# defines a producer that controls the content of the folder.
# Default is PlainDir
producer: BySender
# producer may have properties
producer:
BySender:
dir_structure:
music:
filter: MessageWithMusic
voices:
filter: MessageWithVoice
use_get_sender: true
# sets a cache for the current folder
# referencing a cache defined in `caches` folder
cache: memory1
# dynamically creates a cache to use in this folder
cache:
type: memory
capacity: 300MB
# optional. wrapper that modifies the resulting content of the folder
wrapper: ExcludeEmptyDirs
# optional. Defines the priority of how to classify a message if multiple classes
# match its type. E.g. a message with both a document and a text message
treat_as: MessageWithText
# to define subfolders
documents:
# 'documents' folder will only contain the two following subfolders
docs_from_source1:
source: source1
filter: MessageWithDocument
docs_from_source2:
source: source2
filter: MessageWithDocument
```
#### source
Message source is a list of messages which is used to produce a directory content. Message source is initialized from get_messages() request and is updated by events of posting message, removing message and editing message in the corresponding entity.
Producer is subscribed to a message source and takes care of the directory it is responsible for. It manages it by adding and removing files and subfolders.
The content of a folder is defined by a combination of properties `source`, `filter`, `producer` and `treat_as`.
This will create a tree of empty folders
```yaml
root:
everything:
photos:
texts:
round-and-voice:
rounds:
voices:
```
The config will result into
```
/everything
/photos
/texts
/round-and-voice
/round-and-voice/rounds
/round-and-voice/voices
```
To fill the directories with files we need to specify a source for every folder that is supposed to contain files
```yaml
root:
everything:
source: source1
photos:
source: source1
texts:
source: source1
round-and-voice:
rounds:
source: source1
voices:
source: source1
```
In result every directory that has `source` property will contain all the files from the specified source.
Let's add filters
```yaml
root:
everything:
# don't need filter here
source: source1
photos:
source: source1
filter: MessageWithCompressedPhoto
texts:
source: source1
filter: MessageWithText
treats_as: MessageWithText
round-and-voice:
rounds:
source: source1
filter: MessageWithKruzhochek
voices:
source: source1
filter: MessageWithVoice
```
As soon as the only source used in the structure is "source1" we can get rid of repeating it by using `recursive` property of `source`.
```yaml
root:
source: {source: source1, recursive: True}
everything:
filter: All
photos:
filter: MessageWithCompressedPhoto
texts:
filter: MessageWithText
treats_as: MessageWithText
round-and-voice:
rounds:
filter: MessageWithKruzhochek
voices:
filter: MessageWithVoice
```
Note that
1. The root itself will not contain any files because source with `recursive` flag doesn't trigger file producing
2. We had to specify `filter` in "everything" to trigger file producer. For the same effect we could have specified a producer instead.
```yaml
everything:
# triggers producing from the recursive source
producer: PlainDir
```
The complete rules:
A folder will be produced with content from a message source in cases when:
1. source is specified and it's not recursive
2. recursive source is in the context and `filter` property specified and it's not recursive
3. recursive source is in the context and `producer` prop is specified
#### filter
By message type:
```python
MessageWithDocument # Message with a document attached (message with compressed
# image doesn't match)
MessageWithCompressedPhoto # with a compressed image (photo)
MessageDownloadable # `MessageWithDocument` or `MessageWithCompressedPhoto`
MessageWithAnimated # stickers, gifs
MessageWithAudio # voices and music
MessageWithVoice # voice
MessageWithKruzhochek # round video
MessageWithDocumentImage # uncompressed image
MessageWithFilename # document with a filename attribute
MessageWithMusic # music
MessageWithVideo # round video, video documents, stickers, gifs
MessageWithVideoFile # video documents
MessageWithSticker # sticker
MessageWithOtherDocument # Any document that doesn't fall in the previous categories
MessageWithZip # zip file
MessageWithText # message with text message
MessageWithoutDocument # message with no document and no photo
MessageWithReactions # message with reactions
MessageForwarded # forwarded message
# Telegram filters
InputMessagesFilterPhotos # MessageWithCompressedPhoto
InputMessagesFilterVideo # MessageWithVideo
InputMessagesFilterPhotoVideo # MessageWithCompressedPhoto | MessageWithVideo
InputMessagesFilterDocument # MessageWithOtherDocument | MessageWithDocumentImage
InputMessagesFilterGif # MessageWithAnimated
InputMessagesFilterVoice # MessageWithVoice
InputMessagesFilterMusic # MessageWithMusic
InputMessagesFilterRoundVideo # MessageWithKruzhochek
InputMessagesFilterRoundVoice # MessageWithKruzhochek | MessageWithVoice
```
Other filters
```yaml
# Filter wrapper to reverse a filter.
Not: MessageWithReactions
# Combines multiple filters. If any matches
Union:
- MessageWithDocumentImage
- MessageWithCompressedPhoto
# Combines multiple filters. If every matches
And:
- MessageForwarded
- MessageWithVideo
# same as
filter: [MessageForwarded, MessageWithVideo]
# takes first `count` messages
First:
count: 10
# takes last `count` messages
Last:
count: 10
# Filter by a filename extension
ByExtension: .zip
# will only leave unique docs
OnlyUniqueDocs:
# optional. Control which document, first appeared or last appeared, will stay.
# default: first
picker: last
picker: first
# passthrough filter. Used to trigger tgmount to produce content in the folder
# or to reset recursive filter
All
# sequentially filters messages. E.g. last 10 unique documents
Seq:
- MessageWithDocument
- OnlyUniqueDocs
- Last: 10
# matches reactions
ByReaction:
reaction: 👍
# optional. default: 1
minimum: 5
```
#### producer
```python
PlainDir
BySender
ByForward
ByPerformer
ByReactions
SysInfo
UnpackedZip
```
## Playing flac and mp3 from a zip archive
1. Seeking in files which are stored in a zip archive only works by reading the
offset bytes.
2. id3v1 tags are stored in the end of a media file :)
https://github.com/quodlibet/mutagen/blob/master/mutagen/id3/_id3v1.py#L34
And most of the players try to read it. So just adding a mp3 or flac
to a player will fetch the whole file from the telegram cloud.
In current moment this is solved by custom read function for mp3 and flac files
in archives. The `read` call returns 4096 zero bytes when
1. less than `max_total_read = 128KB` bytes has been read from the file so far
2. `file_size - offset < distance_to_file_end = 16KB`
3. `size == 4096` (usually players read this amount looking for id3v1 (requires
further investigation to find a less hacky way))
See `FileContentZipFixingId3v1` class
To disable this behavior use `--no-fix-id3v1` argument with `mount` command.
In case of mounting a config set `fix_id3v1` property of `UnpackedZip` to False:
```yaml
producer: {UnpackedZip: {fix_id3v1: False}}
```
## Known bugs
- No updates received during reconnection
- Combination of `--filter`, `--offset-date` and `--reverse` always returns empty result