https://github.com/wonyoung-jang/logseq-tokenizer

Logseq Markdown Tokenizer is a Python application that tokenizes and estimates prices for one to many markdown files.
https://github.com/wonyoung-jang/logseq-tokenizer

logseq markdown openai-api tiktoken

Last synced: 4 months ago
JSON representation

Logseq Markdown Tokenizer is a Python application that tokenizes and estimates prices for one to many markdown files.

Host: GitHub
URL: https://github.com/wonyoung-jang/logseq-tokenizer
Owner: wonyoung-jang
License: mit
Created: 2024-03-04T22:09:03.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-03-07T00:25:32.000Z (over 1 year ago)
Last Synced: 2025-01-16T19:48:34.950Z (6 months ago)
Topics: logseq, markdown, openai-api, tiktoken
Language: Python
Homepage: https://wonyoungjang.org/logseq-markdown-tokenizer/
Size: 176 KB
Stars: 3
Watchers: 1
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Logseq Markdown Tokenizer

This project provides a user-friendly graphical interface to tokenize Markdown files within a selected directory, calculate character counts, and estimate the cost of using specific OpenAI models based on the token count. It is designed specifically for Logseq's pages and journals folders, but can tokenize any folder of markdown files.

![Logseq Tokenizer](assets/logseq_tokenizer.png)

## Table of Contents

- [Languages Used](#languages-used)
- [Technologies Used](#technologies-used)
- [Installation](#installation)
- [Usage](#usage)
- [Features](#features)
- [Roadmap](#roadmap)
- [License](#license)

## Languages Used

- Python

## Technologies Used

- PySide6 for GUI
- tiktoken for tokenization

## Prerequisites

- Python
- PySide6
- tiktoken library

## Installation

1. Clone the repository:

```bash
git clone https://github.com/yourusername/logseq-tokenizer.git
```

2. Navigate to the cloned directory:

```bash
cd logseq-tokenizer
```

3. Install the required Python packages:

```bash
pip install -r requirements.txt
```

## Usage

1. Run the application:

```bash
python main.py
```

2. Click on 'Select Folder to Tokenize' to choose the directory containing Markdown files.
3. Enter the desired name for the output CSV file.
4. Click 'Start' to begin the tokenization process.
5. Check the generated CSV file for results.

## Features

- GUI for easy interaction
- Tokenization of Markdown files
- Calculation of character count
- Estimation of cost for using OpenAI models
- Output results to a CSV file

![Example output](assets/example_output.png)

## Roadmap

- Support for additional file formats
- Integration with more OpenAI models/use cases beyond text embeddings
- Enhanced data visualization in the GUI
- Pre-processing content for stopwords before encoding

## License

[MIT License](LICENSE)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/wonyoung-jang/logseq-tokenizer

Awesome Lists containing this project

README