https://github.com/marekyggdrasil/polishify
Automatically convert documents in Polish from older encodings into UTF-8
https://github.com/marekyggdrasil/polishify
conversion encoding encoding-convertors polish polish-language
Last synced: 2 months ago
JSON representation
Automatically convert documents in Polish from older encodings into UTF-8
- Host: GitHub
- URL: https://github.com/marekyggdrasil/polishify
- Owner: marekyggdrasil
- License: gpl-3.0
- Created: 2022-08-07T13:46:53.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2022-08-07T15:42:29.000Z (almost 3 years ago)
- Last Synced: 2025-01-14T17:38:13.600Z (4 months ago)
- Topics: conversion, encoding, encoding-convertors, polish, polish-language
- Language: Python
- Homepage:
- Size: 21.5 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# polishify
## Setup
Simply
```sh
pip install polishify
```## Usage
If you have some text that is in Polish but characters look weird it might not be encoded with `windows-1250` or `iso-8859-2` encoding. If your file is `sometext.txt` you may
```sh
polishify sometext.txt
```and it will show you something like
```
detected encoding is: windows-1250
```If you wish to get this file converted to `utf-8` just do
```sh
polishify sometext.txt properly-encoded.txt
```If you do it in bash script you might not want to see any outputs, the script supports silent mode as follows
```sh
polishify sometext.txt properly-encoded.txt --silent
```This package contains words with polish letters, you might want to use your own dataset `dataset.json` file.
```sh
polishify sometext.txt properly-encoded.txt --silent --dataset dataset.json
```We also provide a tool that generates it from a text
```sh
polishify-extract sometext.txt dataset.json --encoding windows-1250
```## Author
Made by [Marek Narożniak](https://mareknarozniak.com/), for the world and especially people who have people in the family who needs subtitles in Polish and want to bulk convert their encodings. No warranty provided. Licensed under GPL-3.