https://github.com/marinko-peso/croatian-special-character-replacer
:card_index: Replacer for reverting Croatian special characters to their original form before saving the file in UTF-8
https://github.com/marinko-peso/croatian-special-character-replacer
characters croatia python special
Last synced: about 1 month ago
JSON representation
:card_index: Replacer for reverting Croatian special characters to their original form before saving the file in UTF-8
- Host: GitHub
- URL: https://github.com/marinko-peso/croatian-special-character-replacer
- Owner: marinko-peso
- License: mit
- Created: 2016-11-11T07:39:27.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2016-11-16T11:09:03.000Z (over 8 years ago)
- Last Synced: 2025-02-07T17:37:06.759Z (3 months ago)
- Topics: characters, croatia, python, special
- Language: Python
- Homepage:
- Size: 7.81 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Croatian Special Character Replacer
Replacer for reverting Croatian special characters to their original form before saving the file in UTF-8.Primary usage is for subtitles. In many cases subtitle files are encoded in CP1250 encoding, and certain Croatian characters are not displayed in the original form - č, ć, đ, Č, Ć.
Characters š, ž, Š, Ž seem not to be affected by this.
This script will open the file, replace all the characters in their original form, and re-save the script as UTF-8.### Usage
`python cro_char_replacer.py name_of_file_or_directory`If you provide the script with a directory it will automatically detect it and attempt to find all subtitle files inside and process them. In case it finds another directory inside, it will enter it and also find and process all subitle files. This will go as deep in the folder structure as required.
### Customization
You can customize the script to do whatever character replacement you want it to do and with whatever encoding by just modifying the settings on top of the file.
- CHARS_TO_REPLACE - characters to replace in the files.
- ALLOWED_FILE_TYPES - which file types are allowed to be processed.
- SOURCE_ENCODING - estimated encoding of the source file.
- DESTINATION_ENCODING - encoding to be applied to file after processing.