https://github.com/hughp/keyboard-character-replacement
Small code for making .keylayout files readable and convertible into corpus text filters
https://github.com/hughp/keyboard-character-replacement
Last synced: about 1 month ago
JSON representation
Small code for making .keylayout files readable and convertible into corpus text filters
- Host: GitHub
- URL: https://github.com/hughp/keyboard-character-replacement
- Owner: HughP
- Created: 2015-07-23T16:09:53.000Z (almost 10 years ago)
- Default Branch: master
- Last Pushed: 2015-07-25T03:49:07.000Z (almost 10 years ago)
- Last Synced: 2025-02-15T06:27:02.179Z (3 months ago)
- Language: Shell
- Size: 141 KB
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Keyboard-character-replacement
Small code for making .keylayout files readable and convertible into corpus text filters##Scope and reason
This code is an independent section of code used in a larger task of keyboard analysis.The goal is to use querry .keylayout files (the default keyboard layout file on OS X) for two puropeses:
1. to find out which characters a keyboard is capable of creating
2. to find out which characters in a corpus are not created by a given keyboard and to what frequency must language uses find a different text input method.##Lead-in
Assuming that we are working with .keylayout files (which are XML files):...We need to copy the file and in the coppied file, replace certain characters with new characters. The reason for this is that the XML read function on CSVfix and Starlet both choke on the encoded control characters. (These characters were not enabled in XML until something like version 1.1) Overall, Starlet does better and only chokes on U+0008 which is the character for back-space. But still there are these issues.
The solution I have imagined is that all control characters can be changed to the Unicode character which is a glyph to represent the character. However, to do this I need to be able to read the input text as strings. For example I need to change '``' to '`␁`'.
This was acomplished with a `sed` script.