Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/glvnst/txtscan
command-line program that reports on general statistics and interesting characters inside text files
https://github.com/glvnst/txtscan
Last synced: 2 months ago
JSON representation
command-line program that reports on general statistics and interesting characters inside text files
- Host: GitHub
- URL: https://github.com/glvnst/txtscan
- Owner: glvnst
- License: other
- Created: 2014-01-27T22:40:50.000Z (almost 11 years ago)
- Default Branch: master
- Last Pushed: 2014-01-27T22:42:02.000Z (almost 11 years ago)
- Last Synced: 2024-08-08T00:43:13.642Z (6 months ago)
- Language: Python
- Size: 105 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
- project-awesome - glvnst/txtscan - command-line program that reports on general statistics and interesting characters inside text files (Python)
README
# txtscan
A small command-line program that reports on general statistics and interesting characters inside text file(s).
## Examples
% txtscan 1.eml /etc/moduli
1.eml
Bytes: 97,910
Lines: 2,343
Interesting Characters:
'\t' (chr 9): 1 time
Line Endings:
\r\n (Windows): 2,343 times
/etc/moduli
Bytes: 242,153
Lines: 262
No Interesting Characters.
Line Endings:
\n (UNIX / OS X): 262 timesStandard input scanning works too:
% date | txtscan -
Bytes: 29
Lines: 1
No Interesting Characters.
Line Endings:
\n (UNIX / OS X): 1 timeVerbose mode reports all the positions of the "interesting characters":
% echo -e '\nwhat?\tin blazes\nis happening\r\nin\bthis\ttext' | txtscan -v -
L1 C5: '\t' (chr 9)
L3 C2: '\x08' (chr 8)
L3 C7: '\t' (chr 9)
Bytes: 44
Lines: 4
Interesting Characters:
'\x08' (chr 8): 1 time
'\t' (chr 9): 2 times
Line Endings:
\n (UNIX / OS X): 3 times
\r\n (Windows): 1 time## What's "Interesting"?
txtscan only reports on **things that *aren't* in this list**:
- A-Z, a-z, 0-9
- space, newline (\n), carriage return (\r)
- comma, period, colon, semicolon, exclamation mark, question mark (',' and '.' and ':' and ';' and '!' and '?')
- greater than, less than ('<' and '>')
- forward slash, backslash ('/' and '\')
- these symbols: '@', '#', '$', '%', '^', '&', '*'
- the dash, underscore, equals, and plus symbols ('-', '_', '=', '+')
- open and close brackets and braces and parens: '[' and ']' and '{' and '}' and '(' and ')'
- single quotes and double quotes (' and ")
- pipe/vertical bar ('|')** Note that both tab and the euro symbol are considered "interesting".**
## Help Message
usage: txtscan [-h] [-v] input [input ...]
Print information about the characters in the input text
positional arguments:
input The file(s) to scan. Use '-' for standard input.
optional arguments:
-h, --help show this help message and exit
-v, --verbose Verbosely report on each interesting character on each line## License
The license is BSD-like. See LICENSE.txt for details.
## To Do
- Although this is primarily aimed at scanning ASCII files, it would be nice (and probably not too hard) to add Unicode support, **particularly support for finding unicode errors**
- Add options to exclude certain characters from reporting or to scan only for certain characters