Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/glvnst/txtscan

command-line program that reports on general statistics and interesting characters inside text files
https://github.com/glvnst/txtscan

Last synced: 2 months ago
JSON representation

command-line program that reports on general statistics and interesting characters inside text files

Awesome Lists containing this project

README

        

# txtscan

A small command-line program that reports on general statistics and interesting characters inside text file(s).

## Examples

% txtscan 1.eml /etc/moduli
1.eml
Bytes: 97,910
Lines: 2,343
Interesting Characters:
'\t' (chr 9): 1 time
Line Endings:
\r\n (Windows): 2,343 times
/etc/moduli
Bytes: 242,153
Lines: 262
No Interesting Characters.
Line Endings:
\n (UNIX / OS X): 262 times

Standard input scanning works too:

% date | txtscan -

Bytes: 29
Lines: 1
No Interesting Characters.
Line Endings:
\n (UNIX / OS X): 1 time

Verbose mode reports all the positions of the "interesting characters":

% echo -e '\nwhat?\tin blazes\nis happening\r\nin\bthis\ttext' | txtscan -v -

L1 C5: '\t' (chr 9)
L3 C2: '\x08' (chr 8)
L3 C7: '\t' (chr 9)
Bytes: 44
Lines: 4
Interesting Characters:
'\x08' (chr 8): 1 time
'\t' (chr 9): 2 times
Line Endings:
\n (UNIX / OS X): 3 times
\r\n (Windows): 1 time

## What's "Interesting"?

txtscan only reports on **things that *aren't* in this list**:

- A-Z, a-z, 0-9
- space, newline (\n), carriage return (\r)
- comma, period, colon, semicolon, exclamation mark, question mark (',' and '.' and ':' and ';' and '!' and '?')
- greater than, less than ('<' and '>')
- forward slash, backslash ('/' and '\')
- these symbols: '@', '#', '$', '%', '^', '&', '*'
- the dash, underscore, equals, and plus symbols ('-', '_', '=', '+')
- open and close brackets and braces and parens: '[' and ']' and '{' and '}' and '(' and ')'
- single quotes and double quotes (' and ")
- pipe/vertical bar ('|')

** Note that both tab and the euro symbol are considered "interesting".**

## Help Message

usage: txtscan [-h] [-v] input [input ...]

Print information about the characters in the input text

positional arguments:
input The file(s) to scan. Use '-' for standard input.

optional arguments:
-h, --help show this help message and exit
-v, --verbose Verbosely report on each interesting character on each line

## License

The license is BSD-like. See LICENSE.txt for details.

## To Do

- Although this is primarily aimed at scanning ASCII files, it would be nice (and probably not too hard) to add Unicode support, **particularly support for finding unicode errors**
- Add options to exclude certain characters from reporting or to scan only for certain characters