Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/npryce/code-words
Extract individual (natural-language) words from source code
https://github.com/npryce/code-words
Last synced: 4 days ago
JSON representation
Extract individual (natural-language) words from source code
- Host: GitHub
- URL: https://github.com/npryce/code-words
- Owner: npryce
- License: gpl-2.0
- Created: 2013-01-17T22:34:22.000Z (almost 12 years ago)
- Default Branch: master
- Last Pushed: 2014-10-19T21:06:46.000Z (about 10 years ago)
- Last Synced: 2024-10-31T22:42:24.312Z (11 days ago)
- Language: Shell
- Size: 5.89 MB
- Stars: 63
- Watchers: 5
- Forks: 17
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: COPYING
Awesome Lists containing this project
README
Code Words
==========Get a handle on unfamiliar code by extracting and visualising the natural language programmers used when writing it.
![Board Game Example](https://raw.github.com/npryce/code-words/master/examples/multiplayer-board-game.png)
An example generated from a multiplayer boardgame written in Java.
Usage
------code * | code-to-words -k ... -s ... | wordcloud -o .png
E.g.
java-code project/src/ | code-to-words -k java-keywords -s cargo-cult-java-stop-words | wordcloud -o project.png
The stop-keyword files and stop-word files must have a single word per
line.The words in keyword-files are filtered out after identifiers
have been extracted from the language but before any further processing.The words in stop-word-files are filtered out after the identifiers
have been split into separate words at underscores or camel-case
boundaries and normalised to lowercase.The wordcloud command has the following options:
* -o _output-file_: output file name (image type is determined from the extension)
* -s _width_x_height_ : width of the output imageLanguages supported
-------------------* C: `c-code`
* `c-keywords`: most C keywords
* `c-primitive-type-keywords`: ignores basic C types (int, char, etc.)
* C++: `c++-code`
* `c++-keywords`: most C++ keywords
* `c-primitive-type-keywords`: ignores basic C types (int, char, etc.)
* C#: `csharp-code`
* `csharp-keywords`: most C# keywords
* `c-primitive-type-keywords`: ignores basic C types (int, char, etc.)
* Haskell: `haskell-code`
* `haskell-keywords`
* HTML: `html-text`
* no stop words file provided. Stop words files for various natural languages can be found on the web.
* Java: `java-code`.
* `java-keywords`: most keywords
* `java-primitive-type-keywords`: ignores primitive types
* `cargo-cult-java-stop-words`: ignores get, set, bean etc. Use with the -s flag.
* JavaScript: `javascript-code`.
* `javascript-keywords`: ignores keywords and reserved words (from ECMA-262 Edition 3)
* `java-primitive-type-keywords`: ignores primitive types
* `nodejs-globals-keywords`: ignores node.js globals
* Python: `python-code`
* `python-keywords`: most keywords
* Ruby: `ruby-code`
* `ruby-keywords`
* Scala: `scala-code`
* `scala-keywords`
* PHP: `php-code`
* `php-keywords`: shows some keywords that may be the result of poor programming practice.
* `php-strict-keywords`: ignores all keywords
* Smalltalk: `smalltalk-code`
* `smalltalk-keywords`: ignores keywordsExamples
--------Example visualisations of various applications are in the examples/ directory.
Dependencies
------------To extract text from source code:
* Bash
* Gnu Sed
* Grep
* AwkTo extract text from HTML:
* w3m
To visualise the results
* Java 1.6It should work on any desktop Linux. It does not yet work on MacOS unless you install the Gnu command-line tools. If you install Gnu sed as `gsed` the script will use it.
To compile the Java wordcloud generator:
* JDK 1.6
* Gnu Make