https://github.com/simpleitk/commentspellcheck

This package spell checks the comments in source code.
https://github.com/simpleitk/commentspellcheck

Last synced: about 1 year ago
JSON representation

This package spell checks the comments in source code.

Host: GitHub
URL: https://github.com/simpleitk/commentspellcheck
Owner: SimpleITK
License: apache-2.0
Created: 2020-12-03T20:11:59.000Z (over 5 years ago)
Default Branch: main
Last Pushed: 2025-04-09T19:27:40.000Z (about 1 year ago)
Last Synced: 2025-04-12T13:08:26.400Z (about 1 year ago)
Language: Python
Homepage:
Size: 171 KB
Stars: 5
Watchers: 5
Forks: 4
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# CommentSpellCheck

![python testing](https://github.com/SimpleITK/CommentSpellCheck/actions/workflows/python-app.yml/badge.svg)

The CommentSpellCheck (CSC) package provides a script that automatically
spell checks the comments of a code base. It was originally developed to
be run on the SimpleITK and ITK code bases.

Here is how it is typically run:

python comment_spell_check.py --exclude Ancillary $SIMPLEITK_SOURCE_DIR/Code

This command will recursively find all the '.h' files in a directory,
extract the C/C++ comments from the code, and run a spell checker on them.
The **'--exclude'** flag tells the script to ignore any file that has
'Ancillary' in its full path name. This flag will accept any
regular expression.

In addition to pyenchant's English dictionary, we use the words in
**additional_dictionary.txt**. These words are proper names and
technical terms harvest by hand from the SimpleITK and ITK code bases.

If a word is not found in the dictionaries, we try two additional checks.

1. If the word starts with some known prefix, the prefix is removed
...and the remaining word is checked against the dictionary. The prefixes
...used by default are **'sitk'**, **'itk'**, and **'vtk'**. Additional
...prefixes can be specified with the **'--prefix'** command line argument.

2. We attempt to split the word by capitalization and check each
...sub-word against the dictionary. This method is an attempt to detect
...camel-case words such as 'GetArrayFromImage', which would get split into
...'Get', 'Array', 'From', and 'Image'. Camel-case words are very commonly
...used for code elements.

The script can also process other file types. With the **'--suffix'**
option, the following file types are available: Python (.py), C/C++
(.c/.cxx), Text (.txt), reStructuredText(.rst), Markdown (.md), Ruby (.ruby),
and Java (.java). Note that reStructuredText files are treated as standard
text. Consequentially, all markup keywords that are not actual words will
need to be added to the additional/exception dictionary.

## Dictionary notes

By default, on Linux and Mac systems, pyenchant uses [GNU aspell](http://aspell.net/)
as the underlying dictionary. The spell checking is case sensitive. While
aspell allows arbitrary characters in a dictionary word, CSC may split up
a word by non-alphanumeric characters. This split can occur if the word
itself is not found in the dictionary.

If a dictionary word has non-alphanumeric characters, CSC prints a warning.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/simpleitk/commentspellcheck

Awesome Lists containing this project

README