https://github.com/coderefinery/word-count
Example project we use in the reproducibility lesson.
https://github.com/coderefinery/word-count
Last synced: 5 months ago
JSON representation
Example project we use in the reproducibility lesson.
- Host: GitHub
- URL: https://github.com/coderefinery/word-count
- Owner: coderefinery
- License: mit
- Created: 2019-06-11T21:40:48.000Z (about 7 years ago)
- Default Branch: main
- Last Pushed: 2024-09-19T06:08:25.000Z (over 1 year ago)
- Last Synced: 2025-09-10T04:46:43.150Z (9 months ago)
- Language: Python
- Homepage:
- Size: 850 KB
- Stars: 8
- Watchers: 3
- Forks: 53
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[](https://mybinder.org/v2/gh/coderefinery/word-count/HEAD)
# Word count example
This example project will count words in a given text and plot a bar chart of the 10
most common words.

## Dependencies
See `environment.yml`.
## Usage
In this example we wish to:
1. Analyze word frequencies using [statistics/count.py](https://github.com/coderefinery/word-count/blob/main/statistics/count.py)
for 4 books (they are all in the [data](https://github.com/coderefinery/word-count/tree/main/data) directory).
2. Plot a histogram using [plot/plot.py](https://github.com/coderefinery/word-count/blob/main/plot/plot.py)
For one book (`isles.txt`) use the scripts like this:
```
$ python code/count.py data/isles.txt > statistics/isles.data
$ python code/plot.py --data-file statistics/isles.data --plot-file plot/isles.png
```
To run these scripts for all books you can collect these calls all into one bash script and run it with `bash run_all.sh`.
One step further and less code, you could also loop through all known book titles in a bash script and run it with: `bash run_all_loop.sh`.
### Workflow
Implemented using Snakemake in `Snakefile`.
### Tests
End to end tests are provided in the test directory.
## Acknowledgement
Inspired by and derived from https://hpc-carpentry.github.io/hpc-python/
which is distributed under
[Creative Commons Attribution license (CC-BY 4.0)](https://creativecommons.org/licenses/by/4.0/).
## CodeRefinery workshop
We use this example in the CodeRefinery workshop in this lesson:
- https://coderefinery.github.io/reproducible-research/