Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/lasupernova/Thesis-and-Whatsapp-Chat-Word-Cloud-Generator

Python CLI tool to generate customized word clouds from documents, especially large documents such as dissertations, master and bachelor thesis
https://github.com/lasupernova/Thesis-and-Whatsapp-Chat-Word-Cloud-Generator

Last synced: 9 days ago
JSON representation

Python CLI tool to generate customized word clouds from documents, especially large documents such as dissertations, master and bachelor thesis

Awesome Lists containing this project

README

        

# Word Cloud Generator
Python CLI tool to generate customized word clouds from documents, especially large documents such as dissertations, master and bachelor thesis (or export your __WhatsApp Chat__ and use it on your conversations with different people!). Depending on `wordcloud` and `nltk`

Don't want to work with the command line? Use the jupyter notebook instead (see instructions and examples below)



![Example of word cloud, Low Height](example_output/example6_width1500_height100.png?raw=true "Custom settings")
### **Example text file for practice:**

Saved as `example.txt`. This is a text file containing the book "The Count of Monte Cristo".

### **Usage:**

`python generate_cloud.py`


By default text information is taken from a file called "doc.txt", so be sure to move a copy of your thesis to you working directory and to rename it to "doc.txt".
Alternatively, use a command line argument to change the name of the input file.


### **Usage - Word Cloud from WhatsApp Chat Exports:**

`python generate_cloud.py - whatsapp`


This will pre-process the WhatsApp chat export file, to exclude dates and other text-parts added by WhatsApp to generate export file (e.g. "Media omitted" text that is inserted inplace of media sent).



### **Customization**:

A number of different parameters can be customized:

Parameter | Command Line Argument | Type
------------ | ------------- | -------------
Name of input file | -file_path | string
Text color | -hue | integer
Stopwords | -sw
(NOTE: these stopword will not replace
generic stopwords but will be added) | list
Background Color | -bg | string
Image Width (pixel) | -w | integer
Image Heigt (pixel) | -height | integer
Maximum number of words to display | -maxwords | integer
Ratio of words to display horizontally | -h_ratio | integer
(from 0-1)
Saturation | -s | integer
(from 0-100)
Lightness | -l | integer
(from 0-100)
File name to store output | -o | string
(NOTE: should end with '.png')
Words to replace in text | -x1 | string
(NOTE: can be multiple strings)
(NOTE: always needs to be used together with _-x2_)
Substitutes for words passed in -x1 | -x2 | string
(NOTE: can be multiple strings)
(NOTE: always needs to be used together with _-x1_)
WhatsApp export-file usage | -whatsapp | simply add "-whatsapp"
Use when a WhatsApp chat export file is used as text
Matrix Effect | -matrix | simply add "-matrix"
The program will then automatically ste all parameters for a matrix-like word cloud
_(see below for example)_

**Example**:

`python generate_cloud.py -file_path my_thesis_final_version.txt -bg black -h_ratio 0.6 -o wordcloud_thesis.png`

- This example will take a text file named 'my_thesis_final_version.txt' and save the wordcloud to 'wordcloud_thesis.png'. The word cloud will have a black background and only 60% of the words will be displayed horizontally (and 40% vertically).




### **Alternative: Jupyter Notebook**:

If you don't want to use the command line, you can use the Jupiter Notebook instead:

- Install [Jupyter Notebook](https://test-jupyter.readthedocs.io/en/latest/install.html)
- Download Github repository
- Open Notebook
- replace _example.txt_ with the name of your text file / thesis (in the notebook); or save your file in the same folder as the jupyter notebook and rename it _example.txt_
- go to `Cell` - click `Run all`
- check you working directory: the word cloud image should be saved there now under a name similar to **wc_Size1500_1000_hslColorH322** (unless you changed the parameter for the output)




### **Examples**:

A few examples of different custom settings and the results:


* __Regular usage:__ `python generate_cloud.py`




Let's change 'count' to 'Simon Basset' ( ...looking at you __Bridgerton__... ) and use a black background


* __Custom usage:__ `python generate_cloud.py -x1 count -x2 Simon_Hastings -f example.txt -o bridgerton2.png -bg black`


I only replaced one word (count -> simon hastings), but multiple words can be replaced at the same time.
E.g: `-x1 count Monte_Cristo -x2 simon_hastings London` changes "count" to "simon hastings" and "Monte Cristo" to "London".
Note that words that belong together, such as "Monte Cristo", should be connected with an underscore.




* __Matrix usage:__ `python generate_cloud.py -matrix`


Automatically created word cloud with matrix-like style. This specific word cloud was generated using the "-whatsapp" option using a WhatsApp chat export file and I used -x1/-x2 in order to censor names and addresses. You can still specify "-whatsapp", and the input (-f) and output (-o) files.




__Custom usage:__

* Left (saturation and lightness adjusted): `python generate_cloud.py -s 25 -l 90`


* Right (allow for random word colors): `python generate_cloud.py -hue None`