{"id":29873126,"url":"https://github.com/cod3licious/textcatvis","last_synced_at":"2026-02-26T21:45:55.490Z","repository":{"id":57474623,"uuid":"84851863","full_name":"cod3licious/textcatvis","owner":"cod3licious","description":"tools to analyze a collection of texts and identify relevant words","archived":false,"fork":false,"pushed_at":"2018-05-20T17:37:08.000Z","size":297,"stargazers_count":12,"open_issues_count":0,"forks_count":2,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-07-30T23:03:12.614Z","etag":null,"topics":["machine-learning","natural-language-processing","visualization"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cod3licious.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-03-13T16:53:20.000Z","updated_at":"2023-02-10T13:57:33.000Z","dependencies_parsed_at":"2022-09-10T04:04:44.158Z","dependency_job_id":null,"html_url":"https://github.com/cod3licious/textcatvis","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/cod3licious/textcatvis","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cod3licious%2Ftextcatvis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cod3licious%2Ftextcatvis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cod3licious%2Ftextcatvis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cod3licious%2Ftextcatvis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cod3licious","download_url":"https://codeload.github.com/cod3licious/textcatvis/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cod3licious%2Ftextcatvis/sbom","scorecard":{"id":295366,"data":{"date":"2025-08-11","repo":{"name":"github.com/cod3licious/textcatvis","commit":"2d1befbb27606bc9f965a0a76e205d8d56ee321a"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":3,"checks":[{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Code-Review","score":0,"reason":"Found 0/16 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"SAST","score":0,"reason":"no SAST tool detected","details":["Warn: no pull requests merged into dev branch"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: MIT License: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}}]},"last_synced_at":"2025-08-17T19:21:49.111Z","repository_id":57474623,"created_at":"2025-08-17T19:21:49.111Z","updated_at":"2025-08-17T19:21:49.111Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29873624,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-26T21:05:00.265Z","status":"ssl_error","status_checked_at":"2026-02-26T20:57:13.669Z","response_time":89,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["machine-learning","natural-language-processing","visualization"],"created_at":"2025-07-30T22:16:01.638Z","updated_at":"2026-02-26T21:45:55.474Z","avatar_url":"https://github.com/cod3licious.png","language":"Python","readme":"textcatvis\n==========\n\nFaced with a collection of texts, sorted into the categories \"C1\"-\"C23\" and no idea what those could be? Got a dump of text documents and need to figure out what they are about and which of those you should have a closer look at?\nCode is here to help!\n\nThis repository contains tools, which help in getting a quick overview of a text dataset by creating word clouds of the relevant words for each class or identified cluster as well as code to highlight these words in the individual texts, e.g. to better understand classifier decisions. Further details can be found in the corresponding paper (short_ and long_).\n\nIf any of this code was helpful for your research, please consider citing it: ::\n\n    @article{horn2017exploring,\n      title     = {Exploring text datasets by visualizing relevant words},\n      author    = {Horn, Franziska and Arras, Leila and Montavon, Gr{\\'e}goire and M{\\\"u}ller, Klaus-Robert and Samek, Wojciech},\n      journal   = {arXiv preprint arXiv:1707.05261},\n      year      = {2017}\n    }\n\n\nor ::\n\n    @article{horn2017discovering,\n      title     = {Discovering topics in text datasets by visualizing relevant words},\n      author    = {Horn, Franziska and Arras, Leila and Montavon, Gr{\\'e}goire and M{\\\"u}ller, Klaus-Robert and Samek, Wojciech},\n      journal   = {arXiv preprint arXiv:1707.06100},\n      year      = {2017}\n    }\n\n.. _short: http://arxiv.org/abs/1707.06100\n.. _long: http://arxiv.org/abs/1707.05261\n\n\nThe code is intended for research purposes. It was programmed for Python 2.7, but should theoretically also run on newer Python 3 versions - no guarantees on this though (open an issue if you find a bug, please)!\n\nquick start\n-----------\nTo install, either download the code from here and include the textcatvis folder in your ``$PYTHONPATH`` or install (the library components only) via pip:\n\n    ``$ pip install textcatvis``\n\n\nIf you have text data available as a collection of ``.txt`` files either in a single folder or in multiple folders (in case of texts already sorted in different categories), you can call the script ``analyze_relevantwords.py`` with the path to the folder (or parent directory of multiple folders) to load this data and create word clouds for it.\n\ntextcatvis library components\n-----------------------------\n\ndependencies: numpy, scipy, matplotlib, sklearn, wordcloud, nlputils_\n\n.. _nlputils: https://github.com/cod3licious/nlputils\n\n- ``data_utils.py``: contains a function to load a text dataset (organized in a folder with subdirectories for each class containing .txt documents) in the form required by the other functions.\n- ``cluster.py``: contains a function to cluster a collection of text documents with the DBSCAN algorithm from sklearn.\n- ``check_query.py``: contains functions to formulate queries and check how often a term occurs in texts of a given category.\n- ``vis_utils.py``: contains functions to create the word clouds and highlight relevant words in individual texts.\n- ``distinctive_words.py``: contains code to examine a text dataset and identify \"distinctive words\" by comparing how often a word occurs in one category compared to all others.\n- ``visualize_relevantwords.py``: contains 3 functions to generate word clouds and highlight words in individual documents based on tf-idf features, distinctive words, as well as the classification scores obtained with a linear SVM.\n\nexamples\n--------\n\n- ``analyze_relevantwords.py``: can be called with a path to a dataset to carry out the analysis for this dataset, i.e. create word clouds for different classes etc.\n- in ``experiments_cancer.py``, the above mentioned tools are tested on the `cancer papers dataset`_ to create the results reported in the paper. (You need to download this dataset first.)\n- in ``experiments_nytimes.py``, the above mentioned tools are tested on articles downloaded with the NYTimes API. (Make sure you have an API key stored in ``nytimes_apikey.txt``.)\n\n.. _`cancer papers dataset`: https://github.com/cod3licious/cancer_papers\n\nIf you have any questions please don't hesitate to send me an `email \u003cmailto:cod3licious@gmail.com\u003e`_ and of course if you should find any bugs or want to contribute other improvements, pull requests are very welcome!\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcod3licious%2Ftextcatvis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcod3licious%2Ftextcatvis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcod3licious%2Ftextcatvis/lists"}