{"id":15014126,"url":"https://github.com/hlasse/textdescriptives","last_synced_at":"2025-05-15T04:06:37.815Z","repository":{"id":38197553,"uuid":"236710916","full_name":"HLasse/TextDescriptives","owner":"HLasse","description":"A Python library for calculating a large variety of metrics from text","archived":false,"fork":false,"pushed_at":"2024-12-16T09:15:24.000Z","size":23305,"stargazers_count":338,"open_issues_count":0,"forks_count":25,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-05-15T04:06:26.832Z","etag":null,"topics":["dependency-distance","descriptive-statistics","nlp","python","readability","readability-scores","spacy","spacy-extension","statistics","syntactic-analysis"],"latest_commit_sha":null,"homepage":"https://hlasse.github.io/TextDescriptives/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/HLasse.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":".zenodo.json"}},"created_at":"2020-01-28T10:37:59.000Z","updated_at":"2025-05-13T20:26:30.000Z","dependencies_parsed_at":"2023-10-12T22:15:10.400Z","dependency_job_id":"18750596-7173-416a-888d-fc4016f06af0","html_url":"https://github.com/HLasse/TextDescriptives","commit_stats":{"total_commits":675,"total_committers":16,"mean_commits":42.1875,"dds":"0.47111111111111115","last_synced_commit":"eb2a66a8e3dfbacb6d70ee618fce78ad642595dd"},"previous_names":[],"tags_count":41,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HLasse%2FTextDescriptives","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HLasse%2FTextDescriptives/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HLasse%2FTextDescriptives/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HLasse%2FTextDescriptives/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/HLasse","download_url":"https://codeload.github.com/HLasse/TextDescriptives/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254270646,"owners_count":22042859,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dependency-distance","descriptive-statistics","nlp","python","readability","readability-scores","spacy","spacy-extension","statistics","syntactic-analysis"],"created_at":"2024-09-24T19:45:13.543Z","updated_at":"2025-05-15T04:06:32.800Z","avatar_url":"https://github.com/HLasse.png","language":"Python","readme":"\n\u003ca href=\"https://github.com/HLasse/TextDescriptives\"\u003e\u003cimg src=\"https://github.com/HLasse/TextDescriptives/raw/main/docs/_static/icon.png\" width=\"175\" height=\"175\" align=\"right\" /\u003e\u003c/a\u003e\n\n\n# TextDescriptives\n\n[![spacy](https://img.shields.io/badge/built%20with-spaCy-09a3d5.svg)](https://spacy.io)\n[![github actions pytest](https://github.com/hlasse/textdescriptives/actions/workflows/tests.yml/badge.svg)](https://github.com/hlasse/textdescriptives/actions)\n[![github actions docs](https://github.com/hlasse/textdescriptives/actions/workflows/documentation.yml/badge.svg)](https://hlasse.github.io/TextDescriptives/)\n[![status](https://joss.theoj.org/papers/06447337ee61969b5a64de484199df24/status.svg)](https://joss.theoj.org/papers/06447337ee61969b5a64de484199df24)\n[![Open in Streamlit](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://huggingface.co/spaces/HLasse/textdescriptives)\n\nA Python library for calculating a large variety of metrics from text(s) using spaCy v.3 pipeline components and extensions. \n\n# 🔧 Installation\n`pip install textdescriptives`\n\n# 📰 News\n\n* We now have a TextDescriptives-powered web-app so you can extract and downloads metrics without a single line of code! Check it out [here](https://huggingface.co/spaces/HLasse/textdescriptives)\n* Version 2.0 out with a new API, a new component, updated documentation, and tutorials! Components are now called by \"`textdescriptives/{metric_name}`. New `coherence` component for calculating the semantic coherence between sentences. See the [documentation](https://hlasse.github.io/TextDescriptives/) for tutorials and more information!  \n\n\n\n# ⚡ Quick Start\n\nUse `extract_metrics` to quickly extract your desired metrics. To see available methods you can simply run:\n```python\nimport textdescriptives as td\ntd.get_valid_metrics()\n# {'quality', 'readability', 'all', 'descriptive_stats', 'dependency_distance', 'pos_proportions', 'information_theory', 'coherence'}\n```\n\nSet the `spacy_model` parameter to specify which spaCy model to use, otherwise, TextDescriptives will auto-download an appropriate one based on `lang`. If `lang` is set, `spacy_model` is not necessary and vice versa.\n\nSpecify which metrics to extract in the `metrics` argument. `None` extracts all metrics. \n\n```py\nimport textdescriptives as td\n\ntext = \"The world is changed. I feel it in the water. I feel it in the earth. I smell it in the air. Much that once was is lost, for none now live who remember it.\"\n# will automatically download the relevant model (´en_core_web_lg´) and extract all metrics\ndf = td.extract_metrics(text=text, lang=\"en\", metrics=None)\n\n# specify spaCy model and which metrics to extract\ndf = td.extract_metrics(text=text, spacy_model=\"en_core_web_lg\", metrics=[\"readability\", \"coherence\"])\n```\n\n\n## Usage with spaCy\n\nTo integrate with other spaCy pipelines, import the library and add the component(s) to your pipeline using the standard spaCy syntax. Available components are *descriptive_stats*, *readability*, *dependency_distance*, *pos_proportions*, *coherence*, and *quality* prefixed with `textdescriptives/`. \n\nIf you want to add all components you can use the shorthand `textdescriptives/all`.\n\n```py\nimport spacy\nimport textdescriptives as td\n# load your favourite spacy model (remember to install it first using e.g. `python -m spacy download en_core_web_sm`)\nnlp = spacy.load(\"en_core_web_sm\")\nnlp.add_pipe(\"textdescriptives/all\") \ndoc = nlp(\"The world is changed. I feel it in the water. I feel it in the earth. I smell it in the air. Much that once was is lost, for none now live who remember it.\")\n\n# access some of the values\ndoc._.readability\ndoc._.token_length\n```\n\nTextDescriptives includes convenience functions for extracting metrics from a `Doc` to a Pandas DataFrame or a dictionary.\n\n```py\ntd.extract_dict(doc)\ntd.extract_df(doc)\n```\n|      | text                      | first_order_coherence | second_order_coherence | pos_prop_DET | pos_prop_NOUN | pos_prop_AUX | pos_prop_VERB | pos_prop_PUNCT | pos_prop_PRON | pos_prop_ADP | pos_prop_ADV | pos_prop_SCONJ | flesch_reading_ease | flesch_kincaid_grade |    smog | gunning_fog | automated_readability_index | coleman_liau_index |     lix |  rix | n_stop_words | alpha_ratio | mean_word_length | doc_length | proportion_ellipsis | proportion_bullet_points | duplicate_line_chr_fraction | duplicate_paragraph_chr_fraction | duplicate_5-gram_chr_fraction | duplicate_6-gram_chr_fraction | duplicate_7-gram_chr_fraction | duplicate_8-gram_chr_fraction | duplicate_9-gram_chr_fraction | duplicate_10-gram_chr_fraction | top_2-gram_chr_fraction | top_3-gram_chr_fraction | top_4-gram_chr_fraction | symbol_#_to_word_ratio | contains_lorem ipsum | passed_quality_check | dependency_distance_mean | dependency_distance_std | prop_adjacent_dependency_relation_mean | prop_adjacent_dependency_relation_std | token_length_mean | token_length_median | token_length_std | sentence_length_mean | sentence_length_median | sentence_length_std | syllables_per_token_mean | syllables_per_token_median | syllables_per_token_std | n_tokens | n_unique_tokens | proportion_unique_tokens | n_characters | n_sentences |\n| ---: | :------------------------ | --------------------: | ---------------------: | -----------: | ------------: | -----------: | ------------: | -------------: | ------------: | -----------: | -----------: | -------------: | ------------------: | -------------------: | ------: | ----------: | --------------------------: | -----------------: | ------: | ---: | -----------: | ----------: | ---------------: | ---------: | ------------------: | -----------------------: | --------------------------: | -------------------------------: | ----------------------------: | ----------------------------: | ----------------------------: | ----------------------------: | ----------------------------: | -----------------------------: | ----------------------: | ----------------------: | ----------------------: | ---------------------: | :------------------- | :------------------- | -----------------------: | ----------------------: | -------------------------------------: | ------------------------------------: | ----------------: | ------------------: | ---------------: | -------------------: | ---------------------: | ------------------: | -----------------------: | -------------------------: | ----------------------: | -------: | --------------: | -----------------------: | -----------: | ----------: |\n|    0 | The world is changed(...) |              0.633002 |               0.573323 |     0.097561 |      0.121951 |    0.0731707 |      0.170732 |       0.146341 |      0.195122 |    0.0731707 |    0.0731707 |      0.0487805 |             107.879 |           -0.0485714 | 5.68392 |     3.94286 |                    -2.45429 |          -0.708571 | 12.7143 |  0.4 |           24 |    0.853659 |          2.95122 |         41 |                   0 |                        0 |                           0 |                                0 |                      0.232258 |                      0.232258 |                             0 |                             0 |                             0 |                              0 |               0.0580645 |                0.174194 |                       0 |                      0 | False                | False                |                  1.77524 |                0.553188 |                               0.457143 |                             0.0722806 |           3.28571 |                   3 |          1.54127 |                    7 |                      6 |             3.09839 |                  1.08571 |                          1 |                0.368117 |       35 |              23 |                 0.657143 |          121 |           5 |\n\n\n\n# 📖 Documentation\n\nTextDescriptives has a detailed documentation as well as a series of Jupyter notebook tutorials.\nAll the tutorials are located in the `docs/tutorials` folder and can also be found on the documentation website.\n\n\n| Documentation              |                                                                                    |\n| -------------------------- | ---------------------------------------------------------------------------------- |\n| 📚 **[Getting started]**    | Guides and instructions on how to use TextDescriptives and its features.           |\n| 👩‍💻 **[Demo]**               | A live demo of TextDescriptives.                                                   |\n| 😎 **[Tutorials]**          | Detailed tutorials on how to make the most of TextDescriptives                     |\n| 📰 **[News and changelog]** | New additions, changes and version history.                                        |\n| 🎛 **[API References]**     | The detailed reference for TextDescriptive's API. Including function documentation |\n| 📄 **[Paper]**              | The preprint of the TextDescriptives paper.                                        |\n\n[Paper]: https://arxiv.org/abs/2301.02057\n[Tutorials]: https://hlasse.github.io/TextDescriptives/tutorial.html\n[Getting started]: https://hlasse.github.io/TextDescriptives/usingthepackage.html\n[API References]: https://hlasse.github.io/TextDescriptives/index.html\n[News and changelog]: https://hlasse.github.io/TextDescriptives/news.html\n[Demo]: https://huggingface.co/spaces/HLasse/textdescriptives\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhlasse%2Ftextdescriptives","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhlasse%2Ftextdescriptives","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhlasse%2Ftextdescriptives/lists"}