{"id":13468705,"url":"https://github.com/neomatrix369/nlp_profiler","last_synced_at":"2025-04-05T06:07:10.533Z","repository":{"id":44879269,"uuid":"293235950","full_name":"neomatrix369/nlp_profiler","owner":"neomatrix369","description":"A simple NLP library allows profiling datasets with one or more text columns. When given a dataset and a column name containing text data, NLP Profiler will return either high-level insights or low-level/granular statistical information about the text in that column.","archived":false,"fork":false,"pushed_at":"2024-05-12T20:59:55.000Z","size":3709,"stargazers_count":242,"open_issues_count":18,"forks_count":37,"subscribers_count":10,"default_branch":"master","last_synced_at":"2025-03-29T05:09:21.567Z","etag":null,"topics":["google-colab","grammar-checks","hacktoberfest","jupyter","kaggle-kernels","natural-language-processing","nlp","nlp-keywords-extraction","nlp-library","nlp-machine-learning","nlp-parsing","nlp-profiler","profiler","profiling","profiling-datasets","text-mining"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/neomatrix369.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":"neomatrix369","patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"custom":null}},"created_at":"2020-09-06T08:37:33.000Z","updated_at":"2024-12-30T22:25:23.000Z","dependencies_parsed_at":"2024-10-22T10:21:12.011Z","dependency_job_id":null,"html_url":"https://github.com/neomatrix369/nlp_profiler","commit_stats":{"total_commits":542,"total_committers":8,"mean_commits":67.75,"dds":0.09225092250922506,"last_synced_commit":"de2922cc9b61c69251ddd4a40addacb82548dd1e"},"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neomatrix369%2Fnlp_profiler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neomatrix369%2Fnlp_profiler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neomatrix369%2Fnlp_profiler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neomatrix369%2Fnlp_profiler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/neomatrix369","download_url":"https://codeload.github.com/neomatrix369/nlp_profiler/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247294536,"owners_count":20915340,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["google-colab","grammar-checks","hacktoberfest","jupyter","kaggle-kernels","natural-language-processing","nlp","nlp-keywords-extraction","nlp-library","nlp-machine-learning","nlp-parsing","nlp-profiler","profiler","profiling","profiling-datasets","text-mining"],"created_at":"2024-07-31T15:01:17.087Z","updated_at":"2025-04-05T06:07:10.511Z","avatar_url":"https://github.com/neomatrix369.png","language":"Python","funding_links":["https://github.com/sponsors/neomatrix369"],"categories":["Python"],"sub_categories":[],"readme":"# NLP Profiler \n\n||| [![Gitter](https://badges.gitter.im/nlp_profiler/community.svg)](https://gitter.im/nlp_profiler/community?utm_source=badge\u0026utm_medium=badge\u0026utm_campaign=pr-badge) |||\n[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n[![GitHub actions](https://github.com/neomatrix369/nlp_profiler/workflows/end-to-end-flow/badge.svg)](https://github.com/neomatrix369/nlp_profiler/actions?workflow=end-to-end-flow)\n[![Code coverage](https://codecov.io/gh/neomatrix369/nlp_profiler/branch/master/graph/badge.svg)](https://codecov.io/gh/neomatrix369/nlp_profiler)\n[![Sourcery](https://img.shields.io/badge/Sourcery-enabled-brightgreen)](https://sourcery.ai) \n[![Codeac](https://static.codeac.io/badges/2-293235950.svg \"Codeac.io\")](https://app.codeac.io/github/neomatrix369/nlp_profiler)\n[![PyPI version](https://badge.fury.io/py/nlp-profiler.svg)](https://badge.fury.io/py/nlp-profiler) \n[![Python versions](https://img.shields.io/pypi/pyversions/nlp_profiler.svg)](https://pypi.org/project/nlp_profiler/) \n[![PyPi stats](https://img.shields.io/pypi/dm/nlp_profiler.svg?label=pypi%20downloads\u0026logo=PyPI\u0026logoColor=white)](https://pypistats.org/packages/nlp_profiler)\n[![Downloads](https://static.pepy.tech/personalized-badge/nlp-profiler?period=total\u0026units=international_system\u0026left_color=black\u0026right_color=orange\u0026left_text=Downloads)](https://pepy.tech/project/nlp-profiler)\n\n\nA simple NLP library that allows profiling datasets with one or more text columns. \n\nNLP Profiler returns either high-level insights or low-level/granular statistical information about the text when given a dataset and a column name containing text data, in that column. \n\nIn short: Think of it as using the `pandas.describe()` function or running [Pandas Profiling](https://github.com/pandas-profiling/pandas-profiling) on your data frame, but for datasets containing text columns rather than the usual columnar datasets.\n\n# Table of contents\n\n- **Community/Chat/Communication:** [![Gitter](https://badges.gitter.im/nlp_profiler/community.svg)](https://gitter.im/nlp_profiler/community?utm_source=badge\u0026utm_medium=badge\u0026utm_campaign=pr-badge)\n- [What do you get from the library?](#what-do-you-get-from-the-library)\n- [Requirements](#requirements)\n- [Getting started](#getting-started)\n  - [Installation](#installation)\n  - [Usage](#usage)\n  - [Developer guide](#developer-guide)\n  - [Demo and presentations](#Demo-and-presentations)\n- [Notebooks](#notebooks)\n- [Screenshots](#screenshots)\n- [Credits and supporters](#credits-and-supporters)\n- [Changes](#changes)\n- [License](#license)\n- [Contributing](#contributing)\n\n---\n\n## What do you get from the library?\n\n- Input a Pandas dataframe series as an input parameter.\n- You get back a new dataframe with various features about the parsed text per row.\n  - High-level: sentiment analysis, objectivity/subjectivity analysis, spelling quality check, grammar quality check, ease of readability check, etc...\n  - Low-level/granular: number of characters in the sentence, number of words, number of emojis, number of words, etc...\n- From the above numerical data in the resulting dataframe descriptive statistics can be drawn using the `pandas.describe()` on the dataframe.\n\nSee screenshots under the [Jupyter](#Jupyter) section and also under [Screenshots](#Screenshots) for further illustrations.\n\nUnder the hood it does make use of a number of libraries that are popular in the AI and ML communities, but we can extend it's functionality by replacing or adding other libraries as well.\n\nA simple [notebook](#Notebooks) have been provided to illustrate the usage of the library.\n\n**_Please join the [Gitter.im community](https://gitter.im/nlp_profiler/community) and say \"hello\" to us, share your feedback, have a fun time with us._**\n\n**Note:** _this is a new endeavour and it may have rough edges i.e. NLP_Profiler in its current version is probably NOT capable of doing many things. Many of these gaps are opportunities we can work on and plug, as we go along using it. Please provide constructive feedback to help with the improvement of this library. We just recently achieved this with [scaling with larger datasets](https://github.com/neomatrix369/nlp_profiler/issues/2#issuecomment-696675059)._\n\n## Requirements\n\n- Python 3.7.x or higher.\n- Dependencies described in the `requirements.txt`.\n- High-level including Grammar checks:\n  - faster processor\n  - higher RAM capacity\n  - working disk-space of 1 to 3 GBytes (depending on the dataset size)\n- (Optional)\n  - Jupyter Lab (on your local machine).\n  - Google Colab account.\n  - Kaggle account.\n  - Grammar check functionality:\n    - Internet access\n    - Java 8 or higher\n  \n## Getting started\n\n### Installation\n\n**For Conda/Miniconda environments:**\n\n```bash\nconda config --set pip_interop_enabled True\npip install \"spacy \u003e= 2.3.0,\u003c3.0.0\"         # in case spacy is not present\npython -m spacy download en_core_web_sm\n\n### now perform any of the below pathways/options\n```\n\n**For Kaggle environments:**\n\n```\npip uninstall typing      # this can cause issues on Kaggle hence removing it helps\n```\n\n_Follow any of the remaining installation steps but \"avoid\" using `-U` with `pip install` -- again this can cause issues on Kaggle hence not using it helps_.\n\n**From PyPi:**\n\n```bash\npip install -U nlp_profiler\n```\n\n**From the GitHub repo:**\n\n```bash\npip install -U git+https://github.com/neomatrix369/nlp_profiler.git@master\n```\n\n**From the source:**\n\nFor library development purposes, see [Developer guide](#developer-guide)\n\n### Usage\n\n```python\nimport nlp_profiler.core as nlpprof\n\nnew_text_column_dataset = nlpprof.apply_text_profiling(dataset, 'text_column')\n```\n\nor \n\n```python\nfrom nlp_profiler.core import apply_text_profiling\n\nnew_text_column_dataset = apply_text_profiling(dataset, 'text_column')\n```\n\nSee [Notebooks](./notebooks/README.md) section for further illustrations.\n\n### Developer guide\n\nSee [Developer guide](developer-guide.md) to know how to build, test, and contribute to the library.\n\n### Demo and presentations\n\nLook at a short demo of the NLP Profiler library at one of these:\n\n\u003ctable\u003e\n  \u003ctr\u003e\n    \u003ctd align=\"center\"\u003e\u003ca href=\"https://youtu.be/sdPOyqMfK7M?t=2274\"\u003e\u003cimg alt=\"Demo of the NLP Profiler library (Abhishek talks #6)\" src=https://user-images.githubusercontent.com/1570917/88474968-8fb48980-cf23-11ea-944d-0a1069174ede.png\u003e\u003c/a\u003e or you find the rest of the \u003ca href=https://www.youtube.com/watch?v=sdPOyqMfK7M\u003etalk here\u003c/a\u003e or here for \u003ca href=\"https://github.com/neomatrix369/awesome-ai-ml-dl/blob/master/presentations/awesome-ai-ml-dl/02-abhishektalks-2020/README.md\"\u003eslides\u003c/a\u003e\u003c/td\u003e\n\u003ctd\u003e\n  \u003ctd align=\"center\"\u003e\u003ca href=\"https://youtu.be/wHIcQWeOugI?t=808\"\u003e\u003cimg alt=\"Demo of the NLP Profiler library (NLP Zurich talk)\" src=https://secure.meetupstatic.com/photos/event/5/7/3/highres_492541395.jpeg\u003e\u003c/a\u003e or you find the rest of the \u003ca href=https://www.youtube.com/watch?v=wHIcQWeOugI\u003etalk here\u003c/a\u003e or here for \u003ca href=\"https://github.com/neomatrix369/nlp_profiler/blob/master/presentations/01-nlp-zurich-2020/README.md\"\u003eslides\u003c/a\u003e\u003c/td\u003e\n  \n  \u003c/tr\u003e\n\u003c/table\u003e\n\n## Notebooks\n\nAfter successful installation of the library, RESTART Jupyter kernels or Google Colab runtimes for the changes to take effect.\n\nSee [Notebooks](./notebooks/README.md) for usage and further details.\n\n## Screenshots\n\nSee [Screenshots](./notebooks/README.md#screenshots)\n\n## Credits and supporters\n\nSee [CREDITS_AND_SUPPORTERS.md](./CREDITS_AND_SUPPORTERS.md)\n\n## Changes\n\nSee [CHANGELOG.md](./CHANGELOG.md)\n\n## License\n\nRefer [licensing](LICENSE.md) (and warranty) policy.\n\n## Contributing\n\nContributions are Welcome!\n\nPlease have a look at the [CONTRIBUTING](CONTRIBUTING.md) guidelines.\n\nPlease share it with the wider community (and get credited for it)!\n\n---\n\nGo to the [NLP page](https://github.com/neomatrix369/awesome-ai-ml-dl/blob/master/natural-language-processing/README.md)\u003c/br\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fneomatrix369%2Fnlp_profiler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fneomatrix369%2Fnlp_profiler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fneomatrix369%2Fnlp_profiler/lists"}