{"id":13691694,"url":"https://github.com/KxSystems/nlp","last_synced_at":"2025-05-02T15:32:51.233Z","repository":{"id":53996828,"uuid":"135463911","full_name":"KxSystems/nlp","owner":"KxSystems","description":"Natural-language processing library","archived":false,"fork":false,"pushed_at":"2024-10-10T16:04:35.000Z","size":1251,"stargazers_count":18,"open_issues_count":1,"forks_count":21,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-05-01T08:48:45.292Z","etag":null,"topics":["clustering","dataset","embedpy","kdb","natural-language-processing","nlp","parsing","python","q","vector"],"latest_commit_sha":null,"homepage":"https://code.kx.com/q/ml","language":"q","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/KxSystems.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2018-05-30T15:36:47.000Z","updated_at":"2024-10-10T16:04:40.000Z","dependencies_parsed_at":"2024-04-08T01:57:59.360Z","dependency_job_id":"4b52e259-35fe-44fb-909f-2fa0d0bcec40","html_url":"https://github.com/KxSystems/nlp","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KxSystems%2Fnlp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KxSystems%2Fnlp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KxSystems%2Fnlp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KxSystems%2Fnlp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/KxSystems","download_url":"https://codeload.github.com/KxSystems/nlp/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252063139,"owners_count":21688655,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clustering","dataset","embedpy","kdb","natural-language-processing","nlp","parsing","python","q","vector"],"created_at":"2024-08-02T17:00:49.145Z","updated_at":"2025-05-02T15:32:51.227Z","avatar_url":"https://github.com/KxSystems.png","language":"q","funding_links":[],"categories":["Training"],"sub_categories":[],"readme":"# ⚠️ **This repository is outdated!** ⚠️\n\n---\n\nSince 9th October 2024 (the 4.0.0 release of the ML Toolkit), this project has been merged into the ml toolkit mono-repo. For the latest updates and active development, please visit [https://github.com/KxSystems/ml](https://github.com/KxSystems/ml). \n\nThis repository is preserved only to maintain old links and project history but will no longer be actively maintained.\n\n---\n\n# Natural Language Processing\n\n## Introduction\n\nNatural language processing (NLP) can be used to answer a variety of questions about unstructured text, as well as facilitating open-ended exploration. It can be applied to datasets such as emails, online articles and comments, tweets and novels. Although the source is text, transformations are applied to convert this data to vectors, dictionaries and symbols which can be handled very effectively by q. Many operations such as searching, clustering, and keyword extraction can all be done using very simple data structures, such as feature vectors.\n\n## Features\n\nThe NLP allows users to parse dataset using the spacy model from python in which it runs tokenisation, Sentence Detection, Part of speech tagging and Lemmatization. In addition to parsing, users can cluster text documents together using different clustering algorithms like MCL, K-means and radix. You can also run sentiment analysis which indicates whether a word has a positive or negative sentiment.\n\n## Requirements\n- kdb+\u003e=? v3.5 64-bit\n- Anaconda Python 3.x\n- [embedPy](https://github.com/KxSystems/embedPy)\n\n#### Dependencies\nThe following python packages are required:\n  1. numpy\n  2. beautifulsoup4\n  3. spacy \n\n* Tests were run using spacy version 2.2.1\n\nTo install these packages with\n\npip\n```bash\npip install -r requirements.txt\n```\nor with conda\n```bash\nconda install -c conda-forge --file requirements.txt\n```\n\n* Download the English model using ```python -m spacy download en```\n\nOther languages that spacy supports can be found at https://spacy.io/usage/models#languages\n\nTo use the languages in the alpha stage of developement in spacy the following steps can be taken:\n\nTo Download the Chinese model the jieba must be installed\n\npip\n```bash\npip install jieba\n```\n\nTo download the Japanese model mecab must be installed\n\npip\n```bash\npip install mecab-python3\n```\n\n* spacy_hunspell is not a requirement to run these scripts, but can be installed using the following methods\n\nLinux\n```bash\nsudo apt-get install libhunspell-dev hunspell\npip install spacy_hunspell\n```\n\nmac\n```bash\nwget https://iweb.dl.sourceforge.net/project/wordlist/speller/2019.10.06/hunspell-en_US-2019.10.06.zip;\nunzip hunspell-en_US-2019.10.06; sudo mv en_US.dic en_US.aff /Library/Spelling/; \nbrew install hunspell;\nexport C_INCLUDE_PATH=/usr/local/include/hunspell;\nsudo ln -sf /usr/local/lib/libhunspell-1.7.a /usr/local/lib/libhunspell.a;\nsudo ln -sf /usr/local/Cellar/hunspell/1.7.0_2/lib/libhunspell-1.7.dylib /usr/local/Cellar/hunspell/1.7.0_2/lib/libhunspell.dylib;\nCFLAGS=$(pkg-config --cflags hunspell) LDFLAGS=$(pkg-config --libs hunspell) pip install hunspell==0.5.0\n```\n\nAt the moment spacy_hunspell does not support installation for windows. More information can be found at https://github.com/tokestermw/spacy_hunspell\n\n## Installation\nRun tests with\n\n```bash\nq test.q\n```\n\nPlace the library file in `$QHOME` and load into a q instance using \n\n```q\nq)\\l nlp/nlp.q\nq).nlp.loadfile`:init.q\nLoading init.q\nLoading code/utils.q\nLoading code/regex.q\nLoading code/sent.q\nLoading code/parser.q\nLoading code/time.q\nLoading code/date.q\nLoading code/email.q\nLoading code/cluster.q\nLoading code/nlp_code.q\nq).nlp.findTimes\"I went to work at 9:00am and had a coffee at 10:20\"\n09:00:00.000 \"9:00am\" 18 24\n10:20:00.000 \"10:20\"  45 50\n```\n\n### Docker\n\nIf you have [Docker installed](https://www.docker.com/community-edition) you can alternatively run:\n\n    $ docker run -it --name mynlp kxsys/nlp\n    kdb+ on demand - Personal Edition\n    \n    [snipped]\n    \n    I agree to the terms of the license agreement for kdb+ on demand Personal Edition (N/y): y\n    \n    If applicable please provide your company name (press enter for none): ACME Limited\n    Please provide your name: Bob Smith\n    Please provide your email (requires validation): bob@example.com\n    KDB+ 3.5 2018.04.25 Copyright (C) 1993-2018 Kx Systems\n    l64/ 4()core 7905MB kx 0123456789ab 172.17.0.2 EXPIRE 2018.12.04 bob@example.com KOD #0000000\n\n    Loading code/utils.q\n    Loading code/regex.q\n    Loading code/sent.q\n    Loading code/parser.q\n    Loading code/time.q\n    Loading code/date.q\n    Loading code/email.q\n    Loading code/cluster.q\n    Loading code/nlp_code.q\n    q).nlp.findTimes\"I went to work at 9:00am and had a coffee at 10:20\"\n    09:00:00.000 \"9:00am\" 18 24\n    10:20:00.000 \"10:20\"  45 50\n    \n\n**N.B.** [instructions regarding headless/presets are available](https://github.com/KxSystems/embedPy/docker/README.md#headlesspresets)\n\n**N.B.** [build instructions for the image are available](docker/README.md)\n\n\n\n## Documentation\n\nDocumentation is available on the [nlp](https://code.kx.com/v2/ml/nlp/) homepage.\n\n \n\n## Status\n  \nThe nlp library is still in development and is available here as a beta release.  \nIf you have any issues, questions or suggestions, please write to ai@kx.com.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FKxSystems%2Fnlp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FKxSystems%2Fnlp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FKxSystems%2Fnlp/lists"}