{"id":13401300,"url":"https://github.com/dselivanov/text2vec","last_synced_at":"2025-05-16T07:05:05.029Z","repository":{"id":35939219,"uuid":"40227854","full_name":"dselivanov/text2vec","owner":"dselivanov","description":"Fast vectorization, topic modeling, distances and GloVe word embeddings in R.","archived":false,"fork":false,"pushed_at":"2024-08-16T02:54:30.000Z","size":48415,"stargazers_count":862,"open_issues_count":25,"forks_count":133,"subscribers_count":52,"default_branch":"master","last_synced_at":"2025-05-12T07:52:52.163Z","etag":null,"topics":["glove","latent-dirichlet-allocation","natural-language-processing","text-mining","topic-modeling","vectorization","word-embeddings","word2vec"],"latest_commit_sha":null,"homepage":"http://text2vec.org","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dselivanov.png","metadata":{"files":{"readme":"README.md","changelog":"NEWS.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2015-08-05T06:09:14.000Z","updated_at":"2025-05-11T05:39:15.000Z","dependencies_parsed_at":"2023-01-16T09:31:21.646Z","dependency_job_id":"e0612405-a10d-4527-ba26-6372331e8e3b","html_url":"https://github.com/dselivanov/text2vec","commit_stats":{"total_commits":582,"total_committers":20,"mean_commits":29.1,"dds":"0.38144329896907214","last_synced_commit":"e3b9865057ba8dae713badbf71ca5160b7a6efa6"},"previous_names":[],"tags_count":13,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dselivanov%2Ftext2vec","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dselivanov%2Ftext2vec/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dselivanov%2Ftext2vec/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dselivanov%2Ftext2vec/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dselivanov","download_url":"https://codeload.github.com/dselivanov/text2vec/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254485056,"owners_count":22078767,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["glove","latent-dirichlet-allocation","natural-language-processing","text-mining","topic-modeling","vectorization","word-embeddings","word2vec"],"created_at":"2024-07-30T19:01:01.137Z","updated_at":"2025-05-16T07:05:00.016Z","avatar_url":"https://github.com/dselivanov.png","language":"R","readme":"**text2vec** is an R package which provides an efficient framework with a concise API for text analysis and natural language processing (NLP). \n\nGoals which we aimed to achieve as a result of development of `text2vec`:\n\n* **Concise** - expose as few functions as possible\n* **Consistent** - expose unified interfaces, no need to explore new interface for each task\n* **Flexible** - allow to easily solve complex tasks\n* **Fast** - maximize efficiency per single thread, transparently scale to multiple threads on multicore machines\n* **Memory efficient** - use streams and iterators, not keep data in RAM if possible\n\nSee [API](http://text2vec.org/api.html) section for details.\n\n# Performance\n\n![htop](http://text2vec.org/images/htop.png)\n\nThis package is efficient because it is carefully written in C++, which also means that text2vec is memory friendly. Some parts are fully parallelized using OpenMP. \n\nOther emrassingly parallel tasks (such as vectorization) can use any fork-based parallel backend on UNIX-like machines. They can achieve near-linear scalability with the number of available cores. \n\nFinally, a streaming API means that  users do not have to load all the data into RAM. \n\n\n# Contributing\n\nThe package has [issue tracker on GitHub](https://github.com/dselivanov/text2vec/issues) where I'm filing feature requests and notes for future work. Any ideas are appreciated.\n\nContributors are welcome. You can help by: \n\n- testing and leaving feedback on the [GitHub issuer tracker](https://github.com/dselivanov/text2vec/issues) (preferably) or directly by e-mail\n- forking and contributing (check [code our style guide](https://github.com/dselivanov/text2vec/wiki/Code-style-guide)). Vignettes, docs, tests, and use cases are very welcome\n- by giving me a star on [project page](https://github.com/dselivanov/text2vec) :-)\n\n# License\n\nGPL (\u003e= 2)\n\n","funding_links":[],"categories":["R","Natural Language Processing","Packages","函式庫"],"sub_categories":["Libraries","書籍"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdselivanov%2Ftext2vec","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdselivanov%2Ftext2vec","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdselivanov%2Ftext2vec/lists"}