{"id":15288150,"url":"https://github.com/rtmigo/skifts_py","last_synced_at":"2025-08-19T23:04:23.858Z","repository":{"id":57468042,"uuid":"445936045","full_name":"rtmigo/skifts_py","owner":"rtmigo","description":"Search for the most relevant documents containing words from a query. Uses Scikit-learn and Numpy","archived":false,"fork":false,"pushed_at":"2022-05-06T23:14:57.000Z","size":45,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"staging","last_synced_at":"2025-06-29T22:16:15.597Z","etag":null,"topics":["cosine-similarity","information-retrieval","numpy","python","scikit-learn","text-mining","tf-idf"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/skifts/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rtmigo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-01-08T21:50:30.000Z","updated_at":"2022-01-13T01:50:52.000Z","dependencies_parsed_at":"2022-09-19T08:20:45.814Z","dependency_job_id":null,"html_url":"https://github.com/rtmigo/skifts_py","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/rtmigo/skifts_py","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rtmigo%2Fskifts_py","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rtmigo%2Fskifts_py/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rtmigo%2Fskifts_py/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rtmigo%2Fskifts_py/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rtmigo","download_url":"https://codeload.github.com/rtmigo/skifts_py/tar.gz/refs/heads/staging","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rtmigo%2Fskifts_py/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271236280,"owners_count":24723978,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-19T02:00:09.176Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cosine-similarity","information-retrieval","numpy","python","scikit-learn","text-mining","tf-idf"],"created_at":"2024-09-30T15:44:23.798Z","updated_at":"2025-08-19T23:04:23.812Z","avatar_url":"https://github.com/rtmigo.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![PyPI version shields.io](https://img.shields.io/pypi/v/skifts.svg)](https://pypi.python.org/pypi/skifts/)\n[![Generic badge](https://img.shields.io/badge/Python-3.6+-blue.svg)](#)\n[![Generic badge](https://img.shields.io/badge/OS-Linux%20|%20macOS%20|%20Windows-blue.svg)](#)\n\n# [skifts](https://github.com/rtmigo/skifts_py#readme)\n\nSearch for the most relevant documents containing words from the query.\n\n```python3\nquery = ['A', 'B']\n\ndocuments = [\n    ['N', 'A', 'M'],  # matching features: 'A'\n    ['C', 'B', 'A'],  # matching features: 'A', 'B'  \n    ['X', 'Y']  # no matching features\n]\n```\n\nThe search with return `['C', 'B', 'A']` and `['N', 'A', 'M']` in that\nparticular order.\n\nIt's not necessarily about text. Words are just any `str` instances. Documents\nare unordered collections of these `str`. We search for documents considering\nfrequency, rarity and match accuracy.\n\n## Install\n\n```bash\npip3 install skifts\n```\n\n\n\u003cdetails\u003e\n  \u003csummary\u003eOther options\u003c/summary\u003e\n\n### From GitHub (staging branch)\n```bash\npip3 install git+https://github.com/rtmigo/skifts_py#egg=skifts\n```\n\u003c/details\u003e\n\n## Use for full-text search\n\nFinding documents that contain words from the query.\n\n```python3\nfrom skifts import SkiFts\n\n# three documents, one per row\ndocuments = [\n    [\"wait\", \"mister\", \"postman\"],\n    [\"please\", \"mister\", \"postman\", \"look\", \"and\", \"see\"],\n    [\"oh\", \"yes\", \"wait\", \"a\", \"minute\", \"mister\", \"postman\"]\n]\n\nfts = SkiFts(documents)\n\n# find and print the most relevant documents:\nfor doc_index in fts.search(['postman', 'wait']):\n    print(documents[doc_index])\n```\n\nWords inside the `documents` list are considered ready-made feature identifiers.\nIf your text needs preprocessing or stemming, this should be done separately.\n\nThe ranking takes into account the frequency of words in the document and the\nrarity of words in the corpus. The word order in the document and the distance\nbetween words do not matter.\n\n## Implementation details\n\nThe search uses the [scikit-learn](https://scikit-learn.org) library, which\nranks documents using [tf-idf](https://en.wikipedia.org/wiki/Tf%E2%80%93idf) and\n[cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity).\n\n## See also\n\nThe [gifts](https://github.com/rtmigo/gifts_py#readme) package implements the\nsame search, but in pure Python with no binary dependencies.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frtmigo%2Fskifts_py","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frtmigo%2Fskifts_py","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frtmigo%2Fskifts_py/lists"}