{"id":23437295,"url":"https://github.com/pvnieo/searchy","last_synced_at":"2025-04-09T18:41:37.963Z","repository":{"id":54604093,"uuid":"114547580","full_name":"pvnieo/searchy","owner":"pvnieo","description":"Implementation of a search engine on the cacm and CS276 (Stanford) collections.","archived":false,"fork":false,"pushed_at":"2019-08-17T12:22:47.000Z","size":38077,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-15T11:23:28.168Z","etag":null,"topics":["boolean-search","cacm","python-3","search-engine","stanford-corpus","vector-space-model"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pvnieo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-12-17T15:31:04.000Z","updated_at":"2022-08-08T11:31:09.000Z","dependencies_parsed_at":"2022-08-13T21:10:26.550Z","dependency_job_id":null,"html_url":"https://github.com/pvnieo/searchy","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pvnieo%2Fsearchy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pvnieo%2Fsearchy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pvnieo%2Fsearchy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pvnieo%2Fsearchy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pvnieo","download_url":"https://codeload.github.com/pvnieo/searchy/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248089934,"owners_count":21045994,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["boolean-search","cacm","python-3","search-engine","stanford-corpus","vector-space-model"],"created_at":"2024-12-23T13:44:43.833Z","updated_at":"2025-04-09T18:41:37.940Z","avatar_url":"https://github.com/pvnieo.png","language":"Jupyter Notebook","readme":"# Moteur de recherche\n\n[![Build Status](https://travis-ci.org/pvnieo/searchy.svg?branch=master)](https://travis-ci.org/pvnieo/searchy)\n\nImplémentation d'un moteur de recherche pour une collection de fichiers.\n\n## Installation\n\nSearchy tourne sous python \u003e= 3.6, utilisez pip pour installer les dépendances\n```\npip3 install -r requirements.txt\n```\n\nInstallez les dépendances demandées par nltk avec la commande suivante:\n```\npython3 -c \"import nltk; nltk.download('punkt'); nltk.download('stopwords'); nltk.download('wordnet');\"\n```\n\n## Usage\n\nUtilisez le script `searchy.py` pour indexer une collection:\n```\nusage: searchy.py [-h] [-q QUERY] [-m {bool,vect}]\n                  [-n {cos,dice,jaccard,overlap}] [-t THRESHOLD]\n                  [-w {f,tfidf,nf}] [-s] [-f] [--no-cache]\n                  collection\n\nBuilds a search engine on a collection of documents\n\npositional arguments:\n  collection            Path to collection file (CACM format), directory or\n                        url to zip\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -q QUERY, --query QUERY\n                        Execute a search query\n  -m {bool,vect}, --model {bool,vect}\n                        Search engine model\n  -n {cos,dice,jaccard,overlap}, --norm {cos,dice,jaccard,overlap}\n                        Vectorial search norm\n  -t THRESHOLD, --threshold THRESHOLD\n                        Vectorial search norm threshold\n  -w {f,tfidf,nf}, --weighting {f,tfidf,nf}\n                        Vectorial weighting method\n  -s, --silent          Disable verbose mode\n  -f, --force           Force re-indexing overwrite cache\n  --no-cache            Disable disk cache\n```\n\n## Exemple d'usage\n\n### Model vectoriel\n\nLes requêtes sont des phrases. Ici on chechre dans la collection CACM.\n\n```\n$ ./searchy.py data/CACM/cacm.all\n```\n```\nLoading data/CACM/cacm.all\nUsing cache 64f76a63\n  documents \t 3204\n  tokens \t 113754\n  terms \t 5961\nmemory: 0.42 mb\n🔍  \u003e Processes and Proofs of Theorems and Programs\n -----\n 3079. An Algorithm for Reasoning About Equality [93.99%]\n -----\n.T\nAn Algorithm for Reasoning About Equality\n.W\nA simple technique for reasoning about equalities\nthat is fast and complete for ground formulas\n...\n -----\n 3140. Social Processes and Proofs of Theorems and Programs [93.87%]\n -----\n.T\nSocial Processes and Proofs of Theorems and Programs\n.W\nIt is argued that formal verifications of\nprograms, no matter how obtained, will not play the\nsame key role in the development of computer science and software\nengineering as proofs do in mathematics.  Furthermore the absence\n...\n\ntotal results: 260     2.94 s\n```\n\nPour charger la collection Stanford de manière rapide, vous pouvez la télécharger et l'extraire dans le dossier `dumps/pa1-data/pa1-data` \npour avoir une structure similaire à \n```\ndumps/pa1-data/pa1-data/0\ndumps/pa1-data/pa1-data/1\n...\ndumps/pa1-data/pa1-data/9\n```\nEt puis charger la avec searchy:\n```\n$ ./searchy.py dumps/pa1-data\n```\n\nSinon on peut utiliser l'url directement comme argument ce qui fera l'opération précédente automatiquement.\n```\n$ ./searchy.py http://web.stanford.edu/class/cs276/pa/pa1-data.zip\n```\n\n### Model booléen\n\nLes requêtes doivent être au format booléen suivant: `(mot1 \u0026 mot2) | ~mot3` \nles opérateurs booléen autorisés sont: `\u0026` (et), `|` (ou), `~` (négation).\n\n```\n$ ./searchy.py -m bool data/CACM/cacm.all\n```\n```\nLoading data/CACM/cacm.all\nUsing cache 64f76a63\n  documents \t 3204\n  tokens \t 113754\n  terms \t 5961\nmemory: 0.42 mb\n🔍  \u003e processes \u0026 Proofs \u0026 theorems \u0026 programs\n -----\n 3140. Social Processes and Proofs of Theorems and Programs [100.00%]\n -----\n.T\nSocial Processes and Proofs of Theorems and Programs\n.W\nIt is argued that formal verifications of\nprograms, no matter how obtained, will not play the\nsame key role in the development of computer science and software\nengineering as proofs do in mathematics.  Furthermore the absence\nof continuity, the inevitability of change, and the complexity of\nspecification of significantly many real programs make the form\nal verification process difficult to justify and manage.  It is felt\nthat ease of formal verification should not dominate program\nlanguage design.\n.K\nFormal mathematics, mathematical proofs,\nprogram verification, program specification\n2.10 4.6 5.24\n\ntotal results: 1     2.96 s\n```\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpvnieo%2Fsearchy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpvnieo%2Fsearchy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpvnieo%2Fsearchy/lists"}