{"id":22166124,"url":"https://github.com/alextanhongpin/spam-api","last_synced_at":"2026-04-21T09:32:55.914Z","repository":{"id":79115258,"uuid":"120277403","full_name":"alextanhongpin/spam-api","owner":"alextanhongpin","description":"Microservices for spam filtering system","archived":false,"fork":false,"pushed_at":"2018-03-23T09:59:42.000Z","size":29342,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-10-07T09:54:29.768Z","etag":null,"topics":["python","scikit-learn"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/alextanhongpin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-02-05T08:40:31.000Z","updated_at":"2020-04-19T21:44:55.000Z","dependencies_parsed_at":null,"dependency_job_id":"f706291c-997f-4ade-a5ab-9abd32599d1f","html_url":"https://github.com/alextanhongpin/spam-api","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/alextanhongpin/spam-api","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alextanhongpin%2Fspam-api","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alextanhongpin%2Fspam-api/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alextanhongpin%2Fspam-api/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alextanhongpin%2Fspam-api/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/alextanhongpin","download_url":"https://codeload.github.com/alextanhongpin/spam-api/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alextanhongpin%2Fspam-api/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32085500,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-21T06:27:27.065Z","status":"ssl_error","status_checked_at":"2026-04-21T06:27:21.250Z","response_time":128,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["python","scikit-learn"],"created_at":"2024-12-02T05:18:07.265Z","updated_at":"2026-04-21T09:32:55.888Z","avatar_url":"https://github.com/alextanhongpin.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Spam API\n\nMicroservices API for spam filtering system.\n\n## Abstract\n\nOne of the goals of this repository is design an approach to design machine learning systems.\n\n## To run \n\n```bash\n$ python -m main\n\n# Note that this will not work since the import will be messed up\n$ python main.py\n```\n\n## Flows\n\n- Prepare text data\n  - removal of stop words\n  - lemmatization\n- Feature extraction process\n- Training the classifiers\n- Checking performance\n\n\n## Pickled\n\nTo view the size of the pickled file:\n```bash\n$ du -h *.pkl\n```\n\n## Tips\n\nAt first it may be tempting to construct your pipeline to include the feature extractor:\n\n```python\npipeline = Pipeline([('vect', CountVectorizer(stop_words = 'english')),\n                      ('tfidf', TfidfTransformer()),\n                      ('gaussian_nb', GaussianNB())])\n```\n\nBut note that this will only be useful when training your model. For prediction, you need to reuse the feature extractor function. Also,\nwhen training multiple classifiers, you will end up running the feature extraction process which is not optimal.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falextanhongpin%2Fspam-api","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falextanhongpin%2Fspam-api","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falextanhongpin%2Fspam-api/lists"}