{"id":19066478,"url":"https://github.com/ssandra102/machine-learning-api","last_synced_at":"2026-04-18T01:32:43.329Z","repository":{"id":177582575,"uuid":"660602366","full_name":"ssandra102/Machine-Learning-API","owner":"ssandra102","description":"ML API to perform OCR text extraction on receipt images and push the extracted then classified data to firebase.","archived":false,"fork":false,"pushed_at":"2023-06-30T11:57:30.000Z","size":809,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-22T03:17:05.050Z","etag":null,"topics":["flask","pytesseract-ocr","python","svc-model"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ssandra102.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-06-30T11:35:02.000Z","updated_at":"2023-09-13T15:28:40.000Z","dependencies_parsed_at":null,"dependency_job_id":"738d7daf-4490-4a12-8fc9-98ffb3b7498b","html_url":"https://github.com/ssandra102/Machine-Learning-API","commit_stats":null,"previous_names":["ssandra102/machine-learning-api"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ssandra102/Machine-Learning-API","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ssandra102%2FMachine-Learning-API","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ssandra102%2FMachine-Learning-API/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ssandra102%2FMachine-Learning-API/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ssandra102%2FMachine-Learning-API/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ssandra102","download_url":"https://codeload.github.com/ssandra102/Machine-Learning-API/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ssandra102%2FMachine-Learning-API/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31953515,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-18T00:39:45.007Z","status":"ssl_error","status_checked_at":"2026-04-18T00:39:20.671Z","response_time":62,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["flask","pytesseract-ocr","python","svc-model"],"created_at":"2024-11-09T00:56:55.335Z","updated_at":"2026-04-18T01:32:43.306Z","avatar_url":"https://github.com/ssandra102.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Machine-Learning-API\nFlask API developed using Python Flask, to extract texts from Indian itemized receipts. The extracted text is classified into categories belonging to one of the 21 categories(see 'data' dictinonary in main.py).\u003cbr\u003e\nThe cumulative sum of categorised items along with their respective category is pushed to Firebase Realtime Database. Also, the receipt image is fetched from Firebase Storage. \u003cbr\u003e\n\n## Files\n1. main.py - The API is written in this file. Run the file using the command:\n```\npython main.py\n## or\npython3 main.py\n```\n\u003cbr\u003eThe frontend is a webpage with the text \"Hello World\". \u003cbr\u003e\u003cbr\u003e\n2. Fetch_images.py - contains configuration details of firebase, that you will get by creating a new project in firebase.\u003cbr\u003e\u003cbr\u003e\n3. serviceAccount.json - contains configuration details of Firebase storage database. \u003cbr\u003e\n*note*: replace congiration details in Fetch_images.py and serviceAccount.json files with your own details.\u003cbr\u003e\u003cbr\u003e\n4. requirements.txt - contains the libraries used for the project.\u003cbr\u003e\u003cbr\u003e\n5. SVC_model.pkl - a pickle file used for categorising, receipt items. It is a SVC model, used for multi-text classification with 3 pre-processing steps done on the text. They are coded as a pipeline with the following functions: removing stopwords, porter stemming, and tf-idf vectoriser.\u003cbr\u003e\u003cbr\u003e\n6. Categorization.ipynb - notebook with all the steps used to develop SVM model i.e SVC_model.pkl.\u003cbr\u003e\u003cbr\u003e\n\n## Dataset\nDATA1.csv - 11179 rows with 3 columns of Indian product desccription, sub category, and category.\n\n\n \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fssandra102%2Fmachine-learning-api","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fssandra102%2Fmachine-learning-api","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fssandra102%2Fmachine-learning-api/lists"}