{"id":15175565,"url":"https://github.com/sfuller14/semantic-consensus","last_synced_at":"2025-06-13T13:09:36.197Z","repository":{"id":175993589,"uuid":"654813249","full_name":"sfuller14/semantic-consensus","owner":"sfuller14","description":"E-commerce intelligent search platform. Pinecone/Devpost Hackathon 2023.","archived":false,"fork":false,"pushed_at":"2023-09-01T01:05:39.000Z","size":26113,"stargazers_count":16,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-07T19:39:19.059Z","etag":null,"topics":["cohere","openai-api","pinecone","recommendation-system","semantic-search-engine","streamlit"],"latest_commit_sha":null,"homepage":"http://ecommerce-recsys.us-east-2.elasticbeanstalk.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sfuller14.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-06-17T03:11:41.000Z","updated_at":"2024-11-05T21:14:27.000Z","dependencies_parsed_at":"2024-12-03T17:57:29.723Z","dependency_job_id":"5e26d1a9-fcca-48cc-8e5d-a811359deb25","html_url":"https://github.com/sfuller14/semantic-consensus","commit_stats":{"total_commits":118,"total_committers":3,"mean_commits":"39.333333333333336","dds":0.5084745762711864,"last_synced_commit":"23eca5172e4e3dd517480714204337c07e6cd645"},"previous_names":["sfuller14/semantic-consensus"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/sfuller14/semantic-consensus","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sfuller14%2Fsemantic-consensus","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sfuller14%2Fsemantic-consensus/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sfuller14%2Fsemantic-consensus/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sfuller14%2Fsemantic-consensus/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sfuller14","download_url":"https://codeload.github.com/sfuller14/semantic-consensus/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sfuller14%2Fsemantic-consensus/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259650956,"owners_count":22890385,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cohere","openai-api","pinecone","recommendation-system","semantic-search-engine","streamlit"],"created_at":"2024-09-27T12:39:29.301Z","updated_at":"2025-06-13T13:09:36.176Z","avatar_url":"https://github.com/sfuller14.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Commercial Consensus\n\nPinecone/Devpost Hackathon June 2023  \n- Try it out: [Commercial Consensus](http://ecommerce-recsys.us-east-2.elasticbeanstalk.com)  (hosted on AWS)\n- [Execution flow diagrams](#execution-flow)\n\n## Demo\n\n![demogif](https://github.com/sfuller14/public_ref/blob/master/recsys.gif)\n\n## The Problem\n\nTraditional implementations of collaborative filtering, content-based filtering, and graph-based recommendation methods rely heavily on structured, tabular data. However, this approach is fraught with limitations due to the widespread missing and inconsistent data inherent to third-party seller platforms:  \n\n\u003cp align=\"center\" style=\"font-size:10px;\"\u003eExample of inconsistent data availability for two products in the same category:\u003c/p\u003e\n\n\u003cp float=\"left\"\u003e\n  \u003cimg src=\"https://github.com/sfuller14/semantic-consensus/assets/54780092/43f1c875-05bd-419f-9bbe-1e005dbad521\" width=\"450\" /\u003e\n  \u003cimg src=\"https://github.com/sfuller14/semantic-consensus/assets/54780092/1d0f548d-4f87-409b-bae3-408c51bcc7a1\" width=\"450\" /\u003e \n\u003c/p\u003e\n\n\u003cp align=\"center\" style=\"font-size:10px;\"\u003eMissing data across our full dataset:\u003c/p\u003e\n\n![image](https://github.com/sfuller14/semantic-consensus/assets/54780092/fd218f4d-5b0a-4acd-93b0-9893f8c6530f)\n\n\u003cp align=\"center\" style=\"font-size:10px;\"\u003eEven when data is available, it is often heterogeneous:\u003c/p\u003e\n\n![Screenshot 2023-06-26 at 9 00 26 AM](https://github.com/sfuller14/semantic-consensus/assets/54780092/863814ae-15af-4595-8ccb-220c89d08d65)\n\n---\n\nThis data quality issue hampers the effectiveness of recommendation systems, thereby reducing platform revenue generation as well as impeding optimal user experience.\n\n## The Solution\n\nCommercial Consensus approaches this problem by harnessing the latent information within customer reviews. By performing vector similarity search on an embedding space reduced by traditional tabular filters, the system presents a basic approach to mitigating the longstanding problem of data quality in e-commerce platforms. Utilizing Pinecone's vector search engine over indexed OpenAI embeddings in coordination with Cohere's reranking endpoint, the platform performs a hybrid (tabular + semantic) search and a conversational interface to tap into the previously inaccessible body of knowledge available in customer reviews. \n\n## Features\n\n### [Enhanced](#technical-appendix) Search\n\n\u003cp align=\"center\" style=\"font-size:10px;\"\u003ePersonalized search results using metadata \u0026 namespace filters + co.rerank()\u003c/p\u003e\n\n![Search Example](https://github.com/sfuller14/semantic-consensus/assets/54780092/27f4c830-c869-4f6b-a859-77fb87b68f6e)\n\u003cp align=\"center\" style=\"font-size:10px;\"\u003eHover over the '?' icon to see the most similar review to your query.\u003c/p\u003e\n\n---\n\n\u003cp align=\"center\" style=\"font-size:10px;\"\u003e'View' page contains detailed product specs and relevant reviews\u003c/p\u003e\n\n![2](https://github.com/sfuller14/semantic-consensus/assets/54780092/2b1af8b9-d7a4-47f7-972f-638fd9ae792a)\n\n---\n### Intelligent Chat Interface\n\n\u003cp align=\"center\" style=\"font-size:10px;\"\u003eAccess aspect-based sentiments from reviews\u003c/p\u003e\n\n![Chat Example](https://github.com/sfuller14/semantic-consensus/assets/54780092/ddf82542-d5cf-4d92-ab88-25e50a8831ff)\n\u003cp align=\"center\" style=\"font-size:10px;\"\u003eCustom pinecone.query() + cohere.rerank() + openai.ChatCompletions() chain.\u003c/p\u003e\n\n---\n\n\u003cp align=\"center\" style=\"font-size:10px;\"\u003eGeneration using both aggregated reviews and product specs.\u003c/p\u003e\n\n![Screenshot 2023-06-25 at 8 53 58 PM](https://github.com/sfuller14/semantic-consensus/assets/54780092/bf75e849-8d9f-4981-bfcf-5c17616869bf)\n\n\n## Appendix\n\n### Execution Flow\n\n1) User enters a query and presses 'Search':\n\n![Screenshot 2023-06-28 at 9 36 34 PM](https://github.com/sfuller14/semantic-consensus/assets/54780092/c118d859-6adc-4bbe-8ff8-bc0745ba356b)\n\n2) User clicks 'View' on a product:\n\n![Screenshot 2023-06-28 at 9 37 57 PM](https://github.com/sfuller14/semantic-consensus/assets/54780092/1f7f62a4-c590-48ab-a707-666200672a06)\n     \n3) User enters a question in the 'Chat' tab:\n\n![Screenshot 2023-06-28 at 9 39 22 PM](https://github.com/sfuller14/semantic-consensus/assets/54780092/70f010d8-26f8-462c-b937-8857d3b3409f)\n\n\n#### Product Title Example\n\n![Screenshot 2023-06-28 at 8 16 23 PM](https://github.com/sfuller14/semantic-consensus/assets/54780092/a76b56a2-097e-4402-bfa7-7639baef65dd)\n\n\nThis is a product of e-commerce sellers optimizing their product titles to facilitate lexical search in the presence of variably-populated data fields. We're able to exploit this practice by including this title in the LLM prompt.\n\n### Re-ranking\n\nAs demonstrated in the diagrams above, the output of each cosine similarity search on the stored ```text-embedding-ada-002```-embedded dataset (i.e., each call to pinecone.query()) is followed by a re-rank. \n\nRe-ranking is a widely-used step in modern search engines. It is generally run on the results of a lighter-weight lexical search (like TF-IDF or BM25) to refine the results. Re-ranking using BERT variants has shown SOTA search status in recent years:\n- [ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT](https://arxiv.org/pdf/2004.12832.pdf)\n\n- [Passage Re-ranking with BERT](https://arxiv.org/pdf/1901.04085.pdf)\n\nCohere recently introduced their [rerank endpoint](https://txt.cohere.com/rerank/): \n\u003cp float=\"center\"\u003e\n  \u003cimg src=\"https://github.com/sfuller14/semantic-consensus/assets/54780092/04641b8a-5745-4fe5-bd04-d18a8db7f353\" width=\"350\" /\u003e\n\u003c/p\u003e\n\n---\n\nWhile ```pinecone.query()``` without re-ranking was often sufficient for simple and well-formed queries, certain query formations (like specific negation expressions) led to undesirable results. Adding re-ranking also generally appeared to show better matching on longer reviews, however in some cases this not necessarily desirable (i.e. re-ranking led to longer reviews being prioritized while a more succinct match would be preferred for display on the home page). __In other cases (specifically during RAG chaining), the longer reviews led to significantly better output.__ More testing is needed here.\n\n__A few examples of using ```pinecone.query()``` alone vs. ```pinecone.query()```+```cohere.rerank()```:__\n\n![Screenshot 2023-06-26 at 9 37 22 PM](https://github.com/sfuller14/semantic-consensus/assets/54780092/3f564654-ff9e-4d95-ae0a-1c187f4d6658)\n\nIn the above, notice that both reviews mentioning BSOD in the re-ranked results go on to say that they resolved it. \n\n![Screenshot 2023-06-26 at 11 08 17 PM](https://github.com/sfuller14/semantic-consensus/assets/54780092/4e209d2a-1749-4312-bd98-f00e757522c0)\n\n---\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsfuller14%2Fsemantic-consensus","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsfuller14%2Fsemantic-consensus","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsfuller14%2Fsemantic-consensus/lists"}