{"id":20170560,"url":"https://github.com/chenglongchen/tensorflow-xnn","last_synced_at":"2025-04-03T03:09:28.005Z","repository":{"id":78924753,"uuid":"122476882","full_name":"ChenglongChen/tensorflow-XNN","owner":"ChenglongChen","description":"4th Place Solution for Mercari Price Suggestion Competition on Kaggle using DeepFM variant.","archived":false,"fork":false,"pushed_at":"2018-07-24T14:05:05.000Z","size":1118,"stargazers_count":283,"open_issues_count":0,"forks_count":76,"subscribers_count":12,"default_branch":"master","last_synced_at":"2025-03-24T08:34:47.757Z","etag":null,"topics":["ctr","ctr-prediction","deep-ctr","deepfm","factorization-machines","fm","kaggle-competition","snapshot-ensemble"],"latest_commit_sha":null,"homepage":"https://www.kaggle.com/c/mercari-price-suggestion-challenge","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ChenglongChen.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-02-22T12:41:33.000Z","updated_at":"2025-03-15T06:20:41.000Z","dependencies_parsed_at":"2023-08-20T16:47:33.198Z","dependency_job_id":null,"html_url":"https://github.com/ChenglongChen/tensorflow-XNN","commit_stats":{"total_commits":10,"total_committers":1,"mean_commits":10.0,"dds":0.0,"last_synced_commit":"6534a832f5b4461cbdf1ebbdf5620a0cd80f0aa9"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ChenglongChen%2Ftensorflow-XNN","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ChenglongChen%2Ftensorflow-XNN/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ChenglongChen%2Ftensorflow-XNN/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ChenglongChen%2Ftensorflow-XNN/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ChenglongChen","download_url":"https://codeload.github.com/ChenglongChen/tensorflow-XNN/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246927835,"owners_count":20856198,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ctr","ctr-prediction","deep-ctr","deepfm","factorization-machines","fm","kaggle-competition","snapshot-ensemble"],"created_at":"2024-11-14T01:19:38.868Z","updated_at":"2025-04-03T03:09:27.986Z","avatar_url":"https://github.com/ChenglongChen.png","language":"Python","readme":"# tensorflow-XNN\n\n4th Place Solution for [Mercari Price Suggestion Challenge on Kaggle](https://www.kaggle.com/c/mercari-price-suggestion-challenge)\n\n## The Challenge\nBuild a model to suggest the price of product on Mercari. The model is required to train (including all the preprocessing, feature extraction and model training steps) and inference within 1 hour, using only 4 cores cpu, 16GB RAM, 1GB disk. Data include unstructured text (product title \u0026 description) and structured ones, e.g., product category and shipping flag etc.\n\n## Summary\nHighlights of our method are as follows:\n\n* very minimum preprocessing with focus on end-to-end learning with multi-field inputs, e.g., textual and categorical;\n* hybrid NN consists of four major compoments, i.e., embed, encode, attend and predict. FastText and NN based FM are used as building block;\n* purely bagging of NNs of the same architecture via snapshot ensemble;\n* efficiency is achieved via various approaches, e.g., lazynadam optimization, fasttext encoding and average pooling, snapshot ensemble, etc.\n\n### Model Architecture\n![fig/architecture.png](fig/architecture.png)\n\nPlease find the slide of our solution [here](./doc/Mercari_Price_Suggesion_Competition_ChenglongChen_4th_Place.pdf).\n\n## About this project\nThis is the 4th text mining competition I have attend on Kaggle. The other three are:\n\n* [CrowdFlower Search Results Relevance Competition](https://www.kaggle.com/c/crowdflower-search-relevance), 1st Place\n* [Home Depot Product Search Relevance Competition](https://www.kaggle.com/c/home-depot-product-search-relevance), 3rd Place\n* [The Hunt for Prohibited Content Competition](http://www.kaggle.com/c/avito-prohibited-content), 4th Place\n\nIn these previous competitions, I took the general ML based methods, i.e., data cleaning, feature engineering (see the solutions of [CrowdFlower](https://github.com/ChenglongChen/Kaggle_CrowdFlower) and [HomeDepot](https://github.com/ChenglongChen/Kaggle_HomeDepot) for how many features have been engineered), VW/XGBoost training, and massive ensembling. \n\nSince I have been working on CTR \u0026 KBQA based on deeplearning and embedding models for some time, I decided to give this competition a shot. With data of this competition, I have experimented with various ideas such as NN based FM and snapshot ensemble.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchenglongchen%2Ftensorflow-xnn","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchenglongchen%2Ftensorflow-xnn","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchenglongchen%2Ftensorflow-xnn/lists"}