{"id":21995687,"url":"https://github.com/tohtsky/myfm","last_synced_at":"2025-07-23T11:39:10.115Z","repository":{"id":39652294,"uuid":"229400350","full_name":"tohtsky/myFM","owner":"tohtsky","description":"A Python/C++ implementation of Bayesian Factorization Machines","archived":false,"fork":false,"pushed_at":"2025-07-11T12:43:20.000Z","size":1509,"stargazers_count":55,"open_issues_count":2,"forks_count":14,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-07-11T12:43:24.444Z","etag":null,"topics":["bayesian-inference","collaborative-filtering","factorization-machine","factorization-machines","gibbs-sampler","gibbs-sampling-algorithm","ordinal-regression","regression-models"],"latest_commit_sha":null,"homepage":"https://myfm.readthedocs.io","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tohtsky.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-12-21T08:40:28.000Z","updated_at":"2025-02-11T03:28:11.000Z","dependencies_parsed_at":"2025-01-10T13:34:06.076Z","dependency_job_id":"b51fd371-d564-4c3b-a536-21aff7515e94","html_url":"https://github.com/tohtsky/myFM","commit_stats":{"total_commits":144,"total_committers":2,"mean_commits":72.0,"dds":0.01388888888888884,"last_synced_commit":"b9ba70ea38d9370d3ad50a9d25b2ff825eaa30ef"},"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"purl":"pkg:github/tohtsky/myFM","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tohtsky%2FmyFM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tohtsky%2FmyFM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tohtsky%2FmyFM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tohtsky%2FmyFM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tohtsky","download_url":"https://codeload.github.com/tohtsky/myFM/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tohtsky%2FmyFM/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266671047,"owners_count":23966096,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-23T02:00:09.312Z","response_time":66,"last_error":null,"robots_txt_status":null,"robots_txt_updated_at":null,"robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bayesian-inference","collaborative-filtering","factorization-machine","factorization-machines","gibbs-sampler","gibbs-sampling-algorithm","ordinal-regression","regression-models"],"created_at":"2024-11-29T21:18:09.574Z","updated_at":"2025-07-23T11:39:10.103Z","avatar_url":"https://github.com/tohtsky.png","language":"C++","readme":"# myFM\n[![Python](https://img.shields.io/badge/python-3.7%20%7C%203.8%20%7C%203.9%20%7C%203.10-blue)](https://www.python.org)\n[![pypi](https://img.shields.io/pypi/v/myfm.svg)](https://pypi.python.org/pypi/myfm)\n[![GitHub license](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/tohtsky/myFM)\n[![Build](https://github.com/tohtsky/myFM/workflows/Build%20wheel/badge.svg?branch=main)](https://github.com/tohtsky/myfm)\n[![Read the Docs](https://readthedocs.org/projects/myfm/badge/?version=stable)](https://myfm.readthedocs.io/en/stable/)\n[![codecov](https://codecov.io/gh/tohtsky/myfm/branch/main/graph/badge.svg?token=kLgOKTQqcV)](https://codecov.io/gh/tohtsky/myfm)\n\n\nmyFM is an implementation of Bayesian [Factorization Machines](https://ieeexplore.ieee.org/abstract/document/5694074/) based on Gibbs sampling, which I believe is a wheel worth reinventing.\n\nCurrently this supports most options for libFM MCMC engine, such as\n\n- Grouping of input variables (`-meta` option of [libFM](https://github.com/srendle/libfm))\n- Relation Data format (See the paper [\"Scaling Factorization Machines to relational data\"](https://dl.acm.org/citation.cfm?id=2488340))\n\nThere are also functionalities not present in libFM:\n\n- The gibbs sampler for Ordered probit regression [5] implementing Metropolis-within-Gibbs scheme of [6].\n- Variational inference for regression and binary classification.\n\nTutorial and reference doc is provided at https://myfm.readthedocs.io/en/latest/.\n\n# Installation\n\nThe package is pip-installable.\n\n```\npip install myfm\n```\n\nThere are binaries for major operating systems.\n\nIf you are working with less popular OS/architecture, pip will attempt to build myFM from the source (you need a decent C++ compiler!). In that case, in addition to installing python dependencies (`numpy`, `scipy`, `pandas`, ...), the above command will automatically download eigen (ver 3.4.0) to its build directory and use it during the build.\n\n# Examples\n\n## A Toy example\n\nThis example is taken from [pyfm](https://github.com/coreylynch/pyFM) with some modification.\n\n```python\nimport myfm\nfrom sklearn.feature_extraction import DictVectorizer\nimport numpy as np\ntrain = [\n\t{\"user\": \"1\", \"item\": \"5\", \"age\": 19},\n\t{\"user\": \"2\", \"item\": \"43\", \"age\": 33},\n\t{\"user\": \"3\", \"item\": \"20\", \"age\": 55},\n\t{\"user\": \"4\", \"item\": \"10\", \"age\": 20},\n]\nv = DictVectorizer()\nX = v.fit_transform(train)\nprint(X.toarray())\n# print\n# [[ 19.   0.   0.   0.   1.   1.   0.   0.   0.]\n#  [ 33.   0.   0.   1.   0.   0.   1.   0.   0.]\n#  [ 55.   0.   1.   0.   0.   0.   0.   1.   0.]\n#  [ 20.   1.   0.   0.   0.   0.   0.   0.   1.]]\ny = np.asarray([0, 1, 1, 0])\nfm = myfm.MyFMClassifier(rank=4)\nfm.fit(X,y)\nfm.predict(v.transform({\"user\": \"1\", \"item\": \"10\", \"age\": 24}))\n```\n\n## A Movielens-100k Example\n\nThis example will require `pandas` and `scikit-learn`. `movielens100k_loader` is present in `examples/movielens100k_loader.py`.\n\nYou will be able to obtain a result comparable to SOTA algorithms like GC-MC. See `examples/ml-100k.ipynb` for the detailed version.\n\n```python\nimport numpy as np\nfrom sklearn.preprocessing import OneHotEncoder\nfrom sklearn import metrics\n\nimport myfm\nfrom myfm.utils.benchmark_data import MovieLens100kDataManager\n\ndata_manager = MovieLens100kDataManager()\ndf_train, df_test = data_manager.load_rating_predefined_split(\n    fold=3\n)  # Note the dependence on the fold\n\ndef test_myfm(df_train, df_test, rank=8, grouping=None, n_iter=100, samples=95):\n    explanation_columns = [\"user_id\", \"movie_id\"]\n    ohe = OneHotEncoder(handle_unknown=\"ignore\")\n    X_train = ohe.fit_transform(df_train[explanation_columns])\n    X_test = ohe.transform(df_test[explanation_columns])\n    y_train = df_train.rating.values\n    y_test = df_test.rating.values\n    fm = myfm.MyFMRegressor(rank=rank, random_seed=114514)\n\n    if grouping:\n        # specify how columns of X_train are grouped\n        group_shapes = [len(category) for category in ohe.categories_]\n        assert sum(group_shapes) == X_train.shape[1]\n    else:\n        group_shapes = None\n\n    fm.fit(\n        X_train,\n        y_train,\n        group_shapes=group_shapes,\n        n_iter=n_iter,\n        n_kept_samples=samples,\n    )\n    prediction = fm.predict(X_test)\n    rmse = ((y_test - prediction) ** 2).mean() ** 0.5\n    mae = np.abs(y_test - prediction).mean()\n    print(\"rmse={rmse}, mae={mae}\".format(rmse=rmse, mae=mae))\n    return fm\n\n\n# basic regression\ntest_myfm(df_train, df_test, rank=8)\n# rmse=0.90321, mae=0.71164\n\n# with grouping\nfm = test_myfm(df_train, df_test, rank=8, grouping=True)\n# rmse=0.89594, mae=0.70481\n```\n\n## Examples for Relational Data format\n\nBelow is a toy movielens-like example that utilizes relational data format proposed in [3].\n\nThis example, however, is too simplistic to exhibit the computational advantage of this data format. For an example with drastically reduced computational complexity, see `examples/ml-100k-extended.ipynb`;\n\n```python\nimport pandas as pd\nimport numpy as np\nfrom myfm import MyFMRegressor, RelationBlock\nfrom sklearn.preprocessing import OneHotEncoder\n\nusers = pd.DataFrame([\n    {'user_id': 1, 'age': '20s', 'married': False},\n    {'user_id': 2, 'age': '30s', 'married': False},\n    {'user_id': 3, 'age': '40s', 'married': True}\n]).set_index('user_id')\n\nmovies = pd.DataFrame([\n    {'movie_id': 1, 'comedy': True, 'action': False },\n    {'movie_id': 2, 'comedy': False, 'action': True },\n    {'movie_id': 3, 'comedy': True, 'action': True}\n]).set_index('movie_id')\n\nratings = pd.DataFrame([\n    {'user_id': 1, 'movie_id': 1, 'rating': 2},\n    {'user_id': 1, 'movie_id': 2, 'rating': 5},\n    {'user_id': 2, 'movie_id': 2, 'rating': 4},\n    {'user_id': 2, 'movie_id': 3, 'rating': 3},\n    {'user_id': 3, 'movie_id': 3, 'rating': 3},\n])\n\nuser_ids, user_indices = np.unique(ratings.user_id, return_inverse=True)\nmovie_ids, movie_indices = np.unique(ratings.movie_id, return_inverse=True)\n\nuser_ohe = OneHotEncoder(handle_unknown='ignore').fit(users.reset_index()) # include user id as feature\nmovie_ohe = OneHotEncoder(handle_unknown='ignore').fit(movies.reset_index())\n\nX_user = user_ohe.transform(\n    users.reindex(user_ids).reset_index()\n)\nX_movie = movie_ohe.transform(\n    movies.reindex(movie_ids).reset_index()\n)\n\nblock_user = RelationBlock(user_indices, X_user)\nblock_movie = RelationBlock(movie_indices, X_movie)\n\nfm = MyFMRegressor(rank=2).fit(None, ratings.rating.values, X_rel=[block_user, block_movie])\n\nprediction_df = pd.DataFrame([\n    dict(user_id=user_id,movie_id=movie_id,\n         user_index=user_index, movie_index=movie_index)\n    for user_index, user_id in enumerate(user_ids)\n    for movie_index, movie_id in enumerate(movie_ids)\n])\npredicted_rating = fm.predict(None, [\n    RelationBlock(prediction_df.user_index, X_user),\n    RelationBlock(prediction_df.movie_index, X_movie)\n])\n\nprediction_df['prediction']  = predicted_rating\n\nprint(\n    prediction_df.merge(ratings.rename(columns={'rating':'ground_truth'}), how='left')\n)\n```\n\n# References\n\n1. Rendle, Steffen. \"Factorization machines.\" 2010 IEEE International Conference on Data Mining. IEEE, 2010.\n1. Rendle, Steffen. \"Factorization machines with libfm.\" ACM Transactions on Intelligent Systems and Technology (TIST) 3.3 (2012): 57.\n1. Rendle, Steffen. \"Scaling factorization machines to relational data.\" Proceedings of the VLDB Endowment. Vol. 6. No. 5. VLDB Endowment, 2013.\n1. Bayer, Immanuel. \"fastfm: A library for factorization machines.\" arXiv preprint arXiv:1505.00641 (2015).\n1. Albert, James H., and Siddhartha Chib. \"Bayesian analysis of binary and polychotomous response data.\" Journal of the American statistical Association 88.422 (1993): 669-679.\n1. Albert, James H., and Siddhartha Chib. \"Sequential ordinal modeling with applications to survival data.\" Biometrics 57.3 (2001): 829-836.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftohtsky%2Fmyfm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftohtsky%2Fmyfm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftohtsky%2Fmyfm/lists"}