{"id":13713099,"url":"https://github.com/wepe/dive-into-ml-system","last_synced_at":"2025-09-04T12:33:39.999Z","repository":{"id":87469933,"uuid":"136014663","full_name":"wepe/dive-into-ml-system","owner":"wepe","description":"Dive into machine learning system, start from reinventing the wheel.","archived":false,"fork":false,"pushed_at":"2018-06-18T02:55:42.000Z","size":1594,"stargazers_count":231,"open_issues_count":2,"forks_count":28,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-04-30T10:31:52.653Z","etag":null,"topics":["ctypes","eigen","machine-learning-systems","openmp"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/wepe.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-06-04T11:23:45.000Z","updated_at":"2025-04-27T11:38:12.000Z","dependencies_parsed_at":null,"dependency_job_id":"d3165167-bd9b-4c5f-b7b1-555847305cb0","html_url":"https://github.com/wepe/dive-into-ml-system","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/wepe/dive-into-ml-system","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wepe%2Fdive-into-ml-system","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wepe%2Fdive-into-ml-system/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wepe%2Fdive-into-ml-system/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wepe%2Fdive-into-ml-system/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/wepe","download_url":"https://codeload.github.com/wepe/dive-into-ml-system/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wepe%2Fdive-into-ml-system/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273609551,"owners_count":25136595,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-04T02:00:08.968Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ctypes","eigen","machine-learning-systems","openmp"],"created_at":"2024-08-02T23:01:27.449Z","updated_at":"2025-09-04T12:33:39.920Z","avatar_url":"https://github.com/wepe.png","language":"C++","funding_links":[],"categories":["Table of Contents"],"sub_categories":[],"readme":"## RTFSC\n\n两年前甚至是三年前开始，你会发现越来越多的人转行做算法，业界也很给力，源源不断地发布各种数据挖掘类型的比赛，成为很多人从入门到实践的第一课．如果说学术的作用是推动算法创新，那么此类比赛的作用便是创新落地，以及检验那些在实践中真正work的东西．然而，实际上我发现这样的比赛很少，大多数赛题其实只是官方宣传自己的一种手段，题目类型非常陈旧，以致于参赛选手只需要`import xgboost as xgb`就行了．从我过去一两年的参赛经验来看，`import xgboost as xgb`的确是很有效的，而从头改算法造轮子最终都是劳而无功或者收效甚微．如果你赞同这个观点的话，右上角`star`一下．\n\n唠叨这些跟这个repo有什么关系？　关注我的人里面，有不少是因为看到我以前的一些参赛代码，可能大部分都是在校生，可能现在正在参加某个比赛．我想给一些小小的个人建议，不要日复一日地重复`import xgboost as xgb`或者`import lightgbm as lgb`，做一些门槛更高的东西，比如学术里的前沿算法，比如工程上机器学习系统的高效实现．机器学习涉及到的领域很多很多，你我还需要不断学习，就不要重复地去写`import xxx`了．\n\n这个小项目最初是为了在毕业离校前做一个简单的组内分享，科普一下机器学习算法包的实现流程．现在打算开源，对很多入门的朋友或许有帮助，但因为懒没有写出完整文档，感兴趣的朋友只能将就读代码了，相信Linus，代码是最好的文档.\n\n## 机器学习算法的底层实现与高层调用\n\n以最简单的机器学习算法逻辑回归为例，介绍底层C++实现，以及高层Python调用，掌握ctypes基本用法．\n\n## 源码说明\n\n- `src/`, c++实现逻辑回归，主要源码是`lr.cc`与`utils.cc`．`python_wrapper.cc`实现了一些辅助函数，暴露C风格接口给python\n- `python-package`，通过`ctypes`实现python调用C函数，`lr/model.py`封装了相关函数，`example.py`是具体的实例\n\n## 依赖\n\n- Eigen\n\n### 使用方法\n\n- 编译得到动态链接库`liblr.so`\n\n```\ng++ -fPIC -shared -fopenmp -o liblr.so python_wrapper.cc lr.cc utils.cc\n```\n\n- 复制到相应文件夹下，`cp liblr.so python-package/lr/`\n\n- 运行　`python example.py`\n\n```python\n\nfrom lr import model\nimport numpy as np\n\n# custom metric function, mean accuracy\ndef mean_accuracy(label,pred,size):\n    num_pos,hit_pos = 0.0,0.0\n    num_neg,hit_neg = 0.0,0.0\n    for i in range(size):\n        if label[i]==1.0:\n            num_pos += 1.0\n            if pred[i]\u003e0.5:\n                hit_pos += 1.0\n\n        if label[i]==0.0:\n            num_neg += 1.0\n            if pred[i]\u003c=0.5:\n                hit_neg += 1.0\n    print \"pos-accracy:{0:.5f},neg-accuracy:{1:.5f}\".format(hit_pos/num_pos,hit_neg/num_neg)\n    return 0.5*hit_pos/num_pos + 0.5*hit_neg/num_neg\n\n\nfeatures = np.load('features.dat')\nlabels = np.load('labels.dat')\nprint features.shape,labels.shape\n\nclf = model(max_iter=1000,alpha=0.01,l2_lambda=0.5,tolerance=0.01)\nclf.fit(features,labels,batch_size=1024,early_stopping_round=100,metric=mean_accuracy)\nprint clf.predict(features[:30])\n\nclf.save(\"/home/wepon/lr.model\")\nclf1 = model()\nclf1.load(\"/home/wepon/lr.model\")\nprint clf1.predict(features[:30])\n\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwepe%2Fdive-into-ml-system","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwepe%2Fdive-into-ml-system","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwepe%2Fdive-into-ml-system/lists"}