{"id":18786510,"url":"https://github.com/tetsuok/arowpp","last_synced_at":"2025-04-13T13:11:46.051Z","repository":{"id":24675222,"uuid":"28085896","full_name":"tetsuok/arowpp","owner":"tetsuok","description":"AROW++ An implementation of the efficient confidence-weighted classifier","archived":false,"fork":false,"pushed_at":"2021-01-09T03:01:34.000Z","size":569,"stargazers_count":11,"open_issues_count":2,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-10T04:55:12.949Z","etag":null,"topics":["cpp","machinelearning"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tetsuok.png","metadata":{"files":{"readme":"README.md","changelog":"ChangeLog","contributing":null,"funding":null,"license":"COPYING","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-12-16T12:24:34.000Z","updated_at":"2021-01-09T15:19:36.000Z","dependencies_parsed_at":"2022-08-23T05:01:12.302Z","dependency_job_id":null,"html_url":"https://github.com/tetsuok/arowpp","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tetsuok%2Farowpp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tetsuok%2Farowpp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tetsuok%2Farowpp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tetsuok%2Farowpp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tetsuok","download_url":"https://codeload.github.com/tetsuok/arowpp/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248717238,"owners_count":21150389,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cpp","machinelearning"],"created_at":"2024-11-07T20:51:46.672Z","updated_at":"2025-04-13T13:11:46.020Z","avatar_url":"https://github.com/tetsuok.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"arowpp\n======\n\nIntroduction\n------------\n\narowpp (AROW++) is an simple and efficient implementation of Adaptive\nRegularization of Weights online learning algorithm for binary\nclassification. AROW is efficient for learning tasks such as natural\nlanguage processing tasks that the data is high-dimensional, extremely\nsparse. AROW is an extension of confidence weighted (CW) [Dredze+\n2008] algorithm that can achieve the good performance with a few\niterations.\n\nFeatures\n--------\n\n- Written in C++ with STL\n- Robustness in the case of non-separable data\n- Uses sparse vector representation\n- Can handle several hundred-thousands of training examples and feature dimension\n\n\nBuild Instructions\n------------------\n\n#### Software Requirements\n\nWe have tested our code on Ubuntu Linux 10.04 (x86_64) and OS X 10.7.3 with the following packaged installed.\n\n- GNU C++ compiler (developed with g++ 4.4.3, Apple's g++ 4.2.1) and Apple's clang 3.0.\n- [Bazel](http://bazel.io/) for building library and binaries.\n- Google C++ Testing Framework (Optional. This requires only for unit tests.)\n\nInstallation\n------------\n\n    $ git clone https://github.com/tetsuok/arowpp.git\n    $ cd arowpp\n    $ bazel build //:arow_learn //:arow_test\n\nNote that If you want to run unit tests, run `bazel test //:arow_unittest`.\n\nUsage\n-----\n\n#### Data format\n\nAROW++ accepts the same representation of training data as [SVMlight](http://svmlight.joachims.org/)\nuses. This format has potential to handle large sparse feature\nvectors. The format of training and test data file is:\n\n(BNF-like representation)\n\n    \u003cclass\u003e .=. +1 | -1\n    \u003cfeature\u003e .=. integer (\u003e=1)\n    \u003cvalue\u003e .=. real\n    \u003cline\u003e .=. \u003cclass\u003e \u003cfeature\u003e:\u003cvalue\u003e\u003cfeature\u003e:\u003cvalue\u003e ... \u003cfeature\u003e:\u003cvalue\u003e\n\nHere’s an example of such a file:\n\n    +1 201:1 3148:1 3983:1 4882:1\n    -1 874:1 3652:1 3963:1 6179:1\n    -1 1331:1 3084:1 3957:1 4514:1\n    -1 643:1 1870:1 3957:1 4367:1\n\n\nTraining\n--------\n\nUse `arow_learn` command.\n\n    $ arow_learn -i int -r float -s train_file model\n\nwhere `train_file` is the training data you need to prepare in\nadvance. `arow_learn` will generates the trained model file in` model`.\n\nThere are 3 major parameters to control the training condition:\n\n- -i: Number of iteration at training. Default setting is 1, but\n       the AROW algorithm can achieve the good performance with a few\n       iteration.\n\n- -r: Regularization parameter (Default 0.1). You can optimize\n       this parameter depending on the data.\n\n- -s: Shuffle training examples if this option is set (Default not\n       shuffle). The AROW algorithm depends on the order of training\n       data.\n\n`arow_learn` outputs the following information:\n\n    $ arow_learn train1 model1\n    Number of features: 1355191\n    Number of examples: 15000\n    Number of updates:  9052\n    Done.\n    time: 3.778 sec.\n\n\nTesting\n-------\n\nUse `arow_test` command.\n\n    $ arow_test test_file model\n\nwhere `test_file` is the test data, and `model` is the trained model\nfile `arow_learn` generated.\n\nHere is a typical output of `arow_test`:\n\n    $ arow_test test1 model1\n    Accuracy 96.537% (4823/4996)\n    (Answer, Predict): (t,p):2480 (t,n):80 (f,p):93 (f,n):2343\n    time: 1.097 sec.\n\n\nReferences\n----------\n\n- [Crammer+ 2009] K. Crammer, A. Kulesza, and M. Dredze, Adaptive Regularization of Weight Vectors. In Advances in Neural Information Processing Systems (NIPS), 2009.\n- [Dredze+ 2008] M. Dredze, K. Crammer, and F. Pereira, Conﬁdence-weighted linear classiﬁcation. In Proc. of the 25th international conference on Machine Learning (ICML), 2008.\n\n\nAcknowledgements\n----------------\n\nThis program is originally based on the implementation in Java written\nby Masashi Tsubosaka. Thanks for Daisuke Okanohara for his\n[oll](https://code.google.com/p/oll/) tool that has been a good\nreference for the development of AROW++. I would also like to thank\nTaku Kudo for learning how to design C/C++ APIs from his software:\n[MeCab](https://code.google.com/p/mecab/) and\n[zinnia](http://zinnia.sourceforge.net/).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftetsuok%2Farowpp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftetsuok%2Farowpp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftetsuok%2Farowpp/lists"}