{"id":13418479,"url":"https://github.com/meta-toolkit/meta","last_synced_at":"2025-03-15T03:31:20.220Z","repository":{"id":13771377,"uuid":"16466317","full_name":"meta-toolkit/meta","owner":"meta-toolkit","description":"A Modern C++ Data Sciences Toolkit","archived":false,"fork":false,"pushed_at":"2023-04-17T08:37:52.000Z","size":31854,"stargazers_count":689,"open_issues_count":55,"forks_count":233,"subscribers_count":62,"default_branch":"master","last_synced_at":"2024-07-31T22:43:08.906Z","etag":null,"topics":["c-plus-plus","graph-algorithms","inverted-index","language-modeling","nlp","nlp-parsing","pos-tag","search-engine","text-analysis","text-analytics","text-classification","word-embeddings"],"latest_commit_sha":null,"homepage":"https://meta-toolkit.org","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/meta-toolkit.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.mit","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2014-02-02T23:54:34.000Z","updated_at":"2024-07-19T17:42:02.000Z","dependencies_parsed_at":"2022-09-23T15:20:52.303Z","dependency_job_id":"c226fa6f-7d7a-49e8-a71b-61b11330356f","html_url":"https://github.com/meta-toolkit/meta","commit_stats":null,"previous_names":[],"tags_count":23,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/meta-toolkit%2Fmeta","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/meta-toolkit%2Fmeta/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/meta-toolkit%2Fmeta/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/meta-toolkit%2Fmeta/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/meta-toolkit","download_url":"https://codeload.github.com/meta-toolkit/meta/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243681024,"owners_count":20330152,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c-plus-plus","graph-algorithms","inverted-index","language-modeling","nlp","nlp-parsing","pos-tag","search-engine","text-analysis","text-analytics","text-classification","word-embeddings"],"created_at":"2024-07-30T22:01:02.750Z","updated_at":"2025-03-15T03:31:20.214Z","avatar_url":"https://github.com/meta-toolkit.png","language":"C++","readme":"# MeTA: ModErn Text Analysis\n\nPlease visit our [web page][meta-website] for information and tutorials\nabout MeTA!\n\n### Build Status (by branch)\n- master: [![Build\n  Status](https://travis-ci.org/meta-toolkit/meta.svg?branch=master)](https://travis-ci.org/meta-toolkit/meta)\n  [![Windows Build\n  Status](https://ci.appveyor.com//api/projects/status/github/meta-toolkit/meta?svg=true\u0026branch=master)](https://ci.appveyor.com/project/skystrife/meta)\n- develop: [![Build\n  Status](https://travis-ci.org/meta-toolkit/meta.svg?branch=develop)](https://travis-ci.org/meta-toolkit/meta)\n  [![Windows Build\n  Status](https://ci.appveyor.com//api/projects/status/github/meta-toolkit/meta?svg=true\u0026branch=develop)](https://ci.appveyor.com/project/skystrife/meta)\n\n# Outline\n- [Intro](#intro)\n    - [Documentation](#documentation)\n    - [Tutorials](#tutorials)\n    - [Citing](#citing)\n- [Project Setup](#project-setup)\n    - [Mac OS X](#mac-os-x-build-guide)\n    - [Ubuntu](#ubuntu-build-guide)\n    - [Arch Linux](#arch-linux-build-guide)\n    - [Fedora](#fedora-build-guide)\n    - [CentOS](#centos-build-guide)\n    - [EWS/EngrIT](#ewsengrit-build-guide) (this is UIUC-specific)\n    - [Windows](#windows-build-guide)\n    - [Generic Setup Notes](#generic-setup-notes)\n\n# Intro\n\nMeTA is a modern C++ data sciences toolkit featuring\n\n - text tokenization, including deep semantic features like parse trees\n - inverted and forward indexes with compression and various caching strategies\n - a collection of ranking functions for searching the indexes\n - topic models\n - classification algorithms\n - graph algorithms\n - language models\n - CRF implementation (POS-tagging, shallow parsing)\n - wrappers for liblinear and libsvm (including libsvm dataset parsers)\n - UTF8 support for analysis on various languages\n - multithreaded algorithms\n\n## Documentation\n\nDoxygen documentation can be found [here][doxygen].\n\n## Tutorials\n\nWe have walkthroughs for a few different parts of MeTA on the\n[MeTA homepage][meta-website].\n\n## Citing\n\nIf you used MeTA in your research, we would greatly appreciate a citation for\nour ACL demo paper:\n\n```latex\n@InProceedings{meta-toolkit,\n  author    = {Massung, Sean and Geigle, Chase and Zhai, Cheng{X}iang},\n  title     = {{MeTA: A Unified Toolkit for Text Retrieval and Analysis}},\n  booktitle = {Proceedings of ACL-2016 System Demonstrations},\n  month     = {August},\n  year      = {2016},\n  address   = {Berlin, Germany},\n  publisher = {Association for Computational Linguistics},\n  pages     = {91--96},\n  url       = {http://anthology.aclweb.org/P16-4016}\n}\n```\n\n# Project setup\n\n## Mac OS X Build Guide\nMac OS X 10.6 or higher is required. You may have success with 10.5, but\nthis is not tested.\n\nYou will need to have [homebrew][homebrew] installed, as well as the\nCommand Line Tools for Xcode (homebrew requires these as well, and it will\nprompt for them during install, or you can install them with `xcode-select\n--install` on recent versions of OS X).\n\nOnce you have homebrew installed, run the following commands to get the\ndependencies for MeTA:\n\n```bash\nbrew update\nbrew install cmake jemalloc lzlib icu4c\n```\n\nTo get started, run the following commands:\n\n```bash\n# clone the project\ngit clone https://github.com/meta-toolkit/meta.git\ncd meta/\n\n# set up submodules\ngit submodule update --init --recursive\n\n# set up a build directory\nmkdir build\ncd build\ncp ../config.toml .\n\n# configure and build the project\nCXX=clang++ cmake ../ -DCMAKE_BUILD_TYPE=Release -DICU_ROOT=/usr/local/opt/icu4c\nmake\n```\n\nYou can now test the system by running the following command:\n\n```bash\n./unit-test --reporter=spec\n```\n\nIf everything passes, congratulations! MeTA seems to be working on your\nsystem.\n\n## Ubuntu Build Guide\n\nThe directions here depend greatly on your installed version of Ubuntu. To\ncheck what version you are on, run the following command:\n\n```bash\ncat /etc/issue\n```\n\nBased on what you see, you should proceed with one of the following guides:\n\n- [Ubuntu 12.04 LTS Build Guide](#ubuntu-1204-lts-build-guide)\n- [Ubuntu 14.04 LTS Build Guide](#ubuntu-1404-lts-build-guide)\n- [Ubuntu 15.10 Build Guide](#ubuntu-1510-build-guide)\n\nIf your version is less than 12.04 LTS, your operating system is not\nsupported (even by your vendor!) and you should upgrade to at least 12.04\nLTS (or 14.04 LTS, if possible).\n\n### Ubuntu 12.04 LTS Build Guide\nBuilding on Ubuntu 12.04 LTS requires more work than its more up-to-date\n14.04 sister, but it can be done relatively easily. You will, however, need\nto install a newer C++ compiler from a ppa, and switch to it in order to\nbuild meta. We will also need to install a newer CMake version than is\nnatively available.\n\nStart by running the following commands to get the dependencies that we\nwill need for building MeTA.\n\n```bash\n# this might take a while\nsudo apt-get update\nsudo apt-get install python-software-properties\n\n# add the ppa that contains an updated g++\nsudo add-apt-repository ppa:ubuntu-toolchain-r/test\nsudo apt-get update\n\n# this will probably take a while\nsudo apt-get install g++ g++-4.8 git make wget libjemalloc-dev zlib1g-dev\n\nwget http://www.cmake.org/files/v3.2/cmake-3.2.0-Linux-x86_64.sh\nsudo sh cmake-3.2.0-Linux-x86_64.sh --prefix=/usr/local\n```\n\nDuring CMake installation, you should agree to the license and then say \"n\"\nto including the subdirectory. You should be able to run the following\ncommands and see the following output:\n\n```bash\ng++-4.8 --version\n```\n\nshould print\n\n    g++-4.8 (Ubuntu 4.8.1-2ubuntu1~12.04) 4.8.1\n    Copyright (C) 2013 Free Software Foundation, Inc.\n    This is free software; see the source for copying conditions.  There is NO\n    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.\n\nand\n\n```bash\n/usr/local/bin/cmake --version\n```\n\nshould print\n\n    cmake version 3.2.0\n\n    CMake suite maintained and supported by Kitware (kitware.com/cmake).\n\nOnce the dependencies are all installed, you should be ready to build. Run\nthe following commands to get started:\n\n```bash\n# clone the project\ngit clone https://github.com/meta-toolkit/meta.git\ncd meta/\n\n# set up submodules\ngit submodule update --init --recursive\n\n# set up a build directory\nmkdir build\ncd build\ncp ../config.toml .\n\n# configure and build the project\nCXX=g++-4.8 /usr/local/bin/cmake ../ -DCMAKE_BUILD_TYPE=Release\nmake\n```\n\nYou can now test the system by running the following command:\n\n```bash\n./unit-test --reporter=spec\n```\n\nIf everything passes, congratulations! MeTA seems to be working on your\nsystem.\n\n### Ubuntu 14.04 LTS Build Guide\nUbuntu 14.04 has a recent enough GCC for building MeTA, but we'll need to\nadd a ppa for a more recent version of CMake.\n\nStart by running the following commands to install the dependencies for\nMeTA.\n\n```bash\n# this might take a while\nsudo apt-get update\nsudo apt-get install software-properties-common\n\n# add the ppa for cmake\nsudo add-apt-repository ppa:george-edison55/cmake-3.x\nsudo apt-get update\n\n# install dependencies\nsudo apt-get install g++ cmake libicu-dev git libjemalloc-dev zlib1g-dev\n```\n\nOnce the dependencies are all installed, you should double check your\nversions by running the following commands.\n\n```bash\ng++ --version\n```\n\nshould output\n\n    g++ (Ubuntu 4.8.2-19ubuntu1) 4.8.2\n    Copyright (C) 2013 Free Software Foundation, Inc.\n    This is free software; see the source for copying conditions.  There is NO\n    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.\n\nand\n\n```bash\ncmake --version\n```\n\nshould output\n\n    cmake version 3.2.2\n\n    CMake suite maintained and supported by Kitware (kitware.com/cmake).\n\nOnce the dependencies are all installed, you should be ready to build. Run\nthe following commands to get started:\n\n```bash\n# clone the project\ngit clone https://github.com/meta-toolkit/meta.git\ncd meta/\n\n# set up submodules\ngit submodule update --init --recursive\n\n# set up a build directory\nmkdir build\ncd build\ncp ../config.toml .\n\n# configure and build the project\ncmake ../ -DCMAKE_BUILD_TYPE=Release\nmake\n```\n\nYou can now test the system by running the following command:\n\n```bash\n./unit-test --reporter=spec\n```\n\nIf everything passes, congratulations! MeTA seems to be working on your\nsystem.\n\n## Ubuntu 15.10 Build Guide\nUbuntu's non-LTS desktop offering in 15.10 has enough modern software in\nits repositories to build MeTA without much trouble. To install the\ndependencies, run the following commands.\n\n```bash\napt update\napt install g++ git cmake make libjemalloc-dev zlib1g-dev\n```\n\nOnce the dependencies are all installed, you should be ready to build. Run\nthe following commands to get started:\n\n```bash\n# clone the project\ngit clone https://github.com/meta-toolkit/meta.git\ncd meta/\n\n# set up submodules\ngit submodule update --init --recursive\n\n# set up a build directory\nmkdir build\ncd build\ncp ../config.toml .\n\n# configure and build the project\ncmake ../ -DCMAKE_BUILD_TYPE=Release\nmake\n```\n\nYou can now test the system by running the following command:\n\n```bash\n./unit-test --reporter=spec\n```\n\nIf everything passes, congratulations! MeTA seems to be working on your\nsystem.\n\n## Arch Linux Build Guide\nArch Linux consistently has the most up to date packages due to its rolling\nrelease setup, so it's often the easiest platform to get set up on.\n\nTo install the dependencies, run the following commands.\n\n```bash\nsudo pacman -Sy\nsudo pacman -S clang cmake git icu libc++ make jemalloc zlib\n```\n\nOnce the dependencies are all installed, you should be ready to build. Run\nthe following commands to get started:\n\n```bash\n# clone the project\ngit clone https://github.com/meta-toolkit/meta.git\ncd meta/\n\n# set up submodules\ngit submodule update --init --recursive\n\n# set up a build directory\nmkdir build\ncd build\ncp ../config.toml .\n\n# configure and build the project\nCXX=clang++ cmake ../ -DCMAKE_BUILD_TYPE=Release\nmake\n```\n\nYou can now test the system by running the following command:\n\n```bash\n./unit-test --reporter=spec\n```\n\nIf everything passes, congratulations! MeTA seems to be working on your\nsystem.\n\n## Fedora Build Guide\n\nThis has been tested with Fedora 22+ (the oldest currently supported Fedora\nas of the time of writing). You may have success with earlier versions, but\nthis is not tested. (If you're on an older version of Fedora, use `yum`\ninstead of `dnf` for the commands given below.)\n\nTo get started, install some dependencies:\n\n```bash\n# These may be already installed\nsudo dnf install make git wget gcc-c++ jemalloc-devel cmake zlib-devel\n```\n\nYou should be able to run the following commands and see the following\noutput:\n\n```bash\ng++ --version\n```\n\nshould print\n\n    g++ (GCC) 5.3.1 20151207 (Red Hat 5.3.1-2)\n    Copyright (C) 2015 Free Software Foundation, Inc.\n    This is free software; see the source for copying conditions.  There is NO\n    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.\n\nand\n\n```bash\ncmake --version\n```\n\nshould print\n\n    cmake version 3.3.2\n\n    CMake suite maintained and supported by Kitware (kitware.com/cmake).\n\n\nOnce the dependencies are all installed, you should be ready to build. Run\nthe following commands to get started:\n\n```bash\n# clone the project\ngit clone https://github.com/meta-toolkit/meta.git\ncd meta/\n\n# set up submodules\ngit submodule update --init --recursive\n\n# set up a build directory\nmkdir build\ncd build\ncp ../config.toml .\n\n# configure and build the project\ncmake ../ -DCMAKE_BUILD_TYPE=Release\nmake\n```\n\nYou can now test the system with the following command:\n\n```bash\n./unit-test --reporter=spec\n```\n\n## CentOS Build Guide\nMeTA can be built in CentOS 7 and above. CentOS 7 comes with a recent\nenough compiler (GCC 4.8.5), but too old a version of CMake. We'll thus\ninstall the compiler and related libraries from the package manager and\ninstall our own more recent `cmake` ourselves.\n\n```bash\n# install build dependencies (this will probably take a while)\nsudo yum install gcc gcc-c++ git make wget zlib-devel epel-release\nsudo yum install jemalloc-devel\n\nwget http://www.cmake.org/files/v3.2/cmake-3.2.0-Linux-x86_64.sh\nsudo sh cmake-3.2.0-Linux-x86_64.sh --prefix=/usr/local --exclude-subdir\n```\n\nYou should be able to run the following commands and see the following\noutput:\n\n```bash\ng++ --version\n```\n\nshould print\n\n    g++ (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4)\n    Copyright (C) 2015 Free Software Foundation, Inc.\n    This is free software; see the source for copying conditions.  There is NO\n    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.\n\nand\n\n```bash\n/usr/local/bin/cmake --version\n```\n\nshould print\n\n    cmake version 3.2.0\n\n    CMake suite maintained and supported by Kitware (kitware.com/cmake).\n\nOnce the dependencies are all installed, you should be ready to build. Run\nthe following commands to get started:\n\n```bash\n# clone the project\ngit clone https://github.com/meta-toolkit/meta.git\ncd meta/\n\n# set up submodules\ngit submodule update --init --recursive\n\n# set up a build directory\nmkdir build\ncd build\ncp ../config.toml .\n\n# configure and build the project\n/usr/local/bin/cmake ../ -DCMAKE_BUILD_TYPE=Release\nmake\n```\n\nYou can now test the system by running the following command:\n\n```bash\n./unit-test --reporter=spec\n```\n\nIf everything passes, congratulations! MeTA seems to be working on your\nsystem.\n\n## EWS/EngrIT Build Guide\n**Note:** Please don't do this if you are able to get MeTA working in **any\nother possible way**, as the EWS filesystem has a habit of being\n**unbearably slow** and increasing compile times by several orders of\nmagnitude. For example, comparing the `cmake`, `make`, and `unit-test`\nsteps on my desktop vs. EWS gives the following:\n\n| system         | `cmake` time | `make` time | `unit-test` time |\n| -------------- |  ----------- | ----------- | ---------------- |\n| my desktop     | 0m7.523s     | 2m30.715s   | 0m36.631s        |\n| EWS            | 1m28s        | 11m28.473s  | 1m25.326s        |\n\n\nIf you are on a machine managed by Engineering IT at UIUC, you should\nfollow this guide. These systems have software that is much too old for\nbuilding MeTA, but EngrIT has been kind enough to package updated versions\nof research software as modules. The modules provided for GCC and CMake are\nrecent enough to build MeTA, so it is actually mostly straightforward.\n\nTo set up your dependencies (**you will need to do this every time you log\nback in to the system**), run the following commands:\n\n```bash\nmodule load gcc\nmodule load cmake/3.5.0\n```\n\nOnce you have done this, double check your versions by running the\nfollowing commands.\n\n```bash\ng++ --version\n```\n\nshould output\n\n    g++ (GCC) 5.3.0\n    Copyright (C) 2015 Free Software Foundation, Inc.\n    This is free software; see the source for copying conditions.  There is NO\n    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.\n\nand\n\n```bash\ncmake --version\n```\n\nshould output\n\n    cmake version 3.5.0\n\n    CMake suite maintained and supported by Kitware (kitware.com/cmake).\n\nIf your versions are correct, you should be ready to build. To get started,\nrun the following commands:\n\n```bash\n# clone the project\ngit clone https://github.com/meta-toolkit/meta.git\ncd meta/\n\n# set up submodules\ngit submodule update --init --recursive\n\n# set up a build directory\nmkdir build\ncd build\ncp ../config.toml .\n\n# configure and build the project\nCXX=`which g++` CC=`which gcc` cmake ../ -DCMAKE_BUILD_TYPE=Release\nmake\n```\n\nYou can now test the system by running the following command:\n\n```bash\n./unit-test --reporter=spec\n```\n\nIf everything passes, congratulations! MeTA seems to be working on your\nsystem.\n\n## Windows Build Guide\n\nMeTA can be built on Windows using the MinGW-w64 toolchain with gcc. We\nstrongly recommend using [MSYS2][msys2] as this makes fetching the compiler\nand related libraries significantly easier than it would be otherwise, and\nit tends to have very up-to-date packages relative to other similar MinGW\ndistributions.\n\n**Note:** If you find yourself confused or lost by the instructions below,\nplease refer to our [visual setup guide for\nWindows](https://meta-toolkit.org/windows-setup-guide.html) which includes\nscreenshots for every step, including updating MSYS2 and the MinGW-w64\ntoolchain.\n\nTo start, [download the installer][msys2] for MSYS2 from the linked\nwebsite and follow the instructions on that page. Once you've got it\ninstalled, you should use the MinGW shell to start a new terminal, in which\nyou should run the following commands to download dependencies and related\nsoftware needed for building:\n\n```bash\npacman -Syu git make patch mingw-w64-x86_64-{gcc,cmake,icu,jemalloc,zlib} --force\n```\n\n(the `--force` is needed to work around a bug with the latest MSYS2\ninstaller as of the time of writing.)\n\nThen, exit the shell and launch the \"MinGW-w64 Win64\" shell. You can obtain\nthe toolkit and get started with:\n\n```bash\n# clone the project\ngit clone https://github.com/meta-toolkit/meta.git\ncd meta\n\n# set up submodules\ngit submodule update --init --recursive\n\n# set up a build directory\nmkdir build\ncd build\ncp ../config.toml .\n\n# configure and build the project\ncmake .. -G \"MSYS Makefiles\" -DCMAKE_BUILD_TYPE=Release\nmake\n```\n\nYou can now test the system by running the following command:\n\n```bash\n./unit-test --reporter=spec\n```\n\nIf everything passes, congratulations! MeTA seems to be working on your\nsystem.\n\n[msys2]: https://msys2.github.io/\n\n## Generic Setup Notes\n\n - There are rules for clean, tidy, and doc. **After you run the `cmake`\n   command once, you will be able to just run `make` as usual** when you're\n   developing---it'll detect when the CMakeLists.txt file has changed and\n   rebuild Makefiles if it needs to.\n\n - To compile in debug mode, just replace `Release` with `Debug` in the\n   appropriate `cmake` command for your OS above and rebuild using `make`\n   after.\n\n - Don't hesitate to reach out on [the forum][forum] if you encounter\n   problems getting set up. We routinely build with a wide variety of\n   compilers and operating systems through our continuous integration\n   setups ([travis-ci][travis-ci] for Linux and OS X and\n   [Appveyor][appveyor] for Windows), so we can be fairly certain that\n   things should build on nearly all major platforms.\n\n[homebrew]: http://brew.sh\n[forum]: https://forum.meta-toolkit.org\n[travis-ci]: https://travis-ci.org/meta-toolkit/meta\n[appveyor]: https://ci.appveyor.com/project/skystrife/meta\n[meta-website]: https://meta-toolkit.org\n[doxygen]: https://meta-toolkit.org/doxygen/namespaces.html\n","funding_links":[],"categories":["TODO scan for Android support in followings","Machine Learning","\u003ca name=\"cpp\"\u003e\u003c/a\u003eC++","Models","C++","进程间通信","函式庫","No longer maintained","Packages"],"sub_categories":["Latent Dirichlet Allocation (LDA) [:page_facing_up:](https://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf)","机器学习","Tools","[Tools](#tools-1)","Speech Recognition","書籍","Python (and Python Notebooks)","Libraries"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmeta-toolkit%2Fmeta","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmeta-toolkit%2Fmeta","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmeta-toolkit%2Fmeta/lists"}