{"id":15029819,"url":"https://github.com/mit-nlp/mitie","last_synced_at":"2025-05-14T12:12:11.181Z","repository":{"id":15611300,"uuid":"18347668","full_name":"mit-nlp/MITIE","owner":"mit-nlp","description":"MITIE: library and tools for information extraction","archived":false,"fork":false,"pushed_at":"2025-01-04T23:54:38.000Z","size":10959,"stargazers_count":2939,"open_issues_count":18,"forks_count":538,"subscribers_count":186,"default_branch":"master","last_synced_at":"2025-04-11T04:59:49.536Z","etag":null,"topics":["c-plus-plus","information-extraction","java","machine-learning","natural-language-processing","python"],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mit-nlp.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-04-01T22:47:53.000Z","updated_at":"2025-04-08T01:40:30.000Z","dependencies_parsed_at":"2022-06-26T10:31:35.212Z","dependency_job_id":null,"html_url":"https://github.com/mit-nlp/MITIE","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mit-nlp%2FMITIE","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mit-nlp%2FMITIE/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mit-nlp%2FMITIE/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mit-nlp%2FMITIE/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mit-nlp","download_url":"https://codeload.github.com/mit-nlp/MITIE/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248345273,"owners_count":21088244,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c-plus-plus","information-extraction","java","machine-learning","natural-language-processing","python"],"created_at":"2024-09-24T20:11:41.909Z","updated_at":"2025-04-11T04:59:55.536Z","avatar_url":"https://github.com/mit-nlp.png","language":"C++","readme":"MITIE: MIT Information Extraction\n=====\n\nThis project provides free (even for commercial use)\n[state-of-the-art](../../wiki/Evaluation) information extraction\ntools. The current release includes tools for performing [named entity\nextraction](http://blog.dlib.net/2014/04/mitie-completely-free-and-state-of-art.html) \nand [binary relation detection](http://blog.dlib.net/2014/07/mitie-v02-released-now-includes-python.html) \nas well as tools for training custom extractors and relation detectors.  \n\nMITIE is built on top of [dlib](http://dlib.net), a high-performance machine-learning library[1], MITIE makes use of several state-of-the-art techniques including the use of distributional word embeddings[2] and Structural Support Vector Machines[3].  MITIE offers several pre-trained models providing varying levels of support for both English, Spanish, and German trained using a variety of linguistic resources (e.g., CoNLL 2003, ACE, [Wikipedia, Freebase](https://github.com/mit-nlp/MITIE/releases/download/v0.4/freebase_wikipedia_binary_relation_training_data_v1.0.tar.bz2), and Gigaword). The core MITIE software is written in C++, but bindings for several other software languages including Python, R, Java, C, and MATLAB allow a user to quickly integrate MITIE into his/her own applications.\n\nOutside projects have created API bindings for [OCaml](https://github.com/travisbrady/omitie), \n[.NET](https://github.com/BayardRock/MITIE-Dot-Net), [.NET Core](https://github.com/slamj1/MitieNetCore), [PHP](https://github.com/ankane/mitie-php), and\n[Ruby](https://github.com/ankane/mitie).  There is also an [interactive tool](https://github.com/Sotera/mitie-trainer) for labeling data and training MITIE.\n\n# Using MITIE\n\nMITIE's primary API is a C API which is documented in the\n[mitie.h](mitielib/include/mitie.h) header file.  Beyond this, there are many\n[example programs](examples/) showing how to use MITIE from C, C++, Java, R, or Python 2.7.\n\n### Initial Setup\n\nBefore you can run the provided examples you will need to download the trained\nmodel files which you can do by running:\n```\nmake MITIE-models\n```\nor by simply downloading the [MITIE-models-v0.2.tar.bz2](https://github.com/mit-nlp/MITIE/releases/download/v0.4/MITIE-models-v0.2.tar.bz2)\nfile and extracting it in your MITIE folder.  Note that the Spanish and German models are supplied in \nseparate downloads.  So if you want to use the Spanish NER model then download [MITIE-models-v0.2-Spanish.zip](https://github.com/mit-nlp/MITIE/releases/download/v0.4/MITIE-models-v0.2-Spanish.zip) and\nextract it into your MITIE folder.  Similarly for the German model: [MITIE-models-v0.2-German.tar.bz2](https://github.com/mit-nlp/MITIE/releases/download/v0.4/MITIE-models-v0.2-German.tar.bz2)\n\n### Using MITIE from the command line\n\nMITIE comes with a basic streaming NER tool.  So you can tell MITIE to\nprocess each line of a text file independently and output marked up text with the command:\n\n```\ncat sample_text.txt | ./ner_stream MITIE-models/english/ner_model.dat  \n```\n\nThe ner_stream executable can be compiled by running `make` in the top level MITIE folder or\nby navigating to the [tools/ner_stream](tools/ner_stream) folder and running `make` or using \nCMake to build it which can be done with the following commands:\n```\ncd tools/ner_stream\nmkdir build\ncd build\ncmake ..\ncmake --build . --config Release\n```\n\n### Compiling MITIE as a shared library\n\nOn a UNIX like system, this can be accomplished by running `make` in the top level MITIE folder or\nby running:\n```\ncd mitielib\nmake\n```\nThis produces shared and static library files in the mitielib folder.  Or you can use\nCMake to compile a shared library by typing:\n```\ncd mitielib\nmkdir build\ncd build\ncmake ..\ncmake --build . --config Release --target install\n```\n\nEither of these methods will create a MITIE shared library in the mitielib folder. \n\n### Compiling MITIE using OpenBLAS\n\nIf you compile MITIE using cmake then it will automatically find and use any optimized BLAS\nlibraries on your machine.  However, if you compile using regular make then you have\nto manually locate your BLAS libaries or DLIB will default to its built in, but slower, BLAS\nimplementation.   Therefore, to use OpenBLAS when compiling without cmake, locate `libopenblas.a` and `libgfortran.a`, then\nrun `make` as follows:\n```\ncd mitielib \nmake BLAS_PATH=/path/to/openblas.a LIBGFORTRAN_PATH=/path/to/libfortran.a\n```\nNote that if your BLAS libraries are not in standard locations cmake will fail to find them.  However,\nyou can tell it what folder to look in by replacing `cmake ..` with a statement such as:\n```\ncmake -DCMAKE_LIBRARY_PATH=/home/me/place/i/put/blas/lib ..\n```\n\n### Using MITIE from a Python 2.7 program\n\nOnce you have built the MITIE shared library, you can go to the [examples/python](examples/python) folder\nand simply run any of the Python scripts.  Each script is a tutorial explaining some aspect of\nMITIE: [named entity recognition and relation extraction](examples/python/ner.py), \n[training a custom NER tool](examples/python/train_ner.py), or \n[training a custom relation extractor](examples/python/train_relation_extraction.py).\n\nYou can also install ``mitie`` direcly from github with this command:\n``pip install git+https://github.com/mit-nlp/MITIE.git``.\n\n\n### Using MITIE from R\n\nMITIE can be installed as an R package. See the [README](tools/R-binding) for more details.\n\n### Using MITIE from a C program\n\nThere are example C programs in the [examples/C](examples/C) folder.  To compile of them you simply\ngo into those folders and run `make`.  Or use CMake like so:\n```\ncd examples/C/ner\nmkdir build\ncd build\ncmake ..\ncmake --build . --config Release\n```\n\n### Using MITIE from a C++ program\n\nThere are example C++ programs in the [examples/cpp](examples/cpp) folder.  To compile any of them you simply\ngo into those folders and run `make`.  Or use CMake like so:\n```\ncd examples/cpp/ner\nmkdir build\ncd build\ncmake ..\ncmake --build . --config Release\n```\n\n### Using MITIE from a Java program\n\nThere is an example Java program in the [examples/java](examples/java) folder.  Before you can run it you\nmust compile MITIE's java interface which you can do like so:\n```\ncd mitielib/java\nmkdir build\ncd build\ncmake ..\ncmake --build . --config Release --target install\n```\nThat will place a javamitie shared library and jar file into the mitielib folder.  Once you have those\ntwo files you can run the example program in examples/java by running run_ner.bat if you are on Windows or\nrun_ner.sh if you are on a POSIX system like Linux or OS X.\n\nAlso note that you must have Swig 1.3.40 or newer, CMake 2.8.4 or newer, and the Java JDK installed to compile the MITIE interface.  Finally, note that if you are using 64bit Java on Windows then you will need to use a command like:\n```\ncmake -G \"Visual Studio 10 Win64\" ..\n```\ninstead of  `cmake ..` so that Visual Studio knows to make a 64bit library.\n\n### Running MITIE's unit tests\n\nYou can run a simple regression test to validate your build.  Do this by running\nthe following command from the top level MITIE folder:\n\n```\nmake test\n```\n\n`make test` builds both the example programs and downloads required\nexample models.  If you require a non-standard C++ compiler, change\n`CC` in `examples/C/makefile` and in `tools/ner_stream/makefile`.\n\n\n# Precompiled Python 2.7 binaries\n\nWe have built Python 2.7 binaries packaged with sample models for 64bit Linux and Windows (both 32 and 64 bit version of Python).  You can download the precompiled package here: [Precompiled MITIE 0.2](https://github.com/mit-nlp/MITIE/releases/download/v0.4/mitie-v0.2-python-2.7-windows-or-linux64.zip)\n\n\n# Precompiled Java 64bit binaries\n\nWe have built Java binaries for the 64bit JVM which work on Linux and Windows.  You can download the precompiled package here: [Precompiled Java MITIE 0.3](https://github.com/mit-nlp/MITIE/releases/download/v0.4/mitie-java-v0.3-windows64-or-linux64.zip).  In the file is an examples/java folder.  You can run the example by executing the provided .bat or .sh file.\n\n# Citing MITIE\n\nThere isn't any paper specifically about MITIE. However, since MITIE is\nbasically just a thin wrapper around dlib please cite dlib's JMLR paper if you\nuse MITIE in your research:\n\n```\nDavis E. King. Dlib-ml: A Machine Learning Toolkit. Journal of Machine Learning Research 10, pp. 1755-1758, 2009\n\n@Article{dlib09,\n  author = {Davis E. King},\n  title = {Dlib-ml: A Machine Learning Toolkit},\n  journal = {Journal of Machine Learning Research},\n  year = {2009},\n  volume = {10},\n  pages = {1755-1758},\n}\n```\n\n# License\n\nMITIE is licensed under the Boost Software License - Version 1.0 - August 17th, 2003.  \n\nPermission is hereby granted, free of charge, to any person or organization\nobtaining a copy of the software and accompanying documentation covered by\nthis license (the \"Software\") to use, reproduce, display, distribute,\nexecute, and transmit the Software, and to prepare derivative works of the\nSoftware, and to permit third-parties to whom the Software is furnished to\ndo so, all subject to the following:\n\nThe copyright notices in the Software and this entire statement, including\nthe above license grant, this restriction and the following disclaimer,\nmust be included in all copies of the Software, in whole or in part, and\nall derivative works of the Software, unless such copies or derivative\nworks are solely in the form of machine-executable object code generated by\na source language processor.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT\nSHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE\nFOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE,\nARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER\nDEALINGS IN THE SOFTWARE.\n\n# References\n\n[1] Davis E. King. Dlib-ml: A Machine Learning Toolkit. Journal of Machine Learning Research 10, pp. 1755-1758, 2009.\n\n[2] Paramveer Dhillon, Dean Foster and Lyle Ungar, Eigenwords: Spectral Word Embeddings, Journal of Machine Learning Research (JMLR), 16, 2015.\n\n[3] T. Joachims, T. Finley, Chun-Nam Yu, Cutting-Plane Training of Structural SVMs, Machine Learning, 77(1):27-59, 2009.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmit-nlp%2Fmitie","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmit-nlp%2Fmitie","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmit-nlp%2Fmitie/lists"}