https://github.com/openaire/iis
Information Inference Service of the OpenAIRE system
https://github.com/openaire/iis
big-data data-mining data-processing-system hadoop iis information-inference openaire spark text-mining
Last synced: 29 days ago
JSON representation
Information Inference Service of the OpenAIRE system
- Host: GitHub
- URL: https://github.com/openaire/iis
- Owner: openaire
- License: apache-2.0
- Created: 2015-09-11T05:55:08.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2025-12-16T11:02:08.000Z (3 months ago)
- Last Synced: 2025-12-20T00:27:29.484Z (3 months ago)
- Topics: big-data, data-mining, data-processing-system, hadoop, iis, information-inference, openaire, spark, text-mining
- Language: Java
- Homepage:
- Size: 69 MB
- Stars: 22
- Watchers: 16
- Forks: 12
- Open Issues: 94
-
Metadata Files:
- Readme: README.markdown
- License: LICENSE
- Notice: NOTICE
Awesome Lists containing this project
README
# About
Information Inference Service (IIS) a flexible data processing system for handling big data based on Apache Hadoop technologies. It is a subsystem of the OpenAIRE system ([www.openaire.eu](http://www.openaire.eu) is its public web front-end) - see Fig.1 for a high-level overview.

**Fig.1**: The center of OpenAIRE system is the Information Space system which stores all information available in the system. IIS ingests data from Information Space, runs processing workflows, and produces inferred data which, in turn, is ingested by Information Space.
The goal of OpenAIRE is to provide an infrastructure for gathering, processing (including de-duplication), and providing unified access to research-related data (papers, datasets, researchers, projects, etc.). The goal of IIS is to provide data/text mining functionality for the OpenAIRE system. In practice, IIS defines data processing workflows that connect various modules, each one with well-defined input and output. A high-level overview of IIS can be found in paper ["Information Inference in Scholarly Communication Infrastructures: The OpenAIREplus Project Experience", Procedia Computer Science, vol. 38, 2014, 92-99](http://www.sciencedirect.com/science/article/pii/S1877050914013763).
IIS was initially developed during [OpenAIREplus](http://cordis.europa.eu/project/rcn/100079_en.html) project and has been further extended during [OpenAIRE2020](http://cordis.europa.eu/project/rcn/194062_en.html) project.
The original code was migrated to GitHub from [D-NET](http://www.d-net.research-infrastructures.eu/) SVN repository. The public read-only interface of the repository is available at [https://svn-public.driver.research-infrastructures.eu/driver/dnet40/modules/](https://svn-public.driver.research-infrastructures.eu/driver/dnet40/modules/) and this is where you can find the history of the code base before the migration (IIS-related Maven projects are the ones matching glob pattern `*-iis-*`).
# Content of the most important subdirectories and files
- `docs` - basic documentation
- `iis-core` - generic common utilities used by other projects
- `iis-common` - OpenAIRE-related common utilities
- `iis-wf` - definitions of workflows used in the system
- `CONTRIBUTORS.markdown` - list of contributors to the project
# License
The code is licensed under Apache License, version 2.0. We also use 3rd party code from other projects compatible with this license. This 3rd party code can be found in directories with names starting with `iis-3rdparty-`; each directory corresponds to a different source project.