An open API service indexing awesome lists of open source software.

https://github.com/dataoneorg/onto-dataonejava

Clone of Bitbucket ndigiuseppe/dataonejava ontology coverage checker
https://github.com/dataoneorg/onto-dataonejava

Last synced: over 1 year ago
JSON representation

Clone of Bitbucket ndigiuseppe/dataonejava ontology coverage checker

Awesome Lists containing this project

README

          

This readme explains the files and organization of rhte DataONEjava componenet of the ontology coverage checker.

Package:
owlOntologies: Note that for these classes to work, the OWLapi jar file NEEDS to be in the build path. This file is included as owlapi-distribution-3.4.4-bin.jar.
classes:
CoverageAnalyzer: This is the file that actually generates the coverage score for a particular ontology. This should be called using the "calculateScore(String corpOnt, String testOnt)" method passing in two strings, each which are a path to an OWLontology file. It will then compute the score for the classes, equivilances, and subclasses, along with an overall score. Note that it currently weights these things the same, though this can be changes on the three lines with comments that say determine the X in testOnt. On top of calculating the scores, it will output the scores to the terminal (for ease of reading) and stores them in a file where the ontology under test is, using the name of the ontology under test and adding CoverageScores.txt. (e.g., if the ontology under test was in /home/user/ontologies/Test.owl it would save the scores in /home/user/ontologies/TestCoverageScores.txt). The remaining methods can be used to get a raw intermediate score, and mainly are available for testing purposes.
CreateOntologyFromThesaurus: This class is designed to read in a normalized corpus, and then using an thesaurus, generate an Owl ontology that represents that corpus. it does this by saying that every synonym word shares the equivilance class "is a" we can then add subclass by saying that if the synonym relationship is not symmetrical (e.g, cow is a synonym of steer, but steer is not a synonym of cow, we say that all steers are cows, but not all cows are steers...therefore, steer is a subclass of cow). This class should be called from the main(String[] args) method passing in two parameters (the first is the absolute path to the corpus and the second is the absolute path where you want the generated ontology stored). This will generate an entire ontology (with equivalences and subclasses). However, if you have an existing ontology and only want to add some classes, you can use the addToOntology(String ontPath, String className) method to add a specific class (and its implications) to the specified ontology. Lastly, if you have two ontologies you want to merge, you can use the mergeOntology(String firstOntPath, String secondOntPath, String outputPath) method which takes in three parameters (the first two being the paths to the existing ontology, and the final one being the output path).
MyOwlOntologyManager: This is a class designed to call various aspects of the OWLapi. This should be considered more of a library than anything else, meaning its use is expected to be something of the form of creating an object and then calling its members. Note that due to the "unique" way OWL ontolgoies are stored, each ontology you want to access should have its own MyOwlOntologyManager object to manage it. This library can do various things like create/load ontologies, get the classes or names of classes from an ontology, add axioms, determine if something is a sub class or add classes to an existing ontology.
ThesaurusManager: This file is used to manage the thesaurus for the CreateOntologyFromThesaurus class. This file should not be altered. Basically it reads in the synonym file, and creates a HashMap with the keys being the headword and the values being a list of the synonyms. Typically this class is instantiated, the thesauri are read in, and then calls to getSynonyms(String key) is used.
unitTests:
classes:
CoverageAnalyzerTest: JUnit tests for the coverage file. It has a few simple inputs and outputs that are confirmed and can be confirmed manually.
CreateOntologyFromThesaurusTest: JUnit tests for the createOntologyFromThesaurus class. It tests the various members with simple files and inputs that can be confirmed manually.

Directories:
ontologies: This directory simply houses all out ontologies we want to be tested. For example, if we are testing the SWEET ontology set, then it houses all those OWL files.
unitTestData: This directory holds all the sample input and outputs for the JUnits tests. This prevents them from being confused with other ontologies and prevents them from being accidentally edited. Future test case inputs should be put here.
stemmedOntologies: This directory holds all the same ontolgoies from the "ontologies" directory, but has all the classes stemmed. Thus, these are the ontologies that should be used during coverage evaluation. However, in we kept the originals for reference.
synonyms: This directory holds the various thesauri files. There should be three: GenEnglishSynCompendium.txt (1), GenEnglishSynCompendiumStemmed.txt (2), mergedSynCompendiumStemmed.txt (3). The first is the listing as we got it from other researchers. The second has all the words stemmed from the original file. The third has all values of headwords combined. Only the third should be used. The final file (MethodNotes.txt) is a readme from the outside researchers regarding the first file.