{"id":22302657,"url":"https://github.com/dataoneorg/onto-dataonepython","last_synced_at":"2025-03-26T00:29:52.787Z","repository":{"id":13506750,"uuid":"16197635","full_name":"DataONEorg/onto-dataonepython","owner":"DataONEorg","description":"Clone of Bitbucket ndigiuseppe/dataonepython ontology coverage project","archived":false,"fork":false,"pushed_at":"2014-01-24T07:18:43.000Z","size":32968,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-01-30T21:17:15.977Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DataONEorg.png","metadata":{"files":{"readme":"README","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-01-24T07:09:34.000Z","updated_at":"2014-05-21T19:03:53.000Z","dependencies_parsed_at":"2022-09-05T01:50:41.662Z","dependency_job_id":null,"html_url":"https://github.com/DataONEorg/onto-dataonepython","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DataONEorg%2Fonto-dataonepython","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DataONEorg%2Fonto-dataonepython/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DataONEorg%2Fonto-dataonepython/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DataONEorg%2Fonto-dataonepython/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DataONEorg","download_url":"https://codeload.github.com/DataONEorg/onto-dataonepython/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245565380,"owners_count":20636276,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-03T18:40:35.680Z","updated_at":"2025-03-26T00:29:52.765Z","avatar_url":"https://github.com/DataONEorg.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"This document will describe the various Python packages for the DataONE ontology coverage project.\n\npackage:\n\tcorpusFetcher:\n\t\tThis package is used to get a corpus and normalize it.  It also normalizes a thesaurus and performs part of speach.  However, to USE the part of speach package, one must download and install the natural language toolkit (NLTK).  the link is http://nltk.org/install.html    \n\n\tclasses:\n\t\tfetchCorpus: This is the main file to call that will get the corpus from a remove location, and normalize it completely and in the right order\n\t\tget_metadata: This file will call cn.dataone.org to get various corpus files and store them\n\t\tlovins: This is a stemmer that follows the lovins stemmer pattern\n\t\tpaicehusk: This is a stemmer that follows the paice pattern\n\t\tporter: This is a stemmer that follows the porter pattern  \n\t\tporter2: This is a stemmer that follows the snowball pattern\n\t\tremoveNumbers: This is part of the normalization process that removes words with not enough english characters \n\t\tremovePunct: This is part of the normalization process that removes all punctuation\n\t\tremoveStop: This is part of the normalization process that removes stop words\n\t\tremoveUpper: This is part of the normalization process that turns everything to lower case\n\t\tstemWords: This is part of the normalization process that stems all words\n\n\t\tThese files normalize the same file and print out a new file in a pipe fashion.  It does this by putting files in the \"data\" directory from the parent directory.  the final product is a file called finishedCorpus_6.txt\n\n\tOntologyWorker:\n\tclasses:\n\t\tfetchOntology: This file takes a list of URLs and downloads the ontologies from the wweb.  Is very specific to gather the SWEET ontologies, not generalized.\n\t\tontologyStemmer: This file stems the class names from an ontology and then overwrites its.  Pass in as a parameter a directory containing OWL ontologies (only stems those)\n\n\tpartOfSpeachTagger:\n\tclasses:\n\t\tPoSTagger: This file takes as an arguement a string, and returns a list of tuples with a word the PoS.  it filters out all except nouns, adverbs, adjects, and verbs\n\n\tthesaurusFixer:\n\tclasses:\n\t\tMergeKeyValuePairs: Because stemming a thesauri file can cause some keys to be the same...This merges the keys and corresponding values within a file.  you need to pass in 2 arguements (the input and output path) else it hardcodes to (likely) non-existent path\n\t\tThesaurusStemmer: This file takes a thesauri file and stems all the words within it.  you need to pass in 2 arguements (the input and output path) else it hardcodes to (likely) non-existent path\n\n\twordNet:\n\tclasses:\n\t\twordNetHandler: This file uses wordnet to generate synonyms for specific \"words\" (ie a string containing a single word).  However, because wordNet's synonym generator is ...bad, its not used.\n\nDirectories:\n\tdata:\n\t\tThis directory contains a variety of folders including the various levels of normalized corpus.  \n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdataoneorg%2Fonto-dataonepython","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdataoneorg%2Fonto-dataonepython","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdataoneorg%2Fonto-dataonepython/lists"}