{"id":21455180,"url":"https://github.com/phenopolis/pheno4j","last_synced_at":"2025-06-16T13:06:27.381Z","repository":{"id":49202150,"uuid":"68394964","full_name":"phenopolis/pheno4j","owner":"phenopolis","description":"Pheno4j: a graph based HPO to NGS database","archived":false,"fork":false,"pushed_at":"2023-06-13T22:50:45.000Z","size":21798,"stargazers_count":31,"open_issues_count":5,"forks_count":4,"subscribers_count":9,"default_branch":"master","last_synced_at":"2023-10-20T22:21:18.769Z","etag":null,"topics":["neo4j"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/phenopolis.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2016-09-16T16:11:13.000Z","updated_at":"2022-05-11T16:41:15.000Z","dependencies_parsed_at":"2022-09-06T04:40:59.870Z","dependency_job_id":"93746282-4bab-43fa-bf19-92787c1be5f1","html_url":"https://github.com/phenopolis/pheno4j","commit_stats":null,"previous_names":[],"tags_count":0,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phenopolis%2Fpheno4j","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phenopolis%2Fpheno4j/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phenopolis%2Fpheno4j/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phenopolis%2Fpheno4j/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/phenopolis","download_url":"https://codeload.github.com/phenopolis/pheno4j/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":226003010,"owners_count":17558157,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["neo4j"],"created_at":"2024-11-23T05:10:50.637Z","updated_at":"2024-11-23T05:10:51.185Z","avatar_url":"https://github.com/phenopolis.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Build Status](https://travis-ci.org/phenopolis/pheno4j.svg?branch=master)](https://travis-ci.org/phenopolis/pheno4j)\n[![Coverage Status](https://coveralls.io/repos/github/phenopolis/pheno4j/badge.svg?branch=master)](https://coveralls.io/github/phenopolis/pheno4j?branch=master)\n\u003c!-- Sajid fix this :-)\n[![Quality Gate](https://sonarqube.com/api/badges/gate?key=com.graph%3Adb)](https://sonarqube.com/dashboard/index/com.graph%3Adb)\n--\u003e\n\n# Pheno4j: a graph based HPO to NGS database\nAuthor: Sajid Mughal\n\nPaper published:\nhttps://www.ncbi.nlm.nih.gov/pubmed/28633344\n\nPresentation videos:\n* https://skillsmatter.com/skillscasts/10611-pheno4j-a-gene-to-phenotype-graph-database\n* https://www.youtube.com/watch?v=257GarPLym4\n\n## Purpose\nGenetic and phenotype data in JSON, VCF and CSV format and convert them into CSV files that represent Nodes and Relationships that can then be used to populate Pheno4J using [the neo4j bulk CSV import tool](https://neo4j.com/docs/operations-manual/current/tutorial/import-tool/).\n\n## Public datasets\nOnly two publicly available datasets required:\n* [Human Phenotype Ontology](http://purl.obolibrary.org/obo/hp.obo)\n* [OMIM HPO-Gene mapping](http://compbio.charite.de/jenkins/job/hpo.annotations.monthly/lastStableBuild/artifact/annotation/ALL_SOURCES_ALL_FREQUENCIES_diseases_to_genes_to_phenotypes.txt)\n\n## User specified datasets\n\nExample datasets specified in [config.properties](https://github.com/phenopolis/pheno4j/blob/master/src/main/resources/config.properties):\n* VCF file which contains genotypes ([example](https://github.com/phenopolis/pheno4j/blob/master/src/test/resources/genotypes.vcf))\n* VEP JSON file ([example](https://github.com/phenopolis/pheno4j/blob/master/src/test/resources/VEP.json))\n* Individuals with HPO terms as CSV file ([example](https://github.com/phenopolis/pheno4j/blob/master/src/test/resources/person_phenotypes.csv))\n\n## Pheno4J schema overview\n\n![](https://github.com/sajid-mughal/pheno4j/blob/master/docs/Figure_1.png?raw=true)\n\n## Installation\n### Local Installation with Exemplar Data\n\nThe local version will not be able to handle efficiently a very large dataset since it does not have access to the configuration for the page cache and jvm size.\nHence it should be used for testing.\n\n#### Prerequisites \n- Java 1.8\n- Maven 3\n\n#### Build Graph and Start up Neo4j on test data ###\nDownload the code, build the database, load the test data referenced in [config.properties](https://github.com/phenopolis/pheno4j/blob/master/src/main/resources/config.properties) and start the server on port 7474:\n\n```\ngit clone https://github.com/phenopolis/pheno4j.git\ncd pheno4j\nmvn clean compile -P build-graph,run-neo4j\n```\n\nOnce the server is running, it can be queried either by going to the web interface on http://localhost:7474/ or using [curl](https://curl.haxx.se/)\nto do http requests from the command line (see next section).\n\n#### Run Example Queries with curl\nThe curl http queries return data in JSON format and so the response can be parsed using [jq](https://stedolan.github.io/jq/).\n\nFor example, get count of variants shared between person1 and person2:\n```\ncurl -H \"Content-Type: application/json\" -d '{\n\"query\": \"WITH [$p1,$p2] AS persons MATCH (p:Person)\u003c-[]-(v:GeneticVariant) WHERE p.personId IN persons WITH v, count(*) as c, persons WHERE c = size(persons) RETURN count(v.variantId);\",\n\"params\":{\"p1\":\"person1\",\"p2\":\"person2\"}\n}' http://localhost:7474/db/data/cypher\n```\n\nGet ids of persons with variant 22-51171497-G-A:\n```\ncurl -H \"Content-Type: application/json\" -d '{\n\"query\": \"MATCH (gv:GeneticVariant)-[]-\u003e(p:Person) WHERE gv.variantId =$var RETURN p.personId;\",\n\"params\":{\"var\":\"22-51171497-G-A\"}\n}' http://localhost:7474/db/data/cypher\n```\n\nMore cypher queries are available [here](https://github.com/phenopolis/pheno4j/blob/master/docs/Cypher-Queries.md).\n\n#### Running Pheno4J on your own data\n\n[Documentation here](https://github.com/phenopolis/pheno4j/blob/master/docs/Additional-Documentation.md#loading-manually-created-files).\n\n### Server Installation\n\nThe server installation can scale to very large datasets as it allows configuration of the JVM size and page cache.\n\n#### Prerequisites\n- Java 1.8\n- Neo4j installation - download from https://neo4j.com/download/community-edition/, extract the archive. The location of the extract will be referred to as **$NEO4J_HOME**\n\n#### Deploy code\nRun the following in the checkout directory, which will generate a zip file, \"graph-bundle.zip\", in the target folder:\n```\nmvn clean package\n```\nCopy `graph-bundle.zip` to your target server and unzip it.\n\n#### Update config file to reference your input data ###\nIn the `conf` folder of the extracted zip above, update [config.properties](https://github.com/phenopolis/pheno4j/blob/master/src/main/resources/config.properties) to reference your input data.\n\n#### Run the GraphDatabaseBuilder ###\nThis step will take all the input data and build csv files, which are then built into a Neo4j database using their ImportTool.\nConstraints and Indexes are then created.\nIn the lib folder of the extracted zip above, run the following:\n```\njava -cp *:../conf/ com.graph.db.GraphDatabaseBuilder\n```\n#### Link the generated database above to your Neo4j installation\n```\ncd $NEO4J_HOME/data/databases\nln -s ${output.folder}/graph-db/data/databases/graph.db graph.db \n```\n${output.folder} is defined in [config.properties](https://github.com/phenopolis/pheno4j/blob/master/src/main/resources/config.properties)\n\n### Update Neo4j config\nIdeally you should hold as much of the data in memory as possible ([See here for more information](https://neo4j.com/docs/operations-manual/current/performance/))\nSet the value of `dbms.memory.pagecache.size` in ${NEO4J_HOME}/conf/neo4j.conf to the size of the files: `NEO4J_HOME/data/databases/graph.db/*store.db*`\n\n#### Start Neo4j\n```\ncd $NEO4J_HOME/bin\n./neo4j start\n```\n#### Run 'warmup' query\nThis query will basically hit the entire graph, the result will be all the data stored on the disk will be loaded into memory. ([See here for more information](https://neo4j.com/developer/kb/warm-the-cache-to-improve-performance-from-cold-start/))\nThis takes up to 10 minutes for our data.\n```\nMATCH (n)\nOPTIONAL MATCH (n)-[r]-\u003e()\nRETURN count(n.prop) + count(r.prop);\n```\n#### Additional Steps\nIf you would like to connect to your instance from your application tier to handle incoming database requests, you can change the password to the Neo4j instance with the following; the port is the value of `dbms.connector.http.listen_address` in `$NEO4J_HOME/conf/neo4j.conf`.\nThe following command will the password to `1`:\n```\ncurl -H \"Content-Type: application/json\" -X POST -d '{\"password\":\"1\"}' -u neo4j:neo4j http://**{HOST}**:**{PORT}**/user/neo4j/password\n```\n\n## Example Cypher Queries\nExamples can be found [here](https://github.com/phenopolis/pheno4j/blob/master/docs/Cypher-Queries.md).\n\n## Further reading\n[Additional Documentation](docs/Additional-Documentation.md)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphenopolis%2Fpheno4j","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fphenopolis%2Fpheno4j","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphenopolis%2Fpheno4j/lists"}