{"id":20221630,"url":"https://github.com/phcerdan/insightjournal-dev","last_synced_at":"2025-10-27T06:01:57.331Z","repository":{"id":66616658,"uuid":"181562632","full_name":"phcerdan/insightjournal-dev","owner":"phcerdan","description":"Insightjournal development scripts","archived":false,"fork":false,"pushed_at":"2022-12-09T19:38:35.000Z","size":2420,"stargazers_count":1,"open_issues_count":3,"forks_count":1,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-03-03T12:16:16.284Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/phcerdan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-04-15T20:42:48.000Z","updated_at":"2021-08-26T16:42:38.000Z","dependencies_parsed_at":"2023-03-09T17:00:36.645Z","dependency_job_id":null,"html_url":"https://github.com/phcerdan/insightjournal-dev","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/phcerdan/insightjournal-dev","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phcerdan%2Finsightjournal-dev","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phcerdan%2Finsightjournal-dev/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phcerdan%2Finsightjournal-dev/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phcerdan%2Finsightjournal-dev/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/phcerdan","download_url":"https://codeload.github.com/phcerdan/insightjournal-dev/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phcerdan%2Finsightjournal-dev/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261064611,"owners_count":23104728,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-14T06:53:54.590Z","updated_at":"2025-10-27T06:01:57.325Z","avatar_url":"https://github.com/phcerdan.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"InsightJournal-dev\n------------------\n\nThis repository deals with cleaning and extracting data from the existing database (postgresql)\nto json objects per publication. (Also a list of author/reviewers/active persons can be queried)\n\nFor postgresql development I was using dbeaver (eclipse based)\nThe only thing I was doing that uses IDE function is the export/copy data.\nThe queries generate as an output a jsonb object, Use the Shift-Ctrl-C (advanced copy)\nto add a comma `,` to separate the different rows. See gif:\n\n![Exporting jsonb to valid json in Beaver](./docs/ExportDataDBeaver.gif)\n\nExtracting bitstreams\n---------------------\n\n[python-extract-bitstreams.py](./python-scripts/python-extract-bitstreams.py) is used to download\nall the bitstreams/blobs associated with a publication from the current Insight-Journal site. Each publication can have different revisions.\nAny shared data between revisions is duplicated in each folder in order to get revisions as independent/publishable states\n(think of them as releases).\nThe bitstreams are uploaded to a girder instance in [data.kitware](https://data.kitware.com/#collection/5cb75e388d777f072b41e8db)\n```bash\ngirder-client --api-url https://data.kitware.com/api/v1 --api-key xxxxxxxxxxxxxxxxx upload 5cc782658d777f072b7834a2 ./bitstreams_folder\n```\nThe bitstreams are not stored in the database, so be sure to duplicate the data regularly.\n\n[python-extract-logo.py](./python-scripts/python-extract-logo.py) is used to extract logos/data blobs from the database directly. They are still\nstored there (and also externally).\n\n[remove_duplicateName.py](./python-scripts/remove_duplicatedName.py):\nIn the quest to create revisions that make copies of data, different files with the name happen. We append the string `_duplicateNameX` to the name.\nAnd then check if those files are actually the same (in that case we only keep one file) or they are indeed different files, in that case we keep them.\nSome results of this scripts can be seen in [removed_duplicated_names_stats.txt](./python-scripts/removed_duplicated_names_stats.txt)\n\nCreating metadata.json per publication\n--------------------------------------\n\nDifferent sql queries are stored in the folder [sql-scripts](./sql-scripts/).\nThe original approach was to use [create_json_per_publication_from_sql.py](./python-scripts/create_json_per_publication_from_sql.py).\nBut some sql queries require post-processing in form of a `sed` (I was using vim) substitution.\nSo the recommended approach is to first generate and export the json data per query to a file (check dbeaver way to export data in this document). This json files per query are stored in [cleaned_json_data](./cleaned_json_data) (with the post-process substitution in place)\n\nThen the final goal of generating a `metadata.json` per publication is done in [merge_json_data_by_publication_id.py](python-scripts/merge_json_data_by_publication_id.py).\n\n\nCheck [readme.sql](./readme.sql) for all the steps pursued to clean the database. They can be executed all at once (and in a reproducible manner) from [clean_database.sql](./sql-scripts/clean_database.sql), starting from a dumped postgresql database of the live site.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphcerdan%2Finsightjournal-dev","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fphcerdan%2Finsightjournal-dev","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphcerdan%2Finsightjournal-dev/lists"}