{"id":13693293,"url":"https://github.com/EDS-APHP-legacy/pySyntheticDatasetGenerator","last_synced_at":"2025-05-02T21:31:51.887Z","repository":{"id":90724311,"uuid":"86330980","full_name":"EDS-APHP-legacy/pySyntheticDatasetGenerator","owner":"EDS-APHP-legacy","description":"Generate relational fictive dataset from a simple yaml description","archived":false,"fork":false,"pushed_at":"2019-04-24T18:48:30.000Z","size":733,"stargazers_count":4,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-12-14T03:03:30.980Z","etag":null,"topics":["data-generator","database","faker","generator","synthetic-data"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/EDS-APHP-legacy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2017-03-27T12:10:21.000Z","updated_at":"2022-01-21T13:16:39.000Z","dependencies_parsed_at":null,"dependency_job_id":"57161f0e-6254-4fbb-8b57-12ed275bee80","html_url":"https://github.com/EDS-APHP-legacy/pySyntheticDatasetGenerator","commit_stats":null,"previous_names":["eds-aphp/pysyntheticdatasetgenerator"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EDS-APHP-legacy%2FpySyntheticDatasetGenerator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EDS-APHP-legacy%2FpySyntheticDatasetGenerator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EDS-APHP-legacy%2FpySyntheticDatasetGenerator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EDS-APHP-legacy%2FpySyntheticDatasetGenerator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/EDS-APHP-legacy","download_url":"https://codeload.github.com/EDS-APHP-legacy/pySyntheticDatasetGenerator/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252108876,"owners_count":21696154,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-generator","database","faker","generator","synthetic-data"],"created_at":"2024-08-02T17:01:08.060Z","updated_at":"2025-05-02T21:31:51.413Z","avatar_url":"https://github.com/EDS-APHP-legacy.png","language":"Python","funding_links":[],"categories":["Process-driven methods"],"sub_categories":["Tabular"],"readme":"# SyntheticDatasetGenerator\r\n\r\n## Dependencies\r\n\r\n```\r\npip install -r requirements.txt\r\n```\r\n- python \u003e=3.6\r\n- liquibase in the path (in case documentation generation needed)\r\n- postgresql database (in case both documentation \u0026 data loaded needed)\r\n\r\n## Run\r\n\r\n1. Push your base file into input\r\n1. Fill config/sdgen.yaml\r\n1. Run make clean run ddl\r\n\r\n\r\n## Install\r\n\r\n```\r\nmake\r\n```\r\n\r\n## Test\r\n\r\n```\r\nmake test\r\n```\r\n\r\n## Principle\r\n\r\n# Goal\r\n\r\n- input:\r\n    - a config file in yaml\r\n    - csv files already existing (eg: terminologies)\r\n- output: \r\n    - one ore multiple data csv reproductible\r\n\r\n# Csv Format (input \u0026 output)\r\n\r\nCsv must respect:\r\n- encoding: utf-8\r\n- separator: \";\"\r\n- quote: False\r\n- header: None\r\n- date format: \"%Y-%M-%d\"\r\n- datetime format: \"%Y-%M-%d %H:%m:%s\"\r\n- strings must not contain the separator (\";\")\r\n\r\n# Table\r\n\r\n- Primary Keys: each table must have an integer primary key\r\n- Foreign Keys: they must refer a primary key.\r\n\r\n# Yaml Configuration\r\n\r\n- Table\r\n  - existing table must have a \"input\" defined\r\n  - generated table must not have \"input\" defined\r\n- Fields Class\r\n  - sequence: an autoincrement from 1 to \"table.tableSize\"\r\n  - simple (percentNull)\r\n    - integer (begin, end)\r\n    - bigint (begin, end)\r\n    - date (begin, end)\r\n    - time (begin, end)\r\n    - varchar (begin, end)\r\n    - regexp (pattern)\r\n    - real (begin, end, decimal)\r\n  - lookup (fk, table, field)\r\n\r\n\r\n# Data Type\r\n\r\n- DBC Type      Java Type\r\n- CHAR          String\r\n- VARCHAR       String\r\n- LONGVARCHAR   String\r\n- NUMERIC       java.math.BigDecimal\r\n- DECIMAL       java.math.BigDecimal\r\n- BIT           boolean\r\n- BOOLEAN       boolean\r\n- TINYINT       byte\r\n- SMALLINT      short\r\n- INTEGER       int\r\n- BIGINT        long\r\n- REAL          float\r\n- FLOAT         double\r\n- DOUBLE        double\r\n- BINARY        byte[]\r\n- VARBINARY     byte[]\r\n- LONGVARBINARY byte[]\r\n- DATE          java.sql.Date\r\n- TIME          java.sql.Time\r\n- TIMESTAMP     java.sql.Timestamp\r\n- CLOB          Clob\r\n- BLOB          Blob\r\n- ARRAY         Array\r\n- DISTINCT      mapping of underlying type\r\n- STRUCT        Struct\r\n- REF           Ref\r\n- DATALINK      java.net.URL\r\n- JAVA_OBJECT   underlying Java class\r\n\r\n# Generate schema\r\n\r\n```\r\nmake load doc\r\n```\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FEDS-APHP-legacy%2FpySyntheticDatasetGenerator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FEDS-APHP-legacy%2FpySyntheticDatasetGenerator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FEDS-APHP-legacy%2FpySyntheticDatasetGenerator/lists"}