{"id":18296543,"url":"https://github.com/fiedsch/datamanagement","last_synced_at":"2025-04-05T12:32:00.064Z","repository":{"id":62504599,"uuid":"49772323","full_name":"fiedsch/datamanagement","owner":"fiedsch","description":"Data management helpers (PHP-CLI)","archived":false,"fork":false,"pushed_at":"2024-12-11T07:56:27.000Z","size":142,"stargazers_count":2,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-04T10:08:54.359Z","etag":null,"topics":["csv-data","data","datamanagement","helper","php"],"latest_commit_sha":null,"homepage":null,"language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fiedsch.png","metadata":{"files":{"readme":"Readme.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-01-16T12:23:23.000Z","updated_at":"2024-12-11T07:56:31.000Z","dependencies_parsed_at":"2024-08-25T07:47:37.006Z","dependency_job_id":"34b913c2-045c-4a5a-91ba-426d659f6510","html_url":"https://github.com/fiedsch/datamanagement","commit_stats":{"total_commits":110,"total_committers":2,"mean_commits":55.0,"dds":"0.018181818181818188","last_synced_commit":"887ec38b8d1b2c55e1c2e7a1b15d9ddafaba1797"},"previous_names":[],"tags_count":28,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fiedsch%2Fdatamanagement","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fiedsch%2Fdatamanagement/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fiedsch%2Fdatamanagement/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fiedsch%2Fdatamanagement/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fiedsch","download_url":"https://codeload.github.com/fiedsch/datamanagement/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247339022,"owners_count":20923004,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["csv-data","data","datamanagement","helper","php"],"created_at":"2024-11-05T14:41:37.032Z","updated_at":"2025-04-05T12:31:57.493Z","avatar_url":"https://github.com/fiedsch.png","language":"PHP","readme":"# Datamanagement Tools\n\nPHP classes and helpers for managing data read from text files\n \n * Data\\FileReader  read text files\n * Data\\CsvFileReader read CSV files\n * Data\\FixedWidthReader reads text files that contain data in fixed width columns\n * Data\\Helper helper functions like `SC()` that converts from spreadsheet column name to index of array \n generated by (e.g.) `CsvFileReader-\u003egetLine()`\n \n \n## Examples\n \n### Work on CSV data\n \n```php\n\u003c?php\n\nrequire __DIR__ . '/../vendor/autoload.php';\n\nuse Fiedsch\\Data\\File\\CsvReader;\n \ntry {\n \n  $reader = new CsvReader(\"testdata.csv\", \";\");\n\n  // Read and handle all lines containing data.\n\n  while (($line = $reader-\u003egetLine()) !== null) {\n    // ignore empty lines (i.e. lines containing no data)\n    if (!$reader-\u003eisEmpty($line)) {\n      print_r($line);\n    }\n  }\n  // $reader-\u003eclose(); // not needed as it will be automatically called when there are no more lines\n\n} catch (Exception $e) {\n    print $e-\u003egetMessage() . \"\\n\";\n}\n```\n\n#### Features\n\nAs of v0.3.2 the typical boilerplate \"open file, read every non-empty line, close file\" \ncan be written in a fancier way. Use the optional parameter to `getLine()`:\n \n ```php\n \u003c?php\n \n   while (($line = $reader-\u003egetLine(Reader::SKIP_EMPTY_LINES)) !== null) {\n       print_r($line);\n   }\n   \n ```\n \n \n### Data augmentation\n \n \n```php\n\u003c?php\n \nrequire __DIR__ . '/../vendor/autoload.php';\n \nuse Fiedsch\\Data\\File\\CsvReader;\nuse Fiedsch\\Data\\Augmentation\\Augmentor;\nuse Fiedsch\\Data\\Augmentation\\Provider\\TokenServiceProvider;\nuse Fiedsch\\Data\\File\\CsvWriter;\n  \ntry {\n\n  $augmentor = new Augmentor();\n \n  $augmentor-\u003eregister(new TokenServiceProvider());\n  \n  $augmentor-\u003eaddRule('token', function (Augmentor $augmentor, $data) {\n     return [ 'token' =\u003e $augmentor['token']-\u003egetUniqueToken() ];\n   });\n  \n   $reader = new CsvReader(\"testdata.csv\", \";\");\n   \n   $writer = new CsvWriter(\"testdata.augmented.txt\", \"\\t\");\n   \n   $header_written = false;\n   \n   while (($line = $reader-\u003egetLine(Reader::SKIP_EMPTY_LINES)) !== null) {\n     $result = $augmentor-\u003eaugment($line);\n     if (!$header_written) {\n        $writer-\u003eprintLine(array_merge(['input_line'], array_keys($result), $reader-\u003egetHeader()));\n        $header_written = true;\n     }\n     $writer-\u003eprintLine(array_merge([$reader-\u003egetLineNumber()], $result, $line));\n   }\n   \n   $writer-\u003eclose();\n \n } catch (Exception $e) {\n     print $e-\u003egetMessage() . \"\\n\";\n }\n ```\n \n ### Creating Tokens\n \n Method one: let the `TokenCreator` make sure, we have unique tokens:\n ```php\n \u003c?php\n  \n require __DIR__ . '/../vendor/autoload.php';\n  \n use Fiedsch\\Data\\Utility\\TokenCreator;\n use Fiedsch\\Data\\File\\Writer;\n\n\n$creator = new TokenCreator(10, TokenCreator::UPPER);\n\n$output = new Writer('mytokens.txt');\n$numTokens = 1000;\n\nwhile ($numTokens-- \u003e 0) {\n  $token = $creator-\u003egetUniqueToken();\n  $output-\u003eprintLine([$token]);\n}\n$output-\u003eclose();\n```\n\nMethod two: generate tokens first and then check if they are unique. This might be faster and less \nresource consuming for large amounts of tokens:\n\n ```php\n  // same as above, exept \n  // $token = $creator-\u003egetUniqueToken();\n  // becomes\n  $token = $creator-\u003ecretateToken();\n```\nCheck that the generated tokens are unique\n```bash\necho \" both lines show the same numbers, there were no duplicate tokens\"\nwc -l mytokens.csv\nsort mytokens.csv | uniq | wc -l\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffiedsch%2Fdatamanagement","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffiedsch%2Fdatamanagement","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffiedsch%2Fdatamanagement/lists"}