{"id":39496960,"url":"https://github.com/clarin-eric/resource-families-html-generator","last_synced_at":"2026-01-18T05:43:14.663Z","repository":{"id":53027657,"uuid":"209677301","full_name":"clarin-eric/resource-families-html-generator","owner":"clarin-eric","description":"Script for generating html table containing data about corpora from .csv file","archived":false,"fork":false,"pushed_at":"2024-10-30T13:47:09.000Z","size":2525,"stargazers_count":3,"open_issues_count":0,"forks_count":4,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-09-10T03:14:15.580Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/clarin-eric.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2019-09-20T01:13:48.000Z","updated_at":"2024-10-30T13:46:33.000Z","dependencies_parsed_at":"2023-02-12T02:30:31.665Z","dependency_job_id":"e27196a2-9f0d-48fd-9a6f-f5f9801f65fd","html_url":"https://github.com/clarin-eric/resource-families-html-generator","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/clarin-eric/resource-families-html-generator","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clarin-eric%2Fresource-families-html-generator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clarin-eric%2Fresource-families-html-generator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clarin-eric%2Fresource-families-html-generator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clarin-eric%2Fresource-families-html-generator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/clarin-eric","download_url":"https://codeload.github.com/clarin-eric/resource-families-html-generator/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clarin-eric%2Fresource-families-html-generator/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28531366,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-18T00:39:45.795Z","status":"online","status_checked_at":"2026-01-18T02:00:07.578Z","response_time":98,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-01-18T05:43:14.606Z","updated_at":"2026-01-18T05:43:14.655Z","avatar_url":"https://github.com/clarin-eric.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"ClarTable\n=========\n\nInstallation\n------------\nWorks with Python `^3.9`\n```bash\ngit clone git@github.com:clarin-eric/resource-families-html-generator.git # via SSH or\ngit clone https://github.com/clarin-eric/resource-families-html-generator.git # via HTTPS\ncd ./resource-families-html-generator/\npip install .\n```\n\nAbout\n-----\n*ClarTable* is a Python module for generating html presentation layer for tabular data from .csv file.\n\n### Usage\n\n#### Locally:\n```bash\nusage: python -m rfhg [-h] -i PATH -r PATH -o PATH\n\nCreate html table from given data and rules. \nTo navigate static resources within the module prepend `static.` \nto the path, eg. `-r static.rules/rules.json`\n\noptional arguments:\n  -h, --help  show this help message and exit\n  -i PATH     path to a .csv file or folder with .csv files\n  -r PATH     path to a .json file with rules\n  -o PATH     path to file where output html table will be generated\n```\n\n\n#### Via CI:\nThe html tables for resource families can be generated via GitHub. Push new .csv files to `/resouce_families` and after processing they will appear in gh-pages branch.   \n\n### CSV format\nIn order to create html table from .csv file with default rules, the file requires __all of following columns__ (order not important). Note that names of columns are case sensitive. If you need generator to consider additional columns contact \u003cmichal@clarin.eu\u003e or adjust __rules.json__.\n\nMake sure, that your .csv files __use ; (semicolon)__ as a column separator. \n\nSingle cell may containt multiple paragraphs or structures split with __#SEP__ separator. Following the example below the Description cell consists of 3 paragraphs. Some of the cells depend on others, looking into Buttons cell there are 2 buttons names split with the separator and respective URLs in Buttons_URL.\n\nCorpus | Corpus_URL | Language | Size | Annotation | Licence | Description | Buttons | Buttons_URL | Publication | Publication_URL | Note\n-------|------------|----------|------|------------|---------|-------------|---------|-------------|-------------|-----------------|-------\nExample Corpus Name | www.examplaryurl.com | English | 100 million tokens | tokenised, PoS-tagged, lemmatised | CC-BY | First examplary sentence #SEPSecond examplary sentence to be started from new line #SEPExample with ```\u003ca href=\"http://some.url\"\u003ehyperlink\u003c/a\u003e``` in it | Concordancer#SEPDownload | https://www.concordancer.com/ #SEPhttps://www.download.com | Smith et al. (3019) | https://publication.url | Note text to be displayed in button field\n\nResulting table:\n![Examplary table](docs/media/example.png)\n\n### Table titles and ordering\nTable title will be derived from the .csv file name in format X-table_title.csv, where X is index used for table ordering. \nTables can be grouped into sections by storing them in the intermediate directory within corpora that is subject to the same indexation principle as .csv files.\nFor example corpora with structure:\n```bash\nHistorical corpora\n├── 1-Historical corpora in the CLARIN infrastructure\n│   ├── 1-Monolingual corpora.csv\n│   └── 2-Multilingual corpora.csv\n└── 2-Other historical corpora\n    ├── 1-Monolingual corpora.csv\n    └── 2-Multilingual corpora.csv\n```\nWill produce:\n\n![Examplary corpora](docs/media/corpora.png)\n\n### Rules format\nRules are composed of nested json notation of tags and field. \nGiven rule:\n```javascript\n{\"tags\": [\n\t{\"tag\": \"\u003ctable class=\\\"table\\\" cellspacing=\\\"2\\\"\u003e\", \"tags\": [\n\t\t{\"tag\": \"\u003cthead\u003e\", \"tags\": [\n\t\t\t{\"tag\": \"\u003ctr\u003e\", \"tags\": [\n\t\t\t\t{\"tag\": \"\u003cth\u003e\", \"text\": \"Corpus name\"}\n\t\t\t]}\t\n\t\t]},\n\t\t{\"tag\": \"\u003ctbody\u003e\", \"tags\": [\n\t\t\t{\"tag\": \"\u003ctr\u003e\", \"tags\": [\n\t\t\t\t{\"tag\": \"\u003ctd valign=\\\"top\\\"\", \"tags\": [\n\t\t\t\t\t{\"tag\": \"\u003cp\u003e\", \"fields\": [\n\t\t\t\t\t\t{\"text\": \"\u003cstrong\u003eField data\u003c/strong\u003e will be inserted here: %s\", \"columns\": ['column_name_in_csv_file']}\n\t\t\t\t\t]}\n\t\t\t\t]}\n\t\t\t]}\n\t\t]}\n\t]}\n]}\n```\n\nGenerated html table with names of corpora, assuming there were only 2 rows in a .csv file\n```html\n\u003ctable class =\"table\" cellspacing=\"2\"\u003e\n        \u003cthead\u003e\n                \u003ctr\u003e\n                        \u003cth valign=\"top\"\u003eCorpus name\n                        \u003c/th\u003e\n                \u003c/tr\u003e\n        \u003c/thead\u003e\n        \u003ctbody\u003e\n                \u003ctr\u003e\n                        \u003ctd valign=\"top\"\u003e\n                                \u003cp\u003e\n                                \u003cstrong\u003eField data\u003c/strong\u003e will be inserted here: NKJP 2.1.4\n                                \u003c/p\u003e\n                        \u003c/td\u003e\n                \u003c/tr\u003e\n        \u003c/tbody\u003e\n        \u003ctbody\u003e\n                \u003ctr\u003e\n                        \u003ctd valign=\"top\"\u003e\n                                \u003cp\u003e\n                                \u003cstrong\u003eField data\u003c/strong\u003e will be inserted here: Common Crawl\n                                \u003c/p\u003e\n                        \u003c/td\u003e\n                \u003c/tr\u003e\n        \u003c/tbody\u003e\n\u003c/table\u003e\n\n```\n\u003ctable class =\"table\" cellspacing=\"2\"\u003e\n        \u003cthead\u003e\n                \u003ctr\u003e\n                        \u003cth valign=\"top\"\u003eCorpus name\n                        \u003c/th\u003e\n                \u003c/tr\u003e\n        \u003c/thead\u003e\n        \u003ctbody\u003e\n                \u003ctr\u003e\n                        \u003ctd valign=\"top\"\u003e\n                                \u003cp\u003eSome text here\n                                \u003cstrong\u003eField data\u003c/strong\u003e will be inserted here: NKJP 2.1.4\n                                \u003c/p\u003e\n                        \u003c/td\u003e\n                \u003c/tr\u003e\n        \u003c/tbody\u003e\n        \u003ctbody\u003e\n                \u003ctr\u003e\n                        \u003ctd valign=\"top\"\u003e\n                                \u003cp\u003eSome text here\n                                \u003cstrong\u003eField data\u003c/strong\u003e will be inserted here: Common Crawl\n                                \u003c/p\u003e\n                        \u003c/td\u003e\n                \u003c/tr\u003e\n        \u003c/tbody\u003e\n\u003c/table\u003e\n\n\n\n\\\u003ctbody\\\u003e tag encloses tags and fields for row creation, only tags nested within \\\u003ctbody\\\u003e ... \\\u003c/tbody\\\u003e can contain \"fields\": []\n\n\t\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fclarin-eric%2Fresource-families-html-generator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fclarin-eric%2Fresource-families-html-generator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fclarin-eric%2Fresource-families-html-generator/lists"}