{"id":50915327,"url":"https://github.com/open-sci/2021-2022-la-chouffe-code","last_synced_at":"2026-06-16T14:02:32.047Z","repository":{"id":46215596,"uuid":"486367730","full_name":"open-sci/2021-2022-la-chouffe-code","owner":"open-sci","description":"The repository for the team La Chouffe of the Open Science course a.a. 2021/20212","archived":false,"fork":false,"pushed_at":"2022-08-10T10:15:59.000Z","size":197407,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2023-03-10T06:56:52.203Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/open-sci.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-04-27T22:20:10.000Z","updated_at":"2022-05-23T08:48:25.000Z","dependencies_parsed_at":"2022-08-12T12:40:59.621Z","dependency_job_id":null,"html_url":"https://github.com/open-sci/2021-2022-la-chouffe-code","commit_stats":null,"previous_names":[],"tags_count":null,"template":null,"template_full_name":null,"purl":"pkg:github/open-sci/2021-2022-la-chouffe-code","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/open-sci%2F2021-2022-la-chouffe-code","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/open-sci%2F2021-2022-la-chouffe-code/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/open-sci%2F2021-2022-la-chouffe-code/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/open-sci%2F2021-2022-la-chouffe-code/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/open-sci","download_url":"https://codeload.github.com/open-sci/2021-2022-la-chouffe-code/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/open-sci%2F2021-2022-la-chouffe-code/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34408788,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-16T02:00:06.860Z","response_time":126,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-16T14:02:31.120Z","updated_at":"2026-06-16T14:02:32.034Z","avatar_url":"https://github.com/open-sci.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 2021-2022-la-chouffe-code\nThe repository for the team La Chouffe of the Open Science course a.a. 2021/2022\n\n## Information about the Project\n\n### Data Management Plan\n- Venditti Giulia, Catizone Chiara, \u0026 Brembilla Davide. (2022). La Chouffe - Data Management Plan (0.0.3). Zenodo. https://doi.org/10.5281/zenodo.6570286\n\n### Protocol introducing the methodology\n- Davide Brembilla, Chiara Catizone, \u0026 Giulia Venditti. (2022). PROTOCOL – Availability of Open Access Metadata from Open Journals – A case study in DOAJ and Crossref V.4. Protocol. protocols.io. https://doi.org/10.17504/protocols.io.kxygxz7ywv8j/v4\n\n### Software developed\n- GiuliaVenditti, dbrembilla, ChiaraCati, \u0026 Silvio Peroni. (2022). open-sci/2021-2022-la-chouffe-code: v.0.0.1 (prerelease). Zenodo. https://doi.org/10.5281/zenodo.6857310\n\n\n### Data Gathered\n- Davide Brembilla, Chiara Catizone, \u0026 Giulia Venditti.  (2022). La Chouffe Dataset (0.0.1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6562909\n\n### Article Presenting the Research\n- Davide Brembilla, Chiara Catizone \u0026 Giulia Venditti. (2022). Availability of Article Metadata from Open Journals – A case study in DOAJ and Crossref. https://doi.org/10.5281/zenodo.6570290\n\n\n### Slides supporting the presentation\n- Chiara Catizone, Davide Brembilla, \u0026 Giulia Venditti. (2022, May 25). Presentation La Chouffe team. Zenodo. https://doi.org/10.5281/zenodo.6579263\n\n## Software requirements\n\nTested on Python \u003e 3.9.\n\nrequests==2.27.1\nrequests-cache == 0.9.4\ntqdm==4.62.3\nbackoff==2.0.1\npandas == 1.4.2\n\nYou can install these  with \u003ccode\u003epip install -r requirements.txt\u003c/code\u003e\n\n## Launching the software\n\nTo use this software you can use from the command line you need first to download both the journals' and the articles' dump from the [DOAJ](https://doaj.org/docs/public-data-dump/). \n\nSpecifics of the computer used for the Estimated Time Allocated (ETA) values:\n- Laptop Lenovo Ideapad 5\n- Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz 8 core\n- 8 GB RAM\n- Windows 10 64 bit\n\nThese are the commands used in order to create the final dump:\n\n\u003ccode\u003epy -m batches_cleaner \"path/to/articles/dump\"\u003c/code\u003e\nETA: 30s\n\n\u003ccode\u003epy -m main \"cleaned\"\u003c/code\u003e\nETA: about 1h per batch (In our case: ca. 78h)\n\n\u003ccode\u003epy -m  stats \"temp/completed\"\u003c/code\u003e\nETA: 5m\n\n\u003ccode\u003epy -m journal_cleaner \"path/to/journal/dump\" \u003c/code\u003e\nETA: 1m\n\n\u003ccode\u003epy -m populator \"stats\"\u003c/code\u003e\nETA: 1,30h\n\nIn the end, the pickle file was created through the Python interpreter:\n\n\u003ccode\u003epy #open the python shell\u003c/code\u003e  \u003cbr\u003e\n\u003ccode\u003eimport pandas as pd\nfrom stats import get_all_in_dir\ndir = get_all_in_dir('results','csv')\ndf = pd.concat([pd.read_csv(file, encoding='utf8') for file in dir])\ndf.to_pickle('result.pkl')\u003c/code\u003e\nETA: 10m\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopen-sci%2F2021-2022-la-chouffe-code","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopen-sci%2F2021-2022-la-chouffe-code","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopen-sci%2F2021-2022-la-chouffe-code/lists"}