{"id":20794144,"url":"https://github.com/sohaamir/osf_digest","last_synced_at":"2025-10-27T14:25:32.907Z","repository":{"id":233938292,"uuid":"788059643","full_name":"sohaamir/osf_digest","owner":"sohaamir","description":"Get a summary of OSF Preprints using the BART CNN","archived":false,"fork":false,"pushed_at":"2024-05-25T15:19:10.000Z","size":226,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-01-18T11:16:12.767Z","etag":null,"topics":["api","bart-large-cnn","huggingface","osf","python","transformers"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sohaamir.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-04-17T17:44:36.000Z","updated_at":"2024-05-25T15:19:13.000Z","dependencies_parsed_at":"2024-04-28T19:52:33.578Z","dependency_job_id":null,"html_url":"https://github.com/sohaamir/osf_digest","commit_stats":null,"previous_names":["sohaamir/osf_digest"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sohaamir%2Fosf_digest","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sohaamir%2Fosf_digest/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sohaamir%2Fosf_digest/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sohaamir%2Fosf_digest/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sohaamir","download_url":"https://codeload.github.com/sohaamir/osf_digest/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243137219,"owners_count":20242184,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["api","bart-large-cnn","huggingface","osf","python","transformers"],"created_at":"2024-11-17T16:13:46.470Z","updated_at":"2025-10-27T14:25:27.869Z","avatar_url":"https://github.com/sohaamir.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# OSF Preprint Digest\nOSF Preprint Digest is a command-line tool that retrieves preprints from the Open Science Framework (OSF) generating concise summaries using the [BART-large-CNN model](https://huggingface.co/facebook/bart-large-cnn). I was surprised to see that there was (to my knowledge at least) no readily available tool that summarises published articles in this way, so I decided to make it myself 🙂\n\n## Functionality\n- The tool allows the user to specify the date interval for abstracts (i.e., the period from the request), as well as the discipline.\n- API calls are separately made for both the abstract and the author list, which are then combined.\n- It filters the preprints to include only those with English abstracts.\n\n## Installation\nClone the repository:\n```bash\ngit clone https://github.com/sohaamir/osf_digest.git\n```\n\nChange to the project directory:\n```bash\ncd osf_digest\n```\n\nInstall the required dependencies:\n```bash\npip install -r requirements.txt\n```\n\nwhich will install the following:\n\n```bash\nrequests\ntqdm\nunidecode\nlangdetect\ntransformers\ntorch\npython-dotenv\n```\n\nInstall the CLI tool from the project root:\n```bash\npip install -e .\n```\n\n## Configuration\nTo run the script, you need to set up the necessary API tokens, which we can then load into the script using `dotenv`.\n\nTo do this: \n- Create a file named `.env` in the project root directory.\n- Open the `.env` file and add the following lines:\n\n```bash\nOSF_TOKEN=your_osf_token\nHF_TOKEN=your_hf_token\n```\n\nReplace your_osf_token with your OSF API token and your_hf_token with your Hugging Face API token. Both are freely available on the respective websites, you just need to make an account.\n\nYour folder structure should look (something) like this:\n\n```\n├── LICENSE\n├── README.md\n├── cli\n│   ├── __init__.py\n│   ├── __pycache__\n│   │   ├── __init__.cpython-311.pyc\n│   │   └── cli.cpython-311.pyc\n│   ├── cli.py\n│   └── instructions.md\n├── data\n│   ├── csv\n│   │   └── 2024-05-25_16-11-31_Psychiatry_Cognitive Neuroscience_7day_preprints.csv\n│   └── json\n│       ├── 2024-05-25_16-11-31_Cognitive Neuroscience_7day_preprints.json\n│       └── 2024-05-25_16-11-31_Psychiatry_7day_preprints.json\n├── osf_digest.egg-info\n│   ├── PKG-INFO\n│   ├── SOURCES.txt\n│   ├── dependency_links.txt\n│   ├── entry_points.txt\n│   ├── requires.txt\n│   └── top_level.txt\n├── output\n│   └── digests\n│       └── 2024-05-25_16-13-48_Psychiatry_Cognitive Neuroscience_7day_summary.csv\n├── request_preprints.py\n├── requirements.txt\n└── setup.py\n```\n\n(I have decided to include some example outputs with the folder, but feel free to delete them. The actual CLI code is contained within /cli/cli.py but the 'barebones' Python code is also provided in the request_preprints.py script in the repository's root.)\n\n## How the script works\n\nWe firstly import a number of external modules/packages:\n\n```python\nimport requests\nimport json\nfrom datetime import datetime, timedelta\nfrom tqdm import tqdm\nfrom unidecode import unidecode\nfrom concurrent.futures import ThreadPoolExecutor, as_completed\nimport os\nimport time\nfrom langdetect import detect\nimport csv\nfrom transformers import BartTokenizer, BartForConditionalGeneration, pipeline\nimport torch\nfrom dotenv import load_dotenv\n```\n\n- `requests` to get the information from OSF\n- `datetime` to set the length of time for the request to cover\n- `tqdm` for progress bars when pulling the information\n- `unidecode` to sort out pesky characters (most commonly in author's names)\n- `concurrent.futures` to speed things up\n- `langdetect` to remove non-English abstracts\n- `transformers` and `torch` to run the BART model\n- `dotenv` to load the environmental variables (our OSF and Hugging Face tokens)\n\n## Usage\nTo run the OSF Preprint Digest tool, use the following command:\n\n```bash\nosf_digest --disciplines \u003cdiscipline1\u003e \u003cdiscipline2\u003e ... --days \u003cdays\u003e --pagesize \u003cpagesize\u003e --max_length \u003cmax_length\u003e --min_length \u003cmin_length\u003e\n```\n\nArguments\n```bash\n--disciplines: List of disciplines to retrieve preprints from (default: ['Psychiatry'])\n--days: Number of days to summarize since today (default: 7, range: 0-365)\n--pagesize: Number of preprints to retrieve per request (default: 100)\n--max_length: Maximum length of the generated summary (default: 170)\n--min_length: Minimum length of the generated summary (default: 30)\n```\n\nExample\n```bash\nosf_digest --disciplines \"Psychiatry\" \"Neuroscience\" --days 14 --pagesize 50 --max_length 200 --min_length 50\n```\nThis command will retrieve preprints from the \"Psychiatry\" and \"Neuroscience\" disciplines published in the last 14 days, with a page size of 50 preprints per request. The generated summaries will have a maximum length of 200 and a minimum length of 50.\n\nAll disciplines on OSF are supported, please visit the OSF website for a complete list. You can see the discipline list for a preprint on it's webpage.\n\n### Output\nThe OSF Preprint Digest tool generates the following output files:\n\n- JSON files containing the retrieved preprints, saved in the data/json directory (a single example):\n\n```json\n{\n        \"id\": \"j726r\",\n        \"title\": \"How visual experience shapes body representation\",\n        \"authors\": [\n            \"Shahzad, Iqra\",\n            \"Occelli, Valeria\",\n            \"Giraudet, Eleonore\",\n            \"Azanon, Elena\",\n            \"Longo, Matthew\",\n            \"Moruaux, Andre\",\n            \"Collignon, Olivier\"\n        ],\n        \"abstract\": [\n            \"We do not have a veridical representation of our body in our mind. For instance,\",\n            \"tactile distances of equal measure along the medial-lateral axis of our limbs are generally perceived\",\n            \"as larger than those running along the proximal-distal axis. This anisotropy in tactile distances reflects\",\n            \"distortions in body-shape representation, such that the body parts are perceived as wider than they\",\n            \"are. While the origin of such anisotropy remains unknown, it has been suggested that visual\",\n            \"experience could partially play a role in its manifestation. To causally test the role of\",\n            \"visual experience on body shape representation, we investigated tactile distance perception in sighted and early\",\n            \"blind individuals comparing medial-lateral and proximal-distal tactile distances of stimuli presented on the ventral and\",\n            \"dorsal part of the forearm, wrist, and hand. Overestimation of distances in the medial-lateral over\",\n            \"proximal-distal body axes were found in both sighted and blind people, but the magnitude of\",\n            \"the anisotropy was significantly reduced in the forearms of blind people. We conclude that tactile\",\n            \"distance perception is mediated by similar mechanisms in both sighted and blind people, but that\",\n            \"visual experience can modulate the tactile distance anisotropy.\"\n        ],\n        \"date_published\": \"2024-04-20T16:07:35.153014\",\n        \"license\": \"563c1cf88c5e4a3877f9e96c\",\n        \"disciplines\": [\n            \"Psychiatry\"\n        ],\n        \"tags\": [\n            \"blindness\",\n            \"tactile distance anisotropy\",\n            \"touch\",\n            \"vision\"\n        ]\n    },\n```\n\n- A CSV file containing the preprint summaries, saved in the data/csv directory:\n\n| title                                                        | authors                                                      | abstract                                                     | disciplines |\n| ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ----------- |\n| The architecture of spontaneous thoughts  and experiences: a graph theory approach. | Coppola, Peter; Sikka, Pilleriin; Valli, Katja; Tuominen, Jarno;  Revonsuo, Antti; Loukola, Ville; Bernstein, Ryan; , nanna.strif; Kirberg,  Manuela; Ezquerro-Nassar, Alejandro; Windt, Jennifer; Noreika, Valdas; mota,  natalia; Bekinschtein, Tristan | The language people use in everyday life provides a window into the mind.  Mind-wandering and dreams have been thought to reflect unique individual  differences and mental health. Here we use a large dataset of mind-wandering  (n=1619) and dream (n=1434) reports from 176 individuals in conjunction with  graph theory applied to natural language. We find that dream reports have a  more complex structure, while mind-wandering reports have fewer word  repetitions and more verbose structure, with essential nodal points in the  narrative flux. Dream reports tend to have more thematic repetitions, local  cliques, and global integration. Capitalising on a repeated measures design,  we found that the structure of dream and mind-wandering reports contains  individual-specific information. Finally, we find that word centrality in  dreams is predictive of depression symptoms. Thus, this approach is sensitive  to individual differences, quantitatively differentiates two distinct  contents of consciousness, and seems promising for cost-effective analyses of  large naturalistically occurring qualitative datasets. | Psychiatry  |\n| How visual experience shapes body  representation            | Shahzad, Iqra; Occelli, Valeria; Giraudet, Eleonore; Azanon, Elena;  Longo, Matthew; Moruaux, Andre; Collignon, Olivier | We do not have a veridical representation of our body in our mind. For  instance, tactile distances of equal measure along the medial-lateral axis of  our limbs are generally perceived as larger than those running along the  proximal-distal axis. This anisotropy in tactile distances reflects  distortions in body-shape representation, such that the body parts are  perceived as wider than they are. While the origin of such anisotropy remains  unknown, it has been suggested that visual experience could partially play a  role in its manifestation. To causally test the role of visual experience on  body shape representation, we investigated tactile distance perception in  sighted and early blind individuals comparing medial-lateral and  proximal-distal tactile distances of stimuli presented on the ventral and  dorsal part of the forearm, wrist, and hand. Overestimation of distances in  the medial-lateral over proximal-distal body axes were found in both sighted  and blind people, but the magnitude of the anisotropy was significantly  reduced in the forearms of blind people. We conclude that tactile distance  perception is mediated by similar mechanisms in both sighted and blind  people, but that visual experience can modulate the tactile distance  anisotropy. | Psychiatry  |\n| Towards understanding and halting  legacies of trauma        | Taylor, William; Korobkova, Laura; Bhinderwala, Nabeel; Dias, Brian G | Echoes of natural and anthropogenic traumas not only reverberate within  the physiology, biology, and neurobiology of the generation directly exposed  to them but also within the biology of future generations. With the intent of  understanding this phenomenon, significant efforts have sought to establish  multi-generational legacies of experiences like stress, chemical exposures,  nutritional impoverishment, and chemosensory experiences. From these studies,  we are gaining new appreciation for how legacies of trauma come to be  bequeathed to future generations. This review first outlines principles that  merit attention in the study of multi-generational legacies of trauma. Next,  it discusses causes and consequences that allow for such legacies to  perpetuate across generations. Finally, we discuss silver linings of such  legacies and how legacies of flourishing can be engineered. In summary, this  review synthesizes our current understanding of the concept, causes and  consequences of legacies of trauma and looks to opportunities to halt them. | Psychiatry  |\n|                                                              |                                                              |                                                              |             |\n\n- A CSV file containing the discipline-wise summaries, saved in the output/digests directory (the summaries generated by the BART model):\n\n| Title                                                        | Authors                                                      | Summary                                                      | Discipline |\n| ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ---------- |\n| The architecture of spontaneous thoughts and  experiences: a graph theory approach. | Coppola, Peter; Sikka, Pilleriin; Valli, Katja; Tuominen, Jarno;  Revonsuo, Antti; Loukola, Ville; Bernstein, Ryan; , nanna.strif; Kirberg,  Manuela; Ezquerro-Nassar, Alejandro; Windt, Jennifer; Noreika, Valdas; mota,  natalia; Bekinschtein, Tristan | The language people use in everyday life provides a window into the mind.  Mind-wandering and dreams have been thought to reflect unique individual  differences and mental health. We find that dream reports have a more complex  structure. We also find that word centrality in dreams is predictive of  depression symptoms. | Psychiatry |\n| How visual experience shapes body  representation            | Shahzad, Iqra; Occelli, Valeria; Giraudet, Eleonore; Azanon, Elena;  Longo, Matthew; Moruaux, Andre; Collignon, Olivier | We do not have a veridical representation of our body in our mind. For  instance, tactile distances of equal measure along the medial-lateral axis of  our limbs are generally perceived as larger than those running along the  proximal-distal axis. This anisotropy in tactile distances reflects  distortions in body-shape representation. | Psychiatry |\n| Towards understanding and halting  legacies of trauma        | Taylor, William; Korobkova, Laura; Bhinderwala, Nabeel; Dias, Brian G | Echoes of natural and anthropogenic traumas reverberate within the  physiology, biology, and neurobiology of the generation directly exposed to  them. With the intent of understanding this phenomenon, significant efforts  have sought to establish multi-generational legacies of experiences like  stress, chemical exposures, nutritional impoverishment, and chemosensory  experiences. | Psychiatry |\n\n## License\nThis project is licensed under the MIT License. See the LICENSE file for more information.\n\n## Contributing\nContributions are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.\n\n## Acknowledgements\nThis tool utilizes the Open Science Framework (OSF) API for retrieving preprints.\nThe preprint summaries are generated using the BART-large-CNN model from Hugging Face.\n\n## Contact\nIf you have any questions or need further assistance, please contact me at axs2210@bham.ac.uk or open a thread!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsohaamir%2Fosf_digest","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsohaamir%2Fosf_digest","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsohaamir%2Fosf_digest/lists"}