{"id":13567089,"url":"https://github.com/digitalmethodsinitiative/4cat","last_synced_at":"2026-01-22T18:57:20.084Z","repository":{"id":37446663,"uuid":"150257402","full_name":"digitalmethodsinitiative/4cat","owner":"digitalmethodsinitiative","description":"The 4CAT Capture and Analysis Toolkit provides modular data capture \u0026 analysis for a variety of social media platforms.","archived":false,"fork":false,"pushed_at":"2026-01-20T15:39:12.000Z","size":73014,"stargazers_count":354,"open_issues_count":78,"forks_count":64,"subscribers_count":12,"default_branch":"master","last_synced_at":"2026-01-20T19:39:13.477Z","etag":null,"topics":["digitalmethods","python","scraping","social-media","textanalysis"],"latest_commit_sha":null,"homepage":"https://4cat.nl","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/digitalmethodsinitiative.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":".zenodo.json","notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2018-09-25T11:52:37.000Z","updated_at":"2026-01-20T15:39:16.000Z","dependencies_parsed_at":"2025-04-17T10:58:01.551Z","dependency_job_id":"d8e74c8a-1b61-4aeb-b435-b053ff8472ea","html_url":"https://github.com/digitalmethodsinitiative/4cat","commit_stats":null,"previous_names":[],"tags_count":34,"template":false,"template_full_name":null,"purl":"pkg:github/digitalmethodsinitiative/4cat","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/digitalmethodsinitiative%2F4cat","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/digitalmethodsinitiative%2F4cat/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/digitalmethodsinitiative%2F4cat/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/digitalmethodsinitiative%2F4cat/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/digitalmethodsinitiative","download_url":"https://codeload.github.com/digitalmethodsinitiative/4cat/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/digitalmethodsinitiative%2F4cat/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28668721,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-22T17:07:18.858Z","status":"ssl_error","status_checked_at":"2026-01-22T17:05:02.040Z","response_time":144,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["digitalmethods","python","scraping","social-media","textanalysis"],"created_at":"2024-08-01T13:02:23.358Z","updated_at":"2026-01-22T18:57:20.074Z","avatar_url":"https://github.com/digitalmethodsinitiative.png","language":"Python","readme":"# ![](https://github.com/digitalmethodsinitiative/4cat/tree/master/common/assets/logo_readme.png) 4CAT: Capture and Analysis Toolkit\n\n[![DOI: 10.5117/CCR2022.2.007.HAGE](https://zenodo.org/badge/DOI/10.5117/ccr2022.2.007.hage.svg)](https://doi.org/10.5117/CCR2022.2.007.HAGE)\n[![DOI: 10.5281/zenodo.4742622](https://zenodo.org/badge/DOI/10.5281/zenodo.4742622.svg)](https://doi.org/10.5281/zenodo.4742622)\n[![License: MPL 2.0](https://img.shields.io/badge/license-MPL--2.0-informational)](https://github.com/digitalmethodsinitiative/4cat/blob/master/LICENSE)\n[![Requires Python 3.11](https://img.shields.io/badge/py-v3.11-blue)](https://www.python.org/)\n[![Docker image status](https://github.com/digitalmethodsinitiative/4cat/actions/workflows/docker_latest.yml/badge.svg)](https://github.com/digitalmethodsinitiative/4cat/actions/workflows/docker_latest.yml)\n\n\u003cp align=\"center\"\u003e\u003cimg alt=\"A screenshot of 4CAT, displaying its 'Create Dataset' interface\" src=\"common/assets/screenshot1.png\"\u003e\u003cimg alt=\"A screenshot of 4CAT, displaying a network visualisation of a dataset\" src=\"common/assets/screenshot2.png\"\u003e\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e4CAT has a website at \u003ca href=\"https://4cat.nl\"\u003e4cat.nl\u003c/a\u003e.\u003c/p\u003e\n\u003cp align=\"center\"\u003e\u003ca href=\"https://bsky.app/profile/4cat.nl\"\u003eFollow 4CAT on Bluesky\u003c/a\u003e for updates.\u003c/p\u003e\n4CAT is a research tool that can be used to analyse and process data from\nonline social platforms. Its goal is to make the capture and analysis of data\nfrom these platforms accessible to people through a web interface, without\nrequiring any programming or web scraping skills. Our target audience is\nresearchers, students and journalists interested using Digital Methods in their\nwork.\n\nIn 4CAT, you create a dataset from a given platform according to a given set of\nparameters; the result of this (usually a CSV or JSON file containing matching items)\ncan then be downloaded or analysed further with a suite of analytical\n'processors', which range from simple frequency charts to more advanced analyses\nsuch as the generation and visualisation of word embedding models.\n\n4CAT has a (growing) number of supported data sources corresponding to popular\nplatforms that are part of the tool, but you can also [add additional data\nsources](https://github.com/digitalmethodsinitiative/4cat/wiki/How-to-make-a-data-source)\nusing 4CAT's Python API. The following data sources are currently supported\nactively and can be used to collect data with 4CAT directly:\n\n* 4chan and 8kun\n* Bluesky\n* Telegram\n* TikTok (from a list of TikTok post URLs)\n* Tumblr\n\nThe following platforms are supported through \n[Zeeschuimer](https://github.com/digitalmethodsinitiative/zeeschuimer), with \nwhich you can collect data to import into 4CAT for analysis:\n\n* 9gag\n* Douyin\n* Gab\n* Imgur\n* Instagram (posts)\n* LinkedIn\n* Pinterest\n* Threads\n* Truth.social\n* TikTok (posts and comments)\n* X/Twitter\n* Xiaohangshu\n\nIt is also possible to upload data collected with other tools as CSV files, or \nzip archives of media files (i.e. video, images, and audio). The following \ntools are explicitly supported but other data can also be uploaded as long as \nit is formatted as CSV or uses a common media file format:\n\n* Facebook and Instagram (via [CrowdTangle](https://www.crowdtangle.com) or [Facepager](https://github.com/strohne/Facepager) exports)\n* YouTube videos and comments (via the [YouTube Data Tools](https://ytdt.digitalmethods.net/))\n* Weibo (via [Bazhuayu](https://www.bazhuayu.com/))\n\nA number of other platforms have built-in support that is untested, or requires\ne.g. special API access. You can view the [data sources in our wiki](https://github.com/digitalmethodsinitiative/4cat/wiki/Available-data-sources) or review [the data\nsources' code](https://github.com/digitalmethodsinitiative/4cat/tree/master/datasources)\nin the GitHub repository.\n\n## Installation\nYou can install 4CAT locally or on a server via Docker or manually. For easiest installation, we recommend copying our [`docker-compose.yml file`](https://raw.githubusercontent.com/digitalmethodsinitiative/4cat/master/docker-compose.yml), [`.env`](https://raw.githubusercontent.com/digitalmethodsinitiative/4cat/master/.env) file, and running this terminal command in the folder where those files have been saved:\n\n```\ndocker-compose up -d\n```\n\nIn depth instructions on both Docker installation and manual installation can be found [in our\nwiki](https://github.com/digitalmethodsinitiative/4cat/wiki/Installing-4CAT). A video walkthrough installing 4CAT via Docker can be found on [YouTube here](https://youtu.be/oWsB7bvNfOY).\n\nCurrently scraping of 4chan, 8chan, and 8kun require additional steps; please see the wiki.\n\nPlease check our\n[issues](https://github.com/digitalmethodsinitiative/4cat/issues) and create\none if you experience any problems (pull requests are also very welcome).\n\n### Upgrading 4CAT\nInstructions on upgrading 4CAT from previous versions [can be found in our wiki](https://github.com/digitalmethodsinitiative/4cat/wiki/Upgrading-4CAT).\n\n## Modules\n4CAT is a modular tool and easy to extend. The following two folders in the \nrepository are of interest for this: \n\n- `datasources`: Data source definitions. This is a set of configuration\n  options, database definitions and python scripts to process this data with.\n  If you want to set up your own data sources, refer to the\n  [wiki](https://github.com/digitalmethodsinitiative/4cat/wiki/How-to-make-a-data-source).\n- `processors`: A collection of data processing scripts that can plug into\n  4CAT to manipulate or process datasets created with 4CAT. There is an API\n  you can use to [make your own\n  processors](https://github.com/digitalmethodsinitiative/4cat/wiki/How-to-make-a-processor).\n\n## Credits \u0026 License\n4CAT was created at [OILab](https://oilab.eu) and the\n[Digital Methods Initiative](https://www.digitalmethods.net) at the University\nof Amsterdam. The tool was inspired by\n[DMI-TCAT](https://wiki.digitalmethods.net/Dmi/ToolDmiTcat), a tool with\ncomparable  functionality that can be used to scrape and analyse Twitter data.\n\n4CAT development is supported by the Dutch [PDI-SSH](https://pdi-ssh.nl/en/)\nfoundation through the [CAT4SMR project](https://cat4smr.humanities.uva.nl/).\n\n4CAT is licensed under the Mozilla Public License, 2.0. Refer to the `LICENSE`\nfile for more information.\n","funding_links":[],"categories":["Python"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdigitalmethodsinitiative%2F4cat","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdigitalmethodsinitiative%2F4cat","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdigitalmethodsinitiative%2F4cat/lists"}