{"id":15107975,"url":"https://github.com/mischau8/spevktator","last_synced_at":"2025-10-23T02:31:46.383Z","repository":{"id":58509402,"uuid":"528987562","full_name":"MischaU8/spevktator","owner":"MischaU8","description":"An open source investigation tool to collect and analyse public VK community wall posts","archived":false,"fork":false,"pushed_at":"2022-09-11T08:06:09.000Z","size":260,"stargazers_count":37,"open_issues_count":1,"forks_count":6,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-30T17:12:22.972Z","etag":null,"topics":["datasets","datasette","osint","python","sql","sqlite","vk","vkontakte"],"latest_commit_sha":null,"homepage":"https://spevktator.io/","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MischaU8.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-08-25T19:15:55.000Z","updated_at":"2024-06-11T12:12:12.000Z","dependencies_parsed_at":"2023-01-18T03:30:26.667Z","dependency_job_id":null,"html_url":"https://github.com/MischaU8/spevktator","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MischaU8%2Fspevktator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MischaU8%2Fspevktator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MischaU8%2Fspevktator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MischaU8%2Fspevktator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MischaU8","download_url":"https://codeload.github.com/MischaU8/spevktator/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":237769067,"owners_count":19363250,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["datasets","datasette","osint","python","sql","sqlite","vk","vkontakte"],"created_at":"2024-09-25T21:43:32.360Z","updated_at":"2025-10-23T02:31:41.042Z","avatar_url":"https://github.com/MischaU8.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Spevktator:  OSINT analysis tool for VK\n\n[![Python](https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C%203.10-blue)](https://www.python.org/)\n[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/datasette/blob/main/LICENSE)\n[![Test](https://github.com/MischaU8/spevktator/actions/workflows/test.yml/badge.svg)](https://github.com/MischaU8/spevktator/actions/workflows/test.yml)\n\n## Team Members\nMischa -  [Github profile](https://github.com/MischaU8)\nMorsaki - [Medium blog](https://medium.com/@rosa.noctis532)\n\n## Tool Description\nSpevktator provides a combined live feed of 5 popular Russian news channels on VK, along with translations, sentiment analysis and visualisation tools, all of which is accessible online, from anywhere (or offline if you prefer so). We currently have an archive of over 67,000 posts, dating back to the beginning of February 2022.\n\nOriginally, it was created to help research domestic Russian propaganda narratives, but can also act as a monitoring hub for VK media content, allowing researchers and journalists to stay up to date on disinformation, even as chaotic events unfold. For example *[Documenting Russian Coverage of the ZNPP](research/Russian%20Coverage%20of%20the%20ZNPP.md)* by Morsaki.\n\nSophisticated researchers can run this tool locally, against their own targets of research and even perform their detailed analysis offline through an SQL interface or Observable Notebook.\n\n## Online Demo\n\nIn our public demo, we collect posts from 5 popular Russian news channels on VK (`life`, `mash`, `nws_ru`, `ria` and `tassagency`).\n\nExplore their posts, together with sentiment analysis, metrics and English translation:\n\nhttps://spevktator.io/vk/posts_mega_view\n\nSome more examples:\n\n- [How often is \"Ukraine\" mentioned per week, together with average sentiment and total number of views?](\nhttps://spevktator.io/vk?sql=select+strftime%28%27%25Y-%25W%27%2C+date_utc%29+as+week%2C+count%28*%29+as+nr_posts%2C+round%28avg%28sentiment%29%2C+2%29+as+avg_sentiment%2C+sum%28views%29+from+posts_mega_view+where+text_en+like+%27%25Ukraine%25%27+group+by+week+order+by+week#g.mark=circle\u0026g.x_column=week\u0026g.x_type=ordinal\u0026g.y_column=nr_posts\u0026g.y_type=quantitative\u0026g.color_column=avg_sentiment\u0026g.size_column=sum(views)\n)\n- [Which weapon systems are most often mentioned?](\nhttps://spevktator.io/vk?sql=with+himars+as+%28%0D%0A++++select+date%28date_utc%29+as+day%2C+count%28%2A%29+as+cnt+from+posts+p+join+posts_translation+pt+on+p.id+%3D+pt.id+where+text_en+like+%22%25HIMARS%25%22+group+by+day%0D%0A%29%2C%0D%0Amlrs+as+%28%0D%0A++++select+date%28date_utc%29+as+day%2C+count%28%2A%29+as+cnt+from+posts+p+join+posts_translation+pt+on+p.id+%3D+pt.id+where+text_en+like+%22%25MLRS%25%22+group+by+day%0D%0A%29%2C%0D%0Asam+as+%28%0D%0A++++select+date%28date_utc%29+as+day%2C+count%28%2A%29+as+cnt+from+posts+p+join+posts_translation+pt+on+p.id+%3D+pt.id+where+text_en+like+%22%25S-300%25%22+group+by+day%0D%0A%29%2C%0D%0Acombined+as+%28%0D%0A++++select+%22HIMARS%22+as+weapon_type%2C+%2A+from+himars%0D%0A++++union+select+%22MLRS%22%2C+%2A+from+mlrs%0D%0A++++union+select+%22SAM%22%2C+%2A+from+sam%0D%0A%29+select+%2A+from+combined+order+by+day%0D%0A\u0026_hide_sql=1#g.mark=bar\u0026g.x_column=day\u0026g.x_type=temporal\u0026g.y_column=cnt\u0026g.y_type=quantitative\u0026g.color_column=weapon_type\n)\n- [Which Aircrafts are most often mentioned?](\nhttps://spevktator.io/vk?sql=with+mig29+as+%28%0D%0A++++select+date%28date_utc%29+as+day%2C+count%28%2A%29+as+cnt+from+posts+p+join+posts_translation+pt+on+p.id+%3D+pt.id+where+text_en+like+%22%25MiG-29%25%22+group+by+day%0D%0A%29%2C%0D%0Amig31+as+%28%0D%0A++++select+date%28date_utc%29+as+day%2C+count%28%2A%29+as+cnt+from+posts+p+join+posts_translation+pt+on+p.id+%3D+pt.id+where+text_en+like+%22%25MiG-31%25%22+group+by+day%0D%0A%29%2C%0D%0Asu25+as+%28%0D%0A++++select+date%28date_utc%29+as+day%2C+count%28%2A%29+as+cnt+from+posts+p+join+posts_translation+pt+on+p.id+%3D+pt.id+where+text_en+like+%22%25Su-25%25%22+group+by+day%0D%0A%29%2C%0D%0Asu35+as+%28%0D%0A++++select+date%28date_utc%29+as+day%2C+count%28%2A%29+as+cnt+from+posts+p+join+posts_translation+pt+on+p.id+%3D+pt.id+where+text_en+like+%22%25Su-35%25%22+group+by+day%0D%0A%29%2C%0D%0Acombined+as+%28%0D%0A++++select+%22MiG-29%22+as+aircraft%2C+%2A+from+mig29%0D%0A++++union+select+%22MiG-31%22%2C+%2A+from+mig31%0D%0A++++union+select+%22Su-25%22%2C+%2A+from+su25%0D%0A++++union+select+%22Su-35%22%2C+%2A+from+su35%0D%0A%29+select+%2A+from+combined+order+by+day%0D%0A\u0026_hide_sql=1#g.mark=bar\u0026g.x_column=day\u0026g.x_type=temporal\u0026g.y_column=cnt\u0026g.y_type=quantitative\u0026g.color_column=aircraft\n)\n- [When is the \"Moskva cruiser\" in the news?](\nhttps://spevktator.io/vk?sql=select+date%28date_utc%29+as+day%2C+count%28*%29+from+posts+p+join+posts_translation+t+on+p.id%3Dt.id+where+t.rowid+in+%28select+rowid+from+posts_translation_fts+where+posts_translation_fts+match+escape_fts%28%3Asearch%29%29+group+by+day+order+by+day+limit+101\u0026search=Moskva+cruiser#g.mark=bar\u0026g.x_column=day\u0026g.x_type=ordinal\u0026g.y_column=count(*)\u0026g.y_type=quantitative\n)\n- What are related entities to [ЗАЭС](https://spevktator.io/vk/related_entities_ru?entity_name=ЗАЭС\u0026_hide_sql=1) (or in English [ZNPP](https://spevktator.io/vk/related_entities_en?entity_name=ZNPP\u0026_hide_sql=1))\n- [Coverage of \"hackers\" by Russian media on VK](https://observablehq.com/@mischau8/coverage-of-hackers-by-russian-media-on-vk) an analysis using Observable Notebook.\n\n## Installation\n\nTo install and run Spevktator locally, you need at least Python 3.9 and a couple Python libraries which you can install with `pip`.\n\n### Development build (cloning git master branch):\n\n```\ngit clone https://github.com/MischaU8/spevktator.git\ncd spevktator\n```\n\nRecommended: Take a look at [venv](https://docs.python.org/3/tutorial/venv.html). This tool provides isolated Python environments, which are more practical than installing packages systemwide. It also allows installing packages without administrator privileges.\n\nInstall the Python dependencies, this will take a while:\n\n```bash\npip3 install .\n```\n\nTo get you started, download and decompress our VK sqlite database dump (~26MB). This includes all public VK wall posts by `life`, `mash`, `nws_ru`, `ria` and `tassagency` between the period of `2022-02-01` and `2022-09-04`. But you can also decide to scrape your own data, see below.\n\n```bash\nwget -v -O data/vk.db.xz https://spevktator.io/static/vk_2022-09-04_lite.db.xz\nxz -d data/vk.db.xz\n```\n\n## Usage\n\nSpevktator uses the open source multi-tool [Datasette](https://datasette.io/) for exploring and publishing the collected data.\nRun the Datasette server to explore the collected posts:\n\n```bash\ndatasette data/\n```\n\nVisit the webinterface on http://127.0.0.1:8001 or explore our public demo on https://spevktator.io/\n\nLearn more about Datasette and SQL on https://datasette.io/tutorials\n\n## Scraping your own data\n\nAfter following the above installation instructions, you can use the command line tool `spevktator` to collect your own datasets from VK and save them to a sqlite database.\n\n### Generic command line usage\n\n```bash\n$ spevktator --help\n\nUsage: spevktator [OPTIONS] COMMAND [ARGS]...\n\n  Save wall posts from VK communities to a SQLite database\n\nOptions:\n  --version  Show the version and exit.\n  --help     Show this message and exit.\n\nCommands:\n  backfill                Retrieve the backlog of wall posts from the VK...\n  extract-named-entities  Extract named-entities from text\n  fetch                   Retrieve all wall posts from the VK communities...\n  install                 Download and install models, create database\n  listen                  Continuously retrieve all wall posts from the...\n  rescrape                Rescrape HTML pages from the scrape_log\n  sentiment               Perform dostoevsky (RU) sentiment analysis on...\n  stats                   Show statistics for the given database\n  translate-entities      Translate entities from RU to EN-US\n  translate-posts         Translate posts from RU to EN-US\n```\n\n### Inspect the status of an existing database\n\n```bash\n$ spevktator stats data/vk.db\n\ndomain        nr_posts  first                last\n----------  ----------  -------------------  -------------------\nlife             26125  2022-01-31T21:05:00  2022-09-03T15:45:00\nmash              3309  2022-01-31T17:52:00  2022-08-31T15:01:00\nnws_ru            3528  2022-01-31T13:00:00  2022-08-31T20:05:00\nria              10198  2022-01-31T22:03:00  2022-09-01T05:01:00\ntassagency       23890  2022-01-31T22:45:00  2022-09-01T05:15:00\n```\n\n### Install RuSentement models and create a (new) database\n\n```bash\n$ spevktator install data/myproject.db\n\nDownloading Dostoevsky sentiment model... DONE\nCreating database...DONE\n```\n\n### Continuously listen for new posts to the channels (domains) on VK\n\nYou can specify one or more domains (the VK jargon for channels / groups) to monitor:\n\n```bash\n$ spevktator listen data/myproject.db vkusnoitochka\n\nScraping VK domain 'vkusnoitochka'... https://m.vk.com/vkusnoitochka\nPOST vkusnoitochka/-213845894_28 2022-09-01T13:27:00 added\nPOST vkusnoitochka/-213845894_27 2022-08-29T16:33:00 added\nPOST vkusnoitochka/-213845894_26 2022-08-08T18:03:00 added\nPOST vkusnoitochka/-213845894_25 2022-08-06T21:25:00 added\nPOST vkusnoitochka/-213845894_24 2022-08-06T21:23:00 added\n2022-09-03 18:51:32.327117 posts_added=5 last_post_added=True earliest_post_date=2022-08-06T21:23:00 page: 1 / 5\nExtracting named-entities up to 5 posts...\n  [####################################]  100%\n0 extracted out of 5 posts\nnext url will be https://m.vk.com/vkusnoitochka?offset=5\u0026own=1\nScraping VK domain 'vkusnoitochka'... https://m.vk.com/vkusnoitochka?offset=5\u0026own=1\nPOST vkusnoitochka/-213845894_23 2022-08-06T21:23:00 added\nPOST vkusnoitochka/-213845894_22 2022-07-10T21:07:00 added\n```\n\nOptional commandline arguments for `listen` are:\n- `--deepl-auth-key` (or `DEEPL_AUTH_KEY` env variable) to provide your DeepL translation API key. \n- `--spevktator-proxy` (or `SPEVKTATOR_PROXY` env variable) the HTTP / HTTPS proxy to use to connect to VK.\n\n### Fetch historic posts \u0026 backfill your database\n\nSome other `spevktator` commands to fetch historic posts from VK:\n\n- `backfill` - Retrieve the backlog of wall posts from the VK, until a certain date. See `spevktator backfill --help` for available options to restrict the data to be downloaded.\n- `fetch` - Retrieve all wall posts from the VK communities. See `spevktator fetch --help` for available options to restrict the data to be downloaded.\n\n## Additional Information\n\nThis section includes any additional information that you want to mention about the tool, including:\n- Potential next steps for the tool (i.e. what you would implement if you had more time)\n- Any limitations of the current implementation of the tool\n- Motivation for design/architecture decisions\n\n### Potential next steps\n\n- Expose more VK post data (thumbnail images, videos, comments)\n- Expose which channels to monitor through the UI\n- Annotation (tags / comments) of posts\n- UI notification when data has been updated\n- User authentication for non-public information \u0026 configuration UI\n- More robust installation instructions for various platforms (Windows, Docker)\n- Packaging and distribution via pypi.\n- Integrate with https://observablehq.com/ notebooks.\n\n### Current limitations\n\n- Only passive monitoring is performed, no VK account is needed, so private groups won’t be scraped.\n- Comments and other personal information isn’t collected due to GDPR.\n- Sentiment prediction is based on RuSentiment and has moderate quality.\n- Post metrics (shares, likes, views) are only tracked for a limited duration (last 5 posts).\n- Post text longer than 2500 characters are not translated.\n- Limited error handling and data loss recovery.\n- The user interface will require SQL knowledge for more advanced usage.\n\n### Motivation for design / architecture decisions\n\nThe ability to conduct keyword searches with local data is much superior to any online search. I no longer need to worry about revealing details of my investigation to any third party. The online web interface is provided for demo purposes, but not required.\n\nSetting up a data pipeline isn’t trivial, besides getting the raw data a lot of value is added with optional related data such as viewer metrics, sentiment, translation and named-entity extraction.\n\nThis tool is modular, the data can be exported in various file formats (CSV, TSV, JSON) through [sqlite-utils](https://sqlite-utils.datasette.io/) while being stored in a very powerful and accessible database format (sqlite). Instead of reinventing the wheel for data exploration and visualisation, it builds on existing opensource tooling, such as Datasette.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmischau8%2Fspevktator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmischau8%2Fspevktator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmischau8%2Fspevktator/lists"}