{"id":19380029,"url":"https://github.com/graphistry/dots","last_synced_at":"2025-07-19T03:32:30.857Z","repository":{"id":224713980,"uuid":"762131422","full_name":"graphistry/dots","owner":"graphistry","description":null,"archived":false,"fork":false,"pushed_at":"2024-04-12T07:48:41.000Z","size":25381,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-24T16:50:28.697Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/graphistry.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2024-02-23T06:32:11.000Z","updated_at":"2024-04-12T06:39:11.000Z","dependencies_parsed_at":"2024-03-18T05:30:09.055Z","dependency_job_id":"296a828f-ac70-4edc-8edc-6b197732a7c5","html_url":"https://github.com/graphistry/dots","commit_stats":null,"previous_names":["dcolinmorgan/mlx_grph","dcolinmorgan/dt_os","dcolinmorgan/dots","graphistry/dots"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/graphistry/dots","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/graphistry%2Fdots","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/graphistry%2Fdots/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/graphistry%2Fdots/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/graphistry%2Fdots/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/graphistry","download_url":"https://codeload.github.com/graphistry/dots/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/graphistry%2Fdots/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265883613,"owners_count":23843792,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-10T09:12:05.980Z","updated_at":"2025-07-19T03:32:30.840Z","avatar_url":"https://github.com/graphistry.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Current Events Scraper \u0026 Featurizer\n\nUsing OpenSearch and Google News APIs, this tool pulls news stories and extracts features from the text. The features are then stored in a CSV file.\n\nCan gather stories from multiple sources and languages. GNews maxes out at ~3000 stories per day, OpenSearch has no limit.  OpenSearch uses scroll and slice to pull a large number of stories .\n\nClone current version \u0026 run [dots_feat.py](https://github.com/dcolinmorgan/dots/blob/main/dots/dots_feat.py)\n--------------------------------------------------\nrequirements :\n  pytest,\n  pyarrow,\n  spacy,\n  python-dotenv,\n  bs4,\n  pandas,\n  scikit-learn,\n  transformers,\n  torch,\n  opensearch-py,\n  requests,\n  nltk,\n  numpy,\n  graphistry[umap-learn],\n  umap-learn,\n  validators,\n  pytesseract,\n  selenium,\n  webdriver_manager,\n  undetected_chromedriver,\n  gliner,\n \n### the example below will pull 100 OS gnews stories and return features each in additon to location and date to a file\n\n```python\n    git clone https://github.com/graphistry/dots\n    python dots/dots_feat.py -n 100 -e 0 -d 0 -o dots_drba_feats.csv\n    python dots/dots_feat.py -n 100 -e 1 -d 0 -o dots_gpy_feats.csv  \n    python dots/dots_feat.py -n 100 -e 2 -d 0 -o dots_glnr_feats.csv  \n```\n\n\u003e\"'Gaza Strip', '16-01-2024', \",\"['neighborhoods', 'rebels', 'widespread famine', 'egypt', 'disease']\" \u003cbr\u003e\n\u003e\"'Miseno, Campania, Italy', '16-01-2024', \",\"['disasters', 'mount vesuvius', 'ancient cataclysm', 'costruzione', 'beach']\"\u003cbr\u003e\n\u003e\"'Clarendon, Clarendon, Jamaica', '16-01-2024', \",\"['new bowen', 'fight', 'whatsapp', 'st catherine', 'jamaica']\"\u003cbr\u003e\n\u003e\"'Philadelphia, Pennsylvania, United States', '16-01-2024', \",\"['meteorologists', 'snow shovels', 'snowstorm', 'accuweather alerts', 'accuweather meteorologists']\"\u003cbr\u003e\n\u003e\"'New Bedford, Massachusetts, United States', '16-01-2024', \",\"['massachusetts law', 'saturday', 'ariel dorsey', 'traffic', 'united states']\"\u003cbr\u003e\n\u003e\"'Corofin, Clare, Ireland', '16-01-2024', \",\"['emergency services', 'breathing', 'rescue service', 'firefighters', 'afternoon']\"\u003cbr\u003e\n\u003e\"'United States', '16-01-2024', \",\"['preparedness', 'earthquake', 'quake', 'morning', 'disaster']\"\u003cbr\u003e\n\u003e\"'Syria', '16-01-2024', \",\"['neighboring countries', 'early recovery', 'cholera', 'symptom', 'mohamad katoub']\"\u003cbr\u003e\n\u003e\"'Iceland', '16-01-2024', \",\"['lava flows', 'evacuation', 'eruptions', 'jóhannesson', 'lúðvík pétursson']\"\u003cbr\u003e\n\n\nhere is an example produced every day via `gh_actions` parsing gNews stories and extracting features:\n [Feature Table](DOTS/output/lobstr3_dots_feats.csv) and [Full Table](DOTS/output/full_lobstr3_dots_feats.csv)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgraphistry%2Fdots","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgraphistry%2Fdots","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgraphistry%2Fdots/lists"}