{"id":20765356,"url":"https://github.com/rootkot/invader","last_synced_at":"2025-04-30T09:51:40.502Z","repository":{"id":62571304,"uuid":"82327119","full_name":"rootKot/invader","owner":"rootKot","description":"Python simple module for data grabbing from websites with JavaScript support","archived":false,"fork":false,"pushed_at":"2017-07-24T09:30:07.000Z","size":40,"stargazers_count":6,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-25T15:22:56.236Z","etag":null,"topics":["beautifulsoup","grabber","javascript","parsing","python2-7","python3","scraper","web"],"latest_commit_sha":null,"homepage":"https://pypi.python.org/pypi/invader","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rootKot.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-02-17T18:40:05.000Z","updated_at":"2024-05-31T17:19:55.000Z","dependencies_parsed_at":"2022-11-04T00:26:33.464Z","dependency_job_id":null,"html_url":"https://github.com/rootKot/invader","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rootKot%2Finvader","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rootKot%2Finvader/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rootKot%2Finvader/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rootKot%2Finvader/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rootKot","download_url":"https://codeload.github.com/rootKot/invader/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251679889,"owners_count":21626622,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["beautifulsoup","grabber","javascript","parsing","python2-7","python3","scraper","web"],"created_at":"2024-11-17T11:16:24.795Z","updated_at":"2025-04-30T09:51:40.478Z","avatar_url":"https://github.com/rootKot.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"Invader\n============\n### Invader is a Python simple module for data grabbing from websites. Also with JavaScript support!\n\nInvader is based on BeautifulSoup and dryscrape\n\n---\n\nDependencies\n============\n* **[Requests](http://docs.python-requests.org/en/master/)**\n* **[Beautiful Soup 4](https://www.crummy.com/software/BeautifulSoup/)**\n* **[Dryscrape](https://github.com/niklasb/dryscrape)**\n\nGetting Started\n============\n* install all dependecies if you haven't\n```\n$ sudo pip install requests\n```\n```\n$ sudo apt-get install python-bs4\n$ sudo pip install beautifulsoup4\n```\n```\n$ sudo apt-get install qt5-default libqt5webkit5-dev build-essential python-lxml python-pip xvfb\n$ sudo pip install dryscrape\n```\n* intall invader\n```\n$ sudo pip install invader\n```\n\n\nItems list data grabbing example:\n\n```python\nfrom invader import Invader\n\nurl = 'https://duckduckgo.com/?q=python\u0026t=hb\u0026ia=web'\ninvader = Invader(url, js=True)\n\nres = invader.take_list('#links .result', {\n    'title': ['.result__a', 'text'],\n    'src': ['.result__a', 'href']\n})\n\nprint(res)\n\n```\nthe response will be a list of dictionaries wich containing each item's image url and title\n\n```json\n[\n    {\"title\": \"Welcome to Python.org\", \"src\": \"https://www.python.org/\"},\n    {\"title\": \"Python (programming language) - Wikipedia\", \"src\": \"https://en.wikipedia.org/wiki/Python_%28programming_language%29\"},\n    {\"title\": \"Python | Codecademy\", \"src\": \"https://www.codecademy.com/learn/python\"}\n]\n```\n\nHere is some **[examples](https://github.com/rootKot/invader/tree/master/examples)** of usage\n\n\nDocumentation\n============\n\nFirst of all create import Invader class from invader.\nCreate instance of Invader and pass for argument the url address of website, and js=True if need to support javascript.\n\n```python\nfrom invader import Invader\ninvader = Invader('http://some.site', js=True)\n```\n\nAfter that, content of website will be getted and saved in instace.\n\n### **Public functions**\n\n### take(selector_list)\n For example if you have a link address of a concrete topic page of some forum, and you need to just pull topic title, or you need to get a list with all pictures sources, then you easly can use this function.\n**take()** function receives a one list argument, where first element of a list is a CSS selector of a html element, and second is a thing that needs you to take, and returns a string, or list with results.\n\n```python\nres = invader.take(['.content .topic-title', 'text'])\n```\nin this example, we getting text of the element with class topic-title.\nAlso you can take some attribute value from the element.\n\n```python\nres = invader.take(['.content .topic-title a', 'href'])\n```\nthe result will be:\n\n```python\nhttp://some.site/link\n```\n\n\n### take_list(wrapper, fields_dict)\nIf you need to get each item's information of some shoping site, then use this function!\n**take_list()** function receives a two arguments. First one is a string with selector of item wrapper element.\nSecond argument is a dictionary with keys and with their selectors and things that we need (text, src, href, etc.)\n\n```python\nres = invader.take_list('.products-wrap \u003e a', {\n    'img_url': ['.pr-item-wrap \u003e img', 'src'],\n    'title': ['.pr-title', 'text']\n})\n```\nthe response will be a list of dictionaries wich containing each item's image_url and title\n\n```json\n[\n  {\"img_url\": \"/files/items/30735/icon_219x270.jpg\", \"title\": \"Поло  Vit 16 9713tr\"},\n  {\"img_url\": \"/files/items/30734/icon_219x240.jpg\", \"title\": \"Поло  Vit 16 9713tr\"}\n]\n```\n\nalso you can leave first argument None, if items havn't wrapper element, and just go one by one.\n**But Warning!** Be careful in that case!\nBe sure that each item have the same html elements that you want to get! Otherwise the order will be destroyed, and result going to be wrong.\n\n### screenshot(path)\nIf js-is enabled, requests goes with virtual browser, using dryscrape.\nyou can take a screenshot of website that you visited.\nGive a path where to save screenshot if needs.\n```python\ninvader = Invader('https://google.com', js=True)\ninvader.sceenshot('/var/www/screenshots/')\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frootkot%2Finvader","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frootkot%2Finvader","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frootkot%2Finvader/lists"}