{"id":13563538,"url":"https://github.com/gushonorato/mechanize","last_synced_at":"2025-10-21T17:38:44.842Z","repository":{"id":53911198,"uuid":"185530842","full_name":"gushonorato/mechanize","owner":"gushonorato","description":"Build web scrapers and automate interaction with websites in Elixir with ease!","archived":false,"fork":false,"pushed_at":"2022-08-14T21:13:40.000Z","size":413,"stargazers_count":30,"open_issues_count":4,"forks_count":5,"subscribers_count":4,"default_branch":"master","last_synced_at":"2024-10-12T00:38:29.604Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Elixir","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gushonorato.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-05-08T04:56:06.000Z","updated_at":"2023-07-25T08:12:27.000Z","dependencies_parsed_at":"2022-08-13T04:01:16.044Z","dependency_job_id":null,"html_url":"https://github.com/gushonorato/mechanize","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gushonorato%2Fmechanize","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gushonorato%2Fmechanize/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gushonorato%2Fmechanize/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gushonorato%2Fmechanize/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gushonorato","download_url":"https://codeload.github.com/gushonorato/mechanize/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223030589,"owners_count":17076459,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T13:01:20.396Z","updated_at":"2025-10-21T17:38:39.550Z","avatar_url":"https://github.com/gushonorato.png","language":"Elixir","funding_links":[],"categories":["HTTP"],"sub_categories":[],"readme":"# ATTENTION: Retirement notice\n\nI am announcing the retirement of Mechanize. The modern web now relies heavily on JavaScript and single-page applications, which demand a fully automated or headless browser to handle dynamic content effectively. Because of this, the community never showed significant interest in Mechanize. Without that engagement, I find no reason to continue active development and support.\n\n# Mechanize [![Build Status](https://travis-ci.org/gushonorato/mechanize.svg?branch=master)](https://travis-ci.org/gushonorato/mechanize) [![Coverage Status](https://coveralls.io/repos/github/gushonorato/mechanize/badge.svg?branch=master)](https://coveralls.io/github/gushonorato/mechanize?branch=master)\n\nBuild web scrapers and automate interaction with websites in Elixir with ease!\n\nOne of Mechanize's main design goals is to enable developers to easily create concurrent web scrapers without the computing cost of using headless browsers. Mechanize is heavily inspired on [Ruby](https://github.com/sparklemotion/mechanize) version of [Mechanize](https://metacpan.org/release/WWW-Mechanize). It features:\n\n- Follow hyperlinks\n- Scrape data easily using CSS selectors\n- Populate and submit forms\n- Follow and tracks 3xx redirects\n- Follow meta-refresh\n- Automatically stores and sends cookies (TODO)\n- Proxy support (TODO)\n- Track of the sites that you have visited as a history (TODO)\n- File upload (TODO)\n- Obey robots.txt (TODO)\n\n## Installation\n\n\u003e **Warning:** This library is in active development and probably will have changes in the public API. Use it carefully in production systems.\n\nThe package can be installed by adding `mechanize` to your list of dependencies in `mix.exs`:\n\n```elixir\ndef deps do\n  [\n    {:mechanize, \"~\u003e 0.1\"}\n  ]\nend\n```\n\n## Getting started\n\nThis guide will teach you how to do the most basic tasks using Mechanize like fetch pages, click links, fill out and submit\nforms and scrape data.\n\n### Fetching a page\n\nFirst you'll have to start Mechanize:\n\n```elixir\nalias Mechanize.Browser\n\nbrowser = Browser.new()\n```\n\nOr using a more verbose alternative:\n\n```elixir\n{:ok, browser} = Browser.start_link()\n```\n\nNow we'll use the browser we've started to fetch a page.  Let's fetch Google\nwith our mechanize browser:\n\n```elixir\npage = Browser.get!(browser, \"https://www.google.com\")\n```\n\nWhat just happened?  We told mechanize to go pick up Google's main page.\nMechanize followed any redirects that Google may have sent. The browser gave us back a page that we can use to scrape data, find links to click, or find forms to fill out.\n\nNext, let's try finding some links to click.\n\n### Finding Links\n\nMechanize returns a page struct whenever you get a page, post, or submit a\nform. Now that we've fetched Google's homepage, let's try listing all of the links:\n\n```elixir\nalias Mechanize.Page\nalias Mechanize.Page.Element\n\npage\n|\u003e Page.links()\n|\u003e Enum.each(fn link -\u003e\n  IO.puts Element.text(link)\nend)\n```\n\nWe can list the links, but Mechanize gives a few shortcuts to help us find a\nlink to click on.  Let's say we wanted to click the link whose text is 'News'. Normally, we would have to do this:\n\n```elixir\nalias Mechanize.Page\nalias Mechanize.Page.Element\nalias Mechanize.Page.Link\n\npage\n|\u003e Page.links()\n|\u003e Enum.filter(fn link -\u003e Element.text(link) == \"News\" end)\n|\u003e List.first()\n|\u003e Link.click!()\n```\n\nBut Mechanize gives us a shortcut.  Instead we can do this:\n\n```elixir\nalias Mechanize.Page\nalias Mechanize.Page.Link\n\npage\n|\u003e Page.link_with(text: \"News\")\n|\u003e Link.click!()\n```\n\nOr even shorter, with just one line:\n\n```elixir\nalias Mechanize.Page\n\nPage.click_link!(page, text: \"News\")\n```\n\nYou're probably thinking \"there could be multiple links with that text!\", and you would be correct!  If you use the plural form, you can access the list. If you wanted to click on the second news link, you could do this:\n\n```elixir\nalias Mechanize.Page\n\n  page\n  |\u003e Page.links_with(text: \"News\")\n  |\u003e Enum.at(1)\n```\n\nWe can even find a link matching its href with some regular expression:\n\n```elixir\nalias Mechanize.Page\n\nPage.link_with(page, href: ~r/something/)\n```\n\nOr chain them together to find a link with certain text and certain href:\n\n```elixir\nalias Mechanize.Page\n\nPage.link_with(page, text: 'News', href: \"/news\")\n```\n\nNow that we know how to find and click links, let's try something more complicated like filling out a form.\n\n### Filling out forms\n\nLet's continue with our Google example.\n\nIf we look at the html of the page, we can see that there is one form named 'f', that has a couple buttons and a few fields. You can see this by saving the page in a file and opening it in your favorite text editor.\n\n```elixir\nFile.write!(\"google.html\", page)\n```\n\nNow that we know the name of the form, let's fetch it off the page:\n\n```elixir\nform = Page.form_with(name: \"f\")\n```\n\nSo let's set the form field named 'q' on the form to 'elixir mechanize':\n\n```elixir\nForm.fill_text(form, name: \"q\", with: keyword)\n```\n\nNow we can submit the form and 'press' the submit button and print the results:\n\n```elixir\nForm.click_button!(form, text: \"Google Search\")\n```\n\nWhat we just did was equivalent to putting text in the search field and\nclicking the 'Google Search' button.\n\nAnother way to do that is typing in the text field and hitting the return button. We can also simulate that by using `submit` function instead of `click_button`:\n\n```elixir\nForm.submit!(form)\n```\n\nLet's take a look at the code all together:\n\n```elixir\nalias Mechanize.{Browser, Page, Form}\n\nb = Browser.new(follow_meta_refresh: true)\n    |\u003e Browser.put_user_agent(:mac_safari)\n\nb\n|\u003e Browser.get!(\"https://www.google.com\")\n|\u003e Page.form_with(name: \"f\")\n|\u003e Form.fill_text(name: \"q\", with: \"elixir mechanize\")\n|\u003e Form.submit!() # or Form.click_button!(form, text: \"Google Search\")\n```\n\nBefore we go on to screen scraping, let's take a look at forms a little more\nin depth.  Unless you want to skip ahead!\n\n### Advanced Form techniques\n\nIn this section, I want to touch on using the different types in input fields\npossible with a form.  Password and textarea fields can be treated just like\ntext input fields.  Select fields are very similar to text fields, but they\nhave many options associated with them.  If you select one option, mechanize\nwill de-select the other options (unless it is a multi select!).\n\nFor example, let's select an `option` with text \"Option 1\" on a `select` with `name=\"select1\"`.\n\n```elixir\nForm.select(form, name: \"select1\", option: \"Option 1\")\n```\n\nWe can also select an `option` by an attribute, in this case we'll select by `value` attribute:\n\n```elixir\nForm.select(form, name: \"select1\", option: [value: \"1\"])\n```\n\nOr select the third option of a `select` (note that Mechanize uses a zero-based index):\n\n```elixir\nForm.select(form, name: \"select1\", option: 2)\n```\n\nNow let's take a look at `checkboxes` and `radio buttons`.  To select a `checkbox`, just check it like this:\n\n```elixir\nForm.check_checkbox(form, name: \"box\", value: \"yes\")\n```\n\n`Radio buttons` are very similar to `checkboxes`, but they know how to uncheck other `radio buttons` of the same name. Just check a `radio button` like you would a `checkbox`:\n\n```elixir\nForm.check_radio_button(form, name: \"box\", value: \"yes\")\n```\n\n### Scraping Data\n\nAfter you have used Mechanize to navigate to the page that you need to scrape, then scrape it using `Page.search/2` function:\n\n```elixir\nbrowser\n|\u003e Browser.get!('http://example.com/')\n|\u003e Page.search(\"p.posted\")\n```\n\n## Example\n\n### Google (Print results from SERP)\n\n```elixir\nalias Mechanize.{Browser, Page, Form}\nalias Mechanize.Page.Element\n\nb =\n  Browser.new(follow_meta_refresh: true)\n  |\u003e Browser.put_user_agent(:mac_safari)\n\ninitial_page = Browser.get!(b, \"https://google.com\")\n\nserp =\n  initial_page\n  |\u003e Page.form_with(name: \"f\")\n  |\u003e Form.fill_text(name: \"q\", with: keyword)\n  |\u003e Form.submit!()\n\nserp\n|\u003e Page.search(\".kCrYT \u003e a .vvjwJb\") # Selects each search result element\n|\u003e Enum.map(\u0026Element.text/1) # Extracts search result title text\n|\u003e Enum.with_index(1)\n|\u003e Enum.each(fn {result, index} -\u003e IO.puts(\"#{index}. #{result}\") end)\n```\n\n## Author\nCopyright © 2020 by Gustavo Honorato (gustavohonorato@gmail.com)\n\n## License\nThis library is distributed under the MIT license. Please see the LICENSE file.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgushonorato%2Fmechanize","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgushonorato%2Fmechanize","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgushonorato%2Fmechanize/lists"}