Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/davidesantangelo/webinspector

Ruby gem to inspect completely a web page. It scrapes a given URL, and returns you its meta, links, images more.
https://github.com/davidesantangelo/webinspector

inspector ruby rubygem scraper web-inspector

Last synced: 5 days ago
JSON representation

Ruby gem to inspect completely a web page. It scrapes a given URL, and returns you its meta, links, images more.

Host: GitHub
URL: https://github.com/davidesantangelo/webinspector
Owner: davidesantangelo
License: mit
Created: 2015-04-26T16:14:59.000Z (over 9 years ago)
Default Branch: master
Last Pushed: 2017-04-07T14:34:19.000Z (almost 8 years ago)
Last Synced: 2024-05-01T20:55:39.960Z (9 months ago)
Topics: inspector, ruby, rubygem, scraper, web-inspector
Language: Ruby
Homepage:
Size: 31.3 KB
Stars: 290
Watchers: 14
Forks: 26
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Webinspector

Ruby gem to inspect completely a web page. It scrapes a given URL, and returns you its title, description, meta, links, images and more.



## See it in action!

You can try WebInspector live at this little demo: [https://scrappet.herokuapp.com](https://scrappet.herokuapp.com)

## Installation

Add this line to your application's Gemfile:

```ruby

gem 'webinspector'

```

And then execute:

    $ bundle

Or install it yourself as:

    $ gem install webinspector

## Usage

Initialize a WebInspector instance for an URL, like this:

```ruby

page = WebInspector.new('http://davidesantangelo.com')

```

## Accessing response status and headers

You can check the status and headers from the response like this:

```ruby

page.response.status  # 200

page.response.headers # { "server"=>"apache", "content-type"=>"text/html; charset=utf-8", "cache-control"=>"must-revalidate, private, max-age=0", ... }

```

## Accessing inpsected data

You can see the data like this:

```ruby

page.url                 # URL of the page

page.scheme              # Scheme of the page (http, https)

page.host                # Hostname of the page (like, davidesantangelo.com, without the scheme)

page.port                # Port of the page

page.title               # title of the page from the head section, as string

page.description         # description of the page

page.links               # every link found

page.images              # every image found

page.meta                # metatags of the page

```

## Accessing meta tags

```ruby

page.meta                 # metatags of the page

page.meta['description']  # meta description

page.meta['keywords']     # meta keywords

```

## Find words (as array)

```ruby

page.find(["word1, word2"]) # return {"word1"=>3, "word2"=>1}

```

## Contributors

  * Steven Shelby ([@stevenshelby](https://github.com/stevenshelby))

  * Sam Nissen ([@samnissen](https://github.com/samnissen))

## License

The webinspector GEM is released under the MIT License.

## Contributing

1. Fork it ( https://github.com/[my-github-username]/webinspector/fork )

2. Create your feature branch (`git checkout -b my-new-feature`)

3. Commit your changes (`git commit -am 'Add some feature'`)

4. Push to the branch (`git push origin my-new-feature`)

5. Create a new Pull Request

>>>>>>> develop