Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/gottfrois/link_thumbnailer

Ruby gem that fetches images and metadata from a given URL. Much like popular social website with link preview.
https://github.com/gottfrois/link_thumbnailer

Last synced: 1 day ago
JSON representation

Ruby gem that fetches images and metadata from a given URL. Much like popular social website with link preview.

Awesome Lists containing this project

README

        

# LinkThumbnailer

[![Code Climate](https://codeclimate.com/github/gottfrois/link_thumbnailer.png)](https://codeclimate.com/github/gottfrois/link_thumbnailer)
[![Build Status](https://travis-ci.org/gottfrois/link_thumbnailer.png?branch=master)](https://travis-ci.org/gottfrois/link_thumbnailer)
[![Gem Version](https://badge.fury.io/rb/link_thumbnailer.svg)](http://badge.fury.io/rb/link_thumbnailer)

Ruby gem generating image thumbnails from a given URL. Rank them and give you back an object containing images and website informations. Works like Facebook link previewer.

Demo Application is [here](http://link-thumbnailer-demo.herokuapp.com/) !
The source code of the Demo Application is hosted [here](https://github.com/gottfrois/link_thumbnailer_demo)!

## Features

- Dead simple.
- Support [OpenGraph](http://ogp.me/) protocol.
- Find and sort images that best represent what the page is about.
- Find and rate description that best represent what the page is about.
- Allow for custom class to sort the website descriptions yourself.
- Support image urls blacklisting (advertisements).
- Works with and without Rails.
- Fully customizable.
- Fully tested.

## Installation

Add this line to your application's Gemfile:

```ruby
gem 'link_thumbnailer'
```

And then execute:

$ bundle

Or install it yourself as:

$ gem install link_thumbnailer

If you are using Rails, you can generate the configuration file with:

$ rails g link_thumbnailer:install

This will add `link_thumbnailer.rb` to `config/initializers/`.

## Usage

Run `irb` and require the gem:

```ruby
require 'link_thumbnailer'
```

The gem handle regular website but also website that use the [Opengraph](http://ogp.me/) protocol.

```ruby
object = LinkThumbnailer.generate('http://stackoverflow.com')
=> #

object.title
=> "Stack Overflow"

object.favicon
=> "//cdn.sstatic.net/stackoverflow/img/favicon.ico?v=038622610830"

object.description
=> "Q&A for professional and enthusiast programmers"

object.images.first.src.to_s
=> "http://cdn.sstatic.net/stackoverflow/img/[email protected]?v=fde65a5a78c6"
```

LinkThumbnailer `generate` method return an instance of `LinkThumbnailer::Models::Website` that respond to `to_json` and `as_json` as you would expect:

```ruby
object.to_json
=> "{\"url\":\"http://stackoverflow.com\",\"title\":\"Stack Overflow\",\"description\":\"Q&A for professional and enthusiast programmers\",\"images\":[{\"src\":\"http://cdn.sstatic.net/stackoverflow/img/[email protected]?v=fde65a5a78c6\",\"size\":[316,316],\"type\":\"png\"}]}"
```

## Configuration

LinkThumbnailer comes with default configuration values. You can change default value by overriding them in a rails initializer:

In `config/initializers/link_thumbnailer.rb`

```ruby
LinkThumbnailer.configure do |config|
# Numbers of redirects before raising an exception when trying to parse given url.
#
# config.redirect_limit = 3

# Set user agent
#
# config.user_agent = 'link_thumbnailer'

# Enable or disable SSL verification
#
# config.verify_ssl = true

# The amount of time in seconds to wait for a connection to be opened.
# If the HTTP object cannot open a connection in this many seconds,
# it raises a Net::OpenTimeout exception.
#
# See http://www.ruby-doc.org/stdlib-2.1.1/libdoc/net/http/rdoc/Net/HTTP.html#open_timeout
#
# config.http_open_timeout = 5

# List of blacklisted urls you want to skip when searching for images.
#
# config.blacklist_urls = [
# %r{^http://ad\.doubleclick\.net/},
# %r{^http://b\.scorecardresearch\.com/},
# %r{^http://pixel\.quantserve\.com/},
# %r{^http://s7\.addthis\.com/}
# ]

# List of attributes you want LinkThumbnailer to fetch on a website.
#
# config.attributes = [:title, :images, :description, :videos, :favicon]

# List of procedures used to rate the website description. Add you custom class
# here. See wiki for more details on how to build your own graders.
#
# config.graders = [
# ->(description) { ::LinkThumbnailer::Graders::Length.new(description) },
# ->(description) { ::LinkThumbnailer::Graders::HtmlAttribute.new(description, :class) },
# ->(description) { ::LinkThumbnailer::Graders::HtmlAttribute.new(description, :id) },
# ->(description) { ::LinkThumbnailer::Graders::Position.new(description, weight: 3) },
# ->(description) { ::LinkThumbnailer::Graders::LinkDensity.new(description) }
# ]

# Minimum description length for a website.
#
# config.description_min_length = 25

# Regex of words considered positive to rate website description.
#
# config.positive_regex = /article|body|content|entry|hentry|main|page|pagination|post|text|blog|story/i

# Regex of words considered negative to rate website description.
#
# config.negative_regex = /combx|comment|com-|contact|foot|footer|footnote|masthead|media|meta|outbrain|promo|related|scroll|shoutbox|sidebar|sponsor|shopping|tags|tool|widget|modal/i

# Numbers of images to fetch. Fetching too many images will be slow.
# Note that LinkThumbnailer will only sort fetched images between each other.
# Meaning that they could be a "better" image on the page.
#
# config.image_limit = 5

# Whether you want LinkThumbnailer to return image size and type or not.
# Setting this value to false will increase performance since for each images, LinkThumbnailer
# does not have to fetch its size and type.
#
# config.image_stats = true
#
# Whether you want LinkThumbnailer to raise an exception if the Content-Type of the HTTP request
# is not an html or xml.
#
# config.raise_on_invalid_format = false
#
# Sets number of concurrent http connections that can be opened to fetch images informations such as size and type.
#
# config.max_concurrency = 20

# Sets the default encoding.
#
# config.encoding = 'utf-8'
end
```

Or at runtime:

```ruby
object = LinkThumbnailer.generate('http://stackoverflow.com', redirect_limit: 5, user_agent: 'foo')
```

Note that runtime options will override default global configuration.

See [Configuration Options Explained](https://github.com/gottfrois/link_thumbnailer/wiki/Configuration-options-explained) for more details on each configuration options.

## Exceptions

LinkThumbnailer defines a list of custom exceptions you may want to rescue in your code. All the following exceptions inherit from `LinkThumbnailer::Exceptions`:

* `RedirectLimit` -- raised when redirection threshold defined in config is reached
* `BadUriFormat` -- raised when url given is not a valid HTTP url
* `FormatNotSupported` -- raised when the `Content-Type` of the HTTP request is not supported (not `html`)

You can rescue from any LinkThumbnailer exceptions using the following code:

```ruby
begin
LinkThumbnailer.generate('http://foo.com')
rescue LinkThumbnailer::Exceptions => e
# do something
end
```

## Contributing

1. Fork it
2. Create your feature branch (`git checkout -b my-new-feature`)
3. Run the specs (`bundle exec rspec spec`)
4. Commit your changes (`git commit -am 'Added some feature'`)
5. Push to the branch (`git push origin my-new-feature`)
6. Create new Pull Request