Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/jdblischak/faviconplease

Find URL for a website's favicon
https://github.com/jdblischak/faviconplease

favicon-grabber favicons rstats

Last synced: 2 months ago
JSON representation

Find URL for a website's favicon

Awesome Lists containing this project

README

        

---
output: github_document
---

# faviconPlease

[![CRAN status](https://www.r-pkg.org/badges/version/faviconPlease)](https://cran.r-project.org/package=faviconPlease)
[![R-CMD-check](https://github.com/jdblischak/faviconPlease/workflows/R-CMD-check/badge.svg)](https://github.com/jdblischak/faviconPlease/actions)

```{r description, results='asis', echo=FALSE}
cat(read.dcf("DESCRIPTION", fields = "Description"))
```

```{r example}
library(faviconPlease)
faviconPlease("https://github.com/")
```

Also check out my [blog post on faviconPlease][blog-post] for more background
and examples.

[blog-post]: https://blog.jdblischak.com/posts/faviconplease/

## Installation

Install latest release from CRAN:

```{r installation-cran, eval=FALSE}
install.packages("faviconPlease")
```

Install development version from GitHub:

```{r installation-github, eval=FALSE}
install.packages("remotes")
remotes::install_github("jdblischak/faviconPlease")
```

## Code of Conduct

Please note that the faviconPlease project is released with a [Contributor Code
of Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html).
By contributing to this project, you agree to abide by its terms.

## Default strategy

By default, `faviconPlease()` uses the following strategy to find the URL to the
favicon for a given website. It stops once it finds a URL and returns it.

1. Download the HTML file and search its `` for any `` elements with
`rel="icon"` or `rel="shortcut icon"`.

1. Download the HTML file at the root of the server (i.e. discard the path) and
search its `` for any `` elements with `rel="icon"` or
`rel="shortcut icon"`.

1. Attempt to download a file called `favicon.ico` at the root of the server.
This is the default location that a browser looks if the HTML file does not
specify an alternative location in a `` element. If the file `favicon.ico`
is successfully downloaded, then this URL is returned.

1. If the above steps fail, as a fallback, use the [favicon service][ddg-icon]
provided by the search engine [DuckDuckGo][ddg]. This provides a nice default
for websites that don't have a favicon (or can't be easily found).

[ddg]: https://duckduckgo.com/
[ddg-icon]: https://duckduckgo.com/duckduckgo-help-pages/privacy/favicons/

## Extending faviconPlease

The default strategy above is designed to reliably get you a favicon URL for
most websites. However, you can customize it as needed.

### Change the fallback to use Google's favicon service

The default fallback function is `faviconDuckDuckGo()`. To instead use Google's
favicon service, you can set the argument `fallback = faviconGoogle`.

Note that neither DuckDuckGo nor Google have every favicon you might expect. And
the availability can change over time. You can see some examples in my [blog
post][blog-post]. Fortunately they both provide a generic favicon to insert when
they don't have the favicon.

### Use a custom fallback function

You can use your own custom fallback function instead. It must accept one
argument, which is the server, e.g. `"github.com"`. The easiest approach would
be to copy-paste one of the existing fallback functions and modify it to use
your alternative favicon service.

```{r custom-fallback}
args(faviconDuckDuckGo)
body(faviconDuckDuckGo)
```

### Use a custom fallback favicon

If you have a URL to a generic favicon file that you would like to use as a
fallback, you can directly pass this as a character vector. It could also be a
path to an image file on the server where your app is running.

### Change the order of the favicon functions

The default strategy first checks the `` for a link to the favicon file
and then checks for the availability of the file `favicon.ico`. You can change
this order, or only perform one of them, by changing the argument `functions`
passed to `faviconPlease()`. It should be a list of functions.

```{r order, eval=FALSE}
# default
functions = list(faviconLink, faviconIco)
# Switch the order
functions = list(faviconIco, faviconLink)
# Only search
functions = list(faviconLink)
# Only check for favicon.ico
functions = list(faviconIco)
# Skip the favicon functions entirely and just use the fallback
functions = NULL
```

### Use a custom favicon function

You can also create your own custom favicon function to pass to
`faviconPlease()`. By default it must accept 3 arguments. It will be passed the
URL's scheme (e.g. `"https"`), server (e.g. `"github.com"`), and path (e.g.
`"/jdblischak/faviconPlease"`). Your function should return the URL to a favicon
or an empty string, `""`, if it can't find one.

```{r faviconLink-signature}
# Favicon functions must accept at least 3 positional arguments
args(faviconLink)
```

As a concrete example, here is a custom function for searching for `favicon.ico`
on Ubuntu 20.04, which has increased security settings (see troubleshooting
section below).

```{r faviconIcoUbuntu20}
faviconIcoUbuntu20 <- function(scheme, server, path) {
faviconIco(scheme, server, path, method = "wget",
extra = c("--no-check-certificate",
"--ciphers=DEFAULT:@SECLEVEL=1"))
}
```

It calls `faviconIco()` with the specific settings needed by `download.file()`
to work on Ubuntu 20.04. You could then use your custom function instead of the
default `faviconIco()` by calling `faviconPlease()` with `functions =
list(faviconLink, faviconIcoUbuntu20)`.

Note that the example function `faviconIcoUbuntu20()` will likely fail on
Windows, macOS, and Ubuntu versions prior to 20.04.

## Troubleshooting

Unfortunately it's not easy to make this fool proof for all operating systems
and all websites. Here are some known issues:

1. `download.file()`, used by `faviconIco()`, is known to have cross-platform
issues. Thus the official documentation in `?download.file` recommends:

> Setting the `method` should be left to the end user.

Accordingly, `faviconIco()` exposes the arguments `method`, `extra`, and
`headers`, which are passed directly to `download.file()`. Alternatively you
can set the global options `"download.file.method"` or
`"download.file.extra"`.

1. Ubuntu 20.04 increased its default security settings for downloading files
from the internet ([details][openssl-ticket]). Unfortunately many websites have
not updated their SSL certificates to comply with the increased security
restrictions. `faviconLink()` has a workaround for this situation, but not
`faviconIco()`. As an example, here's how you could detect the availability of
favicon.ico for the Ensembl website on Ubuntu 20.

```{r ubuntu20, eval=FALSE}
faviconIco("https", "www.ensembl.org", "",
method = "wget", extra = c("--no-check-certificate",
"--ciphers=DEFAULT:@SECLEVEL=1"))
```

Alternatively, if it's an option for you, you could avoid this workaround by
using the previous Ubuntu LTS release 18.04. Also note that the above
command will fail on Ubuntu 18.04 because the default `wget` installed
doesn't have the argument `--ciphers`.

[openssl-ticket]: https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1864689