https://github.com/hrbrmstr/ssdeepr
Context Triggered Piecewise Hash Computation Using 'ssdeep' for R
https://github.com/hrbrmstr/ssdeepr
r rstats ssdeep
Last synced: about 1 month ago
JSON representation
Context Triggered Piecewise Hash Computation Using 'ssdeep' for R
- Host: GitHub
- URL: https://github.com/hrbrmstr/ssdeepr
- Owner: hrbrmstr
- License: other
- Created: 2020-03-02T14:36:39.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2020-03-12T11:18:44.000Z (about 5 years ago)
- Last Synced: 2025-01-16T02:47:28.953Z (3 months ago)
- Topics: r, rstats, ssdeep
- Language: HTML
- Homepage:
- Size: 1.45 MB
- Stars: 4
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
Awesome Lists containing this project
- jimsghstars - hrbrmstr/ssdeepr - Context Triggered Piecewise Hash Computation Using 'ssdeep' for R (HTML)
README
---
output: rmarkdown::github_document
editor_options:
chunk_output_type: console
---
```{r pkg-knitr-opts, include=FALSE}
hrbrpkghelpr::global_opts()
``````{r badges, results='asis', echo=FALSE, cache=FALSE}
hrbrpkghelpr::stinking_badges()
``````{r description, results='asis', echo=FALSE, cache=FALSE}
hrbrpkghelpr::yank_title_and_description()
```## What's Inside The Tin
The following functions are implemented:
```{r ingredients, results='asis', echo=FALSE, cache=FALSE}
hrbrpkghelpr::describe_ingredients()
```## Installation
You'll need `libfuzzy` installed and available for linking. See for platform support.
On Ubuntu/Debian you can do:
```shell
sudo apt install libfuzzy-dev
```On macOS you can do:
```shell
brew install ssdeep
```The library works on Windows, I just need to do some manual labor for that.
Package installation:
```{r install-ex, results='asis', echo=FALSE, cache=FALSE}
hrbrpkghelpr::install_block()
```## Usage
```{r lib-ex}
library(ssdeepr)# current version
packageVersion("ssdeepr")```
- `index.html` is a static copy of a blog main page with a bunch of `
`s with article snippets
- `index1.html` is the same file as `index.htmnl` with a changed cache timestamp at the end
- `index2.html` is the same file as `index.html` with one article snippet removed
- `RMacOSX-FAQ.html` is the CRAN 'R for Mac OS X FAQ'```{r u-01}
system.file("extdat", package="ssdeepr") %>%
list.files(full.names = TRUE, pattern = "html$", include.dirs = FALSE) %>%
hash_file() -> hasheshashes
hash_compare(hashes$hash[1], hashes$hash[1])
hash_compare(hashes$hash[1], hashes$hash[2])
hash_compare(hashes$hash[1], hashes$hash[3])
hash_compare(hashes$hash[1], hashes$hash[4])
```Works with Connections, too. All three should be the same if the Wikipedia page hasn't changed since making local copies in the package.
Using `hash_con()` has the advantage of not requiring the entire contents of the target blob to be read into memory in exchange for a tiny cost to speed for most files (see Benchmarks).
NOTE that retrieving the URL contents with different user-agent strings and/or with javascript-enabled may/will likely generate different content and, thus, a different hash.
```{r u-02}
(k1 <- hash_con(url("https://en.wikipedia.org/wiki/Donald_Knuth",
header = setNames(splashr::ua_ios_safari, "User-Agent"))))(k2 <- hash_con(file(system.file("knuth", "local.html", package = "ssdeepr"))))
(k3 <- hash_con(gzfile(system.file("knuth", "local.gz", package = "ssdeepr"))))
hash_compare(k1, k2)
hash_compare(k1, k3)
hash_compare(k2, k3)
```Benchmarks
```{r u-03}
microbenchmark::microbenchmark(
con = hash_con(file(system.file("knuth", "local.html", package = "ssdeepr"))),
gzc = hash_con(gzfile(system.file("knuth", "local.gz", package = "ssdeepr"))),
fil = hash_file(system.file("knuth", "local.html", package = "ssdeepr"))
)
```## ssdeepr Metrics
```{r cloc, echo=FALSE}
cloc::cloc_pkg_md()
```## Code of Conduct
Please note that this project is released with a Contributor Code of Conduct.
By participating in this project you agree to abide by its terms.