Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/seratch/gyotaku
Saving complete web pages by using Selenium Web Driver
https://github.com/seratch/gyotaku
Last synced: about 1 month ago
JSON representation
Saving complete web pages by using Selenium Web Driver
- Host: GitHub
- URL: https://github.com/seratch/gyotaku
- Owner: seratch
- Created: 2012-06-06T09:10:23.000Z (over 12 years ago)
- Default Branch: master
- Last Pushed: 2012-06-19T02:40:39.000Z (over 12 years ago)
- Last Synced: 2024-08-25T00:53:46.653Z (3 months ago)
- Language: Scala
- Homepage:
- Size: 225 KB
- Stars: 5
- Watchers: 3
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Gyotaku - 魚拓(ぎょたく)
## What's this?
Gyotaku is a simple tool to completely save web pages.
## Requirement
- Mac OS/Linux/Windows
- Java Runtime Environment
- Firefox## Usage
### Get Gyotaku
Download gyotaku.zip and unzip it.
https://github.com/seratch/gyotaku/downloads
### Invoke Gyotaku
Using Gyotaku UI (Swing Application) is the easiest way.
```
./gyotaku_ui
````![screen_shot](https://github.com/seratch/gyotaku/raw/master/img/gyotaku_screen_shot.png)
### Authentication
If you want to get a page which requires authentication, use the selenium web driver which is customized by yourself.
#### input/tumblr-login.scala
Added the following source code:
```scala
import org.openqa.selenium._
val driver = new firefox.FirefoxDriver
driver.get("https://www.tumblr.com/login")
driver.findElement(By.id("signup_email")).sendKeys("YOUR_EMAIL")
driver.findElement(By.id("signup_password")).sendKeys("YOUR_PASSWORD")
driver.findElement(By.id("signup_form")).submit()
driver
```#### input/tumblr.yml
```yml
name: tumblr-dashbord
url: http://www.tumblr.com/dashboard
driver: { path: input/tumblr-login.scala }
```## Configuration
```yml
name: example
url: http://www.example.com/
driver: input/login_operation.scala
charset: UTF-8
prettify: false
replaceNoDomainOnly: false
```### name
The name of gyotaku. It'll be used as directory name under output directory.
### url
The url to download.
### driver
How to create a `org.openqa.selenium.WebDriver` instance.
`FirefoxDriver` will be used if it's omitted.
```yml
driver
path: path/to/driver.scala
```### charset
Charset which is used for the downloaded html and modified css files.
"UTF-8" if it's omitted.
### prettify
Modify the html using HtmlCleaner or not.
`false` if it's omitted.
### replaceNoDomainOnly
Replace urls in html/css only when they don't start with 'http://' or 'https://'.
`true` if it's omitted.