Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/minyk/nifi-headlessbrowser-processor
Headless browser processor for Apache Nifi
https://github.com/minyk/nifi-headlessbrowser-processor
browser nifi processor
Last synced: about 1 month ago
JSON representation
Headless browser processor for Apache Nifi
- Host: GitHub
- URL: https://github.com/minyk/nifi-headlessbrowser-processor
- Owner: minyk
- Created: 2016-05-31T10:50:14.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2020-10-12T19:46:44.000Z (over 3 years ago)
- Last Synced: 2024-03-18T23:54:36.850Z (3 months ago)
- Topics: browser, nifi, processor
- Language: Java
- Size: 26.4 KB
- Stars: 6
- Watchers: 2
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Lists
- awesome-nifi - minyk/nifi-headlessbrowser-processor - Returns the page source in its current state to FlowFile, including any DOM updates that occurred after page load (Processors and Bundles / Mailing List Best Of)
- awesome-nifi - minyk/nifi-headlessbrowser-processor - Returns the page source in its current state to FlowFile, including any DOM updates that occurred after page load (Processors and Bundles / Mailing List Best Of)
- awesome-nifi - minyk/nifi-headlessbrowser-processor - Returns the page source in its current state to FlowFile, including any DOM updates that occurred after page load (Processors and Bundles / Mailing List Best Of)
README
Nifi Headless Browser Processor
================================**Currently, `URL Provided` configuration is only tested.**
* Returns the page source in its current state to FlowFile, including any DOM updates that occurred after page load.
* Use [JBrowserDriver](https://github.com/MachinePublishers/jBrowserDriver).# Prerequisite
* JRE with Java FX
* OpenJDK 8 does not contain `Java FX`
* Use Oracle JDK or Zulu JDK FX
* `fontconfig` package on OS.
* `yum install fontconfig` or `apt install fontconfig`# Configurations
Most configuration is used to make JBrowserDriver.
* configurations
* Host: Hostname for the browser. hostname or ip address.
* Url Provided: if true, the processor read target from `Page URL` configuration. if false, the input flowfile must contain URL.
* Page URL: URL for processing. Used only `Url Provided` is `true`.
* Timezone: Timezone for browser. Select from dropdown list.
* Port Range: port range for JBrowserDriverServer. This range should be multiple of three.
* ~~Javascript: Script after page loading. Currently, EL is not supported.~~
* Remove for now due to timing issue.* Relationship
* success: success relationship of this processor. Flowfile contains page source of input URL.
* failure: failure relationship of this processor.
# TODOs- [ ] Test for `Url Provided: false` configuration.
- [ ] Add some attribute to result flowfile.
- [x] Source URL
- [ ] Page Title
- [ ] Etc.
- [ ] Add capabilities to execute javascript after page loading.