{"id":19078975,"url":"https://github.com/tomtom/websitary","last_synced_at":"2025-04-30T05:23:43.169Z","repository":{"id":873949,"uuid":"615259","full_name":"tomtom/websitary","owner":"tomtom","description":"A ruby-based script that monitors webpages, rss feeds, podcasts etc.","archived":false,"fork":false,"pushed_at":"2018-12-18T09:28:44.000Z","size":55,"stargazers_count":8,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-18T20:49:52.153Z","etag":null,"topics":["rss","ruby","website-monitor","websites"],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tomtom.png","metadata":{"files":{"readme":"README.rdoc","changelog":"History.txt","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2010-04-17T17:33:48.000Z","updated_at":"2022-04-22T09:52:08.000Z","dependencies_parsed_at":"2022-08-16T11:15:25.206Z","dependency_job_id":null,"html_url":"https://github.com/tomtom/websitary","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tomtom%2Fwebsitary","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tomtom%2Fwebsitary/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tomtom%2Fwebsitary/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tomtom%2Fwebsitary/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tomtom","download_url":"https://codeload.github.com/tomtom/websitary/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251646315,"owners_count":21620909,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["rss","ruby","website-monitor","websites"],"created_at":"2024-11-09T02:12:57.421Z","updated_at":"2025-04-30T05:23:43.133Z","avatar_url":"https://github.com/tomtom.png","language":"Ruby","readme":"websitary by Thomas Link\nhttp://rubyforge.org/projects/websitiary/\n\nThis ruby-based script monitors webpages, rss feeds, podcasts etc. and \nreports what's new. It reuses other programs to do the actual work. By \ndefault, it works on an ASCII basis, i.e. it runs diff on the output of \ntext-based webbrowsers like w3m, lynx, or links. With the help of some \nfriends, it works also with HTML. Maybe it of help for some of you.\n\nPlease see the requirements section below.\n\n\n== DESCRIPTION:\nwebsitary (formerly known as websitiary with an extra \"i\") monitors \nwebpages, rss feeds, podcasts etc. It reuses other programs (w3m, diff \netc.) to do most of the actual work. By default, it works on an ASCII \nbasis, i.e. with the output of text-based webbrowsers like w3m (or lynx, \nlinks etc.) as the output can easily be post-processed. It can also work \nwith HTML and highlight new items. This script was originally planned as \na ruby-based websec replacement.\n\nBy default, this script will use w3m to dump HTML pages and then run \ndiff over the current page and the previous backup. Some pages are \nbetter viewed with lynx or links. Downloaded documents (HTML or ASCII) \ncan be post-processed (e.g., filtered through some ruby block that \nextracts elements via hpricot and the like). Please see the \nconfiguration options below to find out how to change this globally or \nfor a single source.\n\nThis user manual is also available as\nPDF[http://websitiary.rubyforge.org/websitary.pdf].\n\n\n== FEATURES/PROBLEMS:\n* Handle webpages, rss feeds (optionally save attachments in podcasts \n  etc.)\n* Compare webpages with previous backups\n* Display differences between the current version and the backup\n* Provide hooks to post-process the downloaded documents and the diff\n* Display a one-page report summarizing all news\n* Automatically open the report in your favourite web-browser\n* Experimental: Download webpages on defined intervalls and generate \n  incremental diffs.\n\nISSUES, TODO:\n* With HTML output, changes are presented on one single page, which \n  means that pages with different encodings cause problems.\n* Improved support for robots.txt (test it)\n* The use of :website_below and :website is hardly tested (please \n  report errors).\n* download =\u003e :body_html tries to rewrite references (a, img) which may \n  fail on certain kind of urls (please report errors).\n* When using :body_html for download, it may happen that some \n  JavaScript code is stripped, which breaks some JavaScript-generated \n  links.\n* The --log command line will create a new instance of the logger and \n  thus reset any previous options related to the logging level.\n\nNOTE: The script was previously called websitiary but was renamed (from \n0.2 on) to websitary (without the superfluous i).\n\n\n=== Caveat\nThe script also includes experimental support for monitoring whole \nwebsites. Basically, this script supports robots.txt directives (see \nrequirements) but this is hardly tested and may not work in some cases.\n\nWhile it is okay for your own websites to ignore robots.txt, it is not \nfor others. Please make sure that the webpages you run this program on \nallow such a use.  Some webpages disallow the use of any automatic \ndownloader or offline reader in their user agreements.\n\n\n== SYNOPSIS:\n\n=== Usage\nExample:\n  # Run \"profile\"\n  websitary profile\n  \n  # Edit \"~/.websitary/profile.rb\"\n  websitary --edit=profile\n  \n  # View the latest report\n  websitary -ereview\n  \n  # Refetch all sources regardless of :days and :hours restrictions\n  websitary -signore_age=true\n  \n  # Create html and rss reports for my websites\n  websitary -fhtml,rss mysites\n  \n  # Add an url to the quicklist profile\n  websitary -eadd http://www.example.com\n\nFor example output see:\n* html[http://deplate.sourceforge.net/websitary.html]\n* rss[http://deplate.sourceforge.net/websitary.rss]\n* text[http://deplate.sourceforge.net/websitary.txt]\n\n\n=== Configuration\nProfiles are plain ruby files (with the '.rb' suffix) stored in \n~/.websitary/.\n\nThe profile \"config\" (~/.websitary/config.rb) is always loaded if \navailable.\n\nThere are two special profile names:\n\n-::\n    Read URLs from STDIN.\n\u003ctt\u003e__END__\u003c/tt\u003e::\n    Read the profile contained in the script source after the __END__ \n    line.\n\n\n==== default 'PROFILE1', 'PROFILE2' ...\nSet the default profile(s). The default is: quicklist\n\nExample:\n  default 'my_profile'\n\n\n==== diff 'CMD \"%s\" \"%s\"'\nUse this shell command to make the diff.\n%s %s will be replaced with the old and new filename.\n\ndiff is used by default.\n\n\n==== diffprocess lambda {|text| ...}\nUse this ruby snippet to post-process the diff.\n\n\n==== download 'CMD \"%s\"'\nUse this shell command to download a page.\n%s will be replaced with the url.\n\nw3m is used by default.\n\nExample:\n  download 'lynx -dump \"%s\"'\n\n\n==== downloadprocess lambda {|text| ...}\nUse this ruby snippet to post-process what was downloaded. Return the \nnew text.\n\n\n==== edit 'CMD \"%s\"'\nUse this shell command to edit a profile. %s will be replaced with the filename.\n\nvi is used by default.\n\nExample:\n  edit 'gvim \"%s\"\u0026'\n\n\n==== option TYPE, OPTION =\u003e VALUE\nSet a global option.\n\nTYPE can be one of:\n\u003ctt\u003e:diff\u003c/tt\u003e::\n  Generate a diff\n\u003ctt\u003e:diffprocess\u003c/tt\u003e::\n  Post-process a diff (if necessary)\n\u003ctt\u003e:format\u003c/tt\u003e::\n  Format the diff for output\n\u003ctt\u003e:download\u003c/tt\u003e::\n  Download webpages\n\u003ctt\u003e:downloadprocess\u003c/tt\u003e::\n  Post-process downloaded webpages\n\u003ctt\u003e:page\u003c/tt\u003e::\n  The :format field defines the format of the final report. Here VALUE \n  is a format string that takes 3 variables as arguments: report title, \n  toc, contents.\n\u003ctt\u003e:global\u003c/tt\u003e::\n  Set a \"global\" option.\n\nDOWNLOAD is a symbol\n\nVALUE is either a format string or a block of code (of class Proc).\n\nExample:\n  set :download, :foo =\u003e lambda {|url| get_url(url)}\n\n\n==== global OPTION =\u003e VALUE\nThis is the same a \u003ctt\u003eoption :global, OPTION =\u003e VALUE\u003c/tt\u003e.\n\nKnown global options:\n\n\u003ctt\u003e:canonic_filename =\u003e BLOCK(FILENAME)\u003c/tt\u003e::\n  Rewrite filenames as they are stored in the mtimes register. This may \n  useful if you want to use the same repository on several computers \n  with in different locations etc.\n\n\u003ctt\u003e:encoding =\u003e OUTPUT_DOCUMENT_ENCODING\u003c/tt\u003e::\n  The default is 'ISO-8859-1'.\n\n\u003ctt\u003e:downloadhtml =\u003e SHORTCUT\u003c/tt\u003e::\n  The default shortcut for downloading plain HTML.\n\n\u003ctt\u003e:file_url =\u003e BLOCK(FILENAME)\u003c/tt\u003e::\n  Rewrite a filename as it is used for creating file urls to local \n  copies in the output. This may useful if you want to use the same \n  repository on several computers with in different locations etc.\n\n\u003ctt\u003e:filename_size =\u003e N\u003c/tt\u003e::\n  The max filename size. If a filename becomes longer, md5 encoding will \n  be used for local copies in the cache.\n\n\u003ctt\u003e:toggle_body =\u003e BOOLEAN\u003c/tt\u003e::\n  If true, make a news body collabsable on mouse-clicks (sort of).\n\n\u003ctt\u003e:proxy =\u003e STRING\u003c/tt\u003e, \u003ctt\u003e:proxy =\u003e ARRAY\u003c/tt\u003e::\n  The proxy. (currently only supported by mechanize)\n\n\u003ctt\u003e:user_agent =\u003e STRING\u003c/tt\u003e::\n  Set the user agent (only for certain queries).\n\n\n==== output_format FORMAT, output_format [FORMAT1, FORMAT2, ...]\nSet the output format.\nFormat can be one of:\n\n* html\n* text, txt (this only works with text based downloaders)\n* rss (prove of concept only;\n  it requires :rss[:url] to be set to the url, where the rss feed will \n  be published, using the \u003ctt\u003eoption :rss, :url =\u003e URL\u003c/tt\u003e \n  configuration command; you either have to use a text-based downloader \n  or include \u003ctt\u003e:rss_format =\u003e 'html'\u003c/tt\u003e to the url options)\n\n\n==== set OPTION =\u003e VALUE; set TYPE, OPTION =\u003e VALUE; unset OPTIONS\n(Un)Set an option for the following source commands.\n\nExample:\n  set :download, :foo =\u003e lambda {|url| get_url(url)}\n  set :days =\u003e 7, sort =\u003e true\n  unset :days, :sort\n\n\n==== source URL(S), [OPTIONS]\nOptions\n\n\u003ctt\u003e:cols =\u003e FROM..TO\u003c/tt\u003e::\n  Use only these colums from the output (used after applying the :lines \n  option)\n\n\u003ctt\u003e:depth =\u003e INTEGER\u003c/tt\u003e::\n  In conjunction with a :website type of :download option, fetch url up \n  to this depth.\n\n\u003ctt\u003e:diff =\u003e \"CMD\", :diff =\u003e SHORTCUT\u003c/tt\u003e::\n  Use this command to make the diff for this page. Possible values for \n  SHORTCUT are: :webdiff (useful in conjunction with :download =\u003e :curl, \n  :wget, or :body_html), :websec_webdiff (use websec's webdiff tool), \n  :body_html, :website_below, :website and :openuri are synonyms for \n  :webdiff.\n  NOTE: Since version 0.3, :webdiff is mapped to websitary's own \n  htmldiff class (which can also be used as stand-alone script). Before \n  0.3, websitary used websec's webdiff script, which is now mapped to \n  :websec_webdiff.\n\n\u003ctt\u003e:diffprocess =\u003e lambda {|text| ...}\u003c/tt\u003e::\n  Use this ruby snippet to post-process this diff\n\n\u003ctt\u003e:download =\u003e \"CMD\", :download =\u003e SHORTCUT\u003c/tt\u003e::\n  Use this command to download this page. For possible values for \n  SHORTCUT see the section on shortcuts below.\n\n\u003ctt\u003e:downloadprocess =\u003e lambda {|text| ...}\u003c/tt\u003e::\n  Use this ruby snippet to post-process what was downloaded. This is the \n  place where, e.g., hpricot can be used to extract certain elements \n  from the HTML code.\n  Example:\n    lambda {|text| Hpricot(text).at('div#content').inner_html}\n\n\u003ctt\u003e:format =\u003e \"FORMAT %s STRING\", :format =\u003e SHORTCUT\u003c/tt\u003e::\n  The format string for the diff text. The default (the :diff shortcut) \n  wraps the output in +pre+ tags. :webdiff, :body_html, :website_below, \n  :website, and :openuri will simply add a newline character.\n\n\u003ctt\u003e:iconv =\u003e ENCODING\u003c/tt\u003e::\n  If set, use iconv to convert the page body into the summary's document \n  encoding (see the 'global' section). Websitary currently isn't able to \n  automatically determine and convert encodings.\n\n\u003ctt\u003e:timeout =\u003e SECONDS\u003c/tt\u003e::\n  When using openuri, download the page with a timeout.\n\n\u003ctt\u003e:hours =\u003e HOURS, :days =\u003e DAYS\u003c/tt\u003e::\n  Don't download the file unless it's older than that\n\n\u003ctt\u003e:days_of_month =\u003e DAY..DAY, :mdays =\u003e DAY..DAY\u003c/tt\u003e::\n  Download only once per month within a certain range of days (e.g., \n  15..31 ... Check once after the 15th). The argument can also be an \n  array (e.g, [1, 15]) or an integer.\n\n\u003ctt\u003e:days_of_week =\u003e DAY..DAY, :wdays =\u003e DAY..DAY\u003c/tt\u003e::\n  Download only once per week within a certain range of days (e.g., 1..2 \n  ... Check once on monday or tuesday; sunday = 0). The argument can \n  also be an array (e.g, [1, 15]) or an integer.\n\n\u003ctt\u003e:daily =\u003e true\u003c/tt\u003e::\n  Download only once a day.\n\n\u003ctt\u003e:ignore_age =\u003e true\u003c/tt\u003e::\n  Ignore any :days and :hours settings. This is useful in some cases \n  when set on the command line.\n\n\u003ctt\u003e:lines =\u003e FROM..TO\u003c/tt\u003e::\n  Use only these lines from the output\n\n\u003ctt\u003e:match =\u003e REGEXP\u003c/tt\u003e::\n  When recursively walking a website, follow only links that match this \n  regexp.\n\n\u003ctt\u003e:rss_rewrite_enclosed_urls =\u003e true\u003c/tt\u003e::\n  If true, replace urls in the rss feed item description pointing to the \n  enclosure with a file url pointing to the local copy\n\n\u003ctt\u003e:rss_enclosure =\u003e true|\"DIRECTORY\"\u003c/tt\u003e::\n  If true, save rss feed enclosures in \n  \"~/.websitary/attachments/RSS_FEED_NAME/\". If a string, use this as \n  destination directory. Only enclosures of new items will be saved -- \n  i.e. when downloading a feed for the first time, no enclosures will be \n  saved.\n\n\u003ctt\u003e:rss_find_enclosure =\u003e BLOCK\u003c/tt\u003e::\n  Certain RSS-feeds embed enclosures in the description. Use this option \n  to scan the description (a Hpricot document) for an URL that is then saved \n  as enclosure if the :rss_enclosure option is set.\n  Example:\n      source 'http://www.example.com/rss',\n        :title =\u003e 'Example',\n        :use =\u003e :rss, :rss_enclosure =\u003e true,\n        :rss_find_enclosure =\u003e lambda {|item, doc| (doc / 'img').map {|e| e['src']}[0]}\n\n\u003ctt\u003e:rss_format (default: \"plain_text\")\u003c/tt\u003e::\n    When output format is :rss, create rss item descriptios as plain text.\n\n\u003ctt\u003e:rss_format_local_copy =\u003e FORMAT_STRING | BLOCK\u003c/tt\u003e::\n    By default a hypertext reference to the local copy of an RSS \n    enclosure is added to entry. Sometimes you may want to display \n    something inline (e.g. an image). You can then use this option to \n    define a format string (one field = the local copy's file url).\n\n\u003ctt\u003e:show_initial =\u003e true\u003c/tt\u003e::\n    Include initial copies in the report (may not always work properly). \n    This can also be set as a global option.\n\n\u003ctt\u003e:sleep =\u003e SECS\u003c/tt\u003e::\n    Wait SECS seconds (float or integer) before downloading the page.\n\n\u003ctt\u003e:sort =\u003e true, :sort =\u003e lambda {|a,b| ...}\u003c/tt\u003e::\n  Sort lines in output\n\n\u003ctt\u003e:strip =\u003e true\u003c/tt\u003e::\n  Strip empty lines\n\n\u003ctt\u003e:title =\u003e \"TEXT\"\u003c/tt\u003e::\n  Display TEXT instead of URL\n\n\u003ctt\u003e:use =\u003e SYMBOL\u003c/tt\u003e::\n  Use SYMBOL for any other option. I.e. \u003ctt\u003e:download =\u003e :body_html \n  :diff =\u003e :webdiff\u003c/tt\u003e can be abbreviated as \u003ctt\u003e:use =\u003e \n  :body_html\u003c/tt\u003e (because for :diff :body_html is a synonym for \n  :webdiff).\n\nThe order of age constraints is:\n:hours \u003e :daily \u003e :wdays \u003e :mdays \u003e :days \u003e :months.\nI.e. if :wdays is set, :mdays, :days, or :months are ignored.\n\n\n==== view 'CMD \"%s\"'\nUse this shell command to view the output (usually a HTML file).\n%s will be replaced with the filename.\n\nw3m is used by default.\n\nExample:\n  view 'gnome-open \"%s\"' # Gnome Desktop\n  view 'kfmclient \"%s\"'  # KDE\n  view 'cygstart \"%s\"'   # Cygwin\n  view 'start \"%s\"'      # Windows\n  view 'firefox \"%s\"'\n\n\n=== Shortcuts for use with :use, :download and other options\n\u003ctt\u003e:w3m\u003c/tt\u003e::\n  Use w3m for downloading the source. Use diff for generating diffs.\n\n\u003ctt\u003e:lynx\u003c/tt\u003e::\n  Use lynx for downloading the source. Use diff for generating diffs.\n  Lynx doesn't try to recreate the layout of a page like w3m or links \n  do. As a result the output IMHO sometimes deviates from the original \n  design but is better suited for being post-processed in some \n  situation.\n\n\u003ctt\u003e:links\u003c/tt\u003e::\n  Use links for downloading the source. Use diff for generating diffs.\n\n\u003ctt\u003e:curl\u003c/tt\u003e::\n  Use curl for downloading the source. Use webdiff for generating diffs.\n\n\u003ctt\u003e:wget\u003c/tt\u003e::\n  Use wget for downloading the source. Use webdiff for generating diffs.\n\n\u003ctt\u003e:openuri\u003c/tt\u003e::\n  Use open-uri for downloading the source. Use webdiff for generating \n  diffs. This doesn't handle cookies and the like.\n\n\u003ctt\u003e:mechanize\u003c/tt\u003e::\n  Use mechanize (must be installed) for downloading the source. Use \n  webdiff for generating diffs. This calls the URL's :mechanize property \n  (a lambda that takes 3 arguments: URL, agent, page =\u003e HTML as string) \n  to post-process the page (or if not available, use the page body's \n  HTML).\n\n\u003ctt\u003e:text\u003c/tt\u003e::\n  This requires hpricot to be installed. Use open-uri for downloading \n  and hpricot for converting HTML to plain text. This still requires \n  diff as external helper.\n\n\u003ctt\u003e:body_html\u003c/tt\u003e::\n  This requires hpricot to be installed. Use open-uri for downloading \n  the source, use only the body. Use webdiff for generating diffs. Try \n  to rewrite references (a, img) so that the point to the webpage. By \n  default, this will also strip tags like script, form, object ...\n\n\u003ctt\u003e:website\u003c/tt\u003e::\n  Use :body_html to download the source. Follow all links referring to \n  the same host with the same file suffix. Use webdiff for generating \n  diff.\n\n\u003ctt\u003e:website_below\u003c/tt\u003e::\n  Use :body_html to download the source. Follow all links referring to \n  the same host and a file below the top directory with the same file \n  suffix. Use webdiff for generating diff.\n\n\u003ctt\u003e:website_txt\u003c/tt\u003e::\n  Use :website to download the source but convert the output to plain \n  text.\n\n\u003ctt\u003e:website_txt_below\u003c/tt\u003e::\n  Use :website_below to download the source but convert the output to \n  plain text.\n\n\u003ctt\u003e:rss\u003c/tt\u003e::\n  Download an rss feed, show changed items.\n\n\u003ctt\u003e:opml\u003c/tt\u003e::\n  Experimental. Download the rss feeds registered in opml. No support \n  for atom yet. \n\n\u003ctt\u003e:img\u003c/tt\u003e::\n  Download an image and display it in the output if it has changed \n  (according to diff). You can use hpricot to extract an image from a \n  HTML source. Example:\n\nAny shortcuts relying on :body_html will also try to rewrite any \nreferences so that the links point to the webpage.\n\n\n\n=== Example configuration file for demonstration purposes\n\n  # Daily\n  set :days =\u003e 1\n  \n  # Use lynx instead of the default downloader (w3m).\n  source 'http://www.example.com', :days =\u003e 7, :download =\u003e :lynx\n  \n  # Use the HTML body and process via webdiff.\n  source 'http://www.example.com', :use =\u003e :body_html,\n    :downloadprocess =\u003e lambda {|text| Hpricot(text).at('div#content').inner_html}\n  \n  # Download a podcast\n  source 'http://www.example.com/podcast.xml', :title =\u003e 'Podcast',\n    :use =\u003e :rss,\n    :rss_enclosure =\u003e '/home/me/podcasts/example'\n  \n  # Check a rss feed.\n  source 'http://www.example.com/news.xml', :title =\u003e 'News', :use =\u003e :rss\n  \n  # Get rss feed info from an opml file (EXPERIMENTAL).\n  # @cfgdir is most likely '~/.websitary'.\n source File.join(@cfgdir, 'news.opml'), :use =\u003e :opml\n  \n  \n  # Weekly\n  set :days =\u003e 7\n  \n  # Consider the page body only from the 10th line downwards.\n  source 'http://www.example.com', :lines =\u003e 10..-1, :title =\u003e 'My Page'\n  \n  \n  # Bi-weekly\n  set :days =\u003e 14\n  \n  # Use these urls with the default options.\n  source \u003c\u003cURLS\n  http://www.example.com\n  http://www.example.com/page.html\n  URLS\n  \n  # Make HTML diffs and highlight occurences of a word\n  source 'http://www.example.com',\n    :title =\u003e 'Example',\n    :use =\u003e :body_html,\n    :diffprocess =\u003e highlighter(/word/i)\n  \n  # Download the whole website below this path (only pages with \n  # html-suffix), wait 30 secs between downloads.\n  # Download only php and html pages\n  # Follow links 2 levels deep\n  source 'http://www.example.com/foo/bar.html',\n    :title =\u003e 'Example -- Bar',\n    :use =\u003e :website_below, :sleep =\u003e 30,\n    :match =\u003e /\\.(php|html)\\b/, :depth =\u003e 2\n  \n  # Download images from some kind of daily-image site (check the user \n  # agreement first, if this is allowed). This may require some ruby \n  # hacking in order to extract the right url.\n  source 'http://www.example.com/daily_image/', :title =\u003e 'Daily Image',\n    :use =\u003e :img,\n    :download =\u003e lambda {|url|\n      rv = nil\n      # Read the HTML.\n      html = open(url) {|io| io.read}\n      # This check is probably unnecessary as the failure to read \n      # the HTML document would most likely result in an \n      # exception.\n      if html\n        # Parse the HTML document.\n        doc = Hpricot(html)\n        # The following could actually be simplified using xpath \n        # or css search expressions. This isn't the most elegant \n        # solution but it works with any value of ALT.\n        # This downloads the image \u003cimg src=\"...\" alt=\"Current Image\"\u003e\n        # Check all img tags in the HTML document.\n        for e in doc.search(%{//img})\n          # Is this the image we're looking for?\n          if e['alt'] == \"Current Image\"\n            # Make relative urls absolute\n            img = rewrite_href(e['src'], url)\n            # Get the actual image data\n            rv = open(img, 'rb') {|io| io.read}\n            # Exit the for loop\n            break\n          end\n        end\n        rv\n      end\n    }\n  \n  \n  unset :days\n\n\n\n=== Commands for use with the -e command-line option\nMost of these commands require you to name a profile on the command \nline. You can define default profiles with the \"default\" configuration \ncommand.\n\nIf no command is given, \"downdiff\" is executed.\n\nadd::\n    Add the URLs given on the command line to the quicklist profile. \n    ATTENTION: The following arguments on the command line are URLs, not \n    profile names.\n\naggregate::\n    Retrieve information and save changes for later review.\n\nconfiguration::\n    Show the fully qualified configuration of each source.\n\ndowndiff::\n    Download and show differences (DEFAULT)\n\nedit::\n    Edit the profile given on the command line (use vi by default)\n\nlatest::\n    Show the latest copies of the sources from the profiles given \n    on the command line.\n\nls::\n    List number of aggregated diffs.\n\nrebuild::\n    Rebuild the latest report.\n\nreview::\n    Review the latest report (just show it with the browser)\n\nshow::\n    Show previously aggregated items. A typical use would be to \n    periodically run in the background a command like\n        websitary -eaggregate newsfeeds\n    and then\n        websitary -eshow newsfeeds\n    to review the changes.\n\nunroll::\n    Undo the latest fetch.\n\n\n\n== TIPS:\n=== Ruby\nThe profiles are regular ruby sources that are evaluated in the context \nof the configuration object (Websitary::Configuration). Find out more \nabout ruby at:\n* http://www.ruby-lang.org/en/documentation/\n* http://www.ruby-doc.org/docs/ProgrammingRuby/ (especially \n  the \n  language[http://www.ruby-doc.org/docs/ProgrammingRuby/html/language.html] \n  chapter)\n\n\n=== Cygwin\nMixing native Windows apps and cygwin apps can cause problems. The \nfollowing settings (e.g. in ~/.websitary/config.rb) can be used to use \na native Windows editor and browser:\n\n  # Use the default Windows programs (as if double-clicked)\n  view '/usr/bin/cygstart \"%s\"'\n  \n  # Translate the profile filename and edit it with a native Windows editor\n  edit 'notepad.exe $(cygpath -w -- \"%s\")'\n  \n  # Rewrite cygwin filenames for use with a native Windows browser\n  option :global, :file_url =\u003e lambda {|f| f.sub(/\\/cygdrive\\/.+?\\/.websitary\\//, '')}\n\n\n=== Windows\nBackslashes usually have to be escaped by backslashes -- or use slashes. \nI.e. instead of 'c:\\foo\\bar' write either 'c:\\\\foo\\\\bar' or \n'c:/foo/bar'.\n\n\n== REQUIREMENTS:\nwebsitary is a ruby-based application. You thus need a ruby \ninterpreter.\n\nIt depends on how you use websitary whether you actually need the \nfollowing libraries, applications.\n\nBy default this script expects the following applications to be \npresent:\n\n* diff\n* vi (or some other editor)\n\nand one of:\n\n* w3m[http://w3m.sourceforge.net/] (default)\n* lynx[http://lynx.isc.org/]\n* links[http://links.twibright.com/]\n\nThe use of :websec_webdiff as :diff application requires \nwebsec[http://baruch.ev-en.org/proj/websec/] (or at \nSavannah[http://savannah.nongnu.org/projects/websec/]) to be installed. \nBy default, websitary uses it's own htmldiff class/script, which is less \nwell tested and may return inferior results in comparison with websec's \nwebdiff. In conjunction with :body_html, :openuri, or :curl, this will \ngive you colored HTML diffs.\n\nFor downloading HTML, you need one of these:\n\n* open-uri (should be part of ruby)\n* hpricot[http://code.whytheluckystiff.net/hpricot] (used e.g. by \n  :body_html, :website, and :website_below)\n* curl[http://curl.haxx.se/]\n* wget[http://www.gnu.org/software/wget/]\n\nThe following ruby libraries are needed in conjunction with :body_html \nand :website related shortcuts:\n\n* hpricot[http://code.whytheluckystiff.net/hpricot] (parse HTML, use \n  only the body etc.)\n* robot_rules.rb[http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/177589] \n  for parsing robots.txt\n\nI personally would suggest to choose the following setup:\n\n* w3m[http://w3m.sourceforge.net/]\n* hpricot[http://code.whytheluckystiff.net/hpricot]\n* robot_rules.rb[http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/177589]\n\n\n== INSTALL:\n=== Use rubygems\nRun\n\n    gem install websitary\n\nThis will download the package and install it.\n\n\n=== Use the zip\nThe zip[http://rubyforge.org/frs/?group_id=4030] contains a file \nsetup.rb that does the work. Run\n\n    ruby setup.rb\n\n\n=== Initial Configuration\nPlease check the requirements section above and get the extra libraries \nneeded:\n* hpricot\n* robot_rules.rb\n\nThese could be installed by:\n\n  # Install hpricot\n  gem install hpricot\n  \n  # Install robot_rules.rb\n  wget http://www.rubyquiz.com/quiz64_sols.zip\n  # Check the correct path to site_ruby first!\n  unzip -p quiz64_sols.zip \"solutions/James Edward Gray II/robot_rules.rb\" \u003e /lib/ruby/site_ruby/1.8/robot_rules.rb\n  rm quiz64_sols.zip\n\nYou might then want to create a profile ~/.websitary/config.rb that is \nloaded on every run. In this profile you could set the default output \nviewer and profile editor, as well as a default profile.\n\nExample:\n\n  # Load standard.rb if no profile is given on the command line.\n  default 'standard'\n  \n  # Use cygwin's cygstart to view the output with the default HTML \n  # viewer\n  view '/usr/bin/cygstart \"%s\"'\n  \n  # Use Windows gvim from cygwin ruby which is why we convert the path \n  # first\n  edit 'gvim $(cygpath -w -- \"%s\")'\n\nWhere these configuration files reside, may differ. If the environment \nvariable $HOME is defined, the default is $HOME/.websitary/ unless one \nof the following directories exist, which will then be used instead:\n\n* $USERPROFILE/websitary (on Windows)\n* SYSCONFDIR/websitary (where SYSCONFDIR usually is /etc but you can \n  run ruby to find out more:\n  \u003ctt\u003eruby -e \"p Config::CONFIG['sysconfdir']\"\u003c/tt\u003e)\n\nIf neither directory exists and no $HOME variable is defined, the \ncurrent directory will be used.\n\nNow check out the configuration commands in the Synopsis section.\n\n\n== LICENSE:\nwebsitary Webpage Monitor\nCopyright (C) 2007-2008 Thomas Link\n\nThis program is free software; you can redistribute it and/or modify\nit under the terms of the GNU General Public License as published by\nthe Free Software Foundation; either version 2 of the License, or\n(at your option) any later version.\n\nThis program is distributed in the hope that it will be useful,\nbut WITHOUT ANY WARRANTY; without even the implied warranty of\nMERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\nGNU General Public License for more details.\n\nYou should have received a copy of the GNU General Public License\nalong with this program; if not, write to the Free Software\nFoundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  \nUSA\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftomtom%2Fwebsitary","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftomtom%2Fwebsitary","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftomtom%2Fwebsitary/lists"}