{"id":28511174,"url":"https://github.com/zone-eu/webalizer","last_synced_at":"2025-10-24T04:43:57.189Z","repository":{"id":73686385,"uuid":"412187092","full_name":"zone-eu/webalizer","owner":"zone-eu","description":null,"archived":false,"fork":false,"pushed_at":"2021-10-04T12:55:24.000Z","size":574,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-06-20T02:41:35.776Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zone-eu.png","metadata":{"files":{"readme":"README","changelog":"CHANGES","contributing":null,"funding":null,"license":"COPYING","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2021-09-30T18:52:02.000Z","updated_at":"2021-10-04T13:04:13.000Z","dependencies_parsed_at":"2023-02-25T10:46:20.643Z","dependency_job_id":null,"html_url":"https://github.com/zone-eu/webalizer","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/zone-eu/webalizer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zone-eu%2Fwebalizer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zone-eu%2Fwebalizer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zone-eu%2Fwebalizer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zone-eu%2Fwebalizer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zone-eu","download_url":"https://codeload.github.com/zone-eu/webalizer/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zone-eu%2Fwebalizer/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267058433,"owners_count":24029021,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-25T02:00:09.625Z","response_time":70,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-08T23:37:42.160Z","updated_at":"2025-10-24T04:43:56.445Z","avatar_url":"https://github.com/zone-eu.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"The Webalizer - A web server log file analysis tool\nCopyright 1997-2013 by Bradford L. Barrett\n\nDistributed under the GNU GPL.  See the files \"COPYING\" and\n\"Copyright\" supplied with the distribution for additional info.\n\n\nWhat is The Webalizer?\n----------------------\n\nThe Webalizer is a web server log file analysis program which produces\nusage statistics in HTML format for viewing with a browser.  The results\nare presented in both columnar and graphical format, which facilitates\ninterpretation.  Yearly, monthly, daily and hourly usage statistics are\npresented, along with the ability to display usage by site, URL, referrer,\nuser agent (browser), search string, entry/exit page, username and country\n(some information is only available if supported and present in the log\nfiles being processed).  Processed data may also be exported into most\ndatabase and spreadsheet programs that support tab delimited data formats.\n\nThe Webalizer supports CLF (common log format) log files, as well as\nCombined log formats as defined by NCSA and others, and variations\nof these which it attempts to handle intelligently.  In addition, The\nWebalizer supports wu-ftpd xferlog (FTP) formatted logs, squid proxy logs\nand W3C extended format logs.\n\nGzip compressed logs may be used as input directly.   Any log filename\nthat ends with a '.gz' extension will be assumed to be in gzip format and\nuncompressed on the fly as it is being read.  The Webalizer now also has\nthe ability to handle BZip2 compressed logs, if enabled at compile time.\nSimilar to gzipped logs, any log filename that ends with a '.bz2' will be\nassumed to be in bzip2 format and uncompressed on the fly as it is being\nread.\n\nFor sites that do not enable hostname lookups (DNS resolution) on their\nweb servers (and have only IP addresses in their logs), The Webalizer\nprovides its own internal DNS lookup capability as well as geolocation\nservices (GeoDB).  The optional GeoIP library from MaxMind Inc. is also\nsupported and may be used instead of the native GeoDB database.\n\nA utility program, \"The Webalizer (DNS) Cache file Manager\", or 'wcmgr'\nis also provided which allows the creation and manipulation of the DNS\ncache files used and produced by the webalizer.  See the file DNS.README\nfor additional information regarding DNS support.\n\nThis documentation applies to The Webalizer Version 2.23\n\nRunning the Webalizer\n---------------------\n\nThe Webalizer was designed to be run from a Unix command line prompt or\nas a cron job.  There are several command line options which will modify\nthe results it produces, and configuration files can be used as well.\nThe format of the command line is:\n\nwebalizer [options ...] [log-file]\n\nWhere 'options' can be one or more of the supported command line\nswitches described below.  'log-file' is the name of the log file\nto process (see below for more detailed information).  If a dash\n(\"-\") is specified for the log-file name, STDIN will be used.\n\n\nOnce executed, the general flow of the program follows:\n\no A default configuration file is scanned for.  A file named\n  'webalizer.conf' is searched for in the current directory, and if\n  found, its configuration data is parsed.  If the file is not\n  present in the current directory,  the file '/etc/webalizer.conf'\n  is searched for and, if found, is used instead.\n\no Any command line arguments given to the program are parsed.  This\n  may include the specification of a configuration file, which is\n  processed at the time it is encountered.\n\no If a log file was specified, it is opened and made ready for\n  processing.  If no log file was given, or the filename '-' is\n  specified on the command line, STDIN is used for input.\n\no If an output directory was specified, the program does a 'chdir' to\n  that directory in preparation for generating output.  If no output\n  directory was given, the current directory is used.\n\no If a non-zero number of DNS Children processes were specified, they\n  will be started, and the specified log file will be processed,\n  either creating or updating the specified DNS cache file.\n\no If no hostname was given, the program attempts to get the hostname\n  using a uname system call.  If that fails, 'localhost' is used.\n\no A history file is searched for.  This file keeps previous month\n  totals used on the main index.html page.  The default file is\n  named 'webalizer.hist', kept in the specified output directory,\n  however may be changed using the \"HistoryName\" configuration file\n  keyword.\n\no If incremental processing was specified, a data file is searched for\n  and loaded if found, containing the 'internal state' data of the\n  program at the end of a previous run.  The default file is named\n  'webalizer.current', kept in the specified output directory, however\n  may be changed using the \"IncrementalName\" configuration file keyword.\n\no Main processing begins on the log file.  If the log spans multiple\n  months, a separate HTML document is created for each month.\n\no After main processing, the main 'index.html' page is created, which\n  has totals by month and links to each months HTML document.\n\no A new history file is saved to disk, which includes totals generated\n  by The Webalizer during the current run.\n\no If incremental processing was specified, a data file is written that\n  contains the 'internal state' data at the end of this run.\n\n\nIncremental Processing\n----------------------\n\nVersion 1.2x of The Webalizer adds incremental run capability.  Simply\nput, this allows processing large log files by breaking them up into\nsmaller pieces, and processing these pieces instead.  What this means\nin real terms is that you can now rotate your log files as often as you\nwant, and still be able to produce monthly usage statistics without the\nloss of any detail.  This is accomplished by saving and restoring all\nrelevant internal data to a disk file between runs.  Doing so allows the\nprogram to 'start where it left off' so to speak, and allows the\npreservation of detail from one run to the next.\n\nSome special precautions need to be taken when using the incremental\nrun capability of The Webalizer.  Configuration options should not be\nchanged between runs, as that could cause corruption of the internal\nstored data.  For example, changing the MangleAgents level will cause\ndifferent representations of user agents to be stored, producing invalid\nresults in the user agents section of the report.  If you need to change\nconfiguration options, do it at the end of the month after normal\nprocessing of the previous month and before processing the current month.\nYou may also want to delete the 'webalizer.current' file as well (or\nwhatever name was specified using the \"IncrementalName\" configuration\noption).\n\nThe Webalizer also attempts to prevent data duplication by keeping\ntrack of the timestamp of the last record processed.  This timestamp\nis then compared to current records being processed, and any records\nthat were logged previous to that timestamp are ignored.  This, in\ntheory, should allow you to re-process logs that have already been\nprocessed, or process logs that contain a mix of processed/not yet\nprocessed records, and not produce duplication of statistics.  The\nonly time this may break is if you have duplicate timestamps in two\nseparate log files... any records in the second log file that do have\nthe same timestamp as the last record in the previous log file processed,\nwill be discarded as if they had already been processed.  There are\nlots of ways to prevent this however, for example, stopping the web\nserver before rotating logs will prevent this situation.  This setup\nalso necessitates that you always process logs in chronological order,\notherwise data loss will occur as a result of the timestamp compare.\n\n\nOutput Produced\n---------------\n\nThe Webalizer produces several reports (html) and graphics for each\nmonth processed.  In addition, a summary page is generated for the\ncurrent and previous months (up to 12), a history file is created\nand if incremental mode is used, the current month's processed data.\nThe exact location and names of these files can be changed using\nconfiguration files and command line options.  The files produced,\n(default names) are:\n\nindex.html              - Main summary page (extension may be changed)\nusage.png               - Yearly graph displayed on the main index page\nusage_YYYYMM.html       - Monthly summary page (extension may be changed)\nusage_YYYYMM.png        - Monthly usage graph for specified month/year\ndaily_usage_YYYYMM.png  - Daily usage graph for specified month/year\nhourly_usage_YYYYMM.png - Hourly usage graph for specified month/year\nsite_YYYYMM.html        - All sites listing (if enabled)\nurl_YYYYMM.html         - All urls listing (if enabled)\nref_YYYYMM.html         - All referrers listing (if enabled)\nagent_YYYYMM.html       - All user agents listing (if enabled)\nsearch_YYYYMM.html      - All search strings listing (if enabled)\nwebalizer.hist          - Previous month history (may be changed)\nwebalizer.current       - Incremental Data (may be changed)\nsite_YYYYMM.tab         - tab delimited sites file\nurl_YYYYMM.tab          - tab delimited urls file\nref_YYYYMM.tab          - tab delimited referrers file\nagent_YYYYMM.tab        - tab delimited user agents file\nuser_YYYYMM.tab         - tab delimited usernames file\nsearch_YYYYMM.tab       - tab delimited search string file\n\nThe yearly (index) report shows statistics for a 12 month period, and\nlinks to each month.  The monthly report has detailed statistics for\nthat month with additional links to any URLs and referrers found.\nThe various totals shown are explained below.\n\nHits\n\n  Any request made to the server which is logged, is considered a 'hit'.\nThe requests can be for anything... html pages, graphic images, audio\nfiles, CGI scripts, etc...  Each valid line in the server log is\ncounted as a hit.  This number represents the total number of requests\nthat were made to the server during the specified report period.\n\nFiles\n\n  Some requests made to the server, require that the server then send\nsomething back to the requesting client, such as a html page or graphic\nimage.  When this happens, it is considered a 'file' and the files\ntotal is incremented.  The relationship between 'hits' and 'files' can\nbe thought of as 'incoming requests' and 'outgoing responses'.\n\nPages\n\n  Pages are, well, pages!  Generally, any HTML document, or anything\nthat generates an HTML document, would be considered a page.  This\ndoes not include the other stuff that goes into a document, such as\ngraphic images, audio clips, etc...  This number represents the number\nof 'pages' requested only, and does not include the other 'stuff' that\nis in the page.  What actually constitutes a 'page' can vary from\nserver to server.  The default action is to treat anything with the\nextension '.htm', '.html' or '.cgi' as a page.  A lot of sites will\nprobably define other extensions, such as '.phtml', '.php3' and '.pl'\nas pages as well.  Some people consider this number as the number of\n'pure' hits... I'm not sure if I totally agree with that viewpoint.\nSome other programs (and people :) refer to this as 'Pageviews'.\n\nSites\n\n  Each request made to the server comes from a unique 'site', which can\nbe referenced by a name or ultimately, an IP address.  The 'sites'\nnumber shows how many unique IP addresses made requests to the server\nduring the reporting time period.  This DOES NOT mean the number of\nunique individual users (real people) that visited, which is impossible\nto determine using just logs and the HTTP protocol (however, this\nnumber might be about as close as you will get).\n\nVisits\n\n  Whenever a request is made to the server from a given IP address\n(site), the amount of time since a previous request by the address\nis calculated (if any).  If the time difference is greater than a\npre-configured 'visit timeout' value (or has never made a request before),\nit is considered a 'new visit', and this total is incremented (both\nfor the site, and the IP address).  The default timeout value is 30\nminutes (can be changed), so if a user visits your site at 1:00 in\nthe afternoon, and then returns at 3:00, two visits would be registered.\nNote: in the 'Top Sites' table, the visits total should be discounted\non 'Grouped' records, and thought of as the \"Minimum number of visits\"\nthat came from that grouping instead.  Note: Visits only occur on\nPageType requests, that is, for any request whose URL is one of the\n'page' types defined with the PageType and PagePrefix option, and not\nexcluded by the OmitPage option.  Due to the limitation of the HTTP\nprotocol, log rotations and other factors, this number should not be\ntaken as absolutely accurate,  rather, it should be considered a pretty\nclose \"guess\".\n\nKBytes\n\n  The KBytes (kilobytes) value shows the amount of data, in KB, that\nwas sent out by the server during the specified reporting period.  This\nvalue is generated directly from the log file, so it is up to the\nweb server to produce accurate numbers in the logs  (some web servers\ndo stupid things when it comes to reporting the number of bytes).  In\ngeneral, this should be a fairly accurate representation of the amount\nof outgoing traffic the server had, regardless of the web servers\nreporting quirks.\n\nNote: A kilobyte is 1024 bytes, not 1000 :)\n\nTop Entry and Exit Pages\n\n  The Top Entry and Exit tables give a rough estimate of what URLs\nare used to enter your site, and what the last pages viewed are.\nBecause of limitations in the HTTP protocol, log rotations, etc...\nthis number should be considered a good \"rough guess\" of the actual\nnumbers, however will give a good indication of the overall trend in\nwhere users come into, and exit, your site.\n\n\nCommand Line Options\n--------------------\n\nThe Webalizer supports many different configuration options that will\nalter the way the program behaves and generates output.  Most of these\ncan be specified on the command line, while some can only be specified\nin a configuration file. The command line options are listed below,\nwith references to the corresponding configuration file keywords.\n\n--------------------------------------------------------------------------\n\nGeneral Options\n---------------\n\n-h        Display all available command line options and exit program.\n\n-v        Be Verbose.  This will cause the program to print additional\n          information at run time.  It is the same as specifying\n          \"Quiet no\", \"ReallyQuiet no\" and \"Debug yes\" config options.\n\n-V        Display the program version and exit.  Additional program\n          specific information will be displayed if 'verbose' mode is\n          also used (e.g. '-vV'), which can be useful when submitting\n          bug reports.\n\n-d        Display additional 'debugging' information for errors and\n          warnings produced during processing.  This normally would\n          not be used except to determine why you are getting all those\n          errors and wanted to see the actual data.  Normally The\n          Webalizer will just tell you it found an error, not the\n          actual data.  This option will display the data as well.\n          Config file keyword: Debug\n\n-F        Specify the log file type to process.  Normally, the\n          Webalizer expects to find a valid CLF or Combined format\n          we server log file.  This option allows you to process\n          wu-ftpd xferlogs, squid and W3C formatted web logs as well.\n          Values can be either 'clf', 'ftp', 'squid' or 'w3c' with\n          'clf' being the default.  Only the first character needs\n          to be specified (eg: -Fs will process a squid log).\n          Config file keyword: LogType\n\n-f        Fold out of sequence log records back into analysis, by\n          treating them as if they were the same date/time as the\n          last good record.  Normally, out of sequence log records\n          are ignored.  If you run apache, don't worry about this.\n          Config file keyword: FoldSeqErr\n\n-i        Ignore history file.  USE WITH CAUTION.  This causes The\n          Webalizer to ignore any existing history file produced from\n          previous runs and generate its output from scratch.  The\n          effect will be as if The Webalizer is being run for the\n          first time and any previous statistics will be lost (although\n          the HTML documents, if any, will not be deleted) on the main\n          index.html (yearly) web page.\n          Config file keyword: IgnoreHist\n\n-b        Ignore incremental data file.  USE WITH CAUTION.  This causes\n          The Webalizer to ignore any existing incremental (state) data\n          file produced by previous runs.  By ignoring the incremental\n          data file, all previous processing for the current month will\n          be lost, and those logs must be re-processed.\n          Config file keyword: IgnoreState\n\n-p        Preserve state (incremental processing).  This allows the\n          processing of partial logs in increments.  At the end of\n          the program, all relevant internal data is saved, so that\n          it may be restored the next time the program is run.  This\n          allows sites that must rotate their logs more than once a\n          month to still be able to use The Webalizer, and not worry\n          about having to gather and feed an entire months logs to\n          the program at the end of the month.  See the section on\n          \"Incremental Processing\" below for additional information.\n          The default is to not perform incremental processing.  Use\n          this command line option to enable the feature.\n          Config file keyword: Incremental\n\n-q        Quiet mode.  Normally, The Webalizer will produce various\n          messages while it runs letting you know what its doing.\n          This option will suppress those messages.  It should be\n          noted that this WILL NOT suppress errors and warnings, which\n          are output to STDERR.\n          Config file keyword: Quiet\n\n-Q        ReallyQuiet mode.  This allows suppression of _all_ messages\n          generated by The Webalizer, including warnings and errors.\n          Useful when The Webalizer is run as a cron job.\n          Config file keyword: ReallyQuiet\n\n-T        Display timing information.  The Webalizer keeps track of the\n          time it begins and ends processing, and normally displays the\n          total processing time at the end of each run.  If quiet mode\n          (-q or 'Quiet yes' in configuration file) is specified, this\n          information is not displayed.  This option forces the display\n          of timing totals if quiet mode has been specified, otherwise\n          it is redundant and will have no effect.\n          Config file keyword: TimeMe\n\n-c file   This option specifies a configuration file to use.  Configuration\n          files allow greater control over how The Webalizer behaves, and\n          there are several ways to use them.  As of version 0.98, The\n          Webalizer searches for a default configuration file in the\n          current directory named \"webalizer.conf\", and if not found,\n          will search in the /etc/ directory for a file of the same name.\n          In addition, you may specify a configuration file to use with\n          this command line option.\n\n-n name   This option specifies the hostname for the reports generated.\n          The hostname is used in the title of all reports, and is also\n          prepended to URLs in the reports.  This allows The Webalizer\n          to be run on log files for 'virtual' web servers or web servers\n          that are different than the machine the reports are located on,\n          and still allows clicking on the URLs to go to the proper\n          location.  If a hostname is not specified, either on the\n          command line or in a configuration file, The Webalizer attempts\n          to determine the hostname using a 'uname' system call.  If this\n          fails, \"localhost\" will be used as the hostname.\n          Config file keyword: HostName\n\n-o dir    This options specifies the output directory for the reports.\n          If not specified here or in a configuration file, the current\n          default directory will be used for output.\n          Config file keyword: OutputDir\n\n-x name   This option allows the generated pages to have an extension\n          other than '.html', which is the default.  Do not include the\n          leading period ('.') when you specify the extension.\n          Config file keyword: HTMLExtension\n\n-P name   Specify the file extensions for 'pages'.  Pages (sometimes\n          called 'PageViews') are normally html documents and CGI\n          scripts that display the whole page, not just parts of it.\n          Some system will need to define a few more, such as 'phtml',\n          'php3' or 'pl' in order to have them counted as well.  The\n          default is 'htm*' and 'cgi' for web logs and 'txt' for ftp.\n          Config file keyword: PageType\n\n-O name   Specify URLs which are not counted as 'pages'.  Requests\n          matching one of these URLs will not be counted as a page, even\n          if they have an extension matching one of the PageTypes defined\n          above or have no extension at all.\n          Config file keyword: OmitPage\n\n-t name   This option specifies the title string for all reports.  This\n          string is used, in conjunction with the hostname (if not blank)\n          to produce the actual title.  If not specified, the default of\n          \"Usage Statistics for\" will be used.\n          Config file keyword: ReportTitle\n\n-Y        Suppress Country graph.  Normally, The Webalizer produces\n          country statistics in both Graph and Columnar forms.  This\n          option will suppress the Country Graph from being generated.\n          Config file keyword: CountryGraph\n\n-G        Suppress hourly graph.  Normally, The Webalizer produces\n          hourly statistics in both Graph and Columnar forms.  This\n          option will suppress the Hourly Graph only from being generated.\n          Config file keyword: HourlyGraph\n\n-H        Suppress Hourly statistics.  Normally, The Webalizer produces\n          hourly statistics in both Graph and Columnar forms.  This\n          option will suppress the Hourly Statistics table only from\n          being generated.\n          Config file keyword: HourlyStats\n\n-K num    Specify how many months should be displayed in the main index\n          (yearly summary) table.  Default is 12 months.  Can be set to\n          anything between 12 and 120 months (1 to 10 years).\n          Config file keyword: IndexMonths\n\n-k num    Specify how many months should be displayed in the main index\n          (yearly summary) graph.  Default is 12 months.  Can be set to\n          anything between 12 and 72 months (1 to 6 years).\n          Config file keyword: GraphMonths\n\n-L        Disable Graph Legends.  The color coded legends displayed on\n          the in-line graphs can be disabled with this option.  The\n          default is to display the legends.\n          Config file keyword: GraphLegend\n\n-l num    Graph Lines.  Specify the number of background reference\n          lines displayed on the in-line graphics produced.  The default\n          is 2 lines, however can range anywhere from zero ('0') for\n          no lines, up to 20 lines (looks funny!).\n          Config file keyword: GraphLines\n\n-P name   Page type.  This is the extension of files you consider to\n          be pages for Pages calculations (sometimes called 'pageviews').\n          The default is 'htm*' and 'cgi' (plus whatever HTMLExtension\n          you specified if it is different).  Don't use a period!\n\n-m num    Specify a 'visit timeout'.  Visits are calculated by looking at\n          the time difference between the current and last request made\n          by a specific host.  If the difference is greater that the\n          visit timeout value, the request is considered a new visit.\n          This value is specified in number of seconds.  The default\n          is 30 minutes (1800).\n          Config file keyword: VisitTimeout\n\n-M num    Mangle user agent names.  Normally, The Webalizer will keep\n          track of the user agent field verbatim.  Unfortunately, there are\n          a ton of different names that user agents go by, and the field\n          also reports other items such as machine type and OS used. For\n          Example, Netscape 4.03 running on Windows 95 will report a\n          different string than Netscape 4.03 running on Windows NT, so even\n          though they are the same browser type, they will be considered\n          as two totally different browsers by The Webalizer.  For that\n          matter, Netscape 4.0 running on Windows NT will report different\n          names if one is run on an Alpha and the other on an Intel\n          processor!  Internet Exploder is even worse, as it reports itself\n          as if it were Netscape and you have to search the given string a\n          little deeper to discover that it is really MSIE!  In order to\n          consolidate generic browser types, this option will cause The\n          Webalizer to 'mangle' the user agent field, attempting to\n          consolidate generic browser types. There are 6 levels that can be\n          specified, each producing different levels of detail.  Level 5\n          displays only the browser name (MSIE or Mozilla) and the major\n          version number.  Level 4 will also display the minor version\n          number (single decimal place).  Level 3 will display the minor\n          version number to two decimal places.  Level 2 will add any\n          sub-level designation (such as Mozilla/3.01Gold or MSIE 3.0b).\n          Level 1 will also attempt to add the system type.  The default\n          Level 0 will disable name mangling and leave the user agent\n          field unmodified, producing the greatest amount of detail.\n          Configuration file keyword: MangleAgents\n\n-g num    This option allows you to specify the level of domains name\n          grouping to be performed.  The numeric value represents the\n          level of grouping, and can be thought of as the 'number of\n          dots' to be displayed.  The default value of 0 disables any\n          domain name grouping.\n          Configuration file keyword: GroupDomains\n\n-D name   This allows the specification of a DNS Cache file name.  This\n          filename MUST be specified if you have dns lookups enabled\n          (using the -N command line switch or DNSChildren configuration\n          keyword).  The filename is relative to the default output\n          directory if an absolute path is not specified (ie: starts\n          with a leading '/').  This option is only available if DNS\n          support was enabled at compile time, otherwise an 'Invalid\n          Keyword' error will be generated.  See the DNS.README file\n          for additional information regarding DNS lookups.\n          Configuration file keyword: DNSCache\n\n-N num    Number of DNS child processes to use for reverse DNS lookups.\n          If specified, a DNSCache name MUST be specified also.  If you\n          do not wish a DNS cache file to be generated, specify a value\n          of zero ('0') to disable it.  This does not prevent using an\n          existing cache file, only the generation of one at run time.\n          See the DNS.README file for additional information.\n          Configuration file keyword: DNSChildren\n\n-j        Enable native GeoDB geolocation services.\n          Configuration file keyword: GeoDB\n\n-J name   Specify an alternate GeoDB database filename to use.  This\n          shouldn't normally be needed.  If used, the filename 'name'\n          is relative to the output directory being used unless an\n          absolute path is specified (ie: starts with a leading '/').\n          Configuration file keyword: GeoDBDatabase\n\n-w        Enable GeoIP support if it is available.\n          Configuration file keyword: GeoIP\n\n-W name   Specify an alternate GeoIP database filename to use.  This\n          shouldn't normally be needed.  If used, the filename 'name'\n          is relative to the specified output directory unless an\n          absolute name is given (ie: starts with a leading '/').\n          Configuration file keyword: GeoIPDatabase\n\n-z name   Specify location of the country flag graphics and enable\n          their display in the top country table.  The directory name\n          is relative to the output directory unless an absolute path\n          is specified (ie: starts with a leading '/').\n          Configuration file keyword: FlagDir\n\nHide Options\n------------\n\nThe following options take a string argument to use as a comparison\nfor matching.  Except for the IndexAlias option, the string argument\ncan be plain text, or plain text that either starts or ends with the\nwildcard character '*'.\n\nFor Example:\n\nGiven the string \"yourmama/was/here\", the arguments \"was\", \"*here\" and\n\"your*\" will all produce a match.\n\n\n-a name   This option allows hiding of user agents (browsers) from the\n          \"Top User Agents\" table in the report.  This option really\n          isn't too useful as there are a zillion different names that\n          current browsers go by, depending where they were obtained,\n          however you might have some particular user agents that hit\n          your site a lot that you would like to exclude from the list.\n          You must have a web server that includes user agents in its\n          log files for this option to be of any use.  In addition, it\n          is also useless if you disable the user agent table in the\n          report (see the -A command line option or \"TopAgents\"\n          configuration file keyword). You can specify as many of these\n          as you want on the command line.  The wildcard character '*'\n          can be used either in front of or at the end of the string.\n          (ie: Mozilla/4.0* would match anything that starts with the\n          string \"Mozilla/4.0\").\n          Config file keyword: HideAgent\n\n-r name   This option allows hiding of referrers from the \"Top Referrer\"\n          table in the report.  Referrers are URLs, either on your own\n          local site or a remote site, that referred the user to a URL\n          on your web server.  This option is normally used to hide\n          your own server from the table, as your own pages are usually\n          the top referrers to your own pages (well, you get the idea).\n          You must have a web server that includes referrer information\n          in the log files for this option to be of any use.  In addition,\n          it is also useless if you disable the referrers table in the\n          report (see the -R command line option or \"TopReferrers\"\n          configuration file keyword).  You can specify as many of these\n          as you like on the command line.\n          Config file keyword: HideReferrer\n\n-s name   This option allows hiding of sites from the \"Top Sites\" table\n          in the report.  Normally, you will only want to hide your own\n          domain name from the report, as it usually is one of the top\n          sites to visit your web server.  This option is of no use if\n          you disable the top sites table in the report (see the -S\n          command line option or \"TopSites\" configuration file option).\n          Config file keyword: HideSite\n\n-X        This causes all individual sites to be hidden, which results\n          in only grouped sites to be displayed on the report.\n          Config file keyword: HideAllSites\n\n-u name   This option allows hiding of URLs from the \"Top URLs\" table\n          in the report.  Normally, this option is used to hide images,\n          audio files and other objects your web server dishes out that\n          would otherwise clutter up the table.  This option is of no\n          use if you disable the top URLs table in the report (see the\n          -U command line option or \"TopURLs\" configuration file keyword).\n          Config file keyword: HideURL\n\n-I name   This option allows you to specify additional index.html aliases.\n          The Webalizer usually strips the string 'index.*' from URLs\n          before processing (unless disabled using the 'DefaultIndex'\n          config option), which has the effect of turning a URL such\n          as /somedir/index.html into just /somedir/ which is really the\n          same URL and should be treated as such.  This option allows you\n          to specify _additional_ strings that are to be treated the same\n          way.  Use with care, improper use could cause unexpected results.\n          For example, if you specify the alias string of 'home', a URL\n          such as /somedir/homepages/brad/home.html would be converted\n          into just /somedir/ which probably isn't what was intended.\n          This option is useful if your web server uses a different default\n          index page other than the standard 'index.html' or 'index.htm',\n          such as 'home.html' or 'homepage.html'.  The string specified\n          is searched for _anywhere_ in the URL, so \"home.htm\" would\n          turn both \"/somedir/home.htm\" and \"/somedir/home.html\" into\n          just \"/somedir/\". Wildcards are _not_ allowed on this one.\n          Config file keyword: IndexAlias\n\nTable Size Options\n------------------\n\n-e num    This option specifies the number of entries to display in the\n          \"Top Entry Pages\" table.  To disable the table, use a value of\n          zero (0).\n          Config file keyword: TopEntry\n\n-E num    This option specifies the number of entries to display in the\n          \"Top Exit Pages\" table.  To disable the table, use a value of\n          zero (0).\n          Config file keyword: TopExit\n\n-A num    This option specifies the number of entries to display in the\n          \"Top User Agents\" table.  To disable the table, use a value of\n          zero (0).\n          Config file keyword: TopAgents\n\n-C num    This option specifies the number of entries to display in the\n          \"Top Countries\" table.  To disable the table, use a value of\n          zero (0).\n          Config file keyword: TopCountries\n\n-R num    This option specifies the number of entries to display in the\n          \"Top Referrers\" table.  To disable the table, use a value of\n          zero (0).\n          Config file keyword: TopReferrers\n\n-S num    This option specifies the number of entries to display in the\n          \"Top Sites\" table.  To disable the table, use a value of\n          zero (0).\n          Config file keyword: TopSites\n\n-U num    This option specifies the number of entries to display in the\n          \"Top URLs\" table.  To disable the table, use a value of\n          zero (0).\n          Config file keyword: TopURLs\n\n--------------------------------------------------------------------------\n\n\nCONFIGURATION FILES\n-------------------\n\nThe Webalizer allows configuration files to be used in order to simplify\nlife for all.  There are several ways that configuration files are accessed\nby the Webalizer.  When The Webalizer first executes, it looks for a\ndefault configuration file named \"webalizer.conf\" in the current directory,\nand if not found there, will look for \"/etc/webalizer.conf\".  In addition,\nconfiguration files may be specified on the command line with the '-c'\noption.  There are lots of different ways you can combine the use of\nconfiguration files and command line options to produce various results.\nThe Webalizer always looks for and reads configuration options from a\ndefault configuration file before doing anything else.  Because of this,\nyou can override options found in the default file by use of additional\nconfiguration files specified on the command line or command line options\nthemselves.  If you specify a configuration file on the command line, you\ncan override options in it by additional command line options which follow.\nFor example, most users will most likely want to create the default file\n/etc/webalizer.conf and place options in it to specify the hostname, log\nfile, table options, etc...  At the end of the month when a different log\nfile is to be used (the end of month log), you can run The Webalizer as\nusual, but put the different filename on the end of the command line, which\nwill override the log file specified in the configuration file.  It should\nbe noted that you cannot override some configuration file options by the\nuse of command line arguments.  For example, if you specify \"Quiet yes\" in\na configuration file, you cannot override this with a command line argument,\nas the command line option only _enables_ the feature (-q option).\n\nThe configuration files are standard ASCII text files that may be created\nor edited using any standard editor.  Blank lines and lines that begin\nwith a pound sign ('#') are ignored.  Any other lines are considered to\nbe configuration lines, and have the form \"Keyword Value\", where the\n'Keyword' is one of the currently available configuration keywords defined\nbelow, and 'Value' is the value to assign to that particular option.  Any\ntext found after the keyword up to the end of the line is considered the\nkeyword's value, so you should not include anything after the actual value\non the line that is not actually part of the value being assigned.  The\nfile \"sample.conf\" provided with the distribution contains lots of useful\ndocumentation and examples as well.  It should be noted that you do not\nhave to use any configuration files at all, in which case, default values\nwill be used (which should be sufficient for most sites).\n\n--------------------------------------------------------------------------\n\nGeneral Configuration Keywords\n------------------------------\n\nLogFile       This defines the log file to use. It should be a fully qualified\n              name (ie: contain the path), but relative names will work as\n              well.  If not specified, the logfile defaults to STDIN.\n\nLogType       This specified the log file type being used.  Normally, The\n              Webalizer processes web logs in either CLF or Combined format.\n              You may also process wu-ftpd xferlog formatted logs, squid\n              proxy logs or W3C formatted web logs by setting the appropriate\n              type using this keyword.   Values may be either 'clf', 'ftp',\n              'squid' or 'w3c'.  Ensure that you specify the proper file type,\n              otherwise you will be presented with a long stream of 'invalid\n              record' messages when the Webalizer is run ;)\n              Command line argument: -F\n\nOutputDir     This defines the output directory to use for the reports.  If\n              it is not specified, the current directory is used.\n              Command line argument: -o\n\nHistoryName   Allows specification of a history path/filename if desired.\n              The default is to use the file named 'webalizer.hist', kept\n              in the normal output directory (OutputDir above).  Any name\n              specified is relative to the normal output directory unless\n              an absolute path name is given (ie: starts with a '/').\n\nReportTitle   This specifies the title to use for the generated reports.\n              It is used in conjunction with the hostname (unless blank)\n              to produce the final report titles.  If not defined, the\n              default of \"Usage Statistics for\" is used.\n              Command line argument: -t\n\nHostName      This defines the hostname.  The hostname is used in the\n              report title as well as being prepended to URLs in the\n              \"Top URLs\" table.  This allows The Webalizer to be run\n              on \"virtual\" web servers, or servers that do not reside\n              on the local machine, and allows clicking on the URL to\n              go to the right place.  If not specified, The Webalizer\n              attempts to get the hostname via a 'uname' system call,\n              and if that fails, will default to \"localhost\".\n              Command line argument: -n\n\nUseHTTPS      Causes the links in the 'Top URLs' table to use 'https://'\n              instead of the default 'http://' prefix.  Not much use if\n              you run a mix of secure/insecure servers on your machine.\n              Only useful if you run the analysis on a secure servers\n              logs, and want the links in the table to work properly.\n\nHTAccess      Enables the creation of a default .htaccess file in the\n              output directory.  If enabled, the file will be created\n              (with a single \"DirectoryIndex\" directive),  unless one\n              already exists.  The default is 'no', which disables the\n              creation of any .htaccess files.\n\nQuiet         This allows you to enable or disable informational messages\n              while it is running.  The values for this keyword can be\n              either 'yes' or 'no'.  Using \"Quiet yes\" will suppress these\n              messages, while \"Quiet no\" will enable them.  The default\n              is 'no' if not specified, which will allow The Webalizer\n              to display informational messages.  It should be noted that\n              this option has no effect on Warning or Error messages that\n              may be generated, as they go to STDERR.\n              Command line argument: -q\n\nReallyQuiet   This allows all generated output to be suppressed, including\n              warning and error messages.  The values for this keyword\n              can be either 'yes' or 'no', with 'no' being the default.\n              Command line argument: -Q\n\nTimeMe        This allows you to display timing information regardless of\n              any \"quiet mode\" specified.  Useful only if you did in fact\n              tell the webalizer to be quiet either by using the -q command\n              line option or the \"Quiet\" keyword, otherwise timing stats\n              are normally displayed anyway.  Values may be either 'yes'\n              or 'no', with the default being 'no'.\n              Command line argument: -T\n\nGMTTime       This keyword allows timestamps to be displayed in GMT (UTC)\n              time instead of local time.  Normally The Webalizer will\n              display timestamps in the time-zone of the local machine\n              (ie: PST or EDT).  This keyword allows you to specify the\n              display of timestamps in GMT (UTC) time instead.  Values\n              may be either 'yes' or 'no'.  Default is 'no'.\n\nDebug         This tells The Webalizer to display additional information\n              when it encounters Warnings or Errors.  Normally, The\n              Webalizer will just tell you it found a bad record or\n              field.  This option will enable the display of the actual\n              data that produced the Warning or Error as well.  Useful\n              only if you start getting lots of Warnings or Errors and\n              want to determine the cause.  Values may be either 'yes'\n              or 'no', with the default being 'no'.\n              Command line argument: -d\n\nIgnoreHist    This suppresses the reading of a history file.  USE WITH\n              EXTREME CAUTION as the history file is how The Webalizer\n              keeps track of previous months.  The effect of this option\n              is as if The Webalizer was being run for the very first\n              time, and any previous data is discarded.  Values may be\n              either 'yes' or 'no', with the default being 'no'.\n              Command line argument: -i\n\nIgnoreState   This suppresses the reading of an existing incremental\n              data file.  USE WITH EXTREME CAUTION!  By ignoring an\n              existing incremental data file, all previous processing\n              for the current month will be lost, and those logs must\n              be re-processed.  Values may be 'yes' or 'no', with the\n              default being 'no'.\n              Command line argument: -b\n\nFoldSeqErr    Allows log records that are out of sequence to be folded\n              back into the analysis, by treating them as if they had\n              the same date/time as the last good record.  Normally,\n              out of sequence log records are simply ignored.  If you\n              run apache, don't worry about this.\n\nVisitTimeout  Set the 'visit timeout' value.  Visits are determined by\n              looking at the time difference between the current and last\n              request made by a specific site.  If the difference in time\n              is greater than the visit timeout value, the request is\n              considered a new visit.  The value is in number of seconds,\n              and defaults to 30 minutes (1800).\n              Command line argument: -m\n\nPageType      Allows you to define the 'page' type extension.  Normally,\n              people consider HTML and CGI scripts as 'pages'.  This\n              option allows you to specify what extensions you consider\n              a page.  Default is 'htm*' and 'cgi' for web logs, and\n              'txt' for ftp logs.\n              Command line argument: -P\n\nPagePrefix    Allows all requests with a specified prefix to be considered\n              as 'pages'. If you want everything under /documents to be\n              treated as pages no matter what their extension is. Also\n              useful if you have cgi-scripts with PATH_INFO.\n\nOmitPage      Allows specified URLs to not be counted as pages under any\n              circumstance, even if they have an extension matching a\n              PageType or PagePrefix as defined above.\n\nGraphLegend   Enable/disable the display of color coded legends on the\n              produced graphs.  Default is 'yes', to display them.\n              Command line argument: -L\n\nGraphLines    Specify the number of background reference lines to display\n              on produced graphs.  The default is 2.  To disable the use\n              of background lines, use zero ('0').\n              Command line argument: -l\n\nIndexMonths   Specify the number of months to display in the main index\n              (yearly summary) table.  Default is 12 months.  Can be set\n              to anything between 12 and 120 months (1 to 10 years).\n              Command line argument: -K\n\nYearHeaders   Enable/disable the display of year headers in the main index\n              (yearly summary) table.  If enabled, year headers will be\n              shown when the table is displaying more than 16 months worth\n              of data.  Values can be 'yes' or 'no'.  Default is 'yes'.\n\nGraphMonths   Specify the number of months to display in the main index\n              (yearly summary) graph.  Default is 12 months.  Can be set\n              to anything between 12 and 72 months (1 to 6 years).\n              Command line argument: -k\n\nCountryGraph  This keyword is used to either enable or disable the creation\n              and display of the Country Usage graph.  Values may be either\n              'yes' or 'no', with the default being 'yes'.\n              Command line argument: -Y\n\nCountryFlags  Enables or disables the display of flags in the top country\n              table.  If enabled, the default directory 'flags' directly\n              under the output directory will be used unless a different\n              path is specified with the 'FlagDir' option below.\n              Command line argument: -zflags\n\nFlagDir       Specifies the location of flag graphics.  If not specified,\n              the default is in the 'flags' directory directly under the\n              output directory being used for the reports.  If specified,\n              the display of flags will be enabled by default.\n              Command line argument: -z\n\nDailyGraph    This keyword is used to either enable or disable the creation\n              and display of the Daily Usage graph.  Values may be either\n              'yes' or 'no', with the default being 'yes'.\n\nDailyStats    This keyword is used to either enable or disable the creation\n              and display of the Daily Usage statistics table.  Values may\n              be either 'yes' or 'no', with the default being 'yes'.\n\nHourlyGraph   This keyword is used to either enable or disable the creation\n              and display of the Hourly Usage graph.  Values may be either\n              'yes' or 'no', with the default being 'yes'.\n              Command line argument: -G\n\nHourlyStats   This keyword is used to either enable or disable the creation\n              and display of the Hourly Usage statistics table.  Values may\n              be either 'yes' or 'no', with the default being 'yes'.\n              Command line argument: -H\n\nIndexAlias    This allows additional 'index.html' aliases to be defined.\n              Normally, The Webalizer scans for and strips the string\n              \"index.\" from URLs before processing them (unless disabled\n              using the DefaultIndex config option below).  This turns a\n              URL such as /somedir/index.html into just /somedir/ which\n              is really the same URL.  This keyword allows _additional_\n              names to be treated in the same fashion for sites that use\n              different default names, such as \"home.html\".  The string\n              is scanned for anywhere in the URL, so care should be used\n              if and when you define additional aliases.  For example,\n              if you were to use an alias such as 'home', the URL\n              /somedir/homepages/brad/home.html would be turned into just\n              /somedir/ which probably isn't the intended result.  Instead,\n              you should have specified 'home.htm' which would correctly\n              turn the URL into /somedir/homepages/brad/ like intended.\n              It should also be noted that specified aliases are scanned\n              for in EVERY log record... A bunch of aliases will noticeably\n              degrade performance as each record has to be scanned for\n              every alias defined.  You don't have to specify 'index.' as\n              it is always the default (unless disabled with the config\n              option \"DefaultIndex\" described below).\n              Command line argument: -I\n\nDefaultIndex  This option is used to enable/disable the use of \"index.\" as\n              a default index name to be stripped from the end of a URL.\n              Most sites should not need to use this option, however some\n              may find it useful, particularly those whose default index\n              file name is something different, or those sites that use\n              'index.php' or similar URLs to generate dynamic content.\n              This option does not effect any of the names that may be\n              defined using the IndexAlias option, and those names will\n              still function as described.  Values may be 'yes' or 'no',\n              with 'yes' being the default.\n\nMangleAgents  The MangleAgents keyword specifies the level of user agent\n              name mangling, if any.  There are 6 levels that may be specified,\n              each producing a different level of detail displayed.  Level 5\n              displays only the browser name (MSIE or Mozilla) and the major\n              version number.  Level 4 adds the minor version (single\n              decimal place).  Level 3 adds the minor version to two decimal\n              places.  Level 2 will also add any sub-level designation\n              (such as Mozilla/3.01Gold or MSIE 3.0b).  Level 1 will also\n              attempt to add the system type.  The default level 0 will\n              leave the user agent field unmodified and produces the\n              greatest amount of detail.\n              Command line argument: -M\n\nSearchEngine  This keyword allows specification of search engines and\n              their query strings.  Search strings are obtained from\n              the referrer field in the record, and in order to work\n              properly, the Webalizer needs to know what query strings\n              different search engines use.  The SearchEngine allows\n              you to specify the search engine and its query string\n              to parse the search string from.  The line is formatted\n              as:  \"SearchEngine engine-string query-string\"  where\n              'engine-string' is a substring for matching the search\n              engine with, such as \"yahoo.com\" or \"altavista\".  The\n              'query-string' is the unique query string that is added\n              to the URL for the search engine, such as \"search=\" or\n              \"MT=\" with the actual search strings appended to the\n              end.  There is no command line option for this keyword.\n\nSearchCaseI   The SearchCaseI option specifies if search strings should\n              be lowercased (case insensitive) or not.  Since most\n              search engines use case insensitive searches (ie: a\n              search for \"Hello\" is the same as \"HELLO\" or \"hello\"),\n              converting to lowercase will improve keyword accuracy,\n              which is the default.  If desired, case sensitivity can\n              be forced with this option.  The value can be 'yes' or\n              'no', with 'yes' (case insensitive) being the default.\n\nIncremental   This allows incremental processing to be enabled or disabled.\n              Incremental processing allows processing partial logs without\n              the loss of detail data from previous runs in the same month.\n              This feature saves the 'internal state' of the program so that\n              it may be restored in following runs.  See the section above\n              titled \"Incremental Processing\" for additional information.\n              The value may be 'yes' or 'no', with the default being 'no'.\n              Command line argument: -p\n\nIncrementalName\n              Allows specification of the incremental data filename if\n              desired.  Normally, the file named \"webalizer.current' is\n              used, kept in the standard output directory.  If specified,\n              filenames are relative to the standard output directory,\n              unless an absolute name is given (ie: starts with '/').\n\nStripCGI      Determines if CGI variables should be stripped from the\n              end of URLs or not.  Normally, these variables are removed\n              from URLs to improve accuracy, however some sites may wish\n              to keep them preserved (particularly on highly dynamic\n              sites).  Values may be either 'yes' or 'no', with 'yes'\n              being the default.\n\nTrimSquidURL  Allows squid log URLs to be reduced in granularity by\n              truncating them after a specified number of '/' path\n              separators after the http:// portion.  A value of 1 will\n              cause all URLs to be summarized by domain only.  The\n              default value is zero (0), which leaves URLs unmodified.\n\nDNSCache      Specifies the DNS cache filename.  This name is relative\n              to the default output directory unless an absolute name\n              is given (ie: starts with '/').  See the DNS.README file\n              for additional information.\n              Command line argument: -D\n\nDNSChildren   The number of DNS children processes to run in order to\n              create/update the DNS cache file.  If specified, the DNS\n              cache filename must also be specified (see above).  Use\n              a value of zero ('0') to disable.  See the DNS.README\n              file for additional information.\n              Command line argument: -N\n\nCacheIPs      Specifies if unresolved addresses should also be cached\n              in the DNS database.  If enabled, unresolved IP addresses\n              will be stored along with resolved addresses.  This may\n              be useful on some sites that have lots of unresolved IPs\n              visiting so they are not looked up each time the program\n              is run.  Values may be 'yes' or 'no'.  Default is 'no'.\n\nCacheTTL      Specifies the Time To Live (TTL) value for cached DNS\n              entries in days.  Default value is 7 (1 week).  Can be\n              any value between 1 and 100.\n\nGeoDB         Controls the use of the native GeoDB geolocation services\n              provided by The Webalizer.  Values may be 'yes' or 'no'\n              with 'no' being the default.\n              Command line argument: -j\n\nGeoDBDatabase Specifies and alternate GeoDB database filename to use.\n              This is relative to the output directory being used unless\n              an absolute path is given (ie: starts with a '/').\n              Command line argument: -J\n\nGeoIP         Controls the use of GeoIP geolocation services.  If The\n              Webalizer was compiled with GeoIP support, it is used by\n              default.  Values may be 'yes' or 'no'. Default is 'yes'.\n              Command line argument: -w\n\nGeoIPDatabase Specifies an alternate GeoIP database filename to use.\n              This name is relative to the default output directory\n              unless an absolute name is given (ie: starts with '/').\n              Command line argument: -W\n\n\nTop Table Keywords\n------------------\n\nTopAgents     This allows you to specify how many \"Top\" user agents are\n              displayed in the \"Top User Agents\" table.  The default\n              is 15.  If you do not want to display user agent statistics,\n              specify a value of zero (0).  The display of user agents\n              will only work if your web server includes this information\n              in its log file (ie: a combined log format file).\n              Command line argument: -A\n\nAllAgents     Will cause a separate HTML page to be generated for all\n              normally visible User Agents.  A link will be added to\n              the bottom of the \"Top User Agents\" table if enabled.\n              Value can be either 'yes' or 'no', with 'no' being the\n              default.\n\nTopCountries  This allows you to specify how many \"Top\" countries are\n              displayed in the \"Top Countries\" table.  The default is\n              30.  If you want to disable the countries table, specify\n              a value of zero (0).\n              Command line argument: -C\n\nTopReferrers  This allows you to specify how many \"Top\" referrers are\n              displayed in the \"Top Referrers\" table.  The default is\n              30.  If you want to disable the referrers table, specify\n              a value of zero (0).  The display of referrer information\n              will only work if your web server includes this information\n              in its log file (ie: a combined log format file).\n              Command line argument: -R\n\nAllReferrers  Will cause a separate HTML page to be generated for all\n              normally visible Referrers.  A link will be added to the\n              \"Top Referrers\" table if enabled.  Value can be either\n              'yes' or 'no', with 'no' being the default.\n\nTopSites      This allows you to specify how many \"Top\" sites are\n              displayed in the \"Top Sites\" table.  The default is 30.\n              If you want to disable the sites table, specify a value\n              of zero (0).\n              Command line argument: -S\n\nTopKSites     Identical to TopSites, except for the 'by KByte' table.\n              Default is 10.  No command line switch for this one.\n\nAllSites      Will cause a separate HTML page to be generated for all\n              normally visible Sites.  A link will be added to the\n              bottom of the \"Top Sites\" table if enabled.  Value can\n              be either 'yes' or 'no', with 'no' being the default.\n\nTopURLs       This allows you to specify how many \"Top\" URLs are\n              displayed in the \"Top URLs\" table.  The default is 30.\n              If you want to disable the URLs table, specify a value\n              of zero (0).\n              Command line argument: -U\n\nTopKURLs      Identical to TopURLs, except for the 'by KByte' table.\n              Default is 10.  No command line switch for this one.\n\nAllURLs       Will cause a separate HTML page to be generated for all\n              normally visible URLs.  A link will be added to the\n              bottom of the \"Top URLs\" table if enabled.  Value can\n              be either 'yes' or 'no', with 'no' being the default.\n\nTopEntry      Allows you to specify how many \"Top Entry Pages\" are\n              displayed in the table.  The default is 10.  If you\n              want to disable the table, specify a value of zero (0).\n              Command line argument: -e\n\nTopExit       Allows you to specify how many \"Top Exit Pages\" are\n              displayed in the table.  The default is 10.  If you\n              want to disable the table, specify a value of zero (0).\n              Command line argument: -E\n\nTopSearch     Allows you to specify how many \"Top Search Strings\" are\n              displayed in the table.  The default is 20.  If you\n              want to disable the table, specify a value of zero (0).\n              Only works if using combined log format (ie: contains\n              referrer information).\n\nTopUsers      This allows you to specify how many \"Top\" usernames are\n              displayed in the \"Top Usernames\" table.  Usernames are\n              only available if you use http authentication on your\n              web server, or when processing wu-ftpd xferlogs.  The\n              default value is 20.  If you want to disable the Username\n              table, specify a value of zero (0).\n\nAllUsers      Will cause a separate HTML page to be generated for all\n              normally visible usernames.  A link will be added to the\n              bottom of the \"Top Usernames\" table if enabled.  Value\n              can be either 'yes' or 'no', with 'no' being the default.\n\nAllSearchStr  Will create a separate HTML page to be generated for all\n              normally visible Search Strings.  A link will be added\n              to the bottom of the \"Top Search Strings\" table if\n              enabled.  Value can be either 'yes' or 'no', with 'no'\n              being the default.\n\n\nHide Object Keywords\n--------------------\n\nThese keywords allow you to hide user agents, referrers, sites, URLs\nand usernames from the various \"Top\" tables.  The value for these keywords\nare the same as those used in their command line counterparts.  You\ncan specify as many of these as you want without limit.  Refer to the\nsection above on \"Command Line Options\" for a description of the string\nformatting used as the value.  Values cannot exceed 80 characters in\nlength.\n\nHideAgent     This allows specified user agents to be hidden from the\n              \"Top User Agents\" table.  Not very useful, since there\n              a zillion different names by which browsers go by today,\n              but could be useful if there is a particular user agent\n              (ie: robots, spiders, real-audio, etc..) that hits your\n              site frequently enough to make it into the top user agent\n              listing.  This keyword is useless if 1) your log file does\n              not provide user agent information or 2) you disable the\n              user agent table.\n              Command line argument: -a\n\nHideReferrer  This allows you to hide specified referrers from the\n              \"Top Referrers\" table.  Normally, you would only specify\n              your own web server to be hidden, as it is usually the\n              top generator of references to your own pages.  Of course,\n              this keyword is useless if 1) your log file does not include\n              referrer information or 2) you disable the top referrers\n              table.\n              Command line argument: -r\n\nHideSite      This allows you to hide specified sites from the \"Top\n              Sites\" table.  Normally, you would only specify your own\n              web server or other local machines to be hidden, as they\n              are usually the highest hitters of your web site, especially\n              if you have their browsers home page pointing to it.\n              Command line argument: -s\n\nHideAllSites  This allows hiding all individual sites from the display,\n              which can be useful when a lot of groupings are being\n              used (since grouped records cannot be hidden).  It is\n              particularly useful in conjunction with the GroupDomain\n              feature, however can be useful in other situations as well.\n              Value can be either 'yes' or 'no', with 'no' the default.\n              Command line argument: -X\n\nHideURL       This allows you to hide URLs from the \"Top URLs\" table.\n              Normally, this is used to hide items such as graphic files,\n              audio files or other 'non-html' files that are transferred\n              to the visiting user.\n              Command line argument: -u\n\nHideUser      This allows you to hide Usernames from the \"Top Usernames\"\n              table.  Usernames are only available if you use http based\n              authentication on your web server.\n\n\nGroup Object Keywords\n---------------------\n\nThe Group* keywords allow object grouping based on Site, URL, Referrer,\nUser Agent and Usernames.  Combined with the Hide* keywords, you can\ncustomize exactly what will be displayed in the 'Top' tables.  For example,\nto only display totals for a particular directory, use a GroupURL and\nHideURL with the same value (ie: '/help/*').  Group processing is only\ndone after the individual record has been fully processed, so name mangling\nand site total updates have already been performed.  Because of this, groups\nare not counted in the main site total (as that would cause duplication).\nGroups can be displayed in bold and shaded as well.  Grouped records are\nnot, by default, hidden from the report.  This allows you to display a\ngrouped total, while still being able to see the individual records, even\nif they are part of the group.  If you want to hide the detail records,\nfollow the Group* directive with a Hide* one using the same value.  There\nare no command line switches for these keywords.  The Group* keywords also\naccept an optional label to be displayed instead of the actual value used.\nThis label should be separated from the value by at least one whitespace\ncharacter, such as a space or tab character.  If the match string contains\nwhitespace (spaces or tabs),  the string should be quoted, using either\nsingle or double quotes.  See the sample configuration file for examples.\n\nGroupReferrer Allows grouping Referrers.  Can be handy for some of the\n              major search engines that have multiple host names a\n              referral could come from.\n\nGroupURL      This keyword allows grouping URLs. Useful for grouping\n              complete directory trees.\n\nGroupSite     This keywords allows grouping Sites.  Most used for\n              grouping top level domains and unresolved IP address\n              for local dial-ups, etc...\n\nGroupAgent    Groups User Agents.  A handy example of how you could use\n              this one is to use \"Mozilla\" and \"MSIE\" as the values for\n              GroupAgent and HideAgent keywords.  Make sure you put the\n              \"MSIE\" one first.\n\nGroupDomains  Allows automatic grouping of domains.  The numeric value\n              represents the level of grouping, and can be thought of\n              as 'the number of dots' to display.  A 1 will display\n              second level domains only (xxx.xxx), a 2 will display\n              third level domains (xxx.xxx.xxx) etc...  The default\n              value of 0 disables any domain grouping.\n              Command line argument: -g\n\nGroupUser     Allows grouping of usernames.  Combined with a group\n              name, this can be handy for displaying statistics on\n              a particular group of users without displaying their\n              real usernames.\n\nGroupShading  Allows shading of table rows for groups.  Value can be\n              'yes' or 'no', with the default being 'yes'.\n\nGroupHighlight Allows bolding of table rows for groups.  Value can be\n               'yes' or 'no', with the default being 'yes'.\n\n\nIgnore/Include Object Keywords\n----------------------\n\nThese keywords allow you to completely ignore log records when generating\nstatistics, or to force their inclusion regardless of ignore criteria.\nRecords can be ignored or included based on site, URL, user agent, referrer\nand username.  Be aware that by choosing to ignore records, the accuracy of\nthe generated statistics become skewed, making it impossible to produce\nan accurate representation of load on the web server.  These keywords\nbehave identical to the Hide* keywords above, where the value can have\na leading or trailing wildcard '*'.  These keywords, like the Hide* ones,\nhave an absolute limit of 80 characters for their values.  These keywords\ndo not have any command line switch counterparts, so they may only be\nspecified in a configuration file.  It should also be pointed out that\nusing the Ignore/Include combination to selectively exclude an entire\nsite while including a particular 'chunk' is _extremely_ inefficient,\nand should be avoided.  Try grep'ing the records into a separate file\nand process it instead.\n\nIgnoreSite    This allows specified sites to be completely ignored from\n              the generated statistics.\n\nIgnoreURL     This allows specified URLs to be completely ignored from\n              the generated statistics.  One use for this keyword would\n              be to ignore all hits to a 'temporary' directory where\n              development work is being done, but is not accessible to\n              the outside world.\n\nIgnoreReferrer This allows records to be ignored based on the referrer\n               field.\n\nIgnoreAgent   This allows specified User Agent records to be completely\n              ignored from the statistics.  Maybe useful if you really\n              don't want to see all those hits from MSIE :)\n\nIgnoreUser    This allows specified username records to be completely\n              ignored from the statistics.  Usernames can only be used\n              if you use http authentication on your server.\n\nIncludeSite   Force the record to be processed based on hostname.  This\n              takes precedence over the Ignore* keywords.\n\nIncludeURL    Force the record to be processed based on URL.  This takes\n              precedence over the Ignore* keywords.\n\nIncludeReferrer Force the record to be processed based on referrer.\n                This takes precedence over the Ignore* keywords.\n\nIncludeAgent  Force the record to be processed based on user agent.\n              This takes precedence over the Ignore* keywords.\n\nIncludeUser   Force the record to be processed based on username.\n              Usernames are only available if you use http based\n              authentication on your server.  This takes precedence over\n              the Ignore* keywords.\n\n\nDump Object Keywords\n--------------------\n\nThe Dump* Keywords allow text files to be generated that can then be used\nfor import into most database, spreadsheet and other external programs.\nThe file is a standard tab delimited text file, meaning that each column\nis separated by a tab (0x09) character.  A header record may be included\nif required, using the 'DumpHeader' keyword.  Since these files contain\nall records that have been processed, including normally hidden records,\nan alternate location for the files can be specified using the 'DumpPath'\nkeyword, otherwise they will be located in the default output directory.\n\nDumpPath      Specifies an alternate location for the dump files.  The\n              default output location will be used otherwise.  The value\n              is the path portion to use, and normally should be an\n              absolute path (ie: has a leading '/' character), however\n              relative path names can be used as well, and will be\n              relative to the output directory location.\n\nDumpExtension Allows the dump filename extensions to be specified. The\n              default extension is \"tab\", however may be changed with\n              this option.\n\nDumpHeader    Allows a header record to be written as the first record\n              of the file.  Value can be either 'yes' or 'no',  with\n              the default being 'no'.\n\nDumpSites     Dump tab delimited sites file.  Value can be either 'yes'\n              or 'no', with the default being 'no'.   The filename used\n              is site_YYYYMM.tab (YYYY=year, MM=month).\n\nDumpURLs      Dump tab delimited url file.  Value can be either 'yes' or\n              'no', with the default being 'no'.  The filename used is\n              url_YYYYMM.tab (YYYY=year, MM=month).\n\nDumpReferrers Dump tab delimited referrer file.  Value can be either\n              'yes' or 'no', with the default being 'no'.  Filename\n              used is ref_YYYYMM.tab (YYYY=year, MM=month).  Referrer\n              information is only available if present in the log\n              file (ie: combined web server log).\n\nDumpAgents    Dump tab delimited user agent file.  Value can be either\n              'yes' or 'no', with the default being 'no'.  Filename\n              used is agent_YYYYMM.tab (YYYY=year, MM=month).  User\n              agent information is only available if present in the\n              log file (ie: combined web server log).\n\nDumpUsers     Dump tab delimited username file.  Value can be either\n              'yes' or 'no', with the default being 'no'.  Filename\n              used is user_YYYYMM.tab (YYYY=year, MM=month).  The\n              username data is only available if processing a wu-ftpd\n              xferlog or http authentication is used on the web server\n              and that information is present in the log.\n\nDumpSearchStr Dump tab delimited search string file.  Value can be\n              either 'yes' or 'no', with the default being 'no'.\n              Filename used is search_YYYYMM.tab (YYYY=year, MM=month).\n              the search string data is only available if referrer\n              information is present in the log being processed and\n              recognized search engines were found and processed.\n\n\n\nHTML Generation Keywords\n------------------------\n\nThese keywords allow you to customize the HTML code that The Webalizer\nproduces, such as adding a corporate logo or links to other web pages.\nYou can specify as many of these keywords as you like, and they will be\nused in the order that they are found in the file.  Values cannot exceed\n80 characters in length, so you may have to break long lines up into two\nor more lines.  There are no command line counterparts to these keywords.\n\nHTMLExtension  Allows generated pages to use something other than the\n               default 'html' extension for the filenames.  Do not\n               include the leading period ('.') when you specify the\n               extension.\n               Command line argument: -x\n\nHTMLPre        Allows code to be inserted at the very beginning of the\n               HTML files.  Defaults to the standard HTML 3.2 DOCTYPE\n               record.  Be careful not to include any HTML here, as it\n               is inserted _before_ the \u003cHTML\u003e tag in the file.  Use it\n               for server-side scripting capabilities, such as php3, to\n               insert scripting files and other directives.\n\nHTMLHead       Allows you to insert HTML code between the \u003cHEAD\u003e\u003c/HEAD\u003e\n               block.  There is no default.  Useful for adding scripts\n               to the HTML page, such as Javascript or php3, or even\n               just for adding a few META tags to the document.\n\nHTMLBody       This keyword defines HTML code to be placed immediately\n               after the \u003cHEAD\u003e section of the report, just before the\n               title and \"summary period/generated on\" lines.  If used,\n               the first HTMLHead line MUST include a \u003cBODY\u003e tag.  Put\n               whatever else you want in subsequent lines, but keep in\n               mind the placement of this code in relation to the title\n               and other aspects of the web page.  Some typical uses\n               are to change the page colors and possibly add a corporate\n               logo (graphic) in the top right.  If not specified, a\n               default \u003cBODY\u003e tag is used that defines page color, text\n               color and link colors (see \"sample.conf\" file for example).\n\nHTMLPost       This keyword defines HTML code that is placed after the\n               title and \"summary period/generated on\" lines, just before\n               the initial horizontal rule \u003cHR\u003e tag.  Normally this keyword\n               isn't needed, but is provided in case you included a large\n               graphic or some other weird formatting tag in the HTMLHead\n               section that needs to be cleaned up or terminated before the\n               main report section.\n\nHTMLTail       This keyword defines HTML code that is placed at the bottom\n               right side of the report.  It is inserted in a \u003cTABLE\u003e section\n               between table data \u003cTD\u003e..\u003c/TD\u003e tags, and is top and right\n               aligned within the table.  Normally this keyword is used to\n               provide a link back to your home page or insert a small\n               graphic at the bottom right of the page.\n\nHTMLEnd        This allows insertion of closing code, at the very end of\n               the page.  The default is to put the closing \u003c/BODY\u003e and\n               \u003c/HTML\u003e tags.  If specified, you _must_ specify these tags\n               yourself.\n\nLinkReferrer   This specifies if the referrers listed in the top referrer\n               table should be displayed as plain text, or as a link to the\n               referrer.  Values can be either 'yes' or 'no', with 'no'\n               being the default.\n\n\nGraph Color Commands\n--------------------\n\nThese keywords allow altering the colors used in the various graphs\nproduced by the Webalizer.  The value is specified as a standard HTML\nRGB hexdecimal color string, without the leading '#' character.  The\nvalue is case insensitive.  If not specified, the default color shown\nwill be used.\n\nColorHit      Color used for 'Hits'.   Default is '00805C' (green)\n\nColorFile     Color used for 'Files'.  Default is '0040FF' (blue)\n\nColorSite     Color used for 'Sites'.  Default is 'FF8000' (orange)\n\nColorKbyte    Color used for 'KBytes'. Default is 'FF0000' (red)\n\nColorPage     Color used for 'Pages'.  Default is '00E0FF' (cyan)\n\nColorVisit    Color used for 'Visits'. Default is 'FFFF00' (yellow)\n\nColorMisc     Color used for miscellaneous titles in various 'Top'\n              tables (not graphs).     Default is '00E0FF' (cyan)\n\nPieColor1     Pie Chart color #1.      Default is '800080' (purple)\n\nPieColor2     Pie Chart color #2.      Default is '80FFC0' (lt. green)\n\nPieColor3     Pie Chart color #3.      Default is 'FF00FF' (lt. purple)\n\nPieColor4     Pie Chart color #4.      Default is 'FFC080' (tan)\n\n\n--------------------------------------------------------------------------\n\n\nNotes on Web Log Files\n----------------------\n\nThe Webalizer supports CLF log formats, which should work for just\nabout everyone.  If you want User Agent or Referrer information, you\nneed to make sure your web server supplies this information in its\nlog file, and in a format that the Webalizer can understand.  While\nThe Webalizer will try to handle many of the subtle variations in\nlog formats, some will not work at all.   Most web servers output\nCLF format logs by default.  For Apache, in order to produce the\nproper log format, add the following to the httpd.conf file:\n\nLogFormat \"%h %l %u %t \\\"%r\\\" %s %b \\\"%{Referer}i\\\" \\\"%{User-agent}i\\\"\"\n\nThis instructs the Apache web server to produce a 'combined' log\nthat includes the referrer and user agent information on the end of\neach record, enclosed in quotes (This is the standard recommended\nby both Apache and NCSA).   Netscape and other web servers have\nsimilar capabilities to alter their log formats.  (note: the above\nworks for apache servers up to V1.2.  V1.3 and higher now have additional\nways to specify log formats... refer to included documentation).\n\nNotes on FTP Log Files\n----------------------\n\nThe Webalizer supports ftp logs produced by wu-ftpd, proftpd and others,\nas a standard 'xferlog'.  To process an ftp log, you must either use the\n-Ff command line option or have \"LogType ftp\" in your configuration file.\nIt is recommended that you create a separate configuration file for ftp\nanalysis, since the values used for your web server will most likely not\nbe suited for ftp log analysis (ie: page types, hostname, etc.. should\nbe different).\n\nBecause of the difference in web and ftp logs, there are a few limitations:\n\no Because there is no concept of a 'response code' in ftp world, response\n  codes are restricted to either 200 (OK) or 206 (Partial Content), based\n  on the completion status found in xferlog (for wu-ftpd, 'i'=incomplete\n  and will generate a 206, 'c'=complete and will generate a 200).  If your\n  ftp server doesn't supply the completion status, all requests will be\n  assigned a response code of 200.  This allows the usage graph to display\n  all transfer requests (hits), and how many of those completed in success\n  (files - ie: 200 response codes).\n\no Page totals won't accurately reflect reality, since there isn't really\n  the concept of a 'page' in regards to ftp services.  I have found that\n  setting the PageType value to \"README\", \"FIRST\", etc... seems to work\n  fairly well however,  and will give a pretty good indication of how\n  many 'non-binary' files were requested.  Of course, the content of your\n  ftp site will be different, so your results may vary.\n\no Visit totals also won't accurately reflect reality, since visits are\n  triggered on PageType requests (see above).  What you usually wind up\n  with is visits=sites in most cases.\n\no Entry/Exit pages will not be calculated for ftp logs.\n\no For obvious reasons, referrers and user agents are not supported.\n\no You _cannot_ analyze both web and ftp logs at the same time.. they must\n  be done separately in different runs.\n\n\nNotes on Referrers\n------------------\n\nReferrers are weird critters... They take many shapes and forms, which makes\nit much harder to analyze than a typical URL, which at least has some\nstandardization.  What is contained in the referrer field of your log\nfiles varies depending on many factors, such as what site did the referral,\nwhat type of system it comes from and how the actual referral was generated.\nWhy is this?  Well, because a user can get to your site in many ways... They\nmay have your site bookmarked in their browser, they may simply type your\nsites URL field in their browser, they could have clicked on a link on some\nremote web page or they may have found your site from one of the many search\nengines and site indexes found on the web.  The Webalizer attempts to deal\nwith all this variation in an intelligent way by doing certain things to\nthe referrer string which makes it easier to analyze.  Of course, if your\nweb server doesn't provide referrer information, you probably don't really\ncare and are asking yourself why you are reading this section...\n\nMost referrers will take the form of \"http://somesite.com/somepage.html\",\nwhich is what you will get if the user clicks on a link somewhere on the\nweb in order to get to your site.  Some will be a variation of this, and\nlook something like \"file:/some/such/sillyname\", which is a reference from\na HTML document on the users local machine.  Several variations of this can\nbe used, depending on what type of system the user has, if he/she is on\na local network, the type of network, etc...  To complicate things even\nmore, dynamic HTML documents and HTML documents that are generated by\nCGI scripts or external programs produce lots of extra information which\nis tacked on to the end of the referrer string in an almost infinite number\nof ways.  If the user just typed your URL into their browser or clicked on\na bookmark, there won't be any information in the referrer field and will\ntake the form \"-\".\n\nIn order to handle all these variations, The Webalizer parses the referrer\nfield in a certain way.  First, if the referrer string begins with \"http\",\nit assumes it is a normal referral and converts the \"http://\" and following\nhostname to lowercase in order to simplify hiding if desired.  For example,\nthe referrer \"HTTP://WWW.MyHost.Com/This/Is/A/HTML/Document.html\" will become\n\"http://www.myhost.com/This/Is/A/HTML/Document.html\".  Notice that only the\n\"http://\" and hostname are converted to lower case... The rest of the\nreferrer field is left alone.  This follows standard convention, as the\nactual method (HTTP) and hostname are always case insensitive, while the\ndocument name portion is case sensitive.\n\nReferrers that came from search engines, dynamic HTML documents, CGI\nscripts and other external programs usually tack on additional information\nthat it used to create the page.  A common example of this can be found\nin referrals that come from search engines and site indexes common on the\nweb.  Sometimes, these referrers URLs can be several hundred characters\nlong and include all the information that the user typed in to search for\nyour site.  The Webalizer deals with this type of referrer by stripping\noff all the query information, which starts with a question mark '?'.\nThe Referrer \"http://search.yahoo.com/search?p=usa%26global%26link\" will\nbe converted to just \"http://search.yahoo.com/search\".\n\nWhen a user comes to your site by using one of their bookmarks or by\ntyping in your URL directly into their browser, the referrer field is\nblank, and looks like \"-\".  Most sites will get more of these referrals\nthan any other type.  The Webalizer converts this type of referral into\nthe string \"- (Direct Request)\".  This is done in order to make it easier\nto hide via a command line option or configuration file option.  This is\nbecause the character \"-\" is a valid character elsewhere in a referrer\nfield, and if not turned into something unique, could not be hidden without\npossibly hiding other referrers that shouldn't be.\n\n\nNotes on Character Escaping\n---------------------------\n\nThe HTTP protocol defines certain ways that URLs can look and behave.  To\nsome extent, referrer fields follow most of the same conventions.  Character\nescaping is a technique by which non-printable or other non-ASCII (and even\nsome ASCII) characters can be used in a URL.  This is done by placing the\nHexadecimal value of the character in the URL, preceded by a percent sign '%'.\nSince Hex values are made up of ASCII characters, any character can be\nescaped to ensure only printable ASCII characters are present in the URL.\nSome systems take this concept to the extreme and escape all sorts of stuff,\neven characters that don't need to be escaped.  To deal with this, The\nWebalizer will un-escape URLs and referrers before being processed. For\nExample, the URL \"/www.webalizer.org/%7Efoo/bar.html\" is the same URL as\n\"/www.webalizer.org/~foo/bar.html\", a very common form of a URL to access\nusers web pages.  If the URLs were not un-escaped, they would be treated as\ntwo separate documents, even though they are really one and the same.\n\n\nSearch String Analysis\n----------------------\n\n  The Webalizer will do a minimal analysis on referrer strings that\nit finds, looking for well known search string patterns.  Most of\nthe major search engines are supported, such as Yahoo!, Altavista,\nLycos, etc...  Unfortunately, search engines are always changing\ntheir internal/CGI query formats, new search engines are coming on\nline every day, and the ability to detect _all_ search strings is\nnearly impossible.  However, it should be accurate enough to give\na good indication of what users were searching for when they stumbled\nacross your site.  Note: as of version 1.31, search engines can now\nbe specified within a configuration file.  See the sample.conf file\nfor examples of how to specify additional search engines.\n\n\n\nNotes on Visits/Entry/Exit Figures\n----------------------------------\n\nThe majority of data analyzed and reported on by The Webalizer is\nas accurate and correct as possible based on the input log file.\nHowever, due to the limitation of the HTTP protocol, the use of\nfirewalls, proxy servers, multi-user systems, the rotation of your\nlog files, and a myriad of other conditions, some of these numbers\ncannot, without absolute accuracy, be calculated.  In particular,\nVisits, Entry Pages and Exit Pages are suspect to random errors\ndue to the above and other conditions.  The reason for this is\ntwofold, 1) Log files are finite in size and time interval, and\n2) There is no way to distinguish multiple individual users apart\ngiven only an IP address.  Because log files are finite, they have\na beginning and ending, which can be represented as a fixed time\nperiod.  There is no way of knowing what happened previous to this\ntime period, nor is it possible to predict future events based on\nit.  Also, because it is impossible to distinguish individual users\napart, multiple users that have the same IP address all appear to\nbe a single user, and are treated as such.  This is most common where\ncorporate users sit behind a proxy/firewall to the outside world,\nand all requests appear to come from the same location (the address\nof the proxy/firewall itself).  Dynamic IP assignment (used with\ndial-up Internet accounts) also present a problem, since the same\nuser will appear as to come from multiple places.\n\nFor example, suppose two users visit your server from XYZ company,\nwhich has their network connected to the Internet by a proxy server\n'fw.xyz.com'.  All requests from the network look as though they\noriginated from 'fw.xyz.com', even though they were really initiated\nfrom two separate users on different PCs.  The Webalizer would\nsee these requests as from the same location, and would record only\n1 visit, when in reality, there were two.  Because entry and exit\npages are calculated in conjunction with visits, this situation\nwould also only record 1 entry and 1 exit page, when in reality,\nthere should be 2.\n\nAs another example, say a single user at XYZ company is surfing\naround your website..  They arrive at 11:52pm the last day of\nthe month, and continue surfing until 12:30am, which is now a\nnew day (in a new month).  Since a common practice is to rotate\n(save then clear) the server logs at the end of the month, you\nnow have the users visit logged in two different files (current\nand previous months).  Because of this (and the fact that the\nWebalizer clears history between months), the first page the\nuser requests after midnight will be counted as an entry page.\nThis is unavoidable, since it is the first request seen by that\nparticular IP address in the new month.\n\nFor the most part, the numbers shown for visits, entry and exit\npages are pretty good 'guesses', even though they may not be 100%\naccurate.  They do provide a good indication of overall trends,\nand shouldn't be that far off from the real numbers to count much.\nYou should probably consider them as the 'minimum' amount possible,\nsince the actual (real) values should always be equal or greater\nin all cases.\n\n\nExporting Webalizer Data\n------------------------\n\nThe Webalizer now has the ability to dump all object tables to tab\ndelimited ASCII text files, which can then be imported into most\npopular database and spreadsheet programs. The files are not normally\nproduced, as on some sites they could become quite large, and are only\nenabled by the use of the Dump* configuration keywords.  The filename\nextensions default to '.tab' however may be changed using the\n'DumpExtension' keyword.  Since this data contains all items, even\nthose normally hidden, it may not be desirable to have them located\nin the output directory where they may be visible to normal web users..\nFor this reason, the 'DumpPath' configuration keyword is available,\nand allows the placement of these files somewhere outside the normal\nweb server document tree.  An optional 'header' record may be written\nto these files as well, and is useful when the data is to be imported\ninto a spreadsheet.. databases will not normally need the header.  If\nenabled, the header is simply the column names as the first record of\nthe file, tab separated.\n\n\nLog files and The Webalizer\n---------------------------\n\nMost sites will choose to have The Webalizer run from cron at specified\nintervals.  Care should be taken to ensure that data is not lost as a\nresult of log file rotations.  A suggested practice is to rotate your\nweb server logs at the end of each month as close to midnight as possible,\nthen have The Webalizer process the 'end of month' log file before running\nstatistics on the new, current log.  On our systems, a shell script called\n'rotate_logs' is run at midnight, the end of each month.  This script file\nlooks like:\n\n------------------------- file: rotate_logs ------------------------------\n#!/bin/sh\n\n# halt the server\nkill `cat /var/lib/httpd/logs/httpd.pid`\n\n# define backup names\nOLD_ACCESS_LOG=/var/lib/httpd/logs/old/access_log.`date +%y%m%d-%H%M%S`\nOLD_ERROR_LOG=/var/lib/httpd/logs/old/error_log.`date +%y%m%d-%H%M%S`\n\n# make end of month copy for analyzer\ncp /var/lib/httpd/logs/access_log /var/lib/httpd/logs/access_log.backup\n\n# move files to archive directory\nmv /var/lib/httpd/logs/access_log `echo $OLD_ACCESS_LOG`\nmv /var/lib/httpd/logs/error_log  `echo $OLD_ERROR_LOG`\n\n# restart web server\n/usr/sbin/httpd\n\n# compress the archived files\n/bin/gzip $OLD_ACCESS_LOG\n/bin/gzip $OLD_ERROR_LOG\n------------------------- end of file ------------------------------------\n\nThis script first stops the web server using a 'kill' command.  Apache\nkeeps the PID of the server in the file httpd.pid, so we use it as the\nargument for the kill.  Next, it defines some names for the backup files,\nwhich are basically the name of the files with the date and time appended\nto the end of them.  It then makes a copy of the log file, appended with\n'.backup' in the log directory, moves the current log files to an archive\ndirectory (/var/lib/httpd/logs/old) and restarts the server.  This setup\nallows the web server to be down for the minimum amount of time needed,\nwhich is important for busy sites.  If you don't want to stop the server,\nyou can remove the initial 'kill' command, and replace the '/usr/sbin/httpd'\nline with \"kill -1 `cat /var/lib/httpd/logs/httpd.pid`\" command instead,\nOn most web servers, this will cause a restart of the server and create\nthe new log files in the process...\n\nAt this point, we have made copies of the previous months logs,  the web\nserver is going about its business as usual, and we have all the time in\nthe world to do any other additional processing we want.  The last two\nlines of the script compress the archived logs using the GNU zip program\n(gzip).  Remember, we still have a copy of the log which we can now run\nThe Webalizer on without having to do any further processing.\n\nNext, we define two crontab entries.  The first runs the above 'rotate_logs'\nscript at midnight at the end of the month.  The second runs The Webalizer\non the '.backup' log file created above at 5 minutes after midnight.  This\ngives other end of month processing jobs a chance to run so we don't bog\nthe system down too much.  If you have lots of end of month stuff going on,\nyou can change the timing to suit your needs.  The crontab entries look\nsomething like:\n\n------------------------- crontab entries --------------------------------\n# Rotate web server logs and run monthly analysis\n0 0 1 * *       /usr/local/adm/rotate_logs\n5 0 1 * *       /usr/bin/webalizer -Q /var/lib/httpd/logs/access_log.backup\n------------------------- end of crontab ---------------------------------\n\nAs you can see, the log rotations occur at midnight, and the analysis\nis done at 5 minutes after.  Once you verify that The Webalizer ran\nsuccessfully, the access_log.backup file can be deleted as it isn't\nneeded any more.  If you need to re-run the analysis, you still have\nthe compressed archive copy that the shell script created.  In order\nfor the above analysis to work properly, you should have already\ncreated an /etc/webalizer.conf configuration file suitable for your\nsite, or otherwise specify configuration options or a configuration\nfile on the crontab command line above.\n\nIf you want The Webalizer to be run more often than once a month, you\ncan specify additional crontab entries to do this as well.  Care should\nbe taken however to ensure that The Webalizer is not running when the\nend of month processing above occurs, or unpredictable results may\nhappen (such as an inability to rotate the logs due to a file lock).\nThe easiest way is to run it on the half hour with a crontab entry like:\n\n30 * * * *      /usr/bin/webalizer\n\n\nReverse DNS Lookups\n-------------------\n\nThe Webalizer fully supports both IPv4 and IPv6 DNS lookups, and\nmaintains a cache of those lookups to reduce processing the same\naddresses in subsequent runs.  The cache file can be created at\nrun-time, or may be created before running the webalizer using either\nthe stand alone 'webazolver' program, or The Webalizer (DNS) Cache\nfile Manager program 'wcmgr'.  In order to perform reverse lookups,\na DNS Cache file must be specified, either on the command line or in\na configuration file.  In order to create/update the cache file at\nrun-time, the number of DNS Children must also be specified, and can\nbe anything between 1 and 100.  This specifies the number of child\nprocesses to be forked, each of which will perform network DNS\nqueries in order to lookup up the addresses and update the cache.\nCached entries that are older than a specified TTL (time to live)\nwill be expired, and if encountered again in a log, will be looked\nup at that time in order to 'freshen' them (verify the name is still\nthe same and update its timestamp).  The default TTL is 7 days, however\nmay be set to anything between 1 and 100 days.  Using the 'wcmgr'\nprogram, entries may also be marked as 'permanent', in which case\nthey will persist (with an infinite TTL) in the cache until manually\nremoved.  See the file DNS.README for additional information.\n\n\nGeolocation Lookups\n-------------------\n\nThe Webalizer has the ability to perform geolocation lookups on IP\naddresses using either it's own internal GeoDB database or optionally\nthe GeoIP database from MaxMind, Inc. (www.maxmind.com).  If used,\nunresolved addresses will be searched for in the database and it's\ncountry of origin will be returned if found.  This actually produces\nmore accurate Country information than DNS lookups, since the DNS\naddress space has additional gcTLDs that do not necessarily map to\na specific country (such as '.net' and '.com').  It is possible to\nuse both DNS lookups and geolocation lookups at the same time, which\nwill cause any addresses that could not be resolved using DNS lookups\nto then be looked up in the database, greatly reducing the number of\n'Unknown/Unresolved' entries in the generated reports.  The native\nGeoDB geolocation database provided by The Webalizer fully supports\nIPv4 and IPv6 lookups, is updated regularly, and is the preferred\ngeolocation method for use with The Webalizer.  The most current\nversion of the database can be obtained from our ftp site.\n\n\nLanguage Support\n----------------\n\nVersion 1.0x of The Webalizer added language support.  This\nsupport is only provided at compile time in the form of an\ninclude file containing all the strings used by The Webalizer.\nThe source distribution contains all language files that were\navailable at the time, with English being the default as\nthat is the only human language I speak fluently, and me\nEspanol es muy malo.  Several people have already indicated\nthe desire to do translations into various languages, and as\nI receive the language files, ","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzone-eu%2Fwebalizer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzone-eu%2Fwebalizer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzone-eu%2Fwebalizer/lists"}