Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/zerodevx/static-sitemap-cli

CLI to generate XML sitemaps for static sites from local filesystem
https://github.com/zerodevx/static-sitemap-cli

cli sitemap-generator static-site xml

Last synced: 5 days ago
JSON representation

CLI to generate XML sitemaps for static sites from local filesystem

Awesome Lists containing this project

README

        

[![npm (tag)](https://img.shields.io/npm/v/static-sitemap-cli/latest)](https://www.npmjs.com/package/static-sitemap-cli)
[![npm](https://img.shields.io/npm/dm/static-sitemap-cli)](https://www.npmjs.com/package/static-sitemap-cli)

# static-sitemap-cli

> CLI to generate XML sitemaps for static sites from local filesystem.

Quick and easy CLI to generate [XML](https://www.sitemaps.org/protocol.html) or
[TXT](https://developers.google.com/search/docs/advanced/sitemaps/build-sitemap#text) sitemaps by
searching your local filesystem for `.html` files. Automatically exclude files containing the
`noindex` meta. Can also be used as a Node module.

**NOTE:** This is the V2 branch. If you're looking for the older version, see the
[V1 branch](https://github.com/zerodevx/static-sitemap-cli/tree/v1). V2 contains **breaking
changes**. Find out what changed on the
[releases](https://github.com/zerodevx/static-sitemap-cli/releases) page.

## Install

```
$ npm i -g static-sitemap-cli
```

## Usage

```
$ sscli -b https://example.com -r public
```

This trawls the `public/` directory for files matching `**/*.html`, then parses each file for the
`noindex` robots meta tag - excluding that file if the tag exists - and finally generates both
`sitemap.xml` and `sitemap.txt` into the `public/` root.

See below for more usage [examples](#examples).

## Options

```
Usage: sscli [options]

CLI to generate XML sitemaps for static sites from local filesystem

Options:
-b, --base base URL (required)
-r, --root root working directory (default: ".")
-m, --match globs to match (default: ["**/*.html"])
-i, --ignore globs to ignore (default: ["404.html"])
-c, --changefreq comma-separated glob-changefreq pairs
-p, --priority comma-separated glob-priority pairs
--no-robots do not parse html files for noindex meta
--concurrent concurrent number of html parsing ops (default: 32)
--no-clean do not use clean URLs
--slash add trailing slash to all URLs
-f, --format sitemap format (choices: "xml", "txt", "both", default: "both")
-o, --stdout output sitemap to stdout instead
-v, --verbose be more verbose
-V, --version output the version number
-h, --help display help for command
```

#### HTML parsing

By default, all matched `.html` files are piped through a fast
[HTML parser](https://github.com/fb55/htmlparser2) to detect if the `noindex`
[meta tag](https://developers.google.com/search/docs/advanced/crawling/block-indexing#meta-tag) is
set - typically in the form of `` - in which case that file
is excluded from the generated sitemap. To disable this behaviour, pass option `--no-robots`.

For better performance, file reads are streamed in `1kb` chunks, and parsing stops immediately when
either the `noindex` meta, or the `` closing tag, is detected (the `` is not parsed).
This operation is performed concurrently with an
[async pool](https://github.com/rxaviers/async-pool) limit of 32. The limit can be tweaked using the
`--concurrent` option.

#### Clean URLs

Hides the `.html` file extension in sitemaps like so:

```
./rootDir/index.html -> https://example.com/
./rootDor/foo/index.html -> https://example.com/foo
./rootDor/foo/bar.html -> https://example.com/foo/bar
```

Enabled by default; pass option `--no-clean` to disable.

#### Trailing slashes

Adds a trailing slash to all URLs like so:

```
./rootDir/index.html -> https://example.com/
./rootDir/foo/index.html -> https://example.com/foo/
./rootDir/foo/bar.html -> https://example.com/foo/bar/
```

Disabled by default; pass option `--slash` to enable.

**NOTE:** Cannot be used together with `--no-clean`. Also, trailing slashes are
[always added](https://github.com/zerodevx/static-sitemap-cli/tree/v1#to-slash-or-not-to-slash) to
root domains.

#### Match or ignore files

The `-m` and `-i` flags allow multiple entries. By default, they are set to the `["**/*.html"]` and
`["404.html"]` respectively. Change the glob patterns to suit your use-case like so:

```
$ sscli ... -m '**/*.{html,jpg,png}' -i '404.html' 'ignore/**' 'this/other/specific/file.html'
```

#### Glob-[*] pairs

The `-c` and `-p` flags allow multiple entries and accept `glob-*` pairs as input. A `glob-*` pair
is a comma-separated pair of `,`. For example, a glob-changefreq pair may look like
this:

```
$ sscli ... -c '**,weekly' 'events/**,daily'
```

Latter entries override the former. In the above example, paths matching `events/**` have a `daily`
changefreq, while the rest are set to `weekly`.

#### Using a config file

Options can be passed through the `sscli` property in `package.json`, or through a `.ssclirc` JSON
file, or through other [standard conventions](https://github.com/davidtheclark/cosmiconfig).

## Examples

#### Dry-run sitemap entries

```
$ sscli -b https://x.com -f txt -o
```

#### Generate XML sitemap to another path

```
$ sscli -b https://x.com -r dist -f xml -o > www/sm.xml
```

#### Get subset of a directory

```
$ sscli -b https://x.com/foo -r dist/foo -f xml -o > dist/sitemap.xml
```

#### Generate TXT sitemap for image assets

```
$ sscli -b https://x.com -r dist -m '**/*.{jpg,jpeg,gif,png,bmp,webp,svg}' -f txt
```

## Programmatic Use

`static-sitemap-cli` can also be used as a Node module.

```js
import {
generateUrls,
generateXmlSitemap,
generateTxtSitemap
} from 'static-sitemap-cli'

const options = {
base: 'https://x.com',
root: 'path/to/root',
match: ['**/*html'],
ignore: ['404.html'],
changefreq: [],
priority: [],
robots: true,
concurrent: 32,
clean: true,
slash: false
}

generateUrls(options).then((urls) => {
const xmlString = generateXmlSitemap(urls)
const txtString = generateTxtSitemap(urls)
...
})
```

Using the XML sitemap generator by itself:

```js
import { generateXmlSitemap } from 'static-sitemap-cli'

const urls = [
{ loc: 'https://x.com/', lastmod: '2022-02-22' },
{ loc: 'https://x.com/about', lastmod: '2022-02-22' },
...
]

const xml = generateXmlSitemap(urls)
```

## Development

Standard Github [contribution workflow](https://github.com/firstcontributions/first-contributions)
applies.

#### Tests

Test specs are at `test/spec.js`. To run the tests:

```
$ npm run test
```

## License

ISC

## Changelog

Changes are logged in the [releases](https://github.com/zerodevx/static-sitemap-cli/releases) page.