{"id":51098266,"url":"https://github.com/instancezero/novelwriterextract","last_synced_at":"2026-06-24T08:32:12.566Z","repository":{"id":233226923,"uuid":"786304908","full_name":"instancezero/NovelWriterExtract","owner":"instancezero","description":"Extract metadata from a NovelWriter project into a data set (ODS, XLSX, or CSV).","archived":false,"fork":false,"pushed_at":"2025-11-03T04:24:16.000Z","size":71319,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-11-03T06:14:17.546Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/instancezero.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-04-14T03:26:36.000Z","updated_at":"2025-11-03T04:24:20.000Z","dependencies_parsed_at":"2024-04-14T21:47:17.842Z","dependency_job_id":"04f4cc99-1907-46d4-8bcc-6587df566125","html_url":"https://github.com/instancezero/NovelWriterExtract","commit_stats":null,"previous_names":["instancezero/novelwriterextract"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/instancezero/NovelWriterExtract","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/instancezero%2FNovelWriterExtract","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/instancezero%2FNovelWriterExtract/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/instancezero%2FNovelWriterExtract/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/instancezero%2FNovelWriterExtract/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/instancezero","download_url":"https://codeload.github.com/instancezero/NovelWriterExtract/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/instancezero%2FNovelWriterExtract/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34724735,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-24T02:00:07.484Z","response_time":106,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-24T08:32:11.931Z","updated_at":"2026-06-24T08:32:12.558Z","avatar_url":"https://github.com/instancezero.png","language":"PHP","funding_links":["https://buymeacoffee.com/alanlangford"],"categories":[],"sub_categories":[],"readme":"# A Metadata Extraction Tool for novelWriter\n\nNovelWriterExtract is a cross-platform command-line tool using a JSON configuration file that \nextracts and processed metadata from a novelWriter project https://novelwriter.io,\nincluding the story metadata\n(as outlined in https://github.com/vkbo/novelWriter/discussions/1769.)\nThe extracted data can be exported to a series of tables in OpenDocument Spreadsheet,\nCSV, HTML, or Microsoft Excel formats.\n\nCheck the release notes at the bottom of this file for information on updates.\n\nIt also extracts data stored in comments, and tag references like @char and @location.\n\nThe output format is determined by the extension of the filename specifed on the command line..\n\n**Note**: if the output file format is CSV, \nonly scenes will be written since CSV files don't support multiple sheets.\n\nStand-alone binaries with no dependencies for Linux, Mac,\nand Windows can be found in the bin/ folder.\n\n## Main Features\n* Extracts scene metadata with the ability to specify which columns appear in the output.\nSince story metadata can have arbitrary identifiers, this allows you to extract different views\nof your novel.\n* Generates scene-by-scene timelines for selected characters.\n* Pulls data from the locations and character sections of the manuscript, including any story\nmetadata associated with those nodes, into separate character and location reference sheets.\n* Supports a flexible method of calculating relative timelines, even if your novel doesn't use\nEarth units for describing time.\nIf your story has multiple timelines, this facilitates doing a chronological sort on scenes\nand character timelines.\n* Can provide statistics on scene and word counts, broken down by scene status.\n* Has a built-in word and phrase frequency analysis tool, \ndesigned to help you spot places where a word or phrase is overused.\n\n## Usage\n\nBasic usage is `novelWriterExtract nw_project_folder output_file [format_file]`\n\nStarting with version 1.2, supplying an asterisk on the command line (`novelWriterExtract *`)\nwill cause the application to prompt you for the additional arguments.\n\nStarting with v1.1 The output file supports two formatting commands:\n@z timezone_identifier@ and @d [php-date-format]@\n\nThe date specification in its simplest form of @d@ will become the current date\nin the format yyyy-mm-dd. The optional format can be any valid PHP date/time string.\n\nThe timezone identifier (@z) is anything recognized by PHP, for example America/Toronto. \nIf no timezone is specified then UTC is used.\nThe timezone specification must precede the use of @d@ or it will have no effect. \n\nIf you're not running a binary, the code was written for PHP 8.4\nbut will probably run just fine in lesser versions.\n\n## Operation\n\nLike novelWriter, the extraction tool supports multiple scenes per document.\n\nAll novelWriter files are only read, never written to.\n\nStarting with version 2.7, novelWriter has implemented constructs of the form `%story.term`,\nsupporting data export from within the application directly,\nhowever NovelWriterExtract offers several filtering and format options not present in novelWriter.\n\nFuture changes to novelWriter might break this tool.\nPlease open an issue if that happens.\nNo warranties are explicit or implied, yada yada yada.\n\nIf you find NovelWriterExtract to be useful, please tell your fellow authors, editors, friends,\nco-workers, grocery store cashiers, and random people walking down the street.\nAfter all, if you're an author, they probably already think you're crazy.\n\n**If you want to support my work you can \n[buy me a coffee](https://buymeacoffee.com/alanlangford).\nEvery little bit helps and is greatly appreciated!**\n\n## Formats\n\nVersion 1.0.0 introduces the option to specify which terms should be extracted, \nalong with some other formatting options.\nIf no format file is specified, all terms are extracted from the project.\nThe format is defined in JSON (there's a highly specific sample in the `formats` folder).\n\nThe overall syntax is (each section is detailed below):\n\n```json lines\n{\n  \"characters\": true, // Array of columns or boolean\n  \"locations\": true,  // Array of columns or boolean\n  \"scenes\": [\n    // Column specifications can be just the column name or a more complex expression.\n    \"column1\",\"column2\",\"...\" \n  ],\n  \"time\": {},         // Time unit specification. Details below.\n  \"timelines\": {},    // Setting related to character timeline generation.\n  \"wordCounts\": true, // Boolean\n  \"wrap\": 40          // Integer, the number of characters to wrap multi-line columns at.\n}\n```\n\nA column definition can be either the name of a %story term\nor the name of an @ reference in NovelWriter.\n\n### Characters\n\nIf the `characters` attribute is true (which is the default), \nNovelWriterExtract will generate a sheet that lists all the characters in the novelWriter project.\nIf it is set to false, no sheet will be produced.\n\nThe default columns in the character sheet are:\n\n* _sequence: a sequential character number.\n* name: The name of the note that contains the character information.\n* @tag: text from the @tag directive.\n* _folder: the name of the sub-folder the character is located in.\n* (Any character attributes from the related %story directives, sorted alphabetically by name.)\n* synopsis: text from the character's %synopsis or %short directive.\n\nIf an array is specified, it is a list of story attributes, \nwhich are included if they are used in the manuscript.  \nTo illustrate, if you have defined attributes for a character's nickname and age with \nconstructs like this:\n\n```\n%story.age: 30\n%story.nickname: The Claw\n%story.build: thin\n```\nThen the default output columns, with `characters` set to true will be:\n* _sequence\n* name\n* tag\n* folder (only if there are character sub-folders)\n* age\n* build\n* nickname\n* synopsis\n\nIf `characters` is \\[nickname, age], then the columns will be:\n* _sequence\n* name\n* tag\n* folder (only if there are character sub-folders)\n* nickname\n* age\n* synopsis\n\nWhere the build column is omitted and the column order has changed.\n\n\n### Locations\n\nDefault location columns are:\n\n* _sequence: a sequential character number.\n* name: The name of the note that contains the character information.\n* tag: text from the @tag directive.\n* _folder: the name of the folder this location is contained in.\n* synopsis: text from the character's %synopsis or %story directive.\n\nCustom %story columns can be displayed in the same way as in the character section.\n\n### Scenes\n\nBesides the @ tags and %story terms, these column names are available:\n\n* _active: The value of the active column in the document tree (yes/no).\n* _blank: an empty column.\n* _chron: A relative time (details below).\n* _novel: the name from the novel this scene is in.\n* _sequence: a sequential scene number in the novel.\n* _sla: A sentence length analysis (details below).\n* _slg: A sentence length graph. (details below).\n* _status: The text value associated with the status icon in the document tree.\n* words: The number of words in the scene.\n\nA simple format file could look like this:\n\n```json\n{\n  \"scenes\": [\n    \"_sequence\", \"name\", \"@location\", \"@char\", \"synopsis\", \"words\"\n  ]\n}\n```\nBut columns can do much more. You can change the column heading from the default:\n\n```json\n{\n  \"scenes\": [\n    {\n      \"key\": \"@custom\",\n      \"heading\": \"Additional References\"\n    }\n  ]\n}\n```\n\nFor formats other than CSV, you can change the column alignment, number format,\nand highlight cells that contain the first mention of a value with the \"onFirst\" attribute:\n```json\n{\n  \"scenes\": [\n    {\n      \"key\": \"@char\",\n      \"style\": {\n        \"align\": \"center\",\n        \"onFirst\": true\n      }\n    },\n    {\n      \"key\": \"words\",\n      \"style\": {\n        \"numberFormat\": \"#,##0.\"\n      }\n    }\n  ]\n}\n```\nUnfortunately, HTML and CSV output formats aren't language sensitive,\nso it's not possible to get the European number style like 1.000,00.\nHowever, the extracts to ODS and XSLX should convert automatically.\n\nYou can break any attribute with a specific value into a new column, \nand highlight the first time the value appears, as in this example with locations:\n```json\n{\n  \"scenes\": [\n    {\n      \"heading\": \"Europe\",\n      \"key\": \"@location\",\n      \"test\": [\n        {\n          \"arg\": \"@location\",\n          \"op\": \"has\",\n          \"value\": [\"France\", \"Germany\", \"Italy\"]\n        }\n      ]\n    },\n    {\n      \"heading\": \"N. America\",\n      \"key\": \"@location\",\n      \"test\": [\n        {\n          \"arg\": \"@location\",\n          \"op\": \"has\",\n          \"value\": [\"Canada\", \"USA\", \"Mexico\"]\n        }\n      ]\n    }  \n  ]\n}\n```\n\nYou can break your main characters into separate columns\nand create a column for secondary characters:\n```json\n{\n  \"scenes\": [\n    {\n      \"heading\": \"Hero\",\n      \"test\": [\n        {\n          \"arg\": \"@char\",\n          \"op\": \"includes\",\n          \"value\": \"Suzie\"\n        }\n      ],\n      \"result\": \"Sue\"\n    },\n    {\n      \"heading\": \"Ally\",\n      \"test\": [\n        {\n          \"arg\": \"@char\",\n          \"op\": \"includes\",\n          \"value\": \"Mark\"\n        }\n      ],\n      \"result\": \"Mark\"\n    },\n    {\n      \"heading\": \"Villain\",\n      \"test\": [\n        {\n          \"arg\": \"@char\",\n          \"op\": \"includes\",\n          \"value\": \"Darth\"\n        }\n      ],\n      \"result\": \"Darth\"\n    },\n    {\n      \"key\": \"@char\",\n      \"heading\": \"Others\",\n      \"exclude\": [\n        \"Sue\",\n        \"Mark\",\n        \"Darth\"\n      ]\n    }\n  ]\n}\n```\n\n#### Relative time\n\nSometimes when a story has multiple timelines, it's useful to be able to look at the\nstory structure on an ascending timeline. The application provides two ways of doing this\nthrough the `%story.time` data.\n\nFixed time mode will attempt to parse a human-readable date/time string \nand convert it to a sortable ISO8601 value. For example, \"%story.time: March 5, 2001 9:15pm\"\nshould result in a _chron value of 2001-03-05T22:15\"\n\nRelative time is expressed in time units. These default to Earth units but can be customized.\n\nThe time mode is specified in the \"time\" section of the format file:\n\n```json lines\n{\n  \"time\": {\n    \"mode\": \"fixed|relative|off\" // Any value other than fixed or relative will be interpreted as off.\n  }\n}\n\n```\n\nPreset relative Earth units are:\n* No units or 'm': minutes.\n* 'h': hours of 60 minutes.\n* 'd': days of 24 hours.\n* 'w': weeks of 7 days.\n* 'mo': months of 30 days.\n* 'y': years of 12 months.\n\n\"%story.time: 15\" would represent 15 minutes into the start of the story.\n\"%story.time: 3mo\" represents three months in (a default month is fixed at 30 days.)\n\nIt is possible to set a base time and then use it in time expressions. \nIn one scene you can define a base time: \"%story.time: prolog=-4.5y\" \nSets \"prolog\" to 4.5 years before \"time zero.\"\nYou can then use \"prolog\" as the basis for other times,\nso \"%story.time: prolog+6mo\" is six months after five years in the past, or -4 years.\nThe only rule is that the base time must be defined before it is used.\n\nIf your story uses its own time system, you can accommodate this with custom units. \nTimes with no unit specification will be taken as unit time.\nEverything else is a multiple of that or another defined unit. \nUnits are specified as part of the time configuration: \n\n```json lines\n{\n  \"time\": {\n    \"mode\": \"relative\",\n    \"units\": {\n      \"zip\": 1,\n      \"blarg\": \"16zip\",\n      \"snarf\": \"128blarg\"\n    }\n  }\n}\n\n```\nIn this time system, the base unit is a zip. A \"blarg\" is 16 zips, and a \"snarf\" is 128 \"blargs\",\nor 2048 zips.\n\n\n#### Sentence Length Analysis\n\nThe `_sla` column produces a compressed representation of sentence lengths in the scene. \nThe first element is the number of sentences in the scene and the average number of sentences\nper paragraph, for example \"102@3.5:\" means there's 102 sentences in the scene \nand the average paragraph is 3.5 sentences long.\n\nFollowing the first element, there is one comma-separated string per paragraph.\nThe string starts with a P and the number of sentences in the paragraph and a colon.\nThe rest of the string characterizes the sentences in the paragraph by length.\nEach sentence is assigned an s if it contains less than five words,\nan m if it contains five to nineteen, and an l if it has 20 or more words.\nGroups of sentences with the same length are assigned a multiplier.\n\nFor example, the string \"P9:2l.6m.s\" means the paragraph has nine sentences, two long,\nfollowed by six medium and one short. Expanded, this would be \"P9:llmmmmmms\".\n\nWhile complex, this is designed to make it easier to detect sequences \nof paragraphs with the same length, like the six medium-length sentences in the example.\n\nWith this release, the criteris for sentence length is pre-set and fixed.\nI'll look at ways to change that in future releases.\n\n#### Sentence Length Graph\n\nThe `_slg` column is intended to provide the same kind of information \nas the sentence length analysis, but in a more visual way. \nEach sentence is represented by a vertical stack with eight possible levels:\n\n* One bar: 1–2 words.\n* Two bars: 3–5 words.\n* Three bars: 6–8 words.\n* Four bars: 9–11 words.\n* Five bars: 12–14 words.\n* Six bars: 15–17 words.\n* Seven bars: 18–20 words.\n* Eight bars: 21+ words.\n\nA space separates each paragraph. This results in output like this:\n\n▆█▄▄▇▆▄▆▄▆▂ ▃▄▇▆▂\n\n\n### Timelines\n\nThe timelines section lists the scenes that a named character appears in. By default,\nthe scene synopsis is listed, but this can be overridden by supplying a %story.of_{character_tag}\nline within the scene in novelWriter. The \"of_\" construct allows the author to relate the\nscene ffrom the perspective of the named character.\n\nThe timeline specification can limit characters by the number of scenes they appear in:\n```json lines\n{\n  \"timelines\": {\n    \"minimum\": 4    // Characters appearing in less than four scenes will not be generated\n  }\n}\n```\nThe default minimum is zero, which will generate a sheet for every character.\n\nYou can also specify which characters to generate (with or without the minimum):\n```json lines\n{\n  \"timelines\": {\n    \"chars\": [\"Bob\", \"Shivanna\"],\n    \"minimum\": 4\n  }\n}\n```\nThis will only produce sheets for the two named characters if they appear in four or more scenes.\n\nBy default, character sheets include the %story.time and (if enabled) the relative time columns.\nYou can change this with the \"show\" option:\n```json lines\n{\n  \"timelines\": {\n    \"chars\": [\"Bob\", \"Shivanna\"],\n    \"minimum\": 4,\n    \"show\": [\"time\", \"_chron\"]    // Only output the named columns. Use an empty array for none.\n  }\n}\n```\n\n### Word and Phrase Use and Clustering Analysis\n\nThe `analysis` flag is a boolean true or false (default). When enabled, \nthe program will generate two tables on an Analysis sheet.\n\nNote that the analysis process is compute-intensive and will take some time to process.\n\nThe tables present frequency, \"Clumpiness\", and \"Average Clumpiness\". \nThe first reports on individual words; the second reports on phrases of two or three words.\n\n\"Clumpiness\" is a metric that is higher when occurrences of the word/phrase \nare closer to each other in a scene. The higher this number is, \nthe more likely that the word or phrase is repeated more than once in close proximity.\n\n### Word Counts\n\nThe ```wordCounts``` flag produces a sheet with statistics on the novel's scenes.\nThe sheet columns tally word counts and lists the number of scenes,\nbroken down by active, inactive, and total.\nThe rows list this data by scene status with totals at the bottom.\n\nIf you don't want to see the word/scene count statistics,\nthey can be disabled in the JSON format specification.\n```json\n{\n  \"wordCounts\": false\n}\n\n```\n\n### Wrap\n\nThe \"wrap\" setting specifies the maximum width of a column in characters. The default is 40.\nThis does not apply to the CSV output file format\n\n## Release Notes\n\n### 1.4.1 2026-05-11\n\nFixed:\n- A renaming issue caused the _chron column to come up blank.\n\n### 1.4.0 2026-05-08\n\nFixed:\n- A bug where custom story attributes weren't being reported.\n\nAdded:\n- Relative time calculations\n- Character timelines\n- Analysis tools\n\nChanged:\n- Improved default headers for custom story attributes.\nFor example, %story:my_thing will use \"My Thing\" as the header instead of \"My_thing\".\n- Column headers are now frozen so they don't scroll off the sheet.\n\n### 1.3.0 2025-11-28\n\n- Added the capability to extract characters and locations\n- Improved column width estimation\n- Major code re-work under the hood.\n- \"columns\" element renamed to \"scenes\". \"columns\" is still recognized for backwards compatibility.\n\n### 1.2.2 2025-11-02\n\n- Repeated references were also being separated by double line feeds. \nThis update uses a single line feed for references, two for synopsis/story comments.\n\n### 1.2.1 2025-10-23\n\n- novelWriter 2.8 will support repeated named comments (e.g. synopsis and the story tags),\nseparating multiple occurrences with two line feeds. This update supports that behaviour\n  (previously a repeated comment would overwrite any earlier ones.)\n\n### 1.2.0 2025-09-20\n\n- Improved the output when there are insufficient arguments on the command line.\n- If * is supplied as the first argument, the program will prompt for arguments. \n- Fixed a bug that was generating messy warnings.\n\n### 1.1.0 2025-09-09\n\nAdded:\n\n- Ability to embed date/time in output filename using @d@; set timezone with @z@ in output path.\n- Improved word counts. Counts now exclude those in novelWriter commands.\n- Better handling of files with multiple scenes, with individual word counts per scene.\n- A new _active column lists the scene active state.\n- A new _status column shows the scene status (using the text labels, not icons).\n- A new statistics table counts scene and word totals, broken out by status and active state.\n- It is now possible to set number formatting by column with the numberFormat style setting.\n\n### 1.0.0 2025-04-09\n\nInitial release with format feature to customize extracted data.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finstancezero%2Fnovelwriterextract","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Finstancezero%2Fnovelwriterextract","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finstancezero%2Fnovelwriterextract/lists"}