Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/kyukyunyorituryo/AozoraEpub3

青空文庫テキスト→ePub3変換
https://github.com/kyukyunyorituryo/AozoraEpub3

bookwalker epub3 ibooks java kindle kobo

Last synced: about 2 months ago
JSON representation

青空文庫テキスト→ePub3変換

Awesome Lists containing this project

README

        

Modified AozoraEpub3
============
description of the modified version
------------
This is a fork version that aims to be close to the "[EBPAJ EPUB 3 File Creation Guide](http://ebpaj.jp/counsel/guide)". When using Aozora Epub 3 for electronic publishing purposes, the original version may not pass the review. Through EPUB validation, we ensure that many EPUB viewers have no display problems.

Due to a Java licensing issue, we decided to build with the AdoptOpenJDK. https://adoptopenjdk.net/releases.html OpenJDK 21 (LTS), HotSpot, OS, and Install JRE.

Downloads
============
Check the releases page [releases page](https://github.com/kyukyunyorituryo/AozoraEpub3/releases) to get the latest distribution.

Description
------------
This is a tool to convert text files with notes from Aozora Bunko into ePub 3 files. ・ Convert the text and image file (or zip) of Aozora Bunko txt to ePub 3 ・ Get the HTML of the Web novel and save it in Aozora Bunko txt format then convert it to ePub 3 ・ Convert the image zip/rar to ePub 3.

Usage Notes
------------
Please use it at your own risk.

* There are some notes that are currently not supported.
* Notes from Aozora Bunko: Some xhtml errors converted by non-specification notes may not be displayed.
* When a 4-byte character is output, it may not be displayed after the external character on an incompatible terminal. (Show notes as small when option not converted is selected)
Please report it on the distribution site where there are bugs or notes that cannot be converted.

Notes on Conversion
------------
Abnormalities in comments, notes that are not supported, and private characters that could not be converted are displayed in the log at the time of conversion, so correct the original text accordingly.

- Out-of-specification and some fluctuating notes are not supported.
- If a Gaiji Note is used in a Gaiji Note, an error will be generated (No plans to respond) → ※[#「姉」の正字、「女+※[#第3水準1-85-57]のつくり」 is shown in the log, the original text of that part is corrected to ※[#「姉」の正字、U+59CA].
- Please delete original comment notes with notes in them

System Requirements
------------
Java 21 and later system requirements (http://www.java.com/ja/)
AdoptOpenJDK (https://adoptium.net/temurin/releases/)

Windows XP or later works with Ubuntu Mac OS X.

How to use
------------
#### Installation
Unzip AozoraEpub3 -*. zip to any folder.

#### Start
Double-click AozoraEpub3.jar to run it.
or "java -jar AozoraEpub3.jar" from the console.
* If java is not visible, specify the full path.
Example: "C:\Program Files\Eclipse Adoptium\jre-21.0.1.12-hotspot\bin/java.exe" -jar AozoraEpub3.jar

#### EPUB convert
Aozora Bunko text file (Extension txt or zip) to be converted into the displayed applet
Drag and drop (Multiple). (Same as opening from "File Selection")
The "Original Filename .epub" or "[author's name] Title. epub" file is generated in the same location as the text file.
* If you convert the image only zip without text, it will generate an ePub file with only images.

#### Converting Web novels
You can also use drag-and-drop to retrieve and convert URLs or URL shortcuts (.url) on the list page of a Web novel site. (Only sites with definition files in web /)

You can obtain it from "syosetu.com", (+ Related Sites), "NEWVEL-LIBRARY", "FC2 Novels", "HAMELN", "Arcadia", "novelist.jp", "dNoVeLs", "Kakyomu", and "novelup.plus".
https://syosetu.com , https://novel.fc2.com/ , https://syosetu.org/ , http://www.mai-net.net/ , http://novelist.jp/ , http://www.dnovels.net/ , https://kakuyomu.jp/ , https://novelup.plus/

screen setting
------------
#### Title
* In the text
Sets whether the title and author name are included in the text.
If it is three lines in a row, the title is followed by a subtitle, which is concatenated with the title.
The title in the text is set to large characters with the author's name and title.
Images and blank lines are ignored.
Select "First Publisher" to treat line 1 as the publisher
* Filename Override
Gets the title and author name from the "[author's name] Title. epub" file name.
The style settings for the title and author lines in the body of the text follow the selections in the body of the text.

#### Cover
* front cover
Specify the cover page image as [leading illustration] [Same image as input file name (png, jpg)] [No Cover] or a file or URL.
[Same image as input file name (png, jpg)] uses an image with the same name and extension as the cover page.
(The extensions are checked in the following order: png, jpg, jpeg, gif)
If there is a cover.png | jpg | jpeg file in the path of the text file when there is no cover page image, set it as a cover page on the confirmation screen.

#### Page Output
* front cover
Add a cover page (Image is 100% width) to the beginning of the ePub.
Please specify it when you want to show the cover in Reader, etc.
(If you specify an illustration in the text as the cover page, it is moved to the top.)
* Title
Title Print the title, author, or other page as a single page, centered horizontally or horizontally.
* Table of Contents
Select to print a table of contents page.
You can choose vertical or horizontal writing.

* Extension
Specifies the extension of the output file.
Choose ".kepub.epub" for Kobo
".fxl.kepub.epub" is the extension for Kobo fixed layout
Select ".mobi" to convert epub to mobi at Kindlegen.exe
Select ".mobi + .epub" to output the unconverted epub file at the same time.
* Use title for output file name
"[author's name] Title. epub" file name.
If neither is set, "Original Filename .epub" is output.
* ePub file overwrite
If a file with the same name (Original Filename .epub) already exists, it is overwritten and output.
If unchecked, the same files will not be converted.

#### Destination
* Destination
Select "Same as Input" to output to the path of the input file.
Sets the full path when specifying the output destination in "path specification".

#### Conversion Setup
* Bookmark ID output
Set the p tag of the line to the id (kobo. 1.1 format) for the bookmark in Kobo's kepub.
It is not required in environments other than Kobo's kepub.
* 4-byte character conversion
If unchecked, 4-byte characters will be converted to = and the note will be displayed in small letters at the end.
(There is a problem with Kobo not displaying more than 4 byte characters in a line.)
In Reader, 4-byte JIS characters are displayed.
(However, those Kanji characters that cannot be displayed are displayed together with kanji characters and the annotation of small letters is not displayed.)
* Vertical Horizontal
Specifies vertical or horizontal text flow in the body.

#### Transform
* input character code
Specifies the character code of the Aozora Bunko file to be entered. This is usually MS 932 (SJIS).
* File Selection
Selecting a file converts it as if you were dragging and dropping it into the text area.
* Pre-conversion Check
Displays a dialog where you can review and edit the title, author, and cover page before converting.
Metadata is created with the modified title and author name.
The title or style of the message is not changed.
You can specify that the cover page is trimmed and the original image is retained.

----
#### Picture Setting 1
* illustration exclusion
Does not display illustration images in text and does not store them in ePub files. The cover page and external character image are output.
* screen size
Use to determine the screen aspect ratio and when not enlarging a small image
* Cover Size
The cover image is reduced to be smaller than this size.
* image magnification
Specifies a percentage of the width of the image, relative to the number of pixels in the image and the screen resolution.
*If the aspect ratio changes due to screen rotation, the image may protrude downward.
* image wrap
Arrange the image so that the characters wrap around the top and bottom of the image in the text.
Only images smaller than the specified image size are displayed.
* Image Single Page
Sets the size of the image to be made into a single page by inserting a page break before and after the image in the text
Output as a page displaying only images
* Thumbnails view
Images smaller than the screen size are enlarged to fit the height or width of the screen.
#### Picture Setting 2
* Jpeg Image Quality When Reduced
Jepg compression parameter for scaling 100 is highest quality
* image reduction rotation
Reduces the image to less than or equal to the number of pixels (The scaling algorithm is Bicubic.)
Set when terminal size is limited
You can also set it to rotate according to the aspect ratio of the image and the screen.
* margin removal
Removes the top, bottom, left, and right margins from the image.
Available only for image only zip/rar files
Png is slightly larger due to the input/output time and compression rate.
(Recommended settings Horizontal: 15% Vertical: 10 ~ 20% White Level: 85 ~ 90% Margins Additional: 0.5% to 1.0%)

----
#### Advanced Settings
* processing full-width spaces in a sentence
? If there is a double-byte space after, etc., it will appear like a paragraph at the beginning of a line after the second line, so hide the space.
* blank line removal
Reduces the number of blank lines specified in one or more consecutive blank lines.
Leave at least one blank line starting with the last three lines of the header row.
If a maximum is specified, it is removed so that it is less than or equal to the empty line.
* Indent
If the line starts with "-" (< [If not, add a double-byte space at the beginning of the line.
* Automatic Tate-chu-yoko
2 half-width numbers and 2 ~ 3! characters! What? tiled vertically.
One digit, three digits,!? You can change a single character to Tate-chu-yoko or Tate-chu-yoko in the settings.
It is disabled when there is no double-byte character before or after it (Ignore spaces between) or in a horizontal note.
* comment output
Specifies how a comment block separated by a - line of at least 50 characters is displayed.
* forced page break
When enabled, forces a page break at the specified number of bytes.
The increase in the size and number of lines of each xhtml file in ePub prevents heavy processing in Reader, etc.
The page break will not occur if it is in a block note such as indentation.
Each line: Force page breaks on lines that exceed the specified number of bytes.
Blank line: Force page break if the number of blank lines exceeds the specified number of bytes.
Before heading: Forces a page break if the specified number of bytes is exceeded before the corresponding row in the table of contents heading.

----
#### TOC Settings
* Table of Contents Output
Maximum Characters: Sets the maximum number of characters for the table of contents name. If long characters are omitted, ... is appended.
Cover page: Prints a table of contents to the cover page. If there is no cover image, it is not output.
Join next line: If the chapter title is on the next line, the character on the next line of the heading is joined to the name of the table of contents.
Suppress Consecutive Headings: Prevents headings that are automatically extracted from the table of contents page, etc., from being included in the table of contents.
* Table of Contents Extraction
After Page Break: Adds the first line of characters to the table of contents after the page break.
Note: Adds the text in the selected heading note to the table of contents. For block notes, only the following lines (Two lines when connected)
Chapter headings: Automatically extract chapter names (numeric) and add them to the table of contents.
(Chapter ~/Chapter ~/Part ~/Part ~ Part ~ Section ~ Section/Chapter ~/Part ~/Chapter ~/Chapter/Prologue/Epilogue/Monologue/Introduction/Final Chapter/Interchapter/Change Chapter/Intermission)
Number Only: Adds a line containing only numbers to the table of contents.
Number + Header: Adds a line of numbers + spaces + header characters to the table of contents.
Numbers (in parentheses): Adds lines containing only numbers in parentheses to the table of contents. [] [] ()
Add a line of numbers (in parentheses) + headings: (Numeric) + spaces, etc + headings to the table of contents.
Other Pattern: Specify the table of contents extraction pattern as a regular expression. Compares to a string without leading and trailing blanks and note tags.

#### Style Settings
* Row Height
The height of a line, in characters. 1.8 leaves 0.8 characters between lines.
* Text Size
Specifies a scale factor to adjust the standard text size.
* Text margins (@ page margin)
Specifies the top, bottom, left, and right margins of the page.
* Text margins (html margin)
Specifies the top, bottom, left, and right margins of the page.
I use this because @ page doesn't work in Reader.

* voiced/semi-voiced character
You can choose to output as is or stack with the position specification.
It is invalid in ruby.
Except for Reader, Kobo and Kindle, it is not confirmed to work.

Usage CUI
------------
#### Running from the Command Line
Usage: java -cp AozoraEpub3.jar AozoraEpub3 \[-options] input _ files (txt, zip)

** Options * *
--h, --help
show usage
--i, --ini
Imports settings from the specified ini file (Other than command line options)
(Default value if no AozoraEpub3.ini file is specified)

--enc
Input file encoding \[MS 932] (default) [UTF -8]
--t
Title kind in text \[0: Title - Author Name] (default) [1: Author Name - Title] [2: Title - Author Name (subtitle preference)] [3: Title Only] [4: None]
--c, --cover
Cover image \[0: First illustration] [1: Same image as file name] [File name or URL]
--tf
Use Input File Name as Title

--d, --dst
Destination Path
--ext
Output File Extension \[.epub] (default) [.kepub.epub]
--of
Match output file name to input file name

File Description
------------
#### Program Files
* AozoraEpub3.jar
ePub 3 conversion tool
Double click or "java -jar AozoraEpub3.jar"
* AozoraEpub 3.ico
Specify this icon when creating a shortcut (jar, cannot be set)
* External libraries
Used external libraries (commons-cli, commons-compress, Velocity, JAI) are specified in build.gradle file.

#### ePub 3 template
* template/*
ePub 3 template
* template/item/style/*. css
ePub 3 style

#### Conversion Configuration File
* chuki_tag_suf.txt
Convert Forward Lookup Notes to Start End Notes
* chuki_tag.txt
Convert notes to ePub tags
* chuki_alt.txt
Convert Private Text Notes to Alternate Text
* chuki_utf.txt
Convert Private Character Notes (No code) to UTF -8 characters
* chuki_ivs.txt
Convert Private Character Notes (No code) to UTF -8 characters with IVS
* chuki_latin.txt
Convert Latin Text Notes to UTF -8
* replace.txt
character substitution configuration file

#### Web novel configuration file
* web/domain _ name/extract.txt
Wev novel extract definition file

#### Private Character Font File
* gaiji/*
Displays your private characters in their corresponding fonts by placing a single-character font file.

corresponding note
------------
#### Configuration file for basic notes
See chuki_tag.txt

*Response status by model
Horizontal annotations not supported on Kindle

#### Exceptionally Programmed
— Middle left and right of the page
-[#注記付き] Convert [# Noted] ○ [# end with "△" note] and [# "○" and "△" readings] to | ○ < △ >
- [#「○」に×傍点] - > Convert to the same number of ruby characters
- Suppress[#ここで字下げ終わり]for continuous indentation
- Indentation calculation with indentation and indentation numerically
[#ここから○字下げ、折り返して●字下げ][#ここから○字下げ、●字詰め]
- Indentation compound combines classes (Ruled, centered)
- Images [#説明(ファイル名.拡張子)] [# Description (file name.extension)]
& lt; img src = "Filename"/& gt;
- Suppress horizontal text and automatic tate-chu-yoko
- Add New Line for Warichu
- Original text: Page break with (No preceding page break)

Supported Private and Special Characters
------------
* Code-converted Private Character Notes output as UTF -8 characters (UTF8 code, JIS code available)
Notes on External Characters of Aozora Bunko
※[#「さんずい+垂」、unicode6DB6] *[# "Sanzui and Tare", unicode6DB6]
※[#「さんずい+垂」、U+6DB6、235-7] *[# "Sanzui and Tare", U + 6 DB6, 235 -7]
※[#「さんずい+垂」、UCS6DB6、235-7] *[# "Sanzui and Tare", UCS6DB6, 235 -7]
※[#「てへん+劣」、第3水準1-84-77]*[# "Tehen + Rare", Level 3 1 -84 -77]
Code Only Private Character Notes
※[#U+845b] *[# U + 845 b]
- ※[#u+845b-e0100]* [# u + 845 b - e0100]
- ※[#U+845b-U+e0100] *[# U + 845 b - U + e0100]
— Gaiji notes with no code description
Converts note names to UTF -8 in the correspondence table (chuk _ utf.txt, chuki _ ivs.txt)
IVS characters can be set to output

* Gaiji notes not in UTF -8 output alternate characters (chuk _ alt.txt)

* Aozora Bunko Special Characters ([] [] <<>> | # *)
[#始め二重山括弧、1-1-52] *[# begin double angle bracket, 1 -1 -52]  →  《
※[#終わり二重山括弧、1-1-53]  *[# close double angle bracket, 1 -1 -53] →  》
※[#始め角括弧、1-1-46] *[# opening bracket, 1 -1 -46] → [
※[#終わり角括弧、1-1-47] *[# close bracket, 1 -1 -47] → ]
※[#始めきっこう(亀甲)括弧、1-1-44] *[# Opening bracket (turtle shell), 1 -1 -44] → 〔
※[#終わりきっこう(亀甲)括弧、1-1-45] *[# close (turtle shell) bracket, 1 -1 -45] → 〕
※[#縦線、1-1-35]  *[# Vertical Line, 1 -1 -35] → |
※[#井げた、1-1-84] *[# Igeta, 1 -1 -84] → #
※[#米印、1-2-8]  *[# rice sign, 1 -2 -8] → ※

* Output Chinese dots "/\" "/ ′ ′\" in UTF -8

original correspondence note
------------
- Circle from here.
- Separator line
— Empty line
- Center
- Center
— Strikethrough
- Double strikethrough - same as strikethrough
— Page left
- Page Left
— Bottom left of page
— Bottom left of page
- Masatate

Unaddressed Notes
------------
- Correction and "Mom." - > Ignore
- Left Ruby.
- Ground in line to next line
-2 Columns

Scheduled Updates and Revision History
------------
See README_Changes.txt

License
------------
- SourceCode and Binary
GPL v3 ( http://www.gnu.org/licenses/gpl-3.0.html )

- Converted Data
Copyright of converted ePub file will be the same as the input data.
modification and distribution of ePub files can be freely carried out in a copyright.