Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kiranandcode/html_gen
A simple HTML templating engine built using literate programming
https://github.com/kiranandcode/html_gen
html literate-programming rust template-engine
Last synced: 26 days ago
JSON representation
A simple HTML templating engine built using literate programming
- Host: GitHub
- URL: https://github.com/kiranandcode/html_gen
- Owner: kiranandcode
- License: mit
- Created: 2018-10-05T20:49:38.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2018-10-22T12:48:22.000Z (about 6 years ago)
- Last Synced: 2024-07-20T19:13:01.173Z (4 months ago)
- Topics: html, literate-programming, rust, template-engine
- Language: TeX
- Size: 376 KB
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: readme.org
- License: LICENSE
Awesome Lists containing this project
README
* HtmlGen
Does the following sound familliar to you?A: "All I want is to template some files for a static site."
B: "Oh, okay, that's simply, just install npm, and packages yarn,gulp,react,express........."
A: "..."
Introducing HTMLGEN, a simple little standalone static templating engine for HTML.
Coded using literate programming.Now with 50 page book describing how to make your own HtmlGen!
Available at: https://github.com/Gopiandcode/html_gen/raw/master/html_gen.pdf
* Usage
The command line interface for the program is as follows.
#+begin_src shell
Usage:
htmlgen [OPTIONS] [BASEDIR]Simple templating engine for html documents
Positional arguments:
BASEDIR The project directory. If not specified, then --input,
--template and --output flags must be given.Optional arguments:
-h,--help Show this help message and exit
-o,--output OUTPUT Directory for the output files to be saved. It defaults
to BASEDIR/bin
-t,--template TEMPLATE
Directory to be searched to find templates. It defaults
to BASEDIR/template
-i,--input INPUT Directory in which the source files to be compiled are
located. It defaults to BASEDIR/bin
-e,--error ERROR Fail on the first undefined parameter
-d,--default DEFAULT Additional mapping for storing default values. If not
specified, the environment variable GOP_HTML_DEFAULTS
if defined, is used as a default.
#+end_srcTo use the program, first write a templated html fiile, with '{NAME}' for parameters:
#+begin_src html{title}
{style}{title}
{body}
#+end_src
Save this in the template directory of your project - we'll store it in './template/blog/simple.gop'.Then construct a mapping file in the '.gop' format (whitespace insensitive):
#+begin_src
#+template blog/simple.gop
title: A statically generated site without any JS¬
body:
I built this site without any JS baby.
I can put divswithin this templated content
without issue¬
style:
¬
#+end_srcThen we can run the application as "html_gen ."; this will produce the following html:
#+begin_src htmlA statically generated site without any JS
A statically generated site without any JS
I built this site without any JS baby.
I can put divswithin this templated content
without issue#+end_src
* Preamble
While the overall nature of this project is quite simple - just a bit of file loading and exports, we can leverage rust's ecosystem to make our development a little easier.** Crates
The crates we'll be using are as follows:
- *ArgParse* - This is a crate that I used a while back when making another command line application. It provides a very nice rustic interface over a library which produces command line interfaces compliant with most unix/linux standards.
an old fashioned regex.
#+begin_src rust :tangle src/main.rs :comments org
extern crate argparse;
#+end_src- *Regex* - We'll be taking advantage of this regex crate to make the parsing phase a little easier; while the stdlib provides some pretty useful string matching utilities, they don't quite match up to
#+begin_src rust :tangle src/main.rs :comments org
extern crate regex;
#+end_src** Standard Library Imports
We'll be using the path utilities provided by the standard library to help us navigate the filesystem in a cross platform way.
#+begin_src rust :tangle src/main.rs :comments org
use std::env;
use std::path;
use std::fs::File;
use std::io::Read;
use std::path::Path;
use std::io::Write;
#+end_src
** Module structure
We'll be splitting up our codebase as follows:#+begin_src rust :tangle src/main.rs :noweb yes :comments org
<>
#+end_src* Command Line Interface
Clearly this project is going to be a command line application, as the static generator will need to parse a document and construct the components.Using argparse - as imported in the preamble, we'll design a sweet and sexy interface to access our application. The main actions we'll allow a user to perform using this application will be as follows:
- *specify output folder* - by default the output of the compiled files are placed in ~./bin/~ dir, which is made if it does not exist.
- *specify template folder* - within a non-templated file, when a template reference is used, by default the application searches the
~./template/~ dir to resolve these references.
- *specify input folder* - by default the program searches ~./src/~ for the source files to be compiled#+begin_src rust :tangle src/main.rs :comments org :noweb yes
fn main() {
<>
}
#+end_srcWe'll set up some initial variables to hold the parameters from the command line.
#+name: high level interface
#+begin_src rust :comments noweb
let mut output_path = String::from("");
let mut template_path = String::from("");
let mut input_path = String::from("");
let mut base_dir : Option = None;
#+end_srcWe'll also need to setup an error strategy - this will require some additional data structures, so we'll leave it to the end.
#+name: high level interface
#+begin_src rust :comments noweb :noweb yes
<>
#+end_srcUsing argparse, we can implement this cmdline interface as follows:
#+name: high level interface
#+begin_src rust :comments noweb :noweb yes
let mut help_string : Vec = Vec::new();
{
let mut ap = argparse::ArgumentParser::new();
ap.set_description("Simple templating engine for html documents");
ap.refer(&mut output_path)
.add_option(&["-o","--output"],
argparse::Store,
"Directory for the output files to be saved. It defaults to BASEDIR/bin");ap.refer(&mut template_path)
.add_option(&["-t","--template"],
argparse::Store,
"Directory to be searched to find templates. It defaults to BASEDIR/template");ap.refer(&mut input_path)
.add_option(&["-i","--input"],
argparse::Store,
"Directory in which the source files to be compiled are located. It defaults to BASEDIR/bin");
ap.refer(&mut base_dir)
.add_argument("BASEDIR",
argparse::StoreOption,
"The project directory. If not specified, then --input, --template and --output flags must be given. ");<>
ap.print_help("htmlgen", &mut help_string);ap.parse_args_or_exit();
}
#+end_srcBefore we do anything, let's get a copy of the help string generated by ~argparse~ for the program.
#+name: high level interface
#+begin_src rust :comments noweb :noweb yes
let help_string = unsafe { String::from_utf8_unchecked(help_string) };
#+end_srcAdditionally, we'll convert the unwritten values to options.
#+name: high level interface
#+begin_src rust :comments noweb :noweb yes
let mut output_path = if output_path.is_empty() { None } else { Some(output_path) };
let mut template_path = if template_path.is_empty() { None } else { Some(template_path) };
let mut input_path = if input_path.is_empty() { None } else { Some(input_path) };
#+end_srcFollowing this, we do some error checking to ensure that everything is suitably specified.
If the base directory is not specified, then all other parameters must be specified - otherwise we exit.
#+name: high level interface
#+begin_src rust :comments noweb :noweb yes
if base_dir.is_none() && (output_path.is_none() || template_path.is_none() || input_path.is_none()) {
println!("{}", help_string);
::std::process::exit(-1);
}
#+end_srcWith that done, we can safely extract the paths.
As specified, our output and template paths take default values from the supplied ~BASEDIR~.
#+name: high level interface
#+begin_src rust :comments noweb :noweb yes
let (output_path, template_path, input_path) = if let Some(bd) = base_dir {
let bd = Path::new(&bd);
let error_string = format!("{:?} is not a valid path", bd);
let alt_output_path = bd.join(Path::new(&"bin")).to_str().expect(&error_string).to_owned();
let alt_template_path = bd.join(Path::new(&"template")).to_str().expect(&error_string).to_owned();
let alt_input_path = bd.join(Path::new(&"src")).to_str().expect(&error_string).to_owned();let output_path = output_path.unwrap_or_else(|| alt_output_path );
let template_path = template_path.unwrap_or_else(|| alt_template_path );
let input_path = input_path.unwrap_or_else(|| alt_input_path );(output_path, template_path, input_path)
} else {
(output_path.unwrap(), template_path.unwrap(), input_path.unwrap())
};
#+end_src* Core Logic
Now we've obtained the directory for the files to be stored, we can move on to the main logic of the program.
Fundamentaly the logic of this program can be split into two main components:
- Recursively descending the source directory, keeping track of the file structure.
#+name: modules
#+begin_src rust :comments noweb
mod crawler;
#+end_src
- Extracting the data from a given file
#+name: modules
#+begin_src rust :comments noweb
mod parser;
#+end_src
- generate a compiled html file from the template and save it to a folder
#+name: modules
#+begin_src rust :comments noweb
mod generator;
#+end_src#+name: high level interface
#+begin_src rust :comments noweb
let output_directory = Path::new(&output_path);
let input_directory = Path::new(&input_path);
let template_directory = Path::new(&template_path);
#+end_srcThus the high level execution of the system is as follows.
First we update the error strategy.
#+name: high level interface
#+begin_src rust :comments noweb :noweb yes
<>
#+end_srcThen we run the crawler and print the output. Done.
#+name: high level interface
#+begin_src rust :comments noweb
println!("{:?}", crawler::crawl_directories(&output_directory, &input_directory, &template_directory, &err_strat));
#+end_src** Parser Logic
Before we begin, we'll need the following packages in our parser:
#+begin_src rust :tangle src/parser.rs :noweb yes :comments org
use std::collections::HashMap;
use regex::Regex;
<>
#+end_src
Once again, our core specification for the parser is to extract a set of key value pairs. Our syntax will be of the following form:
#+begin_src
ID := (Sigma/{:, (, )})+
INTRO := #+template: Sigma+\n
MAPPING := ID: ((SIGMA/{¬})|\¬)* ¬
DOCUMENT := INTRO MAPPING*
#+end_src
Our parser will take in a string (the contents of the file), and return either a hashmap of values and a template name, or an error.
#+begin_src rust :tangle src/parser.rs :noweb yes :comments org
<>pub fn parse_source_string(source: &str)
-> Result<(String, HashMap),ParseError> {
<>
<>
}#[cfg(test)]
mod test {
use super::*;<>
}
#+end_src
Where a parsing error will be one of the following:
- **Template not found** - if the source file does not specify a template to be loaded
- **Invalid identifier** - if an identifier contains an invalid character.
- **Unterminated Body** - if a body does not have a valid terminator.
#+name: structures
#+begin_src rust :comments noweb
#[derive(Debug)]
pub enum ParseError {
TemplateNotFound,
InvalidIdentifier,
UnterminatedBody
}
#+end_src
For simplicity, we're making the parser as general as possible and opting to make failure as unlikely as possible.To do the parsing, first we start off by consuming the template directive, and failing if not present.
First, we check that the template contains a template directive - we're leaving resolving the template to a file to a later point.
#+name: source parsing code
#+begin_src rust :comments noweb
if !source.trim_left().starts_with("#+template:") {
return Err(ParseError::TemplateNotFound);
}
#+end_srcThis means that if a source does not start with a directive, its parsing will fail:
#+name: source parsing tests
#+begin_src rust :comments noweb
#[test]
fn must_start_with_template_directive() {
assert!(parse_source_string("temp-justkidding\n id:\n #+template:\n").is_err());
}
#+end_srcAfter this check, we can safetly consume the first part of the string.
#+name: source parsing code
#+begin_src rust :comments noweb
let source = source.trim_left().split_at(11).1;
#+end_srcNext, let's retrieve the actual template name - failing if it was not provided.
#+name: source parsing code
#+begin_src rust :comments noweb
let (raw_template_name, remaining_string) = split_at_pattern(source, "\n");
let template_name = raw_template_name.trim();
if template_name.is_empty() {
return Err(ParseError::TemplateNotFound);
}
#+end_srcThis also means that if a source does not provide a template name its parsing will fail:
#+name: source parsing tests
#+begin_src rust :comments noweb
#[test]
fn must_provide_template_name() {
assert!(parse_source_string("#+template: example\n").is_ok());
assert!(parse_source_string("#+template:\n").is_err());
assert!(parse_source_string("#+template: \n").is_err());
assert!(parse_source_string("#+template: \n \n").is_err());
assert!(parse_source_string("#+template: \t \n").is_err());
}
#+end_srcNow, our remaining task is to simply iterate through the remaining ~ID: DATA~ pairs, and accumulate these values into a hashmap - let's begin
by setting up an initial hashmap to store the files.
#+name: source parsing code
#+begin_src rust :comments noweb
let mut data : HashMap = HashMap::new();
#+end_src
Next, we'll define a simple loop to do the accumulation - it will use a reference to the hashmap, and the source:
#+name: source parsing code
#+begin_src rust :comments noweb :noweb yes
let mut completed = false;
let mut source = remaining_string;
let mut data = data;while !completed {
<>
}
#+end_src
To extract the keys and bodies, we'll be using a regex - it checks that the start of the string consists of non terminator characters,
followed by a colon.
#+name: source parsing regexes
#+begin_src rust :comments noweb :noweb yes
let key_regex = Regex::new("^[^¬:{}\\\\]*:").unwrap();
#+end_srcNow, inside the loop, we'll use the regex to extract the key values - for this purpose, we'll define a custom ~split_by_regex~ function,
which operates like the ~split_at_pattern~ function but uses the first match of a regex to split the input.#+name: source parsing utility functions
#+begin_src rust :comments noweb
fn split_at_regex<'a>(string: &'a str, pat: &Regex) -> (&'a str, &'a str) {
if let Some(m) = pat.find(string) {
string.split_at(m.end())
} else {
(&"", string)
}
}
#+end_src
Now, using this function, we can implement the key extraction.#+name: source pairs loop
#+begin_src rust :comments noweb
let (raw_key_name, remaining_string) = split_at_regex(source, &key_regex);
let key_name = raw_key_name.trim();
source = remaining_string;
#+end_srcNow due to the way we're extracting the values, bad input may lead to an incorrect parse - we'll try and avoid this by printing an error when the IDs are wrong:
#+name: source pairs loop
#+begin_src rust :comments noweb
if key_name.len() == 0 {
eprintln!("Invalid parse, found empty/malformed ID tag");
return Err(ParseError::InvalidIdentifier);
}
#+end_src
Due to the way we extract the ids, we also end up bringing the colon as well. Let's just remove it before proceeding:
#+name: source pairs loop
#+begin_src rust :comments noweb
let mut key_name = key_name.to_string();
key_name.pop();
let key_name = key_name.trim();
#+end_srcNow we can move on to extracting the data. Let's start by defining a regular expression to isolate specific syntax we wish to capture.
#+name: source parsing regexes
#+begin_src rust :comments noweb
let data_regex = Regex::new("^(\\\\¬|([^¬\\\\]|\\\\[^¬])*)*¬").unwrap();
#+end_srcThe regex we're using can be explained as follows; the outermost kleene closure captures the main constraint that the data should start from the start of the string and end at the first occurrance
of a terminating character.
#+begin_src regex
^ INTERNALS *¬
#+end_srcNext, for the contents of a body, we have to capture 2 main cases:
- When the character is normal and non interesting
- When the character is an escaped terminator.
#+begin_src regex
INTERNALS ::= (ESCAPED_TERMINATOR|NORMAL_CHARACTERS)
#+end_srcFor the escaped terminator case, we simply match on a backspace followed by a terminator.
#+begin_src regex
ESCAPED_TERMINATOR = \¬
#+end_srcIn the case of normal characters, either
- the character is neither a backslash or a terminator
- the character is a backslash and is followed by anything other than a terminator
#+begin_src regex
NORMAL_CHARACTERS = ([^¬\\\\]|\\\\[^¬])*
#+end_srcUsing this regex we can trivially extract the data, repeating the code for key extraction.
#+name: source pairs loop
#+begin_src rust :comments noweb
let (raw_data, remaining_string) = split_at_regex(source, &data_regex);
let src_data = raw_data.trim();
source = remaining_string;
#+end_srcWhile it is fine for data to be empty, we always require the user to provide the end character, so the string should never be 0.
#+name: source pairs loop
#+begin_src rust :comments noweb
if src_data.len() == 0 {
eprintln!("Invalid parse, found body with no terminating tag.");
return Err(ParseError::UnterminatedBody);
}
#+end_srcNow, as before, let's remove the terminating character.
#+name: source pairs loop
#+begin_src rust :comments noweb
let mut src_data = src_data.to_string();
src_data.pop();
let src_data = src_data.trim();
#+end_srcFinally, now we've extracted the id and the tag, we can simply put the values into our hashmap.
#+name: source pairs loop
#+begin_src rust :comments noweb
data.insert(key_name.to_string(), src_data.to_string());
#+end_srcNow, we also need to check for a terminating condition - we'll do this by checking if the remaining string, when trimmed, is empty.
#+name: source pairs loop
#+begin_src rust :comments noweb
if source.trim().is_empty() {
break;
}
#+end_srcFinally, now that string has been consumed, we can simply return the template name and the populated hashmap.
#+name: source parsing code
#+begin_src rust :comments noweb :noweb yes
Ok((template_name.to_string(), data))
#+end_srcAside: Notice, that during the parsing, we're using our own custom function to allow us to split by a pattern, a feature the
stdlib doesn't seem to provide.This utility function splits a string by the first occurance of a pattern returning a string up to the first occurrance
of the pattern and a string continuing from the pattern - the second string contains the text matching the pattern.
#+name: source parsing utility functions
#+begin_src rust :comments noweb
fn split_at_pattern<'a>(string: &'a str, pat: &str) -> (&'a str, &'a str) {
if let Some(ind) = string.find(pat) {
string.split_at(ind)
} else {
(&"", string)
}
}
#+end_src** Generator Logic
The generator takes in an input templated string and an associated mapping and returns a string in which the templates have been filled - it also takes in a paramter dictating how to respond to ill formed strings.We'll be importing the following libraries to make this thing work.
#+name: generator imports
#+begin_src rust :comments org
use std::collections::HashMap;
use regex::{Regex, Captures};
#+end_srcThe generator module follows the standard pattern.
#+begin_src rust :tangle src/generator.rs :noweb yes :comments org
<>
<>
<>
<>#[cfg(test)]
mod tests {
use super::*;<>
}
#+end_srcThe main utility provided by the generator is the main function that populates the templated string when given a mapping, additionally we must specify how the generator should respond when missing templates are found.
#+name: generator function
#+begin_src rust :comments org :noweb yes
pub fn generate_output(input: String, mapping: HashMap, fail_response: &GeneratorErrorStrategy) -> Result {
<>
}
#+end_srcThe strategies the generator should accept are:
- *Fail* - Error out if a parameter that is not in the mapping is found in the template; this is the default.
- *Ignore* - ignore any missing parameters.
- *Fixed* - replace any missing parameters with a fixed response
- *Default* - try a default mapping for the keyword, otherwise try one of the other strategies.
To implement this, we'll use two structures, one to represent the non-recursive cases, and the other for the default option.
#+name: generator structures
#+begin_src rust :comments org
#[derive(Clone,Debug,PartialEq)]
pub enum GeneratorErrorCoreStrategy {
Fail,
Ignore,
Fixed(String)
}
#+end_srcThus for the full enum, we can avoid having to mess with boxes.
#+name: generator structures
#+begin_src rust :comments org
pub enum GeneratorErrorStrategy {
Base(GeneratorErrorCoreStrategy),
Default(HashMap, GeneratorErrorCoreStrategy)
}
#+end_srcNow, the errors the templating function can return are partially based on the error response strategies.
- *Undefined Parameter* - An error when a paremeter with no mapping is found, and the strategy is sufficiently strict.
#+name: generator structures
#+begin_src rust :comments org
#[derive(Debug)]
pub enum GeneratorError {
UndefinedParameter
}
#+end_srcThe core logic of the generator is to use capture groups capabilities provided by the regex crate.
We'll reuse the same pattern as used in the parser, but wrap it in braces and capture the contents.
#+name: generator logic
#+begin_src rust :comments org
let parameter_regex = Regex::new("\\{([^¬:{}\\\\]*)\\}").unwrap();
#+end_srcBefore we run the regex, we'll need to set up some variables to capture lookup errors.
#+name: generator logic
#+begin_src rust :comments org
let mut lookup_failed = false;
#+end_srcNext, we'll run the regex on the input string.
#+name: generator logic
#+begin_src rust :comments org :noweb yes
let new_string = parameter_regex.replace_all(&input, |caps: &Captures| {
<>
});
#+end_srcIf a lookup failed, then we'll return an error.
#+name: generator logic
#+begin_src rust :comments org
if lookup_failed {
return Err(GeneratorError::UndefinedParameter);
}
#+end_srcOnce that's done we have the result string - it's a ~Cow~ though, so we just need to do a conversion before returning it.
#+name: generator logic
#+begin_src rust :comments org
Ok(new_string.to_string())
#+end_srcAll that's left is to define the replacement logic - if it matches, we can simply return the value stored in the hashmap.
#+name: generator replacement logic
#+begin_src rust :comments org :noweb yes
if let Some(value) = mapping.get(&caps[1]) {
value
} else {
<>
}
#+end_srcIf the lookup failes, our action depends on the error strategy we've chosen.
#+name: generator lookup failed
#+begin_src rust :comments org :noweb yes
match &fail_response {
GeneratorErrorStrategy::Base(strategy) => {
<>
}
GeneratorErrorStrategy::Default(mapping, strategy) => {
<>
}
}
#+end_srcFor the base case, we simply match on the specific strategy chosen to decide our action.
#+name: generator base strategy match
#+begin_src rust :comments org :noweb yes
match strategy {
GeneratorErrorCoreStrategy::Fail => {
<>
}
GeneratorErrorCoreStrategy::Ignore => {
<>
},
GeneratorErrorCoreStrategy::Fixed(text) => {
<>
}
}
#+end_srcIf the strategy is a fail fast case, then we still return an empty string, but we set the lookup failed
error, thereby ensuring that the result of the call is an error.
#+name: generator strategy fail case
#+begin_src rust :comments org
lookup_failed = true;
""
#+end_srcIf the strategy is an ignore case, we simply leave the parameter as it was.
#+name: generator strategy ignore case
#+begin_src rust :comments org
&caps[0]
#+end_srcFor the fixed case, we just return the fixed string.
#+name: generators strategy fixed case
#+begin_src rust :comments org
text
#+end_srcNow, for the default mapping case, we first check if the default mapping contains a value for the
parameter. If it does, we can simply return that value.
#+name: generator default strategy
#+begin_src rust :comments org :noweb yes
if let Some(value) = mapping.get(&caps[1]) {
value
} else {
<>
}
#+end_srcIf it doesn't, we simply match on the error strategy as previous.
#+name: generator default fail strategy
#+begin_src rust :comments org :noweb yes
<>
#+end_src** Crawler Logic
The core logic for the crawler is to descend the input directory, keeping track of the current path, pass each file through the parser, then pass on the generated mapping to the generator, along with a corresponding template file and output file.We'll be importing the following libraries for doing the core logic.
#+name: crawler imports
#+begin_src rust :comments org
use std::fs;
use std::io::Read;
use std::fs::File;
use std::path::Path;
use std::convert::AsRef;
#+end_srcWe'll also be bringing in the parsing function from the parser, and the generator function from the generator.
#+name: crawler imports
#+begin_src rust :comments org
use parser::{parse_source_string,ParseError};
use generator::{generate_output, GeneratorError, GeneratorErrorStrategy};
#+end_srcThe main structure for the crawler is as follows.
#+begin_src rust :tangle src/crawler.rs :noweb yes :comments org
<><>
<>
#+end_srcOur crawling function, takes as input the input directory, the output directory, the template directory and the error strategy for the generator.
#+name: crawler function
#+begin_src rust :noweb yes :comments org
pub fn crawl_directories(
output_directory: &P,
input_directory: &Q,
template_path: &R,
err_strat: &GeneratorErrorStrategy
) -> Result
where P : AsRef,
Q : AsRef,
R : AsRef {
<>
}
#+end_srcThe errors produced by the crawler are as follows.
- *ParseError* - When a parser occurs
- *GeneratorError* - when a generator occurs
- *TemplateNotFound* - When a template is not found
- *InputDirectoryError* - When the input directory does not exist
- *OutputDirectoryError* - When the output directory does not exist
#+name: crawler structures
#+begin_src rust :noweb yes :comments org
#[derive(Debug)]
pub enum CrawlError {
ParseError(ParseError),
GeneratorError(GeneratorError),
TemplateNotFound(String),
InputDirectoryError,
OutputFileError(String),
InputFileError(String),
}
#+end_srcBefore we begin, let's set up a counter to enumerate the number of files converted.
#+name: crawler main logic
#+begin_src rust :noweb yes :comments org
let mut file_count = 0;
#+end_srcFirst, we'll extract all the files in the input directory.
#+name: crawler main logic
#+begin_src rust :noweb yes :comments org
let input_files = input_directory.as_ref()
.read_dir()
.map_err(|_|
CrawlError::InputDirectoryError
)?;
for input_file in input_files {
<>
}
#+end_srcFor each file, we need to check its metadata.
#+name: crawler file logic
#+begin_src rust :noweb yes :comments org
let input_file = input_file.map_err(|e| CrawlError::InputFileError(format!("{:?}", e)))?;
let input_metadata = input_file.metadata().map_err(|e| CrawlError::InputFileError(format!("{:?}", e)))?;
let input_file_name = input_file.file_name();
let input_file_path = input_file.path();
let input_file_extension = input_file_path.extension().and_then(|ext| ext.to_str());
let input_file_base = input_file_path.file_stem().and_then(|stem| stem.to_str());
#+end_srcNow our next action is dependent on the type of entry - we'll need to do different things based on whether we find a file or a directory.
#+name: crawler file logic
#+begin_src rust :noweb yes :comments org
if input_metadata.is_dir() {
<>
} else if input_metadata.is_file() && (input_file_extension == Some("gop")) && (input_file_base.is_some()) {
<>
} else {
eprintln!("WARN: Encountered a non-template file (or non unicode path) during crawling the input directory {:?}", input_file);
}
#+end_srcNow, if the file is a directory, we do a recursive call, appending the directory name to the input path and output path
#+name: crawler directory logic
#+begin_src rust :noweb yes :comments org
let dir_name = Path::new(&input_file_name);
let new_output_dir = output_directory
.as_ref()
.join(&dir_name);
let new_input_dir = input_directory
.as_ref()
.join(&dir_name);#+end_src
Let's also make sure the new output directory actually exists to prevent any issues.
#+name: crawler directory logic
#+begin_src rust :noweb yes :comments org
fs::create_dir_all(&new_output_dir);
#+end_srcThen we can do the recursive step.
#+name: crawler directory logic
#+begin_src rust :noweb yes :comments org
let n_count = crawl_directories(
&new_output_dir,
&new_input_dir,
template_path,
err_strat
)?;
file_count += n_count;
#+end_srcOn the other hand, if the file is just a file, we first need to read the file.
#+name: crawler input file logic
#+begin_src rust :noweb yes :comments org
let input_text = {
let mut temp = String::new();
let mut file = File::open(input_file.path()).map_err(|e| CrawlError::InputFileError(format!("{:?}", e)))?;
file.read_to_string(&mut temp).map_err(|e| CrawlError::InputFileError(format!("{:?}", e)))?;
temp
};
#+end_srcNow we'll run the parser on this text.
#+name: crawler input file logic
#+begin_src rust :noweb yes :comments org
let (template_name, mapping) = parse_source_string(&input_text).map_err(|e| CrawlError::ParseError(e))?;
#+end_srcNow we need to read the template to a string.
#+name: crawler input file logic
#+begin_src rust :noweb yes :comments org
let template_path = template_path.as_ref().join(&Path::new(&template_name));
let template_text = {
let mut temp = String::new();
let mut file = File::open(template_path).map_err(|e| CrawlError::TemplateNotFound(format!("{:?}", e)))?;
file.read_to_string(&mut temp).map_err(|e| CrawlError::TemplateNotFound(format!("{:?}", e)))?;
temp
};
#+end_srcWith the template and the mapping, we can run the generator.
#+name: crawler input file logic
#+begin_src rust :noweb yes :comments org
let result = generate_output(
template_text,
mapping,
err_strat
).map_err(|e| CrawlError::GeneratorError(e))?;
#+end_srcBefore we write this to the output directory, we need to construct a new name for the file.
#+name: crawler input file logic
#+begin_src rust :noweb yes :comments org
let input_file_base = input_file_base.unwrap();
let mut new_file_name = String::from(input_file_base);
new_file_name.push_str(".html");
#+end_srcFinally, we can write this to the output directory.
#+name: crawler input file logic
#+begin_src rust :noweb yes :comments org
let output_path =
output_directory.as_ref().join(&Path::new(&new_file_name));
fs::write(&output_path, result)
.map_err(|e| CrawlError::OutputFileError(format!("{:?}", e)))?;
file_count += 1;
#+end_src#+name: crawler main logic
#+begin_src rust :noweb yes :comments org
Ok(file_count)
#+end_src* Error Strategy
Now for the final part of the application - implementing the error strategy from before.Before we do anything, we'll need to extend the capabilities of a prior structure - specifically the GeneratorErrorCoreStrategy, and
the capability to parse the element from a string.
#+name: generator structures
#+begin_src rust :comments org :noweb yes
impl FromStr for GeneratorErrorCoreStrategy {
type Err = ();
fn from_str(src: &str) -> Result {
return match src {
"fail" => Ok(GeneratorErrorCoreStrategy::Fail),
"ignore" => Ok(GeneratorErrorCoreStrategy::Ignore),
x => {
if let Some(ind) = src.find("=") {
if ind + 1 < src.len() {
let (txt, rem) = src.split_at(ind+1);
if txt == "fixed=" {
Ok(GeneratorErrorCoreStrategy::Fixed(rem.to_string()))
} else {
Err(())
}
} else {
Err(())
}
} else {
Err(())
}
},
};
}
}
#+end_srcAs you can see, we're referencing the ~FromStr~ trait which we'll need to import.
#+name: generator imports
#+begin_src rust :comments org
use std::str::FromStr;
#+end_srcNow let's just quickly add some tests to verify this actually works.
#+name: generator tests
#+begin_src rust :comments org :noweb yes
#[test]
fn from_st_works() {
assert_eq!(GeneratorErrorCoreStrategy::from_str("ignore"), Ok(GeneratorErrorCoreStrategy::Ignore));
assert_eq!(GeneratorErrorCoreStrategy::from_str("fail"), Ok(GeneratorErrorCoreStrategy::Fail));
assert_eq!(GeneratorErrorCoreStrategy::from_str("fixed=missing"), Ok(GeneratorErrorCoreStrategy::Fixed("missing".to_string())));
}
#+end_srcOkay, now onto the topic of determining an error response strategy.
We'll be doing this by splitting the concerns into two separate components - first identifying the core strategy and then identifying
the use of a default strategy or not.First for the core strategy, we'll set a default and then populate it.
#+name: high level error strategy
#+begin_src rust :comments org :noweb yes
let mut opt_strat = generator::GeneratorErrorCoreStrategy::Fail;
#+end_srcUsing the from string implementation we described earlier, we can parse this as follows.
#+name: high level error args
#+begin_src rust :comments org :noweb yes
ap.refer(&mut opt_strat)
.add_option(&["-e", "--error"],
argparse::Store,
"Fail on the first undefined parameter");
#+end_srcFor the default strategy we'll be using an optional value which we'll try and populate. If it isn't populated then we'll know that
there is no default strategy.
#+name: high level error strategy
#+begin_src rust :comments org :noweb yes
let mut def_strat = None;
#+end_srcOnce again, for the default we'll just try and populate the string.
#+name: high level error args
#+begin_src rust :comments org :noweb yes
ap.refer(&mut def_strat)
.add_option(&["-d", "--default"],
argparse::StoreOption,
"Additional mapping for storing default values. If not specified, the environment variable GOP_HTML_DEFAULTS if defined, is used as a default.");
#+end_srcIf none was provided we'll try and retrieve it from the environment under the key ~GOP_HTML_DEFAULTS~.
#+name: high level error update
#+begin_src rust :comments org :noweb yes
let def_strat = def_strat.or_else(|| env::var("GOP_HTML_DEFAULTS").ok());
#+end_srcFinally, we can construct the error strategy based on whether a default is provided.
#+name: high level error update
#+begin_src rust :comments org :noweb yes
let err_strat = match def_strat {
None =>
generator::GeneratorErrorStrategy::Base(opt_strat),
Some(path) => {
let mapping = {
<>
};
match mapping {
Some((name, map)) =>
generator::GeneratorErrorStrategy::Default(map, opt_strat),
None => {
eprintln!("Encountered error while reading default mapping at {:?}.", path);
generator::GeneratorErrorStrategy::Base(opt_strat)
}
}
}
};
#+end_srcNow all we've got to do is retrieve the mapping.
#+name: error strategy retrieve mapping
#+begin_src rust :comments org :noweb yes
let def_path = Path::new(&path);
if let Ok(mut file) = File::open(&def_path) {
let mut def_source = String::new();
if let Ok(_count) = file.read_to_string(&mut def_source) {
parser::parse_source_string(&def_source).ok()
} else {
None
}
} else {
None
}
#+end_src