Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
https://github.com/howerj/pickle

Improvements to picol: A TCL like interpreter suitable as an shell in an embedded system
https://github.com/howerj/pickle
c embedded interpreter picol tcl
Last synced: about 2 months ago
JSON representation
Improvements to picol: A TCL like interpreter suitable as an shell in an embedded system
Host: GitHub
URL: https://github.com/howerj/pickle
Owner: howerj
License: bsd-2-clause
Created: 2018-10-02T17:51:21.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2023-06-27T20:09:16.000Z (12 months ago)
Last Synced: 2024-01-29T06:17:51.678Z (5 months ago)
Topics: c, embedded, interpreter, picol, tcl
Language: C
Homepage:
Size: 511 KB
Stars: 22
Watchers: 6
Forks: 3
Open Issues: 0
Metadata Files:
- Readme: readme.md
- License: LICENSE
Lists

AwesomeInterpreter - pickle
README

        % pickle(1) | A Small TCL like interpreter

# NAME

PICKLE - A Small and Embeddable TCL like interpreter and library

# SYNOPSES

pickle files...

pickle

# DESCRIPTION

	Author:     Richard James Howe / Salvatore Sanfilippo

	License:    BSD

	Repository: 

	Email:      [email protected]

	Copyright:  2007-2016 Salvatore Sanfilippo

	Copyright:  2018-2020 Richard James Howe

This is a copy, and modification, of a small interpreter written by Antirez in

about 500 lines of C, this interpreter is for a small TCL like language. The

blog post describing this interpreter can be found at

, along with the code itself at

. It does a surprising amount for such a

small amount of code. This project is a little bit bigger than the original at

around ~6000 lines.

## LICENSE

The files [pickle.c][] and [pickle.h][] are licensed under the 2 clause

[BSD License][], as are all the other files in this project.

## BUILDING

To build you will need a [C][] compiler and [Make][].

Type 'make' to build the executable 'pickle' (or 'pickle.exe') on Windows. To

run type 'make run', which will drop you into a pickle shell. 'make test' will

run the built in unit tests and the unit tests in [shell][].

## RUNNING

To run the project you will need to build it, the default makefile target will

do this, type:

	make

Or:

	make pickle

This will build the pickle library and then link this library with an example

program contained in [main.c][]. This example program is very simple and adds a

few commands to the interpreter that do not exist in the library ("gets", "puts",

"clock", "getenv", "exit", "source", "clock" and "heap"), this is the minimal

set of commands that are needed to get a usable shell up and running and do

performance optimization.

The executable 'pickle' that is built is quite simple, it executes all arguments

given to it on given to it on the command line as scripts. There are no

options, flags or detection of an interactive session with [isatty][]. This

makes usage of the interpreter in interactive sessions challenging, instead,

the language itself can be used to define a shell and process command line

arguments. This is done in a program called '[shell][]'. As mentioned it

contains the unit tests for the project, as well as other subprograms, and most

importantly it contains the interactive shell that reads a line at a time and

prints the result.

The code in [shell][] is quite large, if you do not want to use it an

incredibly minimal shell can be defined with the code:

	#!./pickle

	set r 0

	while { } {

		puts -nonewline "pickle> "

		set r [catch [list eval [gets]] v]

		puts "\[$r\] $v"

	}

For those experienced with [TCL][] some differences in the 'while' and 'gets'

command should be apparent.

## The Picol Language

The internals of the interpreter do not deviate much from the original

interpreter, so the document on the [picol][] language still applies. The

language is like a simplified version of [TCL][], where everything is

a command and the primary data structure is the string.

Some programmers seem to an obsessive interest in their language of choice,

do not become one of those programmers. This language, like any other

language, will not solve all of your problems and may be entirely unsuitable

for the task you want to achieve. It is up to you to evaluate whether this

language, and implementation of it, is suitable.

Language and implementation advantages:

* Small and compact, easy to integrate into a wide variety of platforms and

programs. (4000 [LoC][] is the limit for the core library in [pickle.c][]).

* Fairly good at string handling.

* Can be ported to a variety of platforms. There are few system dependencies

and the design of the program allows it to be ported to an embedded platform.

* Customizable.

* Suitable as a command language and shell.

Disadvantages:

* Everything is a string (math operations and data structure manipulation

will be slow).

* This is a Do-It-Yourself solution, it may require that you modify the library

itself and will almost certainly require that you define your own new commands

in [C][].

* Lacks Unicode/UTF-8 support.

* The language interpreter is not well tested and is likely to be insecure. If

you find a bug, please report it. It is however better tested that most

(re)implementations or extensions to the [picol][] interpreter and far less

likely to segfault or crash if you misuse the interpreter.

Potential Improvements:

* Many, many more commands could be written in order to make this interpreter

more usable.

* The following small libraries can be used to either extend or modify the

interpreter to suite your purposes:

  - UTF-8: 

  - Data Packing/Unpacking: 

  - Base-64: 

  - Line editing: 

  - Fixed point (Q16.16, signed) library: 

The interpreter is fairly small, on my 64-bit x86 machine the interpreter

weighs in at 100KiB (stripped, dynamically linked to C library, Linux ELF). The

interpreter can be configured so that it is even smaller, for example:

	On Debian Linux, x86_64, dyanmically linking against glibc, using

	gcc version 8.3.0, with version 4.1.4 of the interpreter:

	Size    Options/Notes

	100KiB  Normal target, optimized for speed (-O2).

	84KiB   No debugging (-DNDEBUG), optimized for speed (-O2).

	54KiB   No debugging (-DNDEBUG), 32-bit target (-m32), optimized

	        for size (-Os), stripped, No features disabled.

	34KiB   No debugging, 32-bit target, optimized for size, stripped,

	        with as many features disabled as possible.

	On Debian Linux, x86_64, statically linked against musl C library:

	147KiB  Normal target, optimized for speed, statically linked.

This is still larger than I would like it to be, the original picol interpreter

in the smallest configuration (32 bit target, optimized for size), comes in at

18KiB.

### The language itself

Picol, and [TCL][], are dynamic languages with only one real data type, the

string. This might seem inefficient but it is fine for a glue language whose

main purpose is to bind lots of things written in C together. It is similar to

[lisp][] in ways, it is [homoiconic][], and is simple with very little in the

way of syntax.

The following table sums up the different language constructs:

	string  called if first argument

	{ }     quote, used to prevent evaluation

	[ ]     command substitution

	" "     string

	$var    variable lookup

	\c      escape a character

	#       comment

	;       terminates a command

A Picol program consists of a series of commands and arguments to those

commands. Before a command is evaluated, variables are looked up and strings

substituted.

You may have noticed that things such as 'if' or 'while', and even procedure

definition, are not part of the languages syntax. Instead, they are built in

commands and are called like any other command.

Examples of commands:

	puts "Hello, World"

	"puts" "Hello, World"

	# prints "Hello, World"

	set cmd puts

	$cmd "Hello, World"

	# prints "Hello, World"

	set a pu

	set b ts

	$a$b "Hello, World"

	+ 2 2

	- 4 5

	if {bool 4} { puts "TRUE"}

	proc x {a b} { + $a $b }

	puts "x(3, 9) == [x 3 9]"

	# prints 4 to 10 inclusive

	set z 3

	while {< $z 10} { set z [+ $z 1]; puts $z }

To best understand the language, play around with it, and look at the source,

there really is not that much there.

### Internally Defined Commands

Picol defines the commands in this section internally, in a default build all

of the commands in this section will be available. There are some build options

to remove some commands (such as the string function, the math functions, and

the list functions).

The options passed to the command and type are indicated after the command, a

question mark suffix on an argument indicates an optional command, an ellipsis

indicates an optional series of arguments.

For some concrete examples of commands being run, see [unit.tcl][], which

contains [unit tests][] for the project.

* argv

'argv' is a variable, not a function, which should contain the arguments passed

to the pickle interpreter on the command line in a TCL list.

* set variable value?

Create a variable, or overwrite an existing variable, with a value. If only one

argument is given, it returns the value of that variable if it exists or an error

if it does not.

* if {condition} {true clause}  *OR*  if {condition} {true clause} else {false clause}

*if* is the command used to implement conditional execution of either one

clause, or one clause or (exclusive or) another clause. Like in every other

programming language ever (or more accurately the languages with more than one

user, the implementer).

* while {condition} {clause}

Keep executing the while clause whilst the condition is true (ie. is non-zero).

* break

Break out of a while loop. This will continue to break out of a things until

the return code is caught by a loop, or 'catch'.

* continue

Desist from executing the rest of the clause in a while loop, and go back to

testing the condition.

* proc identifier {argument list} {function body}

Create a new command with the name 'identifier', or function if you prefer,

with the arguments in 'argument list', and code to be executed in the 'function

body'. If the final command is not a 'return' then the result of the last

command is used.

There is a special case whereby the last argument in the argument list is

called 'args', if this is the case then the renaming arguments are concatenated

together and passed in to the function body. This allows variadic functions to

be created.

* return string? number?

Optionally return a string, optionally with an internal number that can affect

control flow.

* uplevel number strings...

Evaluate the 'strings...' in the scope indicated by 'number'. A special case

is '#0', which is the global context. The strings are concatenated together as

with if they have been run through the 'concat' command. A scope of 0 is the

current scope, of 1, the caller, of 2, the caller's caller, and so on. A '#'

prefix is meant to reverse the search and start from the global scope and work

down through the call stack, however only '#0' is supported.

* upvar number otherVar myVar

Form a link from myVar to otherVar in the scope specified by number. A

special case is '#0', which is the global context, see 'uplevel' for a

description of the scoping traversal rules implied by the number argument.

You may have noticed that 'upvar' and 'uplevel', which come from [TCL][], are

strange, very strange. No arguments from me.

* unset string

Unset a variable, removing it from the current scope.

* eval strings...

Concatenate a list of strings with a space in-between them, as with 'concat',

then evaluate the string, returning the result of the evaluation.

* apply {{arg-list} {body}} args

Applies an argument list to a function body, substituting the provided

arguments into the variables.

Examples:

	# Returns 4

	apply {{x} {* $x $x}} 2

	# Returns 7

	apply {{x y} {+ $x $y}} 3 4

It essential allows for anonymous functions to be made.

* mathematical operations

The following mathematical operations are defined:

'+', '-', '\*', '/', 'mod', '<', '<=', '>', '>=', '==', '!=',

'min', 'max', 'pow', and 'log'. It should be obvious what each one does.

It should be noted that because all variables are stored internally as strings,

mathematical operations are egregiously slow. Numbers are first converted to

strings, the operation performed, then converted back to strings. There are

also some bitwise operations; 'lshift', 'rshift', 'and', 'or', 'xor'. These

mathematical operations can accept a list integers. '&', '|' and '^' are

aliases for 'and', 'or' and 'xor' respectively. '&&' and '||' implement logical

'and' and 'or', *but all arguments are evaluated -- and it is not a bug!*.

There are also the following unary mathematical operators defined: 'not'/'!'

(logical negation), 'invert'/'~' (bitwise inversion), 'abs' (absolute value),

'bool' (turn number into a boolean 0 or 1), 'negate' (negate a number). '-' is

not defined as negate, as that symbol is already used for subtraction.

Numbers conversion is strict, an invalid number will not be silently converted

into a zero, or a string containing a part of a number will not become that

number, for example: "0", "-1" and "12" are valid numbers, whilst; "0a", "x",

"--2", "22x" are not.

* catch expr varname

This allows arbitrary codes to be caught, 'catch' evaluates an expression and

puts the return code into 'varname', the string returned is the result of the

evaluation of 'expr'.

* command item number *OR* command

This function is used to inspect the currently defined commands in the system.

If no arguments are given then the number of commands defined is returned. If

an item is given a number indicates which command that it applies to. Commands

are indexed by numbers. Defining new command may change the index of other

commands. Commands are either user defined or built in commands.

 - args: get a functions arguments (returns '{built-in pointer pointer}' for built in commands)

 - body: get a functions body (returns '{built-in pointer pointer}' for built in commands)

 - name: get a functions name

* join {list} string

Given a [TCL][] list, 'join' will flatten that list and return a string by

inserting a String in-between its elements. For example "join {a b c} ," yields

"a,b,c".

* conjoin string arguments\*

'conjoin' works the same as 'join' except instead of a list it joins its

arguments, for example:

	join {a b c} ,

	conjoin , a b c

Are equivalent.

* for {start} {test} {next} {body}

Implements a for loop.

* rename function-name new-name

Rename a function to 'new-name', this will fail if the function does not exist

or a function by the same name exists for the name we are trying to rename to.

A special case exists when the new-name is an empty string, the function gets

deleted.

* llength list

Get the length a list. A TCL list consists of a specially formatted string

argument, each element of that list is separated by either space or is a string

or quote. For example the following lists each contain three elements:

	"a b c"

	"a { b } c"

	"a \" b \" c"

The list is the basic higher level data structure in Pickle, and as you can

see, there is nothing special about them. They are just strings treated in a

special way. Processing these lists is incredibility inefficient as everything

is stored as a string - a list needs to be parsed before it can be manipulated

at all. This applies to all of the list functions. A more efficient, non-TCL

compatible, set of list functions could be designed, or the internals of the

library could be changed so they are more complex (which would help speeding

up the mathematical functions), but either option is undesirable for different

reasons.

* lindex list index

See 'llength'.

Index into a list, retrieving an element from that list. Indexing starts at

zero, the first element being the zeroth element.

* lrepeat number string

Repeat a string a number of times to form a list.

Examples:

	pickle> lrepeat 3 abc

	abc abc abc

	pickle> lrepeat 2 {x x}

	{x x} {x x}

* lset variable index value

Look up a variable containing a list and set the element specified by an index to

be equal to 'value'.

* linsert list index value

Insert a value into a list at a specified index, indices less than zero are

treated as zero and greater than the last element are appended to the end of

the list.

* lreplace list first last values...

Replace ranges of elements within a list, the function has a number of special

cases.

* lsort opts... list

This command sorts a list, it uses [insertion sort][] internally and lacks

many of the options of the full command. It does implement the following

options:

  - '-increasing' (default)

Sort the list in increasing order.

  - '-decreasing'

Sort the list in decreasing order.

  - '-ascii' (default)

The list is a series of strings that should be stored in [ASCII][] order.

  - '-integer'

The list is a series of numbers that should be sorted numerically.

* lreverse list

Reverse the elements in a list.

* lrange list lower upper

Extract a range from a list.

* lsearch opts... list pattern

The search command attempts to find a pattern within a list and if found it

returns the index as which the pattern was found within the list, or '-1' if

it was not found.

  - '-nocase'

Do a case insensitive search, beware this is ASCII only!

  - '-not'

Invert the selection, matching patterns that *do not* match.

  - '-exact'

Pattern is an exact string to search for.

  - '-integer'

The pattern is a number to search for.

  - '-glob' (default)

This subcommand uses the same [regex][] syntax (and engine) as the

'string match' subcommand, it is quite limited, and it is the default search

option.

  - '-inline'

Instead of returning the index, return the found element.

  - '-start index'

Start at the specified index instead of at zero.

* split string splitter

Split a string into a list, the value to split on is not a regular expression,

but a string literal. There is a special case where the value to split on is

the empty string, in this case it splits a string into a list of its

constituent characters.

* lappend variable values...

Append values to a list, stored in a variable, the function returns the newly

created list.

* list args...

Turn arguments into a list, arguments with spaces in them are quoted, the

list command returns the concatenation of the escaped elements.

* concat args...

Trim arguments before concatenating them into a string.

* reg opts... regex string

'reg' implements a small regular expression engine that can be used to extract

matches from text. It has a few options that can be passed to it, and a few

virtues; lazy, greedy and possessive.

 - -nocase

Ignore case when matching a string.

 - -start index

Set the start of the string to match from, numbers less than zero are treated

as zero, and numbers greater than the length of the string are treated as

referring to the end of the string.

 - -lazy

Match the shortest string possible.

 - -greedy (default)

Match the longest string possible.

 - -possessive

Match the longest string possible, with no backtracking. If backtracking is

necessary the match fails.

* unknown cmd args...

This command is *not* defined at startup, but can be defined by the user to

catch command-not-found exceptions.

When the interpreter encounters a command that has not been defined it attempts

to find the 'unknown' command and execute that. If it is not found, it performs

its default action, which is to throw an error and return an error string

indicating the command has not been found. If the 'unknown' command has been

found then it is executed with the command and its arguments being passed to

'unknown' as a list.

For example, defining:

	proc unknown {args} { system "$args" }

Would mean any command the interpreter does know know about will be executed by

the system shell, including its arguments, provided the *system* command is

defined.

If an unknown command is found within the unknown function then a generic error

message is returned instead.

* trace on *OR* trace off *OR* trace status

This command can be used to turn tracing on, off, or to query the status of

tracing. The [TCL trace command][] is quite powerful, this one is far more

limited.

* tracer cmd args...

This command *not* defined at startup, but can be defined by the user. This can

be used to trace the execution of the program.

The commands executed within *tracer* will not be traced.

* info subcommand args...

The 'info' command is used to query the status of the interpreter and supports

many subcommands. The subcommands that are supported are:

- commands match?

Match defaults to '\*'. Get a list of all defined commands filtered on 'match'.

- procs match?

Match defaults to '\*'. Get a list of all commands defined with 'proc' filtered

on 'match'.

- functions match?

Match defaults to '\*'. Get a list of all mathematical functions filtered on

'match'.

- locals match?

Match defaults to '\*'. Get a list of all defined locals filtered on 'match'.

- globals match?

Match defaults to '\*'. Get a list of all defined globals filtered on 'match'.

- level

Get the current 'level' of the interpreter, which is the degree of nesting or

scopes that exist relative to the top level scope. Entering a function

increases the level by one, for example.

- cmdcount

Get the number of commands executed since startup, this can be used as a crude

form of a performance counter if the command *clock* is not available.

- version

Return the version number of the interpreter in list format "major minor patch",

[semantic versioning](https://semver.org/) is used.

- complete line

Does the 'line' constitute a command that can be called (which may result in an

error)? Or 'does this line parse correctly'? "0" is returned if it cannot, "1"

is returned if it can.

- exists variable

Does 'variable' exist in the current scope, "0" is returned if it does not

whilst "1" is returned if it does.

- args name

Get the arguments of the named function. Functions that are defined in C will

returned the string 'built-in', otherwise a list is returned containing the

function arguments.

- body name

Get the body of the named function. Functions that are built in functions

defined in C will return a function pointer that represents that C function.

Functions defined with 'proc' will return the body of the function as a string.

- private name

Get the private data of a function.

- system attribute

The "system" subcommand is used to access various attributes that have

been set in the interpreter at compile time or due to the environment

that the system is compiled for.

Attributes that can be looked up are:

1. "pointer": size of a pointer in bits.

2. "number": size of a number in bits.

3. "recursion": recursion depth limit.

4. "length": maximum length of a string or -1 if string length is unlimited.

5. "min": minimum size of a signed number.

6. "max": maximum size of a signed number.

7. "string": are string operations defined?.

8. "maths": are math operations defined?.

9. "list": are list operations defined?.

10. "regex": are regular expression operations defined?.

11. "help": are help strings compiled in?.

12. "debugging": is debugging turned on?.

13. "strict": is strict numeric conversion turned on?.

#### String Operator

* string option arg *OR* string option arg arg *OR* string option arg arg arg

The 'string' command in [TCL][] implements nearly every string command you

could possibly want, however this version of 'string' is more limited and

behaves differently in many circumstances. 'string' also pulls in more standard

C library functions from '[ctype.h][]' and '[string.h][]'.

Some of the commands that are implemented:

  - string match -nocase? pattern String

This command is a primitive regular expression matcher, as available from

. What it lacks in functionality, safety and

usability, it makes up for by being only ten lines long (in the original). It

is meant more for wildcard expansion of file names (so '?' replaces the meaning

of '.' is most regular expression languages). '\\' is used as an escape

character, which escapes the next character.

The following operations are supported: '\*' (match any string) and '?' (match

any character). By default all patterns are anchored to match the entire

string, but the usual behavior can be emulated by prefixing the suffixing the

pattern with '\*'.

  - string trimleft  String Class?

If 'class' is empty, a white-space class is used. 'trimleft' removes leading

characters in a Class from the given String.

  - string trimright String Class?

If 'class' is empty, a white-space class is used. 'trimleft' removes trailing

characters in a class from the given String.

  - string trim      String Class?

If 'class' is empty, a white-space class is used. 'trimleft' removes both

leading and trailing characters in a class from the given String.

  - string length  String

Get the length of String. This is a simple byte length excluding an ASCII NUL

terminator.

  - string tolower String

Convert an ASCII String to lower case.

  - string toupper String

Convert an ASCII String to upper case.

  - string reverse String

Reverse a string.

  - string equal   String1 String2

Compare two strings for equality. Returns '1' if equal, '0' if not equal. This

comparison is case sensitive.

  - string compare String1 String2

Compare two strings.

  - string index   String Index

Retrieve a character from a String at the specified Index. The index starts at

zero for the first character up to the last character. Indices past the last

character return the last character. Negative indexes starting counting from

the last character (the last character being -1) and count downward, negative

indexes that go before the first character return the first character.

  - string is Class String

'is' determines whether a given String belongs to a Class. Most class tests

accept a zero length string as matching that class with a few exceptions. Most

class tests test that a class contains only certain characters (such as 'alpha'

which checks that a string only contains the characters 'a-z' and 'A-Z', or

'digit', which checks that a string only contains the characters '0-9'. Other

class tests test that a string matches a specific format, such as 'integer'

(which does not accept a zero length string), it excepts the string to contain

a decimal number with an optional '+' or '-' prefix.

Class can be:

    - [alnum][]

    - [alpha][]

    - [digit][]

    - [graph][]

    - [lower][]

    - [print][]

    - [punct][]

    - [space][]

    - [upper][]

    - [xdigit][]

    - ascii

    - [control][]

    - integer

Any other Class is invalid. Most classes are based on a C function (or macro)

available in the [ctype.h][] header.

  - string repeat String Count

Repeat a String 'Count' many times. 'Count' must be positive, inclusive of

zero.

  - string first Needle Haystack StartIndex?

Find a Needle in a Haystack, optionally starting from 'StartIndex'. The index

into the string where the first character of found of Needle in Haystack is

returned if the string has been found, negative one if it has not been found.

  - string ordinal String

Convert the first character in a string to a number that represents that

character.

  - string char Number

Convert a number to its character representation.

  - string hex2dec HexString

Convert a lower or uppercase hexadecimal number to its decimal representation.

  - string dec2hex Number

Convert a decimal number to its lowercase hexadecimal representation.

  - string hash String

Hash a string returning the hash of that string as a number.

  - string range String Index1 Index2

Create a sub-string from Index1 to Index2 from a String. If Index1 is greater

than Index2 an empty string is returned. If Index1 is less than zero, it is set

to zero, if Index2 is greater than the index of the last character, it is set

to the index of the last character. Indexing starts at zero and goes up to one

less than the strings length (or zero of empty string), which is the index of

the last character. The characters from Index1 to Index2 inclusive form the

sub-string.

 - string tr d set string *OR* string tr r set1 set2 string

Much like the Unix utility 'tr', this performs various translations given a set

(or two sets of characters). 'tr' can delete characters in the set of

characters in 'set' from 'string' if the option provided to it is 'd', or it

can perform a translation from one set to another if the 'r' specifier is

given. If the second set is larger than the first for the 'r' command the last

character applies to the rest of the characters in 'set2'.

Both 'r' and 'd' options can both have the additional specifier 'c', which

compliments the given 'set' or characters.

'r' can also have the 's' specifier, which will squeeze repeated characters in

the set

Example:

	proc lowercase {x} {

		string tr r abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ $x

	}

Which creates a function with the same functionality as 'string lowercase $x'.

 - string replace old-string first last new-string

This subcommands replaces a substring starting at 'first' and ending at 'last'.

The 'new-string' replaces the removed section of the 'old-string'.

* eq string string

Returns '0' is two strings are not equal and '1' if they are. Unlike '==' this

acts on the entire string.

* ne string string

Returns '1' is two strings are not equal and '0' if they are. Unlike '!=' this

acts on the entire string.

* incr variable number?

Increment a variable by 1, or by an optional value. 'incr' returns the

incremented variable. 'incr' being implemented in C is usually a lot more

efficient then defining 'incr' in TCL, like so:

	proc incr {x} { upvar 1 $x i; set i [+ $i 1] }

And it is used often in looping constructs.

* subst opts... string

Optionally perform substitutions on a string, controllable via three flags.

When you enter a string in a [TCL][] program substitutions are automatically

performed on that string, 'subst' can be used to perform a subset of those

substitutions (command execution, variable substitutions, or escape character

handling) a string.

 - -nobackslashes

Disable escape characters.

 - -novariables

Do not process variables.

 - -nocommands

Do not process command substitutions.

### Extension Commands

These commands are present in the [main.c][] file and have been added to the

interpreter by extending it. They deal with I/O.

* gets

Read in a new-line delimited string, returning the string on success, on End Of

File it returns 'EOF' with a return code of 'break'.

* puts *OR* puts string *OR* puts -nonewline string

Write a line to *stdout*, the option '-nonewline' may be specified, which

means no newline with be appended to the string.

If no string is given, then a single new line is printed out.

* getenv string

Retrieve an environment variable by the name 'string', returning it as a

string.

* exit *OR* exit number

Exit the program with a status of 0, or with the provided status number.

* clock seconds *OR* clock format time time-spec? *OR* clock clicks

A simplified version of the TCL command 'clock' the subcommands it supports

are:

 - clicks

Return the CPU clock.

 - seconds

Return the seconds since the Unix Epoch.

 - format time time-spec?

The format command a time in seconds since the Unix Epoch against an optional

time-specification (the default time specification is "%a %b %d %H:%M:%S %Z %Y").

The formatting is done entirely by the function [strftime][].

There are internal limits on this string length (512 bytes excluding the NUL

terminator).

* heap option

This command is useful for inspecting the size of the heap, it can report the

number of bytes allocator, the number of frees, the number of allocations, and

other statistics.

The options are:

- frees

This is the number of frees that have taken place, excluding freeing 'NULL'.

- allocations

This is the number of allocations that have taken place, including any

reallocations.

- total

This is the total number of bytes that have been allocated.

- reallocations

This is the number of reallocations that have been performed on an already

allocated pointer.

The "heap" command has another subcommand "fail-after", which is used for

internal testing purposes, it takes a number and after that many calls to

the allocation function it causes it to return a failure, which is fatal to

the interpreter (but should not cause a crash). You should not need to use

this subcommand. Calling this subcommand again resets the count until failure,

setting the count to zero disables deliberate failure. This feature could be

used as a crude watchdog, but it would be inadvisable to do so.

* source file-name

Read and then evaluate a file off of disk. This may fail because the file could

not be read or something when wrong during the evaluation.

## Compile Time Options

I am not a big fan of using the [C Preprocessor][] to define a myriad of

compile time options. It leads to messy and unreadable code.

That said the following compile time options are available:

* NDEBUG

If defined this will disable assertions. It will also disable unit tests

functions from being compiled. Assertions are used heavily to check that the

library is being used correctly and to check the libraries internals, this

applies both to the block allocation routines and pickle itself.

There are other compile time options within [pickle.c][] that control;

maximum string length and whether to use one, whether to provide the default

allocator or not, whether certain functions are to be made available to the

interpreter or not (such as the command 'string', the mathematical operators

and the list functions), and whether strict numeric conversion is used.

These options are semi-internal, they are subject to change and removal, you

should use the source to determine what they are and be aware that they may

change across releases.

Inevitably when an interpreter is made for a new language,

[readline][] (or [linenoise][]) integration is a build option, usually because

the author is tired of pressing the up arrow key and seeing '^\[\[A'. Naturally

this increases the complexity of the build system, adds more options, and adds

more code. Instead you can use [rlwrap][], or an alternative, as a wrapper

around your program.

## Custom Allocator / C Library usage

To aid in porting the system to embedded platforms, [pickle.c][] contains no

Input and Output functions (they are added in by registering commands in

[main.c][]). [pickle.c][] does include [stdio.h][], but only to access

[vsnprintf][]. The big problem with porting a string heavy language to an

embedded platform, unlike a language like [FORTH][], is memory allocation. It

is unavoidable that some kind of dynamic memory allocation is required. For

this purpose it is possible to provide your own allocator to the pickle

library. If an allocator is not provided, malloc will be used, you can remove

this from the initialization function in to stop your build system pulling in

your platforms allocator.

The block allocation library provided in [block.c][] can be optionally

used, but unlike [malloc][] will require tweaking to suite your purposes. The

maximum block size available to the allocator will also determine the maximum

string size that can be used by pickle.

Apart from [vsnprintf][], the other functions pulled in from the C

library are quite easy to implement. They include (but are not necessarily

limited to); strlen, memcpy, memchr, memset and abort.

## C API

The language can be extended by defining new commands in [C][] and registering

those commands with the *pickle\_command\_register* function. The internal

structures used are mostly opaque and can be interacted with from within the

language. As stated a custom allocator can be used and a block allocator is

provided, it is possible to do quite a bit with this scripting language whilst

only allocating about 32KiB of memory total on a 64-bit machine (for example

all of the unit tests and example programs run within that amount).

The C API is small and regular. All of the functions exported return the same

error codes and implementing an interpreter loop is trivial.

The language can be extended with new functions written in C, each function

accepts an integer length, and an array of pointers to ASCIIZ strings - much

like the 'main' function in C.

User defined commands can be registered with the 'pickle\_command\_register'

function. With in the user defined callbacks the 'pickle\_result\_set' family of

functions can be used. The callbacks passed to 'pickle\_command\_set' look

like this:

	typedef int (*pickle_func_t)(pickle_t *i, int argc, char **argv, void *privdata);

The callbacks accept a pointer to an instance of the pickle interpreter, and

a list of strings (in 'argc' and 'argv'). Arbitrary data may be passed to the

custom callback when the command is registered.

The function returns one of the following status codes:

	PICKLE_ERROR    = -1 (Throw an error until caught)

	PICKLE_OK       =  0 (Signal success, continue execution)

	PICKLE_RETURN   =  1 (Return out of a function)

	PICKLE_BREAK    =  2 (Break out of a while loop)

	PICKLE_CONTINUE =  3 (Immediately proceed to next iteration of while loop)

These error codes can affect the flow control within the interpreter. The

actual return string of the callback is set with 'pickle\_result\_set' functions.

Variables can be set either within or outside of the user defined callbacks

with the 'pickle\_var\_set' family of functions.

The pickle library does not come with many built in functions, and comes with

no Input/Output functions (even those available in the C standard library) to

make porting to non-hosted environments easier. The example test driver program

does add functions available in the standard library.

The following is the source code for a simple interpreter loop that reads a

line and then evaluates it:

	#include "pickle.h"

	#include 

	#include 

	static void *allocator(void *arena, void *ptr, size_t oldsz, size_t newsz) {

		if (newsz ==     0) { free(ptr); return NULL; }

		if (newsz  > oldsz) { return realloc(ptr, newsz); }

		return ptr;

	}

	static int prompt(FILE *f, int err, const char *value) {

		if (fprintf(f, "[%d]: %s\n> ", err, value) < 0)

			return -1;

		return fflush(f) < 0 ? -1 : 0;

	}

	int main(void) {

		pickle_t *p = NULL;

		if (pickle_new(&p, allocator, NULL) < 0)

			return 1;

		if (prompt(stdout, 0, "") < 0)

			return 1;

		for (char buf[512] = { 0 }; fgets(buf, sizeof buf, stdin);) {

			const char *r = NULL;

			const int er = pickle_eval(p, buf);

			if (pickle_result_get(p, &r) != PICKLE_OK)

				return 1;

			if (prompt(stdout, 0, r) < 0)

				return 1;

		}

		return pickle_delete(p);

	}

It should be obvious that the interface presented is not efficient for many

uses, treating everything as a string has a cost. It is however simple and

sufficient for many tasks.

While API presented in 'pickle.h' is small there are a few areas of

complication. They are: The memory allocation API, registering a command, the

getopt function and the unit tests. The most involved is the memory allocation

API and there is not too much to it, you do not even need to use it and can

pass a NULL to 'pickle\_new' if for the allocator argument if you want to use

the built in malloc/realloc/free based allocator (provided the library was

built with support for it).

It may not be obvious from the API how to go about designing functions to

integrate with the interpreter. The C API is deliberately kept as simple as

possible, more could be exported but there is a trade-off in doing so; it

places more of a burden on backwards compatibility, limits the development

of the library internals and makes the library more difficult to use. It is

always possible to hack your own personal copy of the library to suite your

purpose, the library is small enough that this should be possible.

The Pickle interpreter has no way of registering different types with it, the

string is king. As such, it is not immediately clear what the best way of

adding functionality that requires manipulating non-string data (such as

file handles or pointers to binary blobs) is. There are several ways of doing

this:

1. Convert the pointer to a string and add functions which deal with this string.

2. Put data into the private data field

3. Create a function which registers another function that contains private data.

Option '1' may seem natural, but it is much more error prone. It is possible

to pass the wrong string around and cause the program to crash. Option '2' is

limiting, the C portion of the program is entirely in control of what resources

get added, and only one handle to a resource can be controlled. Option '2' is

a good option for certain cases.

Option '3' is the most general and allows an arbitrary resource to be managed

by the interpreter. The idea is to create a function that acquires the resource

to be managed and registers a new function in the pickle global function

namespace with the resource in the private data field of the newly registered

function. The newly created function, a limited form of a closure, can then

perform operations on the handle. It can also cleanup the resource by release

the object in its private data field, and then deleting itself with the

 'pickle\_command\_rename' function. An example of this is the 'fopen' command,

it returns a closure which contains a file handle.

An example of using the 'fopen' command and the returned function from within

the pickle interpeter is:

	set fh [fopen file.txt rb]

	set line [$fh -gets]

	$fh -close

And an example of how this might be implemented in C is:

	int pickleCommandFile(pickle_t *i, int argc, char **argv, void *pd) {

		FILE *fh = (FILE*)pd;

		if (!strcmp(argv[1], "-close")) { /* delete self */

			fclose(fh);                                /* free handle */

			return pickle_command_rename(argv[0], ""); /* delete self */

		}

		if (!strcmp(argv[1], "-gets")) {

			char buf[512];

			fgets(buf, sizeof buf, fh);

			return pickle_result_set(i, "%s", buf);

		}

		return pickle_result_set(i, PICKLE_ERROR, "invalid option");

	}

	int pickleCommandFopen(pickle_t *i, int argc, char **argv, void *pd) {

		char name[64];

		FILE *fh = fopen(argv[1], argv[2]);

		sprintf(name, "%p", fh); /* unique name */

		pickle_command_register(i, name, pickleCommandFile, fh);

		return pickle_set_result(i, "%s", name);

	}

The code illustrates the point, but lacks the assertions, error checking,

and functionality of the real 'fopen' command. The 'pickleCommandFopen' should

be registered with 'pickle\_command\_rename', the 'pickleCommandFile' is not

as 'pickleCommandFopen' does the registering when needed.

It should be possible to implement the commands 'update', 'after' and 'vwait',

extending the interpreter with task management like behavior without any changes

to the API. It is possible to implement most commands, although it might

be awkward to do so. Cleanup is still a problem.

## Style Guide

Style/coding guide and notes, for the file [pickle.c][]:

- 'pickle\_' and snake\_case is used for exported functions/variables/types

- 'picol'  and camelCase  is used for internal functions/variables/types,

with a few exceptions, such as 'advance' and 'compare', which are internal

functions whose names are deliberately kept short.

- Use asserts wherever you can for as many preconditions, postconditions

and invariants that you can think of.

- Make sure you make your functions static unless they are meant to

be exported. You can use 'objdump -t | awk '$2 ~ /[Gg]/' to find

all global functions. 'objdump -t | grep '\\\*UND\\\*' can be used to

check that you are not pulling in functions from the C library you

do not intend as well.

- The core project is written strictly in C99 and uses only things

that can be found in the standard library, and only things that are

easy to implement on a microcontroller.

- Error messages should begin with the string 'Invalid', even it is

does not make the best grammatical sense, this is so error messages can

be grepped for easily.

The callbacks all have their 'argv' argument defined as 'char\*',

as they do not modify their arguments. However adding this in just adds

a lot of noise to the function definitions. Also see

.

## Notes

* Other implementations

There are other implementations of [TCL][] and other extensions of the original

[picol][] interpreter, here are a few:

- Picol Extension 

- TCL reimplementation 

- An entire list of implementations 

- Another Picol Extension 

- Partcl , a complete reimplementation, also

with a blog post  describing it.

And I am sure if you were to search  you would find more.

* Internal memory usage

One of the goals of the interpreter is low(ish) memory usage, there are a few

design decisions that go against this, along with the language itself, however

an effort has been made to make sure memory usage is kept low.

Some of the (internal) decisions made:

- The use of 'compact\_string\_t' where possible. This can be used to store a

  string within a union of a small character array or a pointer, this requires

  a bit be available elsewhere to store which is used.

- Compact, small, structures for structures that are used a lot; call frames (2

  pointers), variables (3 pointers and a bit-field), and commands (4 pointers).

- Linked-lists are used, which increase overall memory usage but mean large

  chunks of memory do not have to be allocated and reallocate for things like

  tables of functions and variables.

- The parser operates on a full program string and tokens to the string a

  indices into the string, which means a large [AST][] does not

  have to be assembled.

- Memory is allocated on the stack where possible, with the function

  'picolStackOrHeapAlloc' helping with this, it moves allocations to the

  heap if they become too large for the stack. This could conceivably be used

  for more allocations we do, for example when creating argument lists, but is

  currently just used for some unbounded string operations. (Of note, it might

  be worth creating memory pools for small arguments lists as they are

  generated fairly often, or even a pool of medium size buffers we could lock).

  This could also speed things up as we spend a lot of time allocating objects,

  the working set may be small but the number of temporary objects created is

  not, 'picolArgsGrow' could be targeted for this.

- The empty string could be treated specially by the interpreter and all empty

  strings never freed nor allocated.

- Also of note is that the interpreter is designed to gracefully handle out of

  memory conditions, it may not live up to this fact, but it is possible to

  test this by returning NULL in the allocator provided randomly.

- There dictionary is currently initialized at startup from a list of functions

  and their names, this takes up RAM and not ROM, a two level hashing function

  could be used instead. Calling 'rename' on these built-in functions could

  be handled with a single bit per entry separate from the built-in function

  hash table, or by disallowing 'rename' on built in functions. A not trivial

  (for an embedded system) amount of memory is used by this table. The

  trade-off is more lines of code, more complexity, and a bigger executable.

Some of the design decisions made that prevent and hamper memory usage and

things that could be done:

- Defined procedures are strings and Lack of Byte Code Compilation

It would be possible to include some simple complication on the procedures that

are stored, turning certain keywords into bytes codes that fall outside of the

[UTF-8][] and [ASCII][] character ranges, as well as removing runs of white

space and comments entirely. This would be possible to implement without

changing the interface and would both speed things up and reduce memory usage,

however it would increase the complexity of the implementation (perhaps by

about 500 LoC if that can be thought of as a proxy for complexity).

- Within [pickle.c][] there is a table of functions that are register on

  startup, registering means taking each entry in this array and entering it

  into a hash-table that contains all of the other defined procedures and

  built-in functions, there is obviously some redundancy here. It might be

  worth treating the built-in core functions specially with a binary search

  tree just for them instead of adding them into the hash table (this would

  complicate the 'rename' command as well).

* [vsnprintf][]

If you need an implementation of [vsnprintf][] the [Musl C library][] has one.

This is the most complicate C function in use from the standard library and the

one most likely not to be available in an embedded platform (although the base

software packages are getting better nowadays). It is not difficult to make

your own version of [vsnprintf][] function usable by this library as you do not

need to support all of the functionality library function, for example,

floating point numbers are not used within this library.

* Too big

The list functions are also far to complex, big, and error prone, they should

be rewritten.

It might be nice to go back to the original source, with what I know now, and

create a very small version of this library with a goal of compiling to under

30KiB. The 'micro' makefile target does this somewhat, or just starting from

scratch and making my own version. A smaller API could be made as well, there

really only needs to be; pickle\_new, pickle\_delete, pickle\_eval,

pickle\_command\_register, and pickle\_result\_set.

* A module system and some modules

This interpreter lacks a module system, there are a few small and simple

modules that could be integrated with the library quite easily, see;

Constant Data Base Library , A HTTP 1.1

client , Tiny compression routines

, and a fixed point arithmetic

library , and UTF-8 string handling

 These would have to be external modules

that could be integrated with this library.

The current project that attempts to remedy this is available at:

A proper module system would also allow Shared Objects / Dynamically Linked

Libraries to be loaded at run time into the interpreter. This complicates the

library, but a Lisp Interpreter where I have done this, see

.

* Things that are missing that will not be added.

 - The [coroutine][] words are missing, which will require interpreter

 support to implement efficiently. If you need coroutines (which are very

 useful) then access to the internals is needed.

 - Event handling in [vwait][] and [update][] and the like could be added

 later on.

 - The control structures 'foreach' and 'switch' - whilst very useful - will

 most likely not be added. There is no reason that they cannot be added as

 extensions however.

 - Lack of Floating point support, which should not really be expected either

 given the primary usage for this interpreter, as a command language in

 embedded devices.

## Interpreter Limitations

Known limitations of the interpreter include:

* Recursion Depth - 128, set via a compile time option.

* Maximum size of file - 2GiB

* 'clock' command has a limited string available for formatting (512 bytes).

[ASCII]: https://en.wikipedia.org/wiki/ASCII

[AST]: https://en.wikipedia.org/wiki/Abstract_syntax_tree

[BSD License]: https://en.wikipedia.org/wiki/BSD_licenses

[C Preprocessor]: https://en.wikipedia.org/wiki/C_preprocessor

[C]: https://en.wikipedia.org/wiki/C_%28programming_language%29

[FORTH]: https://en.wikipedia.org/wiki/Forth_(programming_language)

[Lua]: https://www.lua.org/

[MIT License]: https://en.wikipedia.org/wiki/MIT_License

[Make]: https://en.wikipedia.org/wiki/Make_(software)

[Musl C library]: https://git.musl-libc.org/cgit/musl/tree/src/stdio

[TCL trace command]: https://www.tcl.tk/man/tcl8.5/TclCmd/trace.htm

[TCL]: https://en.wikipedia.org/wiki/Tcl

[UTF-8]: https://en.wikipedia.org/wiki/UTF-8

[alnum]: http://www.cplusplus.com/reference/cctype/isalnum/

[alpha]: http://www.cplusplus.com/reference/cctype/isalpha/

[cdb]: https://github.com/howerj/cdb

[control]: http://www.cplusplus.com/reference/cctype/iscntrl/

[coroutine]: 

[ctype.h]: http://www.cplusplus.com/reference/cctype/

[digit]: http://www.cplusplus.com/reference/cctype/isdigit/

[graph]: http://www.cplusplus.com/reference/cctype/isgraph/

[homoiconic]: https://en.wikipedia.org/wiki/Homoiconicity

[insertion sort]: https://en.wikipedia.org/wiki/Insertion_sort

[linenoise]: https://github.com/antirez/linenoise

[lisp]: https://en.wikipedia.org/wiki/Lisp_(programming_language)

[loc]: https://en.wikipedia.org/wiki/Source_lines_of_code

[lower]: http://www.cplusplus.com/reference/cctype/islower/

[main.c]: main.c

[malloc]: https://en.wikipedia.org/wiki/C_dynamic_memory_allocation

[pickle.c]: pickle.c

[pickle.h]: pickle.h

[picol]: http://oldblog.antirez.com/post/picol.html

[print]: http://www.cplusplus.com/reference/cctype/isprint/

[punct]: http://www.cplusplus.com/reference/cctype/ispunct/

[python]: https://www.python.org/

[readline]: https://tiswww.case.edu/php/chet/readline/rltop.html

[readme.md]: readme.md

[regex]: http://c-faq.com/lib/regex.html

[rlwrap]: https://linux.die.net/man/1/rlwrap

[shell]: shell

[snprintf]: http://www.cplusplus.com/reference/cstdio/snprintf/

[space]: http://www.cplusplus.com/reference/cctype/isspace/

[stdin]: http://www.cplusplus.com/reference/cstdio/stdin/

[stdio.h]: http://www.cplusplus.com/reference/cstdio/

[stdout]: http://www.cplusplus.com/reference/cstdio/stdout/

[strftime]: http://www.cplusplus.com/reference/ctime/strftime/

[string.h]: http://www.cplusplus.com/reference/cstring/

[unit tests]: https://en.wikipedia.org/wiki/Unit_testing

[unit.tcl]: unit.tcl

[update]: https://www.tcl.tk/man/tcl8.4/TclCmd/update.htm

[upper]: http://www.cplusplus.com/reference/cctype/isupper/

[vsnprintf]: http://www.cplusplus.com/reference/cstdio/vsnprintf/

[vwait]: https://www.tcl.tk/man/tcl8.4/TclCmd/vwait.htm

[xdigit]: http://www.cplusplus.com/reference/cctype/isxdigit/

[isatty]: http://man7.org/linux/man-pages/man3/isatty.3.html