This design is about a small programing language. Small, but rich enough for most projects. There are no constructs such as for or while, although cycles are still possible. This language is very close to assembly languages, but it is not assembly, because it is possible to construct rich (numerical and other) expressions, and also it is sort of object-oriented. Read on to find out more.
Monotype text in bold is used for reserved words, and should not be used for new identifiers. Monotype text in italic is meant to be replaced by actual identifiers or other code.
Source code files have an extension of .pip.
There are two kinds of comments: single-line and multi-line. The first runs from a semicolon until the end of the line:
; single line comment
The second kind is text delimited by braces:
{ multiple line comment { nested comment }
the rest of the first comment }
An identifier is a sequence of characters that begins with either an underscore or a letter (Unicode uppercase or lowercase) and is followed by more underscores or letters, or digits. Identifiers follow camel case notation. Variables begin with a lowercase letter. Constants begin with the character 'k'. Types and functions begin with uppercase letters.
Source files begin with the imports:
import file-name, ...
import library-name.module-name, ...
import library-name.module-name.file-name, ...
...
These are useful not only for not specifying during use the full name of the entities declared in source files, but also for telling the compiler about such entities and their features in order to aid in error prevention. file-name does not have an extension.
Next come the macros:
macro macro-name(argument, ...)
replacement-text
...
macro-name and arguments are just identifiers, and replacement-text is one or more lines of code. Notice how arguments are delimited by parentheses and how the macro contents are indented one tab in relation to the macro header. This indentation mechanism is featured in other parts of PicoPoe, and it allows us to remove unnecessary braces (or other delimiters) and semicolons at the end of lines. We usually indent our code, so why not make use of this phenomenon?
To invoke a macro we simply say:
macro-name(expression, ...)
and this bit of code will be replaced by the replacement text defined in the macro. Macro arguments are, of course, optional.
Next in the source code file come one or more of protocols, wrappers, unions, or structs. Protocols are like this:
protocol protocol-name(argument:type, ...)
operator-declaration
array-declaration
block-declaration
...
The arguments are used for what other languages call generics and are optional. See below for block, array, and operator declarations (note that these are just declarations, not implementations). For instance:
protocol Map(Key:struct Comparable, Data:struct Object)
block AddElement(k:Key, e:Data):Void
block RemoveElement(k:Key):Void
block GetElement(k:Key):Data
block ElementExists(k:Key):Bool
Note the use of struct Comparable and struct Object in the generics arguments to the protocol. This use says that Key and Data are actually types, not data. If we said only Comparable and Object, without struct, Key and Data would be data, not types. We may then say:
data myObject:Map(String, UWord)
myObject.RemoveElement("myElement")
Wrappers are just (extended) typedefs. This is the syntax:
wrapper wrapper-name(argument:type, ...) <underlying-type>
literal-implementation
operator-implementation
array-implementation
block-implementation
...
Again, arguments are for generics and optional. underlying-type is the old type name. wrapper-name is the new type name. For example, we may simply say:
wrapper Handle <**Object>
and, from then on, Handle and **Object will be synonyms. On the other hand, we have the option of completely redefining the interface of the underlying type and add our own methods and operators and more to the underlying type. We just cannot add new data members to a wrapper - we'll have to work with the underlying type's restrictions.
Unions are like this:
union union-name(argument:type, ...)
data-declaration
union-declaration
struct-declaration
...
union-name is optional. If absent, an anonymous union is being declared, and the identifiers inside the union must be different from the other identifiers where the union is declared, to avoid naming conflicts.
Structs inside unions and inside other structs are declared like this:
struct struct-name(argument:type, ...)
data-declaration
union-declaration
struct-declaration
...
Here is an example:
union Registers
struct
bytes[4] eax
bytes[4] ebx
bytes[4] ecx
bytes[4] edx
struct
bytes[2] reserved
bytes[2] ax
bytes[2] reserved
bytes[2] bx
bytes[2] reserved
bytes[2] cx
bytes[2] reserved
bytes[2] dx
struct
bytes[2] reserved
byte ah
byte al
bytes[2] reserved
byte bh
byte bl
bytes[2] reserved
byte ch
byte cl
bytes[2] reserved
byte dh
byte dl
Note the use of the keyword reserved (reserved fields cannot be accessed) and of anonymous structs, similar to unions. Elsewhere we can say:
data r:Registers
r.eax <- r.bx × (r.cl / 2)
Finally, come the structs declared outside of other structs or unions. These are the classes, although they still use the keyword struct, like this:
struct struct-name(argument:type, ...) <category-name>
equ-declaration
data-declaration
union-declaration
struct-declaration
literal-implementation
operator-implementation
array-implementation
block-implementation
initer-implementation
...
category-name may be nothing (in which case, we don't write < >), or it may be the super class name, or a protocol name, or Private, or a general identifier beginning with an uppercase letter. Usually, a class is just a sequence of these struct declarations, each grouped under its own category. If this category name isn't specified, the class inherits from no super class. If it's nothing or the super struct name, it may have data added in this category. If it's a protocol name, we must implement in the category the methods and operators of that protocol. If the category name is the reserved keyword Private, the methods implemented in that category are not accessible to subclasses or other classes. A class may have only one category of each name, except Private, which may appear more than once in a class definition. Here follows an example.
struct Rectangle <GeometricFigure>
data x:Real, y:Real, w:Real, h:Real
block Init(x:Real, y:Real, w:Real, h:Real):Rectangle
self.x <- x
self.y <- y
self.w <- w
self.h <- h
! self
block Area:Real
! w × h
block MakeSquare(s:Real):Void
w h <- s
struct Rectangle <ChangeOrigin>
block MoveToOrigin:Void
x y <- 0.0
block CenterOnOrigin:Void
x <- -w / 2.0
y <- -h / 2.0
Here we see that class Rectangle is being defined across two categories so far, namely GeometricFigure (its super class), and ChangeOrigin, a protocol with two methods declared elsewhere in the program and implemented here. We could have added more protocol categories, or even Private or general categories, but not more super categories or empty categories. Exactly one of either of these last two must always be present, and one or the other must be the first category of a class.
There are only four primitive data types, and they're all related:
bit
bits[unsigned-word]
byte
bytes[unsigned-word]
If we say:
data s:bit
data v:bits[7]
we end up with data occupying one full byte in memory, that is, bits and bytes are packed together, not spread across bytes as in other programming languages.
If we wish to pass blocks around, we may do so with the block type:
(argument-type, ...)::(return-type, ...)
This is the signature of a block, indicating both its argument types as well as its return types. It may be used anywhere we expect a block name to be passed around.
Constants are declared in one of several ways. The easiest case is with just one constant:
static equ constant <- expression
This works if the constant is of the same type as the class. If not, we can use:
static equ constant1:type1 <- expression1, constant2:type2 <- expression2, ...
Any combination of these two cases is possible. static is optional and it says whether the constant belongs to the class or is different per instance. Enumerations are like this:
equ enum-name
name1
name2 name3
name4 <- unsigned-word
name5
...
The identifier enum-name is optional, and, if absent, care must be taken to avoid naming conflicts. name1 is 0 (zero), name2 and name3 are on the same line and therefore synonyms (both equal to 1), name4 is initialized with a number, name5 is that number plus 1, and so on.
Data declarations are like this, for example:
static getter setter data data1:type1 <- expression1, data2:type2 <- expression2, ...
static, getter, and setter are optional. static says the data belongs to the class, not to its instances. getter allows the data to be read from outside of the class, like this:
x <- myObj.data
setter allows the data to be written to outside of its class:
myObj.data <- x
These are similar to the public/private mechanisms of other languages. The initializer expressions expression1, expression2, ... are also optional. A type is:
*type-name(generics)[array-size]
* indicates the data is in fact a pointer. We may have pointers to pointers to pointers... Generics were seen above, with the Map example. Arrays, if present, may be multidimensional, like this:
data myArray:UWord[10][20][30]
A literal is declared so:
literal regular-expression
statement
...
Here is an excerpt of an example:
struct Bool <Object>
data value:bit
literal false
value <- 0
literal true
value <- 1
Operators come in several flavours:
prefixop symbol:return-type
statement
...
suffixop symbol:return-type
statement
...
linfixop symbol(argument:argument-type):return-type
statement
...
rinfixop symbol(argument:argument-type):return-type
statement
...
There are prefix operators, like ¬myVar, suffix operators like myVar--, and infix operators (left and right associative) like myVar ∈ mySet. symbol is any Unicode mathematical symbol, or combination (without spaces), with a few exceptions. Operators may be overloaded. In case of ambiguities, the compiler should select longest match first, followed by order of imports, followed by order of implementation. The order of implementation gives us the operator precedence, from highest to lowest.
Array access is like the following:
arrayget(index:index-type):return-type
statement
...
arrayset(index:index-type, value:value-type):return-type
statement
...
We have already seen examples of blocks. They're just pieces of code, like this:
static block name(argument:argument-type, ...):return-type, ...
statement
...
Blocks may be overloaded. They may also be or not be static. They may return more than one expression.
An initer is simply code that gets called at class load time, to initialize static data or perform some other code at that time. Initers are usually the last blocks to be implemented in the class.
initer
statement
...
The simplest statement is probably a data declaration. These are similar to the data declaration we saw above, but have no getter and no setter modifiers. They must appear after the block arguments and before all other statements. Next are the assignments (we already saw a few):
data1 data2 ... <- expression
And the associated return:
! expression1, expression2, ...
For example, the code:
block MyFunc(x:Real, y:Real):Real, Real
! x / y, x \ y
quo rem <- MyFunc(10.0, 5.0)
places the quotient of dividing 10.0 by 5.0 in quo and the remainder of that division in rem. If a block returns no values, a return may come all by itself:
!
A statement may also be an expression (see below). Next come the labels and the gotos:
block name(argument:argument-type, ...):return-type, ...
statement1
...
@label
statement2
...
Labels are identifiers, and are preceded by the character @, indented at the same level as the block where they appear. Gotos are like this:
boolean -> label-if-true | label-if-false
The boolean expression is evaluated. If true, execution continues at the instruction following label label-if-true, and if false, at the instruction following label-if-false. The bit of code “| label-if-false” may be omitted and, if this is so, no jump occurs if the expression is false. The labels may be block calls. More generally, we have the following construct:
unsigned-word -> 0:label-if-0 | 1,3,5..9:label-if-1-3-5-6-7-8-9 | ... | label-if-default
Or the more readable:
unsigned-word -> 0:label-if-0
| 1,3,5..9:label-if-1-3-5-6-7-8-9
| ...
| label-if-default
This jumps to the labels according to the value of the unsigned-word expression. Again, the labels may be block calls. The numbers 0, 1 and so on may be enum constants. Note the use of ranges and of multiple values. Finally, there's the assembler statement:
block name(argument:argument-type, ...):return-type, ...
statement1
...
asm(cpu)
asm-statement1
...
@asm-label
asm-statement2
...
statement2
...
Notice how the @asm-label is indented at the same level as the asm() statement.
Any expression may be enclosed in parentheses. It may be data, with an optional object before:
object.data
It may be a constant:
object.equ
enum-name.equ
Or a block call:
object.block(expression1, expression2, ...)
It may be a literal:
object.literal
It may be a struct or a union field:
struct.data
union.data
It may be the contents pointed to by a variable (we may have pointers to pointers to pointers...):
*data
It may be a meta access
data@size
data@name
data@addr
data@type
Or an array access:
data[unsigned-word]
It may involve an operator:
prefix-op expression
expression suffix-op
expression infix-op expression
It may be a cast:
(expression, type)
It may be a primitive call:
pmtv(operation, expression1, expression2, ...)
It may be the ternary operator of other languages:
boolean ? expression-if-true : expression-if-false
It may be:
self
super
Or it may be the following structure used to initialize arrays, dictionaries or maps, or other complex structures:
[expression1a | expression1b | ..., expression2a | expression2b | ..., ...]
Finally, it may be the array:
args
used when a variable number of arguments in a block are declared (with ellipsis). We may say:
args.count
args[unsigned-word]
There's just the conditional compilation directive:
#if condition
statement1
...
#elsif condition
statement2
...
#else
statement3
...
#fi
condition may be complex, using the logical operators ¬, ∧, and ∨, parentheses, and other defined symbols in the call to the compiler.
Copyright © 2020 Rui Cuco. All rights reserved.
All trademarks mentioned in these pages belong to their respective owners.