A High-Level Programming Language

So, you enjoyed some of the features of PicoPoe, but want a more structured programming language? Then Lune is for you. It is very similar to PicoPoe, but it has cycles, if's, and so on. You may even define your own structured statements, if you like. Let's take a closer look.

Monotype text in bold is used for reserved words, and should not be used for new identifiers. Monotype text in italic is meant to be replaced by actual identifiers or other code.

Source Code

Source code files have an extension of .lune.

There are two kinds of comments: single-line and multi-line. The first runs from a semicolon until the end of the line:

; single line comment

The second kind is text delimited by braces:

{ multiple line comment
  { nested comment }
  the rest of the first comment }

An identifier is a sequence of characters that begins with either an underscore or a letter (Unicode uppercase or lowercase) and is followed by more underscores or letters, or digits. Identifiers follow camel case notation. Variables and functions begin with a lowercase letter. Constants begin with the character 'k'. Types begin with uppercase letters.

Source File Structure

Source files begin with the imports:

import file-name, ...
import library-name.module-name, ...
import library-name.module-name.file-name, ...
...

These are useful not only for not specifying during use the full name of the entities declared in source files, but also for telling the compiler about such entities and their features in order to aid in error prevention. file-name does not have an extension.

Next in the source code file come one or more of protocols, wrappers, unions, or structs. Protocols are like this:

protocol protocol-name(argument:type, ...)
  operator-declaration
  array-declaration
  block-declaration
  ...

The arguments are used for what other languages call generics and are optional. See below for block, array, and operator declarations (note that these are just declarations, not implementations). For instance:

protocol Map(Key:struct Comparable, Data:struct Object)
  block addElement(k:Key) withContents(e:Data):Void
  block removeElement(k:Key):Void
  block getElement(k:Key):Data
  block elementExists(k:Key):Bool

Note the use of struct Comparable and struct Object in the generics arguments to the protocol. This use says that Key and Data are actually types, not data. If we said only Comparable and Object, without struct, Key and Data would be data, not types. We may then say:

data myObject:Map(String, UWord)

myObject.removeElement("myElement")

Wrappers are just (extended) typedefs. This is the syntax:

wrapper wrapper-name(argument:type, ...):underlying-type
  errors-declaration
  literal-implementation
  operator-implementation
  array-implementation
  block-implementation
  ...

Again, arguments are for generics and optional. underlying-type is the old type name. wrapper-name is the new type name. For example, we may simply say:

wrapper Handle:**Object

and, from then on, Handle and **Object will be synonyms. On the other hand, we have the option of completely redefining the interface of the underlying type and add our own methods and operators and more to the underlying type. We just cannot add new data members to a wrapper - we'll have to work with the underlying type's restrictions.

Unions are like this:

union union-name(argument:type, ...)
  data-declaration
  union-declaration
  struct-declaration
  ...

union-name is optional, as are arguments. If union-name is absent, an anonymous union is being declared, and the identifiers inside the union must be different from the other identifiers where the union is declared, to avoid naming conflicts.

Structs inside unions and inside other structs are declared like this:

struct struct-name(argument:type, ...)
  data-declaration
  union-declaration
  struct-declaration
  ...

Here, struct-name is optional too, as are the arguments. Here is an example:

union Registers
  struct
    bytes[4] eax
    bytes[4] ebx
    bytes[4] ecx
    bytes[4] edx
  struct
    bytes[2] reserved
    bytes[2] ax
    bytes[2] reserved
    bytes[2] bx
    bytes[2] reserved
    bytes[2] cx
    bytes[2] reserved
    bytes[2] dx
  struct
    bytes[2] reserved
    byte ah
    byte al
    bytes[2] reserved
    byte bh
    byte bl
    bytes[2] reserved
    byte ch
    byte cl
    bytes[2] reserved
    byte dh
    byte dl

Note the use of the keyword reserved (reserved fields cannot be accessed) and of anonymous structs, similar to unions. Elsewhere we can say:

data r:Registers

r.eax <- r.bx × (r.cl / 2)

Finally, come the structs declared outside of other structs or unions. These are the classes, although they still use the keyword struct, like this:

struct struct-name(argument:type, ...):category-name
  errors-declaration
  equ-declaration
  data-declaration
  union-declaration
  struct-declaration
  literal-implementation
  operator-implementation
  array-implementation
  block-implementation
  initer-implementation
  ...

category-name may be nothing (in which case, we don't write the colon), or it may be the super class name, or a protocol name, or Private, or a general identifier beginning with an uppercase letter. Usually, a class is just a sequence of these struct declarations, each grouped under its own category. If this category name isn't specified, the class inherits from no super class. If it's nothing or the super struct name, it may have data added in this category. If it's a protocol name, we must implement in the category the methods and operators of that protocol. If the category name is the reserved keyword Private, the methods implemented in that category are not accessible to other classes. A class may have only one category of each name, except Private, which may appear more than once in a class definition. Here follows an example.

struct Rectangle:GeometricFigure
  data x:Real, y:Real, w:Real, h:Real

  block initWith(x:Real) and(y:Real) and(w:Real) and(h:Real):Rectangle
    self.x <- x
    self.y <- y
    self.w <- w
    self.h <- h
    return(self)

  block area:Real
    return(w × h)

  block makeSquare(s:Real):Void
    w h <- s

struct Rectangle:ChangeOrigin
  block moveToOrigin:Void
    x y <- 0.0

  block centerOnOrigin:Void
    x <- -w / 2.0
    y <- -h / 2.0

Here we see that class Rectangle is being defined across two categories so far, namely GeometricFigure (its super class), and ChangeOrigin, a protocol with two methods declared elsewhere in the program and implemented here. We could have added more protocol categories, or even Private or general categories, but not more super categories or empty categories. Exactly one of either of these last two must always be present, and one or the other must be the first category of a class.

Primitive Types

There are only four primitive data types, and they're all related:

bit
bits[unsigned-word]
byte
bytes[unsigned-word]

If we say:

data s:bit
data v:bits[7]

we end up with data occupying one full byte in memory, that is, bits and bytes are packed together, not spread across bytes as in other programming languages.

If we wish to pass blocks around, we may do so with the block type:

(argument-type, ...)->(return-type, ...)

This is the signature of a block, indicating both its argument types as well as its return types. It may be used anywhere we expect a block name to be passed around.

Member Declarations

Errors are special kinds of enumerations, defined like this:

errors errors-name
  error1
  error2
  error3 error4
  ...

error3 and error4 are synonyms. Errors can only be used with the statements error() and iferror(), explained below. Here is an example of a list of errors:

errors FileError
  ReadAfterEOF
  DiskFull
  BadHandle
  FileAlreadyOpen

Constants are declared in one of several ways. The easiest case is with just one constant:

static equ constant <- expression

This works if the constant is of the same type as the class. If not, we may use:

static equ constant1:type1 <- expression1, constant2:type2 <- expression2, ...

Any combination of these two cases is possible. static is optional and it says whether the constant belongs to the class or is different per instance. Enumerations are like this:

equ enum-name
  name1
  name2 name3
  name4 <- unsigned-word
  name5
  ...

The identifier enum-name is optional, and, if absent, care must be taken to avoid naming conflicts. name1 is 0 (zero), name2 and name3 are on the same line and therefore synonyms (both equal to 1), name4 is initialized with a number, name5 is that number plus 1, and so on.

Data declarations are like this, for example:

static getter setter data data1:type1 <- expression1, data2:type2 <- expression2, ...

static, getter, and setter are optional. static says the data belongs to the class, not to its instances. getter allows the data to be read from outside of the class, like this:

x <- myObj.data

setter allows the data to be written to outside of its class:

myObj.data <- x

These are similar to the public/private mechanisms of other languages. The initializer expressions expression1, expression2, ... are also optional. A type is:

*type-name(generics)[array-size]

* indicates the data is in fact a pointer. We may have pointers to pointers to pointers... Generics were seen above, with the Map example. Arrays, if present, may be multidimensional, like this:

data myArray:UWord[10][20][30]

A literal is declared so:

literal regular-expression
  statement
  ...

Here is an excerpt of an example:

struct Bool:Object
  data value:bit

  literal false
    value <- 0

  literal true
    value <- 1

Operators come in several flavours:

prefixop symbol:return-type
  statement
  ...

suffixop symbol:return-type
  statement
  ...

linfixop symbol(argument:argument-type):return-type
  statement
  ...

rinfixop symbol(argument:argument-type):return-type
  statement
  ...

There are prefix operators, like ¬myVar, suffix operators like myVar--, and infix operators (left and right associative) like myVar1 < myVar2. symbol is any Unicode mathematical symbol, or combination (without whitespace), with a few exceptions. Operators may be overloaded. In case of ambiguities, the compiler should select longest match first, followed by order of imports, followed by order of implementation. The order of implementation gives us the operator precedence, from highest to lowest.

Array access are simply blocks with two special names, like the following:

block at(index:index-type):return-type
  statement
  ...

block at(index:index-type) put(value:value-type):return-type
  statement
  ...

We have already seen examples of blocks. They're just pieces of code, like this:

static block name1(argument1:argument1-type) name2+(argument2:argument2-type) ...:return-type, ...
  statement
  ...

Note the similarity with Objective-C. Blocks may be overloaded. They may also be or not be static. They may return more than one expression. Note the + after name2. This means one or more occurrences of name2(...) may appear in a call to this block, like this:

block myBlock(argX:Real) and+(argY:Real):Void
  ...

myObj.myBlock(x) and(y1) and(y2) and(y3)

The name of this block is myBlock()and+(). It cannot coexist, in a class, with myBlock()and() nor with myBlock+()and(). At most one + may be present in a block declaration, and it may appear anywhere, not just in the last part of a block name. To access the parameters in this example, we may use:

args.count
args[unsigned-word]

An initer is simply code that gets called at class load time, to initialize static data or perform some other code at that time. Initers are usually the last blocks to be implemented in the class.

initer
  statement
  ...

Statements

The simplest statement is probably

nop

which means no operation. Next comes a data declaration. There are two types of data declarations inside blocks. The first is similar to the data declaration we saw above, but have no getter and no setter modifiers. The may have a static modifier. The other data declaration statement is like this:

|argument:type, ...|

This kind of statement is used like this:

myNumber.repeat
  |myWord:UWord|
  print(myWord)

repeat is a statement declared in the class UWord that expects a Block as an argument, and exports a single parameter caught with the |...| statement. In this example, repeat executes the block as many times as the contents of myNumber, from value 0 (zero) to value myNumber-1, and in each execution of the block, myWord contains this value. Data declarations must appear before all other statements in a block, first those in a |...| statement, and then those in normal data statements. Next are the assignments (we already saw a few):

data1 data2 ... <- expression

And the associated returns:

return(expression)
return(expression1) and+(expression2)

For example, the code:

block myFuncX(x:Real) andY(y:Real):Real, Real
  return(x / y) and(x \ y)

quo rem <- myFuncX(10.0) andY(5.0)

places the quotient of dividing 10.0 by 5.0 in quo and the remainder of that division in rem. If a block returns no values, a return may come all by itself:

return

A statement may also be an expression (see below). Next come the labels and the gotos:

  statement1
  ...
@label
  statement2
  ...

Labels are identifiers, and are preceded by the character @, indented at the same level as the (sub-)block where they appear. Gotos are like this:

goto(label)

The switch statement of C is also present:

case(expression)
  if(expression1a) or+(expression1b) do
    statement1
    ...
  ...
  ifnone
    statement2
    ...

If expression matches expression1a or expression1b, then do statement1. If not, repeat the test with the next if. If no matches were found, then do statement2. To fall through to the next case, append

continue

after the case's last statement. To exit from a case, or other control structure, write

break

Now, error handling. To throw an error, say:

error(error-type.error-name)

To catch an error, say:

iferror(error-type.error-name) or+(error-type) do
  statement1
  ...
...
purge
  statement2
  ...

These are usually the last statements in a block. Note that we may catch a single error, or a whole family of errors with these statements. Finally, there's the assembler statement:

statement1
...
asm(cpu)
  asm-statement1
  ...
@asm-label
  asm-statement2
  ...
statement2
...

Notice how the @asm-label is indented at the same level as the asm() statement.

Expressions

Any expression may be enclosed in parentheses. It may be data, with an optional object before (this is so, from now on):

object.data

It may be a constant:

object.equ
enum-name.equ

Or a block call:

object.block1(expression1) block2(expression2) ...

If one of the arguments of the block is a Block, the call goes like this:

object.block1
  statement
  ...  
block2(expression2) ...

That is, the contents are indented. For example:

myBool.ifTrue
  statement-if-true
  ...  
ifFalse
  statement-if-false
  ...

It may be a literal:

object.literal

It may be a struct or a union field:

struct.data
union.data

It may be the contents pointed to by a variable (we may have pointers to pointers to pointers...):

*data

It may be a meta access

data@size
data@name
data@addr
data@type

Or an array access:

data[unsigned-word]

It may involve an operator:

prefix-op expression
expression suffix-op
expression infix-op expression

It may be a cast:

(expression, type)

It may be a primitive call:

pmtv(operation)
pmtv(operation) arg+(expression)

It may be the ternary operator of other languages:

boolean ? expression-if-true : expression-if-false

It may be:

self
super

Or it may be the following structure used to initialize arrays, dictionaries or maps, or other complex structures:

[expression1a | expression1b | ..., expression2a | expression2b | ..., ...]

Finally, it may be the array:

args

used when a variable number of arguments in a block are declared (with ellipsis). We may say:

args.count
args[unsigned-word]

Preprocessor Directives

There's just the conditional compilation directive:

#if condition
statement1
...
#elsif condition
statement2
...
#else
statement3
...
#fi

condition may be complex, using the logical operators ¬, , and , parentheses, and other defined symbols in the call to the compiler.