Cation Language Design

We design Cation as a pure categorical programming language for practical tasks.

In real-world computing, unlike mathematics, everything is bounded: resources, computational time; there are no infinities, no arbitrary-precision rational or irrational numbers (but just their partial estimates). Most of existing programming languages ignored this simple fact, resulting in hundreds of pitfalls and footguns for the developers. Cation tries to avoid this, and ensures that everything is bound and measurable at compile-time. It does this using instruments of category theory, and as a result each Cation program can pass a termination analysis and formally prove maximum amount of resources it takes to execute.

Language features

Cation is made for practical tasks: we attempt to make it easy-to-read and write, avoiding as much of boilerplate code as possible. There are just five built-in keywords; monadic programming is effortless: one does not need to declare set of possible error types ahead. Cation has indentation-based syntax, which makes it much more visually readable and sets it aside from curly-braced languages like C/C++, Rust, Scala... and makes it similar to Idris, Haskell, Python.

It has the following distinguishing features, which taken together set it aside from other existing languages:

Termination analysis and eager evaluation, implying compile-time-defined bounds for data types
Generalized algebraic data types and dependent types
Support of metaprogramming with @-attributes operating like macros
Parallel computation by default ("multi-way computing")
Doesn't require a runtime, doesn't use a garbage collector
Data types, functions and variable instances have no difference: everything is an expression
Each expression corresponds to a natural transformation
Program compilation is an evaluation of expressions
Designed for the creation of domain-specific languages

Mutability and racing conditions

There are no variables, just values.

Exception handling

Memory addressing

Program structure

On the high level, the Cation program consists of (optionally annotated) statements, which can be either specifiers or expressions.

Specifiers are evaluated by the compiler to sets of expressions, serving the primary purpose of reducing boilerplate code and simplifying human readability and writability of the language.

Specifiers

Specifiers consist of a declaration, optionally followed by a definition, which may contain type and body information. Definition bodies, when present, are made of a sequence of statements.

Declaration starts with one of keywords, either build-in (data, fx, infx, let, alias) or added as language extensions by domain-specific languages on top of Cation (like Contractum). The declaration keyword is followed by a declaration name, which can be used later as a part of expressions throughout Cation code.

The overall structure of a specifier is the following: <KEYWORD> <NAME> <TYPE>? <BODY>?. Type, when present, is separated from the name with a semicolon operator; body, when present, is separated from the previous parts either with := operator, when a body is given on the same line, or a newline character, with all body statements indented.

Some examples of specifiers are:

data type specifier:
```
data Coord2D: x U64, y U64
```
where data Coord2D is a data type declaration, separated with colon from the type information, which is x U64, y U64. Data type specifiers contain no body.
value specifier:
```
val value: U8 := 5
```
where val value is a value declaration, separated with colon from the type information, which is U8. Next, the body contains the value assigned to the named value instance. Value specifications may have data type omitted in situations when it can be inferred; for instance, we can also write the specifier above as val value := 5#U8, where #U8 is a type annotation.

function specifier (infix variant):

infx `+`: a Coord2D, b Coord2D -> Coord2D
  val x := a.x + b.x
  val y := a.y + b.y
  x, y

which can be also written as a one-liner

infx `+`: a Coord2D, b Coord2D -> Coord2D := a.x + b.x, a.y + b.y

Expressions

Everything which is not a specifier is an expression. Expressions can be seen as mathematical statements, or a natural transformations.

In Cation, data types, values and functions are all first-class citizens, such that all of them may participate expressions.

Expressions are made of named entities, composed together according to the compatibility rules using functions. Expressions are separated either with a line feed character (U+000A), or with a semicolon ; if put on one line one after the other.

Language syntax

Fundamental operators

Built-in operators are called fundamental operators. They are used for basic expression composability and constitute a backbone of the expression syntax.

There are three sets of fundamental operators: categorical limits, projection and injection operators, iterating and composition operators. These operators are used to structure execution flow: branching, looping, working with collections etc.

Categorical limits

There are two fundamental operators defining categorical limits: product operator , (comma) and sum operator | (pipe). These operators are used to compose most of the code; for instance, arguments of a data type definition is just an expression composing other data types as a categorical product; for instance, data Coord2D: x U32, y U32 makes 2D coordinate as a product of two 32-bit integers. Likewise, a co-product (enum) data types can be defined as data Maybe: none | some(X).

Projection operators

Projection operators (space) and . (dot) allow to call a function providing it with arguments. The use of these two forms depends on the function type and a number of its arguments. There are two types of functions, which affect the operator: prefix functions (declared using fx keyword) and infix functions (declared with infx keyword). For prefix operators one calls a function (i.e. does projection) by putting function argument after the function name, separating them with a space: someFn argument. In fact, any prefix function can be seen as a function with just a single argument, made of anonymous data type, composing all arguments together. Thus, a function of two arguments can be used in the following manner:

fx twoArgsFn: x U32, y U32 -> U32

val coord := x U32, y U32
twoArgsFn coord

As one may see, a data type can be seen as a function constructing an instance of that type, which emphasises that in Cation there is no difference between data types and functions.

The call of the prefix function can be reversed using a dot projection operator .: coord.twoArgsFn.

Infix functions are put after their first argument, which allows more natural way of using mathematical operators and alike. For instance, above we have declared infx `+` operator which takes two arguments and adds them together: a + b. One may also reverse the order of the arguments passed to the infix operator by putting the call expression into parentheses: (+ a, b) or even (+) a, b.

Injection operators

Injection operators >|, |?, |: represent a way how to de-compose a categorical co-product into one of its source components and pattern-match against them. In other words, injection operators are the way to branch the code execution; they are the way how if-else statements and pattern matching is handled in the language.

There are several ways of doing branching constructions using injection operators. First, one can use value >| in combination with pattern => code expressions to match the value against a set of patterns and handle a default variant with _ => code:

data WorldDirection: north | south | west | east

val sample: WorldDirection

sample >|                   – match sample {
  north => doSomething1     –   north => {} 
  south => doSomething2     –   south => {}
      _ => doSomething3     –   _ => {}
                            – }

Next, one can use a boolean expression followed with |? to express if semantics, and use |: for handling else condition. Let's assume we have an expression test which results in a boolean value, and two expressions, ifTrue and ifFalse which we'd like to execute basing on the test value. To do so we simply write test |? ifTrue |: ifElse.

sample =?= north             – if sample == north
|? doSomething1
|: doSomething2             – else

sample =?= north             – if sample == north
|? doSomething1
|: sample =? south          – else if sample == south
|? doSomething2             
|: doSomething3             – else

Composition operators

Operators ( ), [ ] and { } are called composition operators; they are used to create types via categorical composition of existing types, i.e. act as natural transformations defining new functors (as type constructors) from existing ones.

Round parenthesis can be used for grouping expressions, such that the boundaries of composition operation are explicitly provided. When used with no values, they also can be used to represent initial object (unit) (,), as well as terminal object (impossible type) (|).

Square brackets are used for creating ordered non-unique collection types. Collection type is a type composed dynamically (via production operation) of some elementary data types – which in other languages is called a list or an array. Finally, curly braces are used for creating ordered unique collection types – sets and maps.

In the type definition each of the collection types is required to have a specification of the minimum and maximum allowed number of elements – a mechanism by which compiler can perform a termination analysis and enforce boundaries and compile-time. This is done with power operator ^ and range operators ..= and ..<. Power operator denotes that the data type of collection element is potentiated, and the range operator specifies limits of how many times the self-product of the element type may happen. For instance, [ U8 ^ ..<256 ] is an array which may contain from zero up to 255 elements of unsigned 8-bit integer type.

Iterating operators

Iterating operators are used not just for data type specifications, but also for enabling code repetitions (cycles and loops), iterators and generators. For this purpose iterating operators <| and |> are used. The semantic meaning of the expression becomes take each element of a collection and bring it as a value to the other expression. The collection for <| should be put right-side and expression processing its values left-side, while for |> the order is reversed. Thus, 0..<100#U8 |> print will iterate over range of unsigned 8-bit integers from zero to 100 and print each of them. When the iterating operator is combined with a composition operator, this means put the resulting values from the expression into a new collection: for instance [0..<100#U8 |> pow 2] would collect all squared numbers from 0 to 99 into a new list.

When working with loops using iterating operators one may need to access the current value from the iterator in explicit form, as well as to use a results from previous rounds of the iterator. This can be done with _ and $ operators: _ is the context-defined value operator holding the current input for the iteration, while $ is the result operator giving access to the output values of the previous iterations. For instance, to access the previous result one needs to write $-1, and to access the value result of the very first iteration, $0 ($1 will give result of the second iteration etc.). Result operator $ can be also used to assign an output of the iteration or a function by writing $ <- value.

Tags and annotations

Operator # adds context tags to the values; for instance it is used to add integer tag to co-product type variants (like in data Bool: false#0 | true#1) or to enforce some literal to be of a specific data type (like in ..100#U8, where the range is enforced to be over U8 type).

Operator @ is used to annotate Cation statements. Annotations are a way of metaprogramming: they are similar to procedural macros in Rust or annotations in Java.

Data types

Data types are made with GADT and category theory:

primitive data types all are countable and finite sets;
compositional data types are products (using , operator) or coproducts (| operator) of primitive types (sets).

Data type names are capitalized; their definition starts with a data keyword, followed by the type name and compositional semantic after a colon, which can be seen as a constructor definition or as a product/coproduct functor over categories creating new given data type as a new category. For instance, data Coord : x U32, y U32 defines category $Coord$ (2D coordinate) as a self-product of categories $U32$ (all 32-bit natural numbers) $Coord = U32 \times U32$ having two corresponding projective morphisms $x: Coord \mapsto U32$ and $y: Coord \mapsto U32$

Cation comes with a very small set of built-in data types; the most of the types used in everyday programming are defined in the standard library.

Built-in types

There are four main classes of built-in types:

unit (,) and never type (|), which we have covered above;
integer numbers: signed, unsigned and non-zero unsigned
(type classes int, unsigned, signed and nonzero);
floating-point numbers (type class float);
Unicode character Char type;
range types.

Integers and floats offer many times with different bit length; float types also offer different memory encodings. Unicode character has variable bit length.

Supported bit length for integer types are:

Bits	Bytes	Unsigned	Signed	Non-zero	C equivalents	Rust equivalents *
8 bits	1 byte	`U8`	`I8`	`N8`	`(unsigned)` `char`	`u8`, `i8`, `NonZeroU8`
16 bits	2 bytes	`U16`	`I16`	`N16`	`(unsigned)` `short`	`u16`, `i16`, `NonZeroU16`
24 bits	3 bytes	`U24`	`I24`	`N24`	n/a	n/a
32 bits	4 bytes	`U32`	`I32`	`N32`	`(unsigned)` `long`	`u32`, `i32`, `NonZeroU32`
40 bits	5 bytes	`U40`	`I40`	`N40`	n/a	n/a
48 bits	6 bytes	`U48`	`I48`	`N48`	n/a	n/a
56 bits	7 bytes	`U56`	`I56`	`N56`	n/a	n/a
64 bits	8 bytes	`U64`	`I64`	`N64`	`(unsigned)` `long long`	`u64`, `i64`, `NonZeroU64`
80 bits	10 bytes	`U80`	`I80`	`N80`	n/a	n/a
96 bits	12 bytes	`U96`	`I96`	`N96`	n/a	n/a
112 bits	14 bytes	`U112`	`I112`	`N112`	n/a	n/a
128 bits	16 bytes	`U128`	`I128`	`N128`	n/a	`u128`, `i128`, `NonZeroU128`
256 bits	32 bytes	`U256`	`I256`	`N256`	n/a	n/a
512 bits	64 bytes	`U512`	`I512`	`N512`	n/a	n/a
1024 bits	128 bytes	`U1024`	`I1024`	`N1024`	n/a	n/a

All integer types in Cation are co-product types, made with all their allowed values. This makes it possible to use injection operators to match them against patterns and ranges.

Supported bit length and encodings for floating-point types are:

Type name	Bytes	Encoding	Underlynig Rust type
`F16B`	2	bfloat16	`bfloat::bf16`
`F16`	2	IEEE Half	`apfloat::ieee::Half`
`F32`	4	IEEE Single	`apfloat::ieee::Single`
`F64`	8	IEEE Double	`apfloat::ieee::Double`
`F80`	10	IEEE X87 Extended	`apfloat::ieee::X87DoubleExtended`
`F128`	16	IEEE Quad	`apfloat::ieee::Quad`
`F256`	32	IEEE Oct	`apfloat::ieee::Oct`

Range types simplify creation of collection types, described below, as well as are an efficient tool in building cycles and iterators. Cation comes with the following set of range types, each of which can be instantiated using a shorthand range operator.

Type name	Range operator
`RangeAll`	`..`
`RangeTo`	`..<N`
`RangeToIncl`	`..=N`
`RangeFrom`	`M..`
`RangeFromTo`	`M..<N`
`RangeFromToIncl`	`M..=N`

Data type composition

New data types in Cation are constructed via composition of built-in types using categorical product and co-product operators , and | described above. Here are the requirements for the syntax of these operators:

Name	Syntax form	Max no of fields or variants
Product types (structure, tuple)	`• , •` or `(• , •)`	256
Sum types (union, enum)	`•` \| `•` or `(•` \| `•)`	256

Iterable collections

One may use composition operators to build arrays and dynamic collections, which may be iterated over using iterating operations. Cation provides built-in support for the following collections:

Name	Syntax form	Max no of elements / fields / variants
Fixed array	`[•^N]`	`N`, up to 2^16
Dynamic list	`[• ^D..<U]`	From `D` to `U`, up to 2^64
Dynamic set	`{• ^D..<U}`	From `D` to `U`, up to 2^64
Dynamic map	`{• -> ^D..U •}`	From `D` to `U`, up to 2^64

Range constriction ^D..=U used in type expression specifying minimum and maximum size of a dynamic collection is called confinement bounds. It can be seen as an upper and lower indexes on the possible number of elements, i.e. type definition [Byte ^ 1..20] can be read as $\bigotimes^{20}_1 byte$ and means product type with dynamic number of fields, from 1 to 20 max, where each field is a byte – or, in more common terms, a byte array of dynamic size which can't have less than one byte – and can't grow larger than 20 bytes.

For simplifying syntax strict encoding provides comprehensions and defaults for specifying the confinement bounds:

Comprehension	Expands to	Comment
`[•]`	`[• ^ 0..=0xFFFF]`	Default number of elements in confined collections is from zero to 2^16
`[•+]` or `[•^1..]`	`[• ^ 1..=0xFFFF]`	Collection which must contain at least one item
`[•^N]`	`[• ^ N..=0xFFFF]`	Collection with minimum of `N` items
`[•^..=N]`	`[• ^ 0..=N]`	Collection having maximum of `N` items

Standard library types

Standard library offers many types composed of the built-ins, including (but not limiting to):

Restricted-bit-length integers, like U1, U2, U3, U4, U5, U6, U7;
ASCII char subsets AsciiChar, AsciiPrintable and others;
Strings: Unicode strings String, ASCII string AsciiString;
Bytes, fixed and dynamic byte arrays: Byte, Bytes16, Bytes32, Blob256, Blob64kB, Blob16MB Blob4GB etc;
Commonly-used monads Maybe, Either, Result, Ternary.

Lambda expressions

Lambda expressions allow a combination of statements, capturing locally-defined values, to be treated as a function. They simplify passing functions as arguments to other functions, saving from boilerplate code.

Lambda expressions have several forms. The canonical one is a specifier form, which starts with a lambda keyword, followed by a value name, colon, argument and return type definition and body:

val local: U8 = random
lambda sq: x U8 -> U32
    pow 2 + local

As any other specifier it can be put into a single line:

val local: U8 = random
lambda sq: x U8 -> U32 := pow 2 + local

The second form is an operator form using .\ or Greek λ; it allows anonymous lambda definitions which may span a single or multiple lines. The specific syntax of this form depends on whether the lambda is the last expression in the line:

-- for the last expression in the line
.\pow 2
-- or even shorter
.\^2

If a lambda expression has to be followed by other expressions, one need to put the expression into a parenthesis:

.\(pow 2), _

Lambda expressions may have own explicit arguments, separated from the function body using double-column. The return type may be omitted, and in this case if is inferred by the compiler from the body of the expression:

.\x U8 -> U16: x pow 2

-- With a type omission:
.\x U8: x pow 2

-- If the expression is not at the end of a line
.\(x U8: x pow 2), _

Finally, there is the multiline forms of lambda expressions:

lambda lambdaFn
    statement1
    statement2

In can be used the same way as single-line expressions, with the only difference – there is no colon between the lambda declaration and the body:

.\ x U8
   x pow 2

.\ x U8 -> U16
   x pow 2

-- Here we can use multiline in the middle of other expression
someFn arg1, .\(x U8 -> U16
   x pow 2
), argLast

The type of the lambda expression is .\ input -> output, and this type may be used in type annotations:

val squared: .\U8 -> U16 := .\pow 2

Generics

Cation supports parametric polymorphism via type classes, generics and confinement constraints.

Type classes

Type classes are the same as classes in Haskell or traits in Rust: they provide a simple way to introduce generics into programming, without the need for complex inheritance rules.

Type classes are declared with class keyword followed by a lowercase name; here is an example how ord and eq classes are defined in the standard library:

class ord
  infx `>?`: Self, Self -> Bool
  
  @final
  infx `<?`: a Self, b Self -> Bool
    ~ (a >? b)

class eq: ord
  infx `=?=`: Self, Self -> Bool

  @final
  infx `=/=`: a Self, b Self -> Bool
    ~ a =? b
  
  @final
  infx `>?=`: a Self, b Self -> Bool
    a =?= b |? true |: a >? b
  
  @final
  infx `<?=`: a Self, b Self -> Bool
    a =?= b |? true |: a <? b

With declarations, all types which has infix >? operator taking two arguments of the same type and returning bool will become ord and acquire the opposite infix operator <?.

Generic arguments

Functions and data types can be equipped with generic arguments between colon separating function name from its input arguments, and => character:

fx len: T any, N unsigned => collection [T ^ $N..$N] -> N

Each generic argument consists of generic parameter name – usually capitalized – and a constraint expressed as a type class name to which the type must belong. Multiple generic parameters should be separated by a comma.

Confinement constraints

When collections are used inside functions sometimes it is important to make sure that the concrete collection types provided to the function in different input or output arguments match their dimensions. This is achieved with a special form of generics named confinement constrains:

fx append: N unsigned => 
  list [String ^ ..$N], item String -> [String ^ ..$N+1]

Computational parallelism

Cation compiler will be designed to reason about the code logic at compile-time, which is possible due to constraints and bounds on all resources used by the language. During compilation, it deduces the graph of all relations between values used in the program, and uses that graph to automatically parallelize all computations which do not access the same values at the same time. Thus, most of the parallelism will come for free, with no effort from the programmer.

When an explicit parallelism is needed, Cation offers channel-based parallelism, made with rho-calculus. Thus, instead of a need to differentiate async and non-async functions one uses non-blocking channels to send data by value. Since there is no difference between data, types, functions and everything is just a natural transformations, channels can send also functions between threads. This significantly simplifies programming patterns and reduces the number of possible pitfalls,

Language comparison

The main differences between Cation and other modern programming languages with strong functional properties are given in the table:

Language	Absent in the language (but present in Cation)	Features used in Cation
Haskell	Termination analysis, parallel computing, GADT, "infinite types", absence of runtime	Monads
Idris v2	Parallel computing, absence of runtime	GADT, termination analysis, monads, eager evaluation
Scala v2	Termination analysis, absence of runtime, types are not first-class citizens	GADT, eager evaluation, monads
Rust	Termination analysis, parallel computing, GADT, types are not first-class citizens	No runtime