Cation Language Reference

Lexical structure

The lexical structure of Cation describes what sequence of characters form valid tokens of the language. These valid tokens form the lowest-level building blocks of the language and are used to describe the rest of the language in subsequent chapters. A token consists of an identifier, keyword, punctuation, literal, or operator.

In most cases, tokens are generated from the characters of a Cation source file by considering the longest possible substring from the input text, within the constraints of the grammar that are specified below. This behavior is referred to as longest match or maximal munch.

Whitespace and comments

Whitespace has three uses:

to separate tokens in the source file;
to separate statements with indentation;
to distinguish between prefix and infix operators (see Operators);

The following characters are considered whitespace: space (U+0020), line feed (U+000A), carriage return (U+000D), horizontal tab (U+0009), vertical tab (U+000B), form feed (U+000C) and null (U+0000).

Comments are treated as whitespace by the compiler. Single line comments begin with -- and continue until a line feed (U+000A) or carriage return (U+000D). Multiline comments begin with {- and end with -}. Nesting multiline comments is allowed, but the comment markers must be balanced.

Identifiers

Identifiers begin with an uppercase or lowercase letter A through Z, an underscore (_), a noncombining alphanumeric Unicode character in the Basic Multilingual Plane, or a character outside the Basic Multilingual Plane that isn’t in a Private Use Area. After the first character, digits and combining Unicode characters are also allowed.

To use ASCII punctuation characters in an identifier, or a reserved word, put a backtick (`) before and after it. For example, data isn’t a valid identifier, but `data` is valid. The backticks aren’t considered part of the identifier; `x` and x has the same meaning.

Keywords

The following keywords are reserved and can’t be used as identifiers, unless they’re escaped with backticks, as described above in Identifiers. Keywords other than let and lambda can be used as field or variant names in data types, as parameter names in a function declaration or function call without being escaped with backticks.

data: defines new data type
class: defines a new data type class
fx: defines new function (prefix function)
infx: defines new infix function
lambda: defines a lambda function
alias: defines a new alias for a function
let: defines new value

The following tokens are reserved as built-in operators and can’t be used in custom operators: (, ), {, }, [, ], ., ,, :, ;, #, =>, ->, <-, <|, |>, >|, |?, |:, .\, `.

Literals

A literal is the source code representation of a value of a type, such as a number or string.

The following are examples of literals:

42               -- Integer literal
42#U8            -- Tagged integer literal
3.14159          -- Floating-point literal
'A'              -- Character literal
"Hello, world!"  -- String literal

A literal without a tag doesn’t have a type on its own. Instead, Cation’s type inference attempts to infer a type for the literal. For example, in the declaration let x: U8 := 42, Cation uses the explicit type information (: U8) to infer that the type of the integer literal 42 is U8. If there isn’t suitable type information available, Cation infers that the literal’s type is one of the default literal built-in Cation types listed in the table below.

Literal	Default type
Integer	U64
Float	F64
Character	Char
String	String

When specifying the type annotation for a literal value with a tag, the annotation’s type must be a type that can be instantiated from that literal value.

For example, in the declaration let str := "Hello, world", the default inferred type of the string literal "Hello, world" is String. This can be changed by using type annotation or tag: both let str: AsciiString := "Hello, world" and let str := "Hello, world"#AsciiString will be inferred to AsciiString type instead of String.

Integer literals

Integer literals represent integer values of some precision. By default, integer literals are expressed in decimal; you can specify an alternate base using a prefix:

binary literals begin with 0b,
octal literals begin with 0o,
and hexadecimal literals begin with 0x.

Decimal literals contain the digits 0 through 9. Binary literals contain 0 and 1, octal literals contain 0 through 7, and hexadecimal literals contain 0 through 9 as well as A through F in upper- or lowercase.

Negative integers literals are expressed by prepending a minus sign (-) to an integer literal, as in -42.

Underscores (_) are allowed between digits for readability, but they’re ignored and therefore don’t affect the value of the literal. Integer literals can begin with leading zeros (0), but they’re likewise ignored and don’t affect the base or value of the literal.

Unless otherwise specified, the default inferred type of an integer literal is the Cation built-in type U64. Cation has built-in types for various sizes of signed, unsigned and built-in integers, as described in Integers.

Floating-point literals

Floating-point literals represent floating-point values of some precision.

By default, floating-point literals are expressed in decimal (with no prefix), but they can also be expressed in hexadecimal (with a 0x prefix).

Decimal floating-point literals consist of a sequence of decimal digits followed by either a decimal fraction, a decimal exponent, or both. The decimal fraction consists of a decimal point (.) followed by a sequence of decimal digits. The exponent consists of an upper- or lowercase e prefix followed by a sequence of decimal digits that indicates what power of 10 the value preceding the e is multiplied by. For example, 1.25e2 represents 1.25 x 10², which evaluates to 125.0. Similarly, 1.25e-2 represents 1.25 x 10⁻², which evaluates to 0.0125.

Hexadecimal floating-point literals consist of a 0x prefix, followed by an optional hexadecimal fraction, followed by a hexadecimal exponent. The hexadecimal fraction consists of a decimal point followed by a sequence of hexadecimal digits. The exponent consists of an upper- or lowercase p prefix followed by a sequence of decimal digits that indicates what power of 2 the value preceding the p is multiplied by. For example, 0xFp2 represents 15 x 2², which evaluates to 60. Similarly, 0xFp-2 represents 15 x 2⁻², which evaluates to 3.75.

Negative floating-point literals are expressed by prepending a minus sign (-) to a floating-point literal, as in -42.5.

Underscores (_) are allowed between digits for readability, but they’re ignored and therefore don’t affect the value of the literal. Floating-point literals can begin with leading zeros (0), but they’re likewise ignored and don’t affect the base or value of the literal.

Unless otherwise specified, the default inferred type of a floating-point literal is the Cation built-in F64, which represents a 64-bit floating-point number.

Character literal

String literals

Binary data literals

Date-and-time literals

Custom literals

Syntactic constructs

Specifiers

Projections

Function calls are made using projection operator expressed as a space ( ); i.e. for a prefix function fn the way to call it is to write fn args, where args can be a named or unnamed data type matching arguments in the function declaration. They can be kapt in (..), or without them, but if multiple arguments are present they must be separated from each other with a comma , (product operator which composes arguments into a data type). Thus, a function always takes a single argument, which can be a named or unnamed data type.

Prefix functions may be called using an infix notation: (args).fn, or via arg0.fn arg1, .., argLast notation.

Call of an infix function infn is arg0 infn arg1, .., argLast. It can be reverted via use of (infn arg0, .., argLast) or (infn) arg0, .., argLast forms.

Injections

Injections are branching operators, which allow code execution to branch execution basing on a value of a co-product type. They are either pattern matching (an equivalent of Rust match and Haskell case), or a boolean-test based.

Pattern matching is done with >| infix operator, which takes a value of a co-product type on the left side, and a set of natural transformations matching each of the injections on the right:

value >|
  pattern1 => code
  pattern2 => code
  _ => code -- default match

Branching based on the boolean conditions uses .. |? .. |: .. operator, which is also a ternary operator, corresponding to if .. then .. else construction. It also has a form of condition1 |? statement1 |: condition2 |? statement2 |: statementElse, corresponding to if condition1 then statement1 elseIf condition2 then statement2 else statementElse.

Annotations

Operators

Built-in

Categorical limits

Operator , is a categorical product operator. Operator | is categorical sum operator

Mappings (morphisms and functors)

Operators -> <- define functors: they do project an object or each of objects in a category (value, collection, iterator or a data type) to a different object or a category.

For instance, when used in function definition, it projects all possible values of a function input as data type (which may be a composite anonymous type) to the set of return values defined by the return data type:

fx some: input Type -> output Type

Operator -> is also used inside collections of key-value maps, separating keys and their corresponding values.

Natural transformation mapping

Operator => is used to introduce generic arguments as a natural transformation.

Iteration operators

Operators |> and <| used to build iterators: it maps of the items of an iterable collection into a new value:

x <- 0..<100 |> mulTry x, 2

which will result in a new iterator over doubled values in the range from 0 to 99.

One may skip te initial x <- assignment; in this case the input value of the current iteration will be put into the first argument of the next expression:

0..<100 |> mulTry 2

one may also use the context operator _:

0..<100 |> _ mulTry 2

or reverse the order of the expressions:

mulTry 2 <| 0..<100

Lambda operator

Lambda operator .\, with alias λ (Greek letter lambda) is a shorthand for creating lambda expressions. It has four forms:

single-line, end of line (trailing lambda):
```
.\args -> ret: expr
```
single-line, alongside other expressions:
```
.\(args -> ret: expr), _
```
multi-line, end of line (trailing lambda block):
```
.\args -> ret
  expr1
  expr2
  -- ...
```

multi-line, alongside other expressions:

.\(args -> ret
  expr1
  expr2
  -- ...
), _

All forms may skip the return type -> ret part; in this case the return type is inferred by the compiler:

.\args: expr

.\(args: expr), _

.\args
  expr1
  expr2
  -- ...

.\(args
  expr1
  expr2
  -- ...
), _

If the lambda expression has no inputs, lambda operator must be simply followed by the expression itself with no colon used:

.\expr

.\(expr), _

.\
  expr1
  expr2
  -- ...

.\(
  expr1
  expr2
  -- ...
), _

Standard library

Monadic operators

Operators !, !!, ? and ?? are used with monad types (like optionals/maybes or result types). First, ! and ? marks a part of the expression which should be tested against unwrapped monad value; if the unwrap procedure fails, it will default to the expression put after !! or ?? operator at the end of the same line. For instance, if value is of Maybe type we can write value ?? 0; or value !! noValue; the first will default expression to zero, and the second convert return type of the function to a result type, create an enum Error with variant noValue and will return that value if the maybe monad in value doesn't contain an actual value.

Arithmetic operators

Unlike other languages, Cation requires to explicitly handle overflow and underflow conditions in arithmetics, as well as zero divisions, which is required for termination analysis and helps in avoiding undefined behaviours. Thus, each of arithmetic operators has multiple forms, handling overflows and underflows differently. This approach used in Cation is named checked arithmetics.

To construct arithmetic operators, a symbol representing mathematical operation (like +, -, *, /, %, ^) is postfixed with a symbol representing the way of handling overflow, underflow or zero divisions. Such symbols are:

? for converting the result of the operation into Maybe monad;
! for converting the result of the operation into Result::error monad variant with details on specific condition which has occurred;
% for wrapping a value in case of overflow (modulo-arithmetic) and returning ArRes monad;
^ for saturating a value with a maximum possible value in case of overflow;
with no postfix for extending the result type to the next bit dimension so it always fits.

Additionally, Cation allows native arithmetic operations when an operation itself and the resulting type guarantees impossibility of overflow or other exceptional conditions. These operations are:

unsigned, signed and non-zero integer addition with + operator, when the bit dimension of the resulting integer type is equal to or exceeds the sum of the bit dimensions of the inputs;
signed integer subtraction with - operator, when the bit dimension of the resulting integer type is equal to or exceeds the sum of the bit dimensions of the inputs;
unsigned, signed and non-zero integer multiplication with * operator, when the dimension of the resulting integer type is equal to or exceeds the product of the bit dimensions of the inputs;
division with / operator of non-zero unsigned integers;
modulo division with % operator of non-zero unsigned integers;
unsigned, signed and non-zero potentiation with ^ operator when the bit dimensions of the resulting integer type exceeds

Thus, the resulting table of the arithmetic operations is the following:

Operation	Native	Maybe	Result	Wrapping	Saturating	Extending
Addition	`+`	`+?`	`+!`	`+%`	`+^`	`+`
Subtraction	`-`	`-?`	`-!`	`-%`	`-^`	`-`
Multiplication	`*`	`*?`	`*!`	`*%`	`*^`	`*`
Division	`/`	`/?`	`/!`	n/a	n/a	n/a
Modulo division	`%`	`%?`	`%!`	n/a	n/a	n/a
Potentiation	`^`	`^?`	`^!`	`^%`	`^^`	`^`

Bitwise operators

Boolean logic operators

Ternary logic operators

String operators

Types

Built-in

Initial and terminal

Integers

Integers come in signed, unsigned and non-zero unsigned classes, each of which contains types with different bit length.

Supported bit length for integer types are:

Bits	Bytes	Unsigned	Signed	Non-zero	C equivalents	Rust equivalents
8 bits	1 byte	`U8`	`I8`	`N8`	`(unsigned)` `char`	`u8`, `i8`, `NonZeroU8`
16 bits	2 bytes	`U16`	`I16`	`N16`	`(unsigned)` `short`	`u16`, `i16`, `NonZeroU16`
24 bits	3 bytes	`U24`	`I24`	`N24`	n/a	n/a
32 bits	4 bytes	`U32`	`I32`	`N32`	`(unsigned)` `long`	`u32`, `i32`, `NonZeroU32`
40 bits	5 bytes	`U40`	`I40`	`N40`	n/a	n/a
48 bits	6 bytes	`U48`	`I48`	`N48`	n/a	n/a
56 bits	7 bytes	`U56`	`I56`	`N56`	n/a	n/a
64 bits	8 bytes	`U64`	`I64`	`N64`	`(unsigned)` `long long`	`u64`, `i64`, `NonZeroU64`
80 bits	10 bytes	`U80`	`I80`	`N80`	n/a	n/a
96 bits	12 bytes	`U96`	`I96`	`N96`	n/a	n/a
112 bits	14 bytes	`U112`	`I112`	`N112`	n/a	n/a
128 bits	16 bytes	`U128`	`I128`	`N128`	n/a	`u128`, `i128`, `NonZeroU128`
256 bits	32 bytes	`U256`	`I256`	`N256`	n/a	n/a
512 bits	64 bytes	`U512`	`I512`	`N512`	n/a	n/a
1024 bits	128 bytes	`U1024`	`I1024`	`N1024`	n/a	n/a

All integer types in Cation are co-product types, made with all their allowed values. This makes it possible to use injection operators to match them against patterns and ranges.

Floats

Supported bit length and encodings for floating-point types are:

Type name	Bytes	Encoding	Underlynig Rust type
`F16B`	2	bfloat16	`bfloat::bf16`
`F16`	2	IEEE Half	`apfloat::ieee::Half`
`F32`	4	IEEE Single	`apfloat::ieee::Single`
`F64`	8	IEEE Double	`apfloat::ieee::Double`
`F80`	10	IEEE X87 Extended	`apfloat::ieee::X87DoubleExtended`
`F128`	16	IEEE Quad	`apfloat::ieee::Quad`
`F256`	32	IEEE Oct	`apfloat::ieee::Oct`

Character

Cation has just a single built-in Unicode character type Char. It is the only of the built-in types which has variable bit length, due to the Unicode standard. Its length varies from 8 bits to 32 bits; with 8 bit step.

Ranges

Range types simplify creation of collection types, as well as are an efficient tool for cycles and iterators. Cation comes with the following set of range types, each of which can be instantiated using a shorthand range expressions.

Type name	Range operator
`RangeAll`	`..`
`RangeTo`	`..<N`
`RangeToIncl`	`..=N`
`RangeFrom`	`M..`
`RangeFromTo`	`M..<N`

Standard library

Small integers

String types

Monads

Statements

Expressions

Expressions are separated either with a line feed character (U+000A), or with a semicolon ; if put on one line one after the other.

Lambda expressions

Lambda expressions have two forms: operator and specifier.

Collection comprehension

Grouping operator is (..) constructs an anonymous data type.

Operators [..], {..} and {..->..} construct collection types.

Range expressions

Operators ..= and ..< help to create iterators or collections over ranges. They may be combined with a step size information in form of

0<=2x<=100

where instead of 2 any other constant can be given

Context value

Operator _ means context default value, like the one kept in a stack from a previous function call or a decomposition

Result value

Operator $ means result of the current expression or a function. It can be used to assign an output value of a function, like in $ <- value, or to access the results within iterations from the previous cycles of iterations, like with $-1, accessing the previous iteration result, or $3 accessing the result of the third iteration from the beginning.

Patterns

Specifiers

Function

Data type

Data class

Value specifier

Lambda specifier

Lambda specifier starts with a lambda keyword, followed by a value name, colon, argument and return type definition and body:

val local: U8 = random
lambda sq: x U8 -> U32
    pow 2 + local

As any other specifier it can be put into a single line:

val local: U8 = random
lambda sq: x U8 -> U32 := pow 2 + local

Generics

Annotations

Attributes

Statements starting with @ are used to add attributes other Cation statements. Attributes are a way of metaprogramming: they are similar to procedural macros in Rust or annotations in Java.

Cation Language Reference

Lexical structure

Whitespace and comments

Identifiers

Keywords

Literals

Integer literals

Floating-point literals

Character literal

String literals

Binary data literals

Date-and-time literals

Custom literals

Syntactic constructs

Specifiers

Projections

Injections

Annotations

Operators

Built-in

Categorical limits

Mappings (morphisms and functors)

Natural transformation mapping

Iteration operators

Lambda operator

Standard library

Monadic operators

Arithmetic operators

Bitwise operators

Boolean logic operators

Ternary logic operators

String operators

Types

Built-in

Initial and terminal

Integers

Floats

Character

Ranges

Standard library

Small integers

String types

Monads

Statements

Expressions

Lambda expressions

Collection comprehension

Range expressions

Context value

Result value

Patterns

Specifiers

Function

Data type

Data class

Value specifier

Lambda specifier

Generics

Annotations

Attributes

@id

@alias

@final

@override

@private

Tags

Co-product variant tags

Type tags

`@id`

`@alias`

`@final`

`@override`

`@private`