Magnolisp
This is Magnolisp, a small, experimental language and implementation. It is experimental in its implementation technique, which is to replace the phase level 0 (runtime) language of Racket with something non-Racket (here: Magnolisp), and translate it into another language (here: C++) for execution.
Magnolisp is inspired by Racket and the likewise experimental programming language Magnolia. Its algebraic language resembles Racket’s, and Racket also provides the module and macro systems. The language is restricted in ways similar to Magnolia, with the restrictions designed to support static reasoning about code, and to allow for fairly direct mapping to most mainstream languages.
Magnolisp is intended to explore and demonstrate techniques for source-to-source compilation on top of Racket, not to support writing of useful applications.
1 Magnolisp the Language
#lang magnolisp | package: magnolisp |
The Magnolisp language relies on Racket for its module and macro systems. All of Racket may be used for macro programming. The racket/base language is provided by default for phase level 1 (compile time).
A small subset of racket/base definitions is also available at phase level 0 by default, as these may be used in runtime code, and evaluated as magnolisp in the Racket VM. However, only a small subset of Racket can be handled by the Magnolisp compiler, and either the Magnolisp Racket language or the Magnolisp compiler will report errors as appropriate for uncompilable language.
When a magnolisp module is evaluated as Racket, any module top-level runtime expressions will also get evaluated; this feature is intended to facilitate testing during development. The Magnolisp compiler, on the other hand, discards top-level expressions, and also any top-level definitions that are not actually part of the program being compiled.
2 Modules and Macros
The Racket provide and require forms may be used in Magnolisp as normal, also at phase level 0. However, as far as C++ compilation is concerned, these are only used to connect together Magnolisp definitions internally to the compiled program/library. C++ imports and exports are specified separately using the foreign and export annotations.
For defining macros and macro-expansion time computation, the relevant Racket facilities (e.g., define-syntax, define-syntax-rule, begin-for-syntax, etc.) may be used as normal.
3 Syntactic Forms
(require magnolisp/surface) | package: magnolisp |
The surface syntax of Magnolisp consists of a collection of Magnolisp-specific forms, as well as a selection of supported Racket forms. The magnolisp/surface module defines the Magnolisp-specific ones, and the magnolisp language exports these at phase level 0.
3.1 Defining Forms
In Magnolisp, it is possible to define functions, types, and variables; of these, variable definitions are not allowed at the top level.
As Magnolisp has almost no standard library, it is ultimately necessary to define primitive types and functions (flagged as foreign) in order to be able to compile programs that do anything useful. For this reason there is also a primitives convenience form for defining multiple foreign types and/or functions at once.
syntax
(define #:type id maybe-annos)
(define id maybe-annos expr) (define (id arg ...) maybe-annos expr ...) (define (id arg ...) maybe-annos #:function Racket-expr)
It is acceptable to define a type for a local (lexical) scope, if the type is not referenced elsewhere.
The second form defines a variable with the name id, and the (initial) value given by expr. A type annotation may be included to specify the Magnolisp type of the variable.
Any (module) top-level variable definition will not get translated into C++.
The third form defines a function. The (optional) body of a function is a sequence of expressions, which (if present) must produce a single value.
Unlike in Racket, no tail-call elimination may be assumed even when a recursive function application appears in tail position.
Functions may be defined locally within another function. A local function may reference free variables from a surrounding lexical scope, as long as they are not used as L-values (i.e., targets of assignment), and as long as the variables are not top-level.
Where the function is declared as foreign, the Magnolisp compiler will ignore any body expr ... sequence. When a function without a body is invoked as Racket, the result is #<void>. When a foreign function has a body, it is typically there to “simulate” the behavior of the C++ implementation in the Racket VM. For purposes of simulation it can be useful to make use of the full Racket runtime language; to implement a function body in Racket syntax instead of Magnolisp syntax, enclose the body expression within a begin-racket form.
A function with the export flag in its annotations indicates that the function is part of the public API of a program that includes the containing module. When a function is used merely as a dependency (i.e., its containing module was not specified as being a part of the program), any export flag is ignored.
When a function includes a type annotation, the type expression must specify a function type (see Type Expressions).
(define (identity x) x) (define (five) #:: (export [type (-> int)]) 5) (define (inc x) #:: (foreign [type (-> int int)]) (add1 x)) (define (seven) #:: (foreign [type (-> int)]) 1 2 3 4 5 6 7) (define (nine) #:: (foreign [type (-> int)]) (define x (seven)) (define (compute) (inc (inc x))) (compute))
Here, identity must have a single, concrete type, possible to determine from the context of use. It is not a generic function, and hence it may not be used in multiple different type contexts within a single program.
The second define form may also be used to define a function (even a top-level one), provided that the “variable initializer” expr is a lambda expression. Alternatively, if the annotations indicate that the definition is for a foreign function (whose type is explicitly given as a function type), then id is also taken to bind a function. Any “variables” that are applied in a program are also taken to refer to functions. For foreign function definitions of this form, the expr would typically be a function (value) for “simulating” the foreign behavior.
> (let () (define two1 #:: ([type (-> int)]) (#%plain-lambda () 2)) (define two2 (#%plain-lambda () 2)) (define mul #:: (foreign [type (-> int int int)]) *) (define (four) (mul (two1) (two2))) (mul (four) (four))) 16
The fourth define form (with the #:function keyword) may likewise be used to define a function, one whose Racket implementation is given as an Racket-expression (for which Racket syntax is assumed). The arity of the function value that the expression yields must match the number of declared arguments. When this form is used, it is not necessary to give the type of the function, provided that the type can be inferred from context.
The first form states that id is a type, probably providing annotations for it (declare’ing types as foreign is presently compulsory).
The second form states that id is a function, possibly providing annotations for it. The arguments arg ... only serve to specify the arity of the function.
The key difference between define and declare is that the former binds the identifier, and thus at the same time necessarily specifies any Racket implementation, while the latter does not. That is: Racket’s define can be used to give a Racket implementation for something; Magnolisp’s define can be used to give both a Racket implementation (possibly a “non-implementation”) as well as C++ translation information for something; whereas declare only gives the C++ translation information for something.
syntax
(function (id arg ...) maybe-annos maybe-body)
maybe-body =
| expr
NOTE: This form is deprecated; use define, instead.
syntax
(var id maybe-annos expr)
NOTE: This form is deprecated; use define, instead.
syntax
(primitives definition ...)
definition = [#:type id] | [#:function (id arg ...) :: type-expr]
3.2 Annotations
maybe-annos | = | ||||
| | #:: (anno-expr ...) | ||||
anno-expr | = | export-anno-expr | |||
| | foreign-anno-expr | ||||
| | type-anno-expr | ||||
| | ... | ||||
export-anno-expr | = | export | |||
| | (export C++-id) | ||||
foreign-anno-expr | = | foreign | |||
| | (foreign C++-id) | ||||
type-anno-expr | = | (type type-expr) |
where:
C++-id A valid C++ identifier. When not provided, a default C++ name is automatically derived from the Magnolisp name.
The set of annotations that may be used in Magnolisp is open ended, to allow for additional tools support. Only the most central Magnolisp-compiler-recognized annotations are included in the above grammar.
It is not always necessary to explicitly specify a type for a typed Magnolisp definition, as the Magnolisp compiler does whole-program type inference (in Hindley-Milner style). When evaluating as Racket, type annotations are not used at all.
For convenience, the magnolisp language installs a reader extension that supports annotation related shorthands: #ap(anno-expr ...) expr is short for (annotate (anno-expr ...) expr); and ^type-expr is short for (type type-expr). For example, #ap(^int) expr reads as (annotate ((type int)) expr).
syntax
(type type-expr)
syntax
(literal fmt-elem ...)
fmt-elem = str | fmt-expr fmt-expr = (fmt-insn fmt-obj) fmt-insn = display | write | | cxx-str fmt-obj = datum | type-id | str
str A string literal.
display Use Racket’s display to format the object.
write Use Racket’s write to format the object.
Use Racket’s print to format the object.
cxx-str Format the object as a “regular” C++ string literal, after coercing it into a string. Only non-control ASCII characters are allowed in the string.
datum Substituted with the datum of the literal.
type-id Substituted with the C++ name of the literal’s type.
3.3 Type Expressions
type-expr | = | type-id | ||
| | (-> type-expr ... type-expr) | |||
| | (<> type-expr type-expr ...) | |||
| | (exists type-var-id ... type-expr) | |||
| | (for-all type-var-id ... type-expr) | |||
| | (auto) |
Type expressions are parsed according to the above grammar, where type-id must be an identifier that names a type. The only predefined types in Magnolisp are Void and Bool, and any others must be defined using define #:type or some other type-binding form.
A type-var-id is an identifier that gets bound as a type variable for the context of its type expression.
syntax
(-> type-expr ... type-expr)
syntax
(<> type-expr type-expr ...)
(define #:type stack #:: (foreign)) (define (stack-id x) #:: ([type (-> (<> stack int) (auto))]) x) ; the C++ type of this x use will be stack<int>
Only foreign functions may be universally typed.
(define #:type Box #:: (foreign)) (define (box v) #:: (foreign [type (∀ E (-> E (<> Box E)))])) (define (unbox box) #:: (foreign [type (∀ E (-> (<> Box E) E))]))
syntax
(auto)
3.4 Value Expressions
Like Racket (and unlike C++), the Magnolisp language makes no distinction between statements and expressions. Some expressions yield no useful value, however; such expressions conceptually produce a result of type Void (such result values do exist at Racket run time, but not at C++ run time). Some expressions yield no values (or multiple values), but are merely used as a syntactic device, and only allowed to appear in certain contexts.
Magnolisp borrows a number of constructs from Racket (or Scheme). For example, there is a conditional form (if test-expr then-expr else-expr), as well as the derived forms (when test-expr then-expr ...+) and (unless test-expr then-expr ...+). The test-expr conditional expression must always be of type Bool, and whether it holds depends on the “truthiness” of its value, as interpreted in C++ or Racket (as applicable). The branches of an if must generally be of the same type, except where the result of the if form is discarded. The when and unless can generally only appear in such result-discarding contexts, as they have an implicit “else” branch of type Void.
A (begin body ...) form, in Magnolisp, signifies a sequence of expressions, itself constituting an expression. Similarly to Racket, to allow definitions to appear within an expression sequence, (let () body ...) should be used instead.
The (let ([id expr] ...) body ...+), (let* ([id expr] ...) body ...+), and (letrec ([id expr] ...) body ...+) forms are also available in Magnolisp, but the named variant of let is not supported.
The (set! id expr) form is likewise available in Magnolisp, supporting assignment to variables. The left-hand side expression id must be a reference to a bound variable. (The id may naturally instead be a transformer binding to an assignment transformer, in which case the form is macro transformed as normal.)
In Magnolisp, (void expr ...) is an expression with no useful result (the result is of the unit type Void). Any arguments to void are evaluated as usual, but they are not used. The (values) form signifies “nothing,” and has no result; hence it is an error for (values) to appear in a position where the context expects a result. In result expecting contexts, the former may only appear in a 1-value context, and the latter in a 0-value context (there are few in Magnolisp).
The define forms may appear in a Racket internal-definition context (and not Racket expression context). The same is true of define-values forms that conform to the restricted syntax supported by the Magnolisp compiler.
syntax
(cast type-expr expr)
> (cast int 5) 5
While generally only declarations require annotations, cast demonstrates a specific case where it is useful to associate annotations with expressions.
syntax
(annotate (anno-expr ...) expr)
syntax
(if-target name then-expr else-expr)
Note that there is no specific support for execution-target-conditional macro expansion in Magnolisp (such conditionality is possible, but Magnolisp itself has no supporting mechanisms for it). Instead, to generate different code for different targets, one may use if-target to macro generate code for all targets at once (currently only C++ and Racket). The choice of which alternative code fragment to evaluate will be made after Magnolisp programs’ macros have been expanded, but still at compile-time, either during source-to-source or bytecode compilation (depending on the execution target).
> (if-target cxx (seven) (five)) 5
syntax
(if-cxx then-expr else-expr)
See also: begin-racket, let-racket.
3.5 Racket Forms
To include Racket code in a phase level 0 context that is significant to Magnolisp, you may wrap the code in a form that indicates that the code is only intended for parsing as Racket. Code so wrapped must be grammatically correct Racket, but not necessarily Magnolisp. The wrapping forms begin-racket and let-racket merely switch syntaxes, and have no effect on the namespace used for evaluating the enclosed sub-forms; the surrounding namespace is still in effect. Nesting of the wrapping forms is allowed.
syntax
(begin-racket Racket-form ...)
> (require (prefix-in r. (only-in racket/base define)))
> (begin-racket (r.define six 6) (r.define (one-more x) (let dummy () (+ x 1))))
> (define (eight) #:: (foreign [type (-> int)]) (one-more (one-more six)))
> (eight) 8
syntax
(let-racket Racket-expr ...)
> (define (three) #:: (foreign [type (-> int)]) (let-racket (define-values (x y) (let () (values 1 2))) (set! x (let dummy () (one-more y))) x))
> (three) 3
4 Runtime Library
(require magnolisp/prelude) | package: magnolisp |
The Magnolisp language includes a small number of predefined names (that are not syntax). Most notably, the compiler expects expressions of type Bool and Void in certain contexts, and it also recognizes their identifiers. While the C++ translation semantics of these types are not built into the compiler, such semantics are predefined by the language. More specifically, the magnolisp language uses the magnolisp/prelude module as its “runtime library,” one that specifies the C++ runtime names to which these known types correspond. The magnolisp/prelude module only contains such compile-time information, and no Racket bindings are exported from it.
type
Bool : any/c
type
Void : any/c
5 Core Magnolisp
(require magnolisp/core) | package: magnolisp |
There are a small number of Magnolisp-specific names that are treated specially by the Magnolisp compiler. These are bound in the magnolisp/core module, and exported for phase level 0 by the magnolisp language.
5.1 Magnolisp Built-Ins
A boolean expression is simply an expression of type Bool, which is one of the two predefined types in Magnolisp. The other one is Void, which is Magnolisp’s unit type (whose values carry no information).
type
Bool : any/c
type
Void : any/c
5.2 Magnolisp Core Syntax
Magnolisp core syntax is encoded primarily in terms of Racket’s core forms. Magnolisp core forms that have no Racket counterpart, however, are encoded in terms of the #%magnolisp variable, which is treated specially by the Magnolisp compiler. The #%magnolisp binding is exported from the magnolisp/core module.
It is possible to define multiple different surface syntaxes for Magnolisp, and these can be defined as libraries similar to the magnolisp/surface syntax definition used by the magnolisp language. All Magnolisp language variants must, however, refer to the same core bindings (i.e., as exported from racket/base and magnolisp/core), as no other bindings are treated specially by the Magnolisp compiler.
binding
#%magnolisp : any/c
5.3 Fully Expanded Programs
As far as the Magnolisp compiler is concerned, a Magnolisp program is fully expanded if it conforms to the following grammar. Any syntactic ambiguities are resolved in favor of the first matching grammar rule.
A non-terminal ntrkt is as documented for non-terminal nt in Fully Expanded Programs of the Racket Reference.
A form formign denotes language that is ignored by the Magnolisp compiler, but which may be useful when evaluating as Racket.
A form formpname = pval means the form form whose syntax object has the property named pname set to the value pval. A form formpname ≠ pval means the form form whose syntax object has the property named pname set to some value that is not pval. A form (sub-form ...)pname =/≠ pval may alternatively be written as (pname =/≠ pval sub-form ...).
Anything of the form idid-expr is actually a non-terminal like id-expr, but for the specific identifier id.
module-begin-form | = | (#%module-begin mgl-modlv-form ...) | |||||||||
mgl-modlv-form | = | (#%provide raw-provide-specrkt ...)ign | |||||||||
| | (#%require raw-require-specrkt ...)ign | ||||||||||
| | submodule-formrkt,ign | ||||||||||
| | (begin mgl-modlv-form ...) | ||||||||||
| | (begin-for-syntax module-level-formrkt ...)ign | ||||||||||
| | module-level-def | ||||||||||
| | (define-syntaxes (trans-id ...) Racket-expr)ign | ||||||||||
| | Racket-exprign | ||||||||||
| | in-racket-formign | ||||||||||
module-level-def | = |
| |||||||||
| | (define-values (id) mgl-expr) | ||||||||||
| |
| ||||||||||
Racket-expr | = | exprrkt | |||||||||
in-racket-form | = | Racket-form'for-target ≠ 'cxx | |||||||||
mgl-expr | = | in-racket-formign | |||||||||
| | (begin mgl-expr ...+) | ||||||||||
| | (begin0 mgl-expr mgl-expr ...) | ||||||||||
| | (#%expression mgl-expr) | ||||||||||
| | (#%plain-lambda (id ...) mgl-expr ...+) | ||||||||||
| | if-target-expr | ||||||||||
| |
| ||||||||||
| | (if mgl-expr mgl-expr mgl-expr) | ||||||||||
| | (#%plain-app voidid-expr mgl-expr ...) | ||||||||||
| | (#%plain-app valuesid-expr mgl-expr) | ||||||||||
| | (#%plain-app valuesid-expr) | ||||||||||
| | (#%plain-app id-expr mgl-expr ...) | ||||||||||
| |
| ||||||||||
| | (let-values (bind-in-let ...) mgl-expr ...+) | ||||||||||
| | (letrec-values (bind-in-let ...) mgl-expr ...+) | ||||||||||
| |
| ||||||||||
| | (set! id mgl-expr) | ||||||||||
| | (quote datum) | ||||||||||
| | (#%top . id) | ||||||||||
| | id | ||||||||||
bind-in-let | = |
| |||||||||
| | [() mgl-expr] | ||||||||||
| | [(id) mgl-expr] | ||||||||||
if-target-expr | = |
| |||||||||
| |
| ||||||||||
id-expr | = | id | |||||||||
| | (#%top . id) | ||||||||||
| | (#%expression id-expr) | ||||||||||
mgl-anno-expr | = | (if Racket-exprign anno-expr Racket-exprign) | |||||||||
anno-expr | = |
| |||||||||
| |
| ||||||||||
anno-val-expr | = | (quote datum) | |||||||||
| | (quote-syntax datum) | ||||||||||
type-expr | = | id | |||||||||
| | fn-type-expr | ||||||||||
fn-type-expr | = |
|
where:
id An identifier.
trans-id An identifier with a transformer binding.
datum A piece of literal data. A (quote datum) form is a literal in Magnolisp, and its type must be possible to infer from context. Boolean literals are an exception, as their Magnolisp type is recognized as Bool.
Racket-form Any Racket core form.
in-racket-form Any Racket form that has the syntax property 'for-target set to some value that is not 'cxx, meaning that the form is not intended for compilation to C++. These are ignored by the Magnolisp compiler where possible, and it is an error if they persist in contexts where they ultimately cannot be ignored. (The begin-racket and begin-for-racket forms are implemented through this mechanism.)
if-target-expr Indicates a choice between two expressions that is conditional on the compilation target language. Where the syntax property 'if-target is set to the value 'cxx, the Magnolisp compiler will only compile the first expression. If it is set to some other value (indicating another target language), only the second expression will be compiled. (The if-target form is implemented through this mechanism.)
submodule-form A Racket submodule definition. Submodules are not actually supported by the magnolisp language, but the Magnolisp compiler does allow them to appear, and merely ignores them.
anno-expr An annotation expression, containing an identifier id naming the kind of annotation, and an expression specifying the “value” of the annotation. In the generic case, any symbol can be used to name an annotation kind, and any quoted or quote-syntaxed datum can give the value. Only annotations of kind 'type are parsed in a specific way.
(Warning: For some of the idid-expr non-terminals, the current parser actually assumes a direct id.)
6 Evaluation
Programs written in Magnolisp can be evaluated in the usual Racket way, provided that the #lang signature specifies the language as magnolisp. Any module top-level phase level 0 expressions are evaluated, and the results are printed (as for Racket’s #%module-begin).
It is also possible to launch a Magnolisp REPL, by issuing the command:
racket -I magnolisp
7 Magnolisp-Based Languages
(require magnolisp/modbeg) | package: magnolisp |
The macros of the language must target Magnolisp’s core syntax.
It follows that the language must refer to Magnolisp’s built-in types, since certain core forms expect said types.
The language must export Magnolisp’s #%module-begin macro, or its own variant thereof, one that prepares all the information that the Magnolisp compiler expects.
Where it is not the magnolisp/prelude module that specifies C++ mappings for the language’s built-ins and primitives, the name of said module (or modules) must be communicated to the compiler via the language’s own #%module-begin macro.
For an example of a language targeting Magnolisp core language, see ErdaC++.
syntax
(module-begin form ...)
procedure
(make-module-begin stx [ #:prelude-path prelude-stx]) → syntax? stx : syntax? prelude-stx : syntax? = #''(magnolisp/prelude)
8 Compiler API
(require magnolisp/compiler-api) | package: magnolisp |
The Magnolisp implementation includes a compiler targeting C++. The magnolisp/compiler-api library provides an API for invoking the compiler.
procedure
(compile-modules module-path-v ... [ #:relative-to rel-to-path-v]) → compilation-state? module-path-v : module-path? rel-to-path-v : (or/c path-string? (-> any) false/c) = #f
procedure
(compile-files path-s ...) → compilation-state?
path-s : path-string?
The optional argument rel-to-path-v is as for resolve-module-path. It is only relevant for relative module paths, and indicates to which path such paths should be considered relative.
Any path-s is mapped to a `(file ,path-s) module path, coercing path-s to a string if necessary.
procedure
(compilation-state? v) → boolean?
v : any/c
procedure
(generate-files st backends [ #:outdir outdir #:basename basename #:out out #:banner banner?]) → void? st : compilation-state? backends : (listof (cons/c symbol? any/c)) outdir : path-string? = (current-directory) basename : string? = "output" out : (or/c #f output-port?) = (current-output-port) banner? : boolean? = #t
9 mglc
The compiler can also be invoked via the mglc command-line tool, specifying the program to compile. The tool gets installed by invoking raco setup. (Alternatively you may just run it as ./mglc on Unix platforms.)
To compile a program with mglc, list the source files of the program as arguments; the program will consist of all the functions in the listed files that are annotated with the export flag, as well as any code on which they rely. A number of compiler options affecting compilation behavior may be passed, see mglc --help for a list.
An example invocation would be:
mglc --stdout --banner --cxx my-program.rkt
which instructs the compiler to print out C++ code into standard output, with banners, for the program "my-program.rkt".
10 Example Code
For sample Magnolisp programs, see the "test-*.rkt" files in the "tests" directory of the Magnolisp implementation codebase.
Note that some of the example programs are written in Magnolisp language variants other than magnolisp (which is the only one documented here), but the differences are typically minor and superficial.
11 Source Code
12 Installation
Racket version 6.3 or higher is required to run the software; a known-compatible version is 6.5, but versions 6.3–6.6 are all expected to work. Version 6.2.1 will not work due to Racket 6.3 having a different macro model, and also due to differences in its organization of unstable libraries.
13 License
Except where otherwise noted, the following license applies:
Copyright © 2012–2016 University of Bergen and the authors.
Authors: Tero Hasu
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.