Skip to content

Structure and concepts

Julian Oppermann edited this page May 21, 2021 · 56 revisions

Notation

We use an EBNF-like notation with the following operators: alternative |, zero or one repetitions ?, zero or more repetitions *, one or more repetitions +, and unordered list @. Parentheses ( ) group grammar symbols. Tokens are enclosed in single quotes. By convention, lexer rule names are ALL_CAPS, and parser rule names are in CamelCase, beginning with a capital letter.

Overall structure

The file extension for CoreDSL 2 files is .core_desc.

The following grammar defines the structure of a CoreDSL 2 file.

The top-level entity is the core description.

CoreDescription ::= Import* (InstructionSet | CoreDefinition)*

A CoreDSL 2 file may import core descriptions from other files.

Import          ::= 'import' STRING ';'

A core description contains an arbitrary number of instruction sets and/or core definitions. Both entities are structured into sections, describing architectural state, internal and external functions, and instructions. Additional sections may be added in the future versions of the language.

InstructionSet  ::= 'InstructionSet' ID ('extends' ID)? '{' Sections '}'
CoreDefinition  ::= 'CoreDef' ID ('provides' ID (',' ID)*)? '{' Sections '}'
Sections        ::= @(ArchState? Functions? Instructions?)
ArchState       ::= 'architectural_state' '{' ArchStateItem* '}'
Functions       ::= 'functions' '{' Function* '}'
Instructions    ::= 'instructions' Attribute* '{' Instruction* '}'

CoreDef and InstructionSet

The CoreDef construct models a processor core conforming to the given architectural description. The InstructionSet construct allows the separation of instruction set definition and core definition. Hence, by referencing it in the optional provides clause, the same instruction set definition can be reused in multiple core definitions. Instruction sets may be organised hierarchically: An instruction set may extend a previously defined super instruction set, inheriting its architectural state, functions and instructions.

Note The remainder of this page is a syntax-focussed overview of the available language constructs. In addition, a CoreDSL 2 specification is subject to the scoping rules, which determine the visibility of identifiers, as well as the elaboration rules, which introduce a recipe to compose the effective ISA of a core definition and statically evaluate its parameters.

Architectural state

The architectural_state section may contain a subset of declarations and assignments that carry a special meaning for the modelled ISA. In all forms, optional attributes can impose further constraints.

Implementation parameters

Simple variable declarations yield implementation parameters. They may be initialised at the declaration site, or assigned in the same or another architectural_state section.

ArchStateItem ::= TypeSpecifier? ID ('=' ConstantExpression)? Attribute* ';'

Examples

int XLEN;
int REG_LEN = 32;
XLEN = 32;

Registers

A variable declaration with the register keyword defines a single, architectural register. If an array-like dimension specification is present, a register file is declared. The dimension must be a constant expression comprised exclusively of literals and implementation parameters.

ArchStateItem ::= 'register' TypeSpecifier ID ('[' ConstantExpression ']')? Attribute* ';'

Examples

register unsigned int PC [[is_pc]]; // program counter
register unsigned int X[REG_LEN];   // general-purpose register file

Address spaces

An extern array declaration represents an address space of the given type and size.

ArchStateItem ::= 'extern' TypeSpecifier ID ('[' ConstantExpression ']')? Attribute* ';'

Examples

extern char         MEM[1 << XLEN];
extern unsigned int CSR[4096];

Aliases

A declaration with an ampersand token between the type specifier and the identifier introduces an alias. The initialisation is mandatory, and the initial value must reference another register or a single element of a register file or address space.

ArchStateItem ::= TypeSpecifier '&' ID '=' ConstantExpression Attribute* ';'

Example

unsigned int &ZERO = X[0];
unsigned int &mvendorid = CSR[0xF11];

Functions

The functions section contains function declarations and definitions following the usual C syntax.

Function ::= 'extern'? TypeSpecifier ID '(' ParameterList ')' ';'
          |  TypeSpecifier ID '(' ParameterList ')' Attribute* CompoundStatement
 

The extern keyword marks a declaration as a black box. Its invocation and behaviour is implementation-specific.

Instructions

The instructions sections contains an arbitrary number of instruction definitions in the following format below.

Instruction   ::= ID Attribute* '{'
                    'encoding' ':' EncodingSpec ';'
                    ('args_disass' ':' STRING ';')?
                    'behavior' ':' Statement
                  '}'

After the instruction name, optional attributes may be present. The instruction body is organised into tagged components.

Encoding

The encoding specifies the instruction encoding, which is a concatenation of fields.

EncodingSpec  ::= EncodingField ('::' EncodingField)*
EncodingField ::= (ID '[' IntegerConstant ':' IntegerConstant ']') | IntegerConstant

We distinguish named fields and patterns. Named fields comprise an identifier and a bit range, and can be thought of as parameters to the instruction. Patterns are integer constants intended to be matched in an instruction decode stage (or similar). Named fields can occur multiple times in the encoding (with different bit ranges), to denote that the value is encoded using non-consecutive bits in the instruction word.

Example

SW { encoding: offset[11:5] :: src[4:0] :: base[4:0] :: 3'b010 :: offset[4:0] :: 7'b0100011; ...

We recommend to use Verilog-style integer literals in the encoding, as a C-style bit-literals will not capture leading zeros (e.g., 0b010 == 2'b10, not 3'b010 as one might expect).

Disassembler format

TODO: Eyck

Behaviour

The statement following the behavior tag expresses the instruction's (arbitrarily complex) semantic, written in the C-inspired language defined in the remainder of this specification document.

TODO: Should probably switch all text to American English.

Clone this wiki locally