This project implements a compiler front-end and interpreter for the RPAL (Right-Program, Abstract Language) language, developed in Java. It processes RPAL source code through several stages: lexical analysis, parsing into an Abstract Syntax Tree (AST), AST standardization into a Standardized Tree (ST), and finally, execution using a Control Stack Execution (CSE) machine.
The compiler is built manually without the use of parser/lexer generator tools like YACC or Lex.
- Lexical Analysis: Converts RPAL source code into a stream of tokens.
- Recursive-Descent Parsing: Builds an Abstract Syntax Tree (AST) from the token stream based on the RPAL grammar.
- AST Printing: Option to print the generated AST for debugging and verification (
-ast
flag). - AST Standardization: Transforms the AST into a more canonical Standardized Tree (ST) by applying specific rewrite rules to simplify constructs like
let
,where
,fcn_form
, etc., into more primitive forms usinglambda
,gamma
, andtau
nodes. - Control Stack Execution (CSE) Machine: Evaluates the Standardized Tree to produce the program's output. It supports lexical scoping, closures, recursion, and various built-in functions and operators.
The RPAL compiler processes input in the following sequence:
- Input: An RPAL program provided as a text file (e.g.,
example.txt
). myrpal.java
(Main Program):- Reads the input file.
- Orchestrates the flow between different compiler components.
- Lexical Analysis (
Lexer
package):- The source code string is passed to the
Lexer
. Lexer.tokenize()
converts the string into aList<Token>
.
- The source code string is passed to the
- Parsing (
Parser
package):- The
List<Token>
is passed to theParser
. Parser.parse()
constructs anASTNode
representing the root of the Abstract Syntax Tree.
- The
- AST Output (Optional):
- If the
-ast
command-line flag is provided,ASTNode.print()
is called to display the tree structure, and the program exits.
- If the
- Standardization (
Standardizer
package):- The root
ASTNode
of the AST is passed to theStandardizer
. Standardizer.standardize()
modifies the AST in-place, transforming it into a Standardized Tree (ST).
- The root
- CSE Machine Setup (
CSE_Machine
package):- The root
ASTNode
of the ST is passed to theCSEMachineFactory
. CSEMachineFactory.getMachine()
traverses the ST and generates the initial control structures (a list ofSymbol
objects), stack, and environment for the CSE machine. It returns an initializedCSEMachine
instance.
- The root
- Execution (
CSE_Machine
package):CSEMachine.execute()
is called on the initialized machine.- The CSE machine processes its control structures and stack according to a defined set of rules, evaluating the program.
- Output:
- The
CSEMachine
returns the final result of the program as aString
. myrpal.java
prints this output to the console.
- The
This project is divided into several key modules, each handling a specific phase of the compilation and execution process.
- File Path:
src/myrpal.java
- Responsibility:
- Serves as the entry point for the RPAL compiler and interpreter.
- Handles command-line argument parsing (input file path and
-ast
flag). - Manages file input/output.
- Coordinates the overall workflow by instantiating and invoking methods from the Lexer, Parser, Standardizer, and CSE Machine components in sequence.
- Key Method Structure:
public class myrpal { public static void main(String[] args) { // 1. Parse CLI arguments (filename, -ast flag) // 2. Read program string from file // 3. Lexer lexer = new Lexer(); List<Token> tokens = lexer.tokenize(program); // 4. Parser parser = new Parser(tokens); ASTNode ast = parser.parse(); // 5. If -ast: ast.print(0); return; // 6. Standardizer standardizer = new Standardizer(); standardizer.standardize(ast); // 7. CSEMachineFactory factory = new CSEMachineFactory(); CSEMachine machine = factory.getMachine(ast); // 8. String output = machine.execute(false); // 9. Print output } }
-
Directory: [
src/Lexer/
](file:///RPAL-Project\src\Lexer) -
Responsibility: Performs lexical analysis, converting the raw RPAL source code string into a sequence of tokens.
-
Lexer.java
- File Path:
src/Lexer/Lexer.java
- Core Task: Implements the tokenization logic using regular expressions.
- Key Method Structure:
public class Lexer { // private static final Map<String, String> regexMap; // Defines token patterns public List<Token> tokenize(String input); // Main tokenization method }
- It iterates through predefined regex patterns (for keywords, identifiers, integers, strings, operators, punctuation, comments, spaces) to identify and extract tokens. Comments and spaces are typically skipped. An
END_OF_TOKENS
token is added to mark the end of the input.
- File Path:
-
Token.java
- File Path:
src/Lexer/Token.java
- Core Task: A simple data class representing a single lexical token.
- Structure:
public class Token { private TokenType type; private String value; public Token(TokenType type, String value); public TokenType getType(); public String getValue(); public String toString(); // For debugging }
- File Path:
-
TokenType.java
- File Path:
src/Lexer/TokenType.java
- Core Task: An enumeration defining all possible types of tokens that the lexer can produce (e.g.,
KEYWORD
,IDENTIFIER
,INTEGER
,STRING
,OPERATOR
,PUNCTUATION
,END_OF_TOKENS
).
- File Path:
-
-
Directory: [
src/Parser/
](file:///RPAL-Project\src\Parser) -
Responsibility: Syntactic analysis, transforming the list of tokens from the Lexer into an Abstract Syntax Tree (AST).
-
Parser.java
- File Path:
src/Parser/Parser.java
- Core Task: Implements a recursive-descent parser based on the RPAL grammar specified in RPAL_Grammar.md.
- Key Method Structure:
public class Parser { // private final List<Token> tokens; // private int currentTokenIndex; public Parser(List<Token> tokens); public ASTNode parse(); // Entry point for parsing // Helper methods for token stream management private Token peekToken(); private Token consumeToken(); private Token match(TokenType expectedType); private Token match(TokenType expectedType, String expectedValue); // Recursive methods for each non-terminal in the grammar private ASTNode parseE(); // Parses Expressions private ASTNode parseEw(); // Parses Expressions with 'where' private ASTNode parseD(); // Parses Definitions // ... and so on for parseDa, parseDr, parseDb, parseVl, parseVb, // parseT, parseTa, parseTc, parseB, parseBt, parseBs, parseBp, // parseA, parseAt, parseAf, parseAp, parseR, parseRn }
- It handles operator precedence and associativity by structuring the grammar rules and parsing methods appropriately (e.g., by eliminating left-recursion and using iterative parsing for left-associative operators).
- File Path:
-
ASTNode.java
- File Path:
src/Parser/ASTNode.java
- Core Task: Represents a node in the Abstract Syntax Tree. A single class is used for all node types, distinguished by a
NodeType
enum. - Structure:
public class ASTNode { private NodeType type; private String value; // Stores identifier name, literal value, operator symbol, etc. private List<ASTNode> children; public ASTNode(NodeType type, String value); public ASTNode(NodeType type, String value, List<ASTNode> children); public void addChild(ASTNode child); public NodeType getType(); public String getValue(); public List<ASTNode> getChildren(); public void setType(NodeType type); public void setValue(String value); public void print(int indent); // For pretty-printing the AST }
- File Path:
-
NodeType.java
- File Path:
src/Parser/NodeType.java
- Core Task: An enumeration defining all types of nodes that can appear in the AST (e.g.,
LET
,LAMBDA
,IDENTIFIER
,INTEGER
,OP_PLUS
,GAMMA
,TAU
). These correspond to the=> 'node_type'
annotations in the grammar.
- File Path:
-
ParseException.java
(Defined as an inner class inParser.java
)- Core Task: A custom runtime exception used to signal errors encountered during parsing (e.g., mismatched tokens, unexpected end of input).
-
-
Directory: [
src/Standardizer/
](file:///RPAL-Project\src\Standardizer) -
Responsibility: Transforms the AST into a Standardized Tree (ST). This process simplifies complex RPAL constructs into more fundamental ones, making it easier for the CSE machine to evaluate. The standardization is done in-place by modifying the original AST nodes.
Standardizer.java
- File Path:
src/Standardizer/Standardizer.java
- Core Task: Applies a set of predefined rewrite rules to the AST nodes.
- Key Method Structure:
public class Standardizer { public void standardize(ASTNode node); // Main method, traverses AST (post-order) // Private helper methods for specific transformation rules: private void handleLet(ASTNode node); // let X = E in P => gamma(lambda X.P, E) private void handleWhere(ASTNode node); // P where X = E => gamma(lambda X.P, E) private void handleFunctionForm(ASTNode node); // fcn_form P V1..Vn E => P = lambda V1 . ... . lambda Vn . E private void handleLambda(ASTNode node); // lambda V1..Vn . E => lambda V1 . ... . lambda Vn . E private void handleWithin(ASTNode node); // D1 within D2 (where D1 is X1=E1, D2 is X2=E2) => X2 = gamma(lambda X1.E2, E1) private void handleAt(ASTNode node); // @ E1 N E2 => gamma(gamma(N,E1), E2) private void handleAnd(ASTNode node); // D1 and D2 ... (where Di is Xi=Ei) => (X1,X2,...) = tau(E1,E2,...) private void handleRec(ASTNode node); // rec X = E => X = Y*(lambda X.E) }
- File Path:
-
Directory: [
src/CSE_Machine/
](file:///RPAL-Project\src\CSE_Machine) -
Responsibility: Implements the Control Stack Execution (CSE) machine, which evaluates the Standardized Tree.
-
CSEMachine.java
- File Path:
src/CSE_Machine/CSEMachine.java
- Core Task: Executes the program based on the CSE model using a control list, a stack, and an environment list.
- Key Method Structure:
public class CSEMachine { // private List<Symbol> control; // private List<Symbol> stack; // private List<E> environment; // List of environment markers public CSEMachine(List<Symbol> control, List<Symbol> stack, List<E> environment); public String execute(boolean test); // Main execution loop implementing CSE rules // Private helpers for operations and rule applications private Symbol applyUnaryOperation(Uop rator, Symbol rand); private Symbol applyBinaryOperation(Bop rator, Symbol a, Symbol b); private E getEnv(int index); // Retrieves an environment by its index // private void printControlAndStack(); // For debugging }
- The
execute
method iteratively processesSymbol
objects from thecontrol
list, applying rules based on the symbol type (e.g.,Id
,Lambda
,Gamma
,Bop
,Beta
). It manages variable lookups and bindings throughE
(environment) objects.
- File Path:
-
CSEMachineFactory.java
- File Path:
src/CSE_Machine/CSEMachineFactory.java
- Core Task: Converts the Standardized Tree (ST, an
ASTNode
) into the initial data structures (control, stack, environment) required by theCSEMachine
. - Key Method Structure:
public class CSEMachineFactory { // private int lambdaIndex; // For unique lambda identifiers // private int deltaIndex; // For unique delta (control structure) identifiers // private final E e0; // Initial base environment public CSEMachine getMachine(ASTNode root); // Main factory method private Delta buildDelta(ASTNode node); // Builds a Delta (list of symbols) from an AST subtree private List<Symbol> preOrder(ASTNode node); // Traverses AST to create symbols for a Delta private Lambda createLambda(ASTNode node); // Creates Lambda symbols private Symbol createSymbol(ASTNode node); // Maps ASTNode to specific Symbol subclass }
- It performs a pre-order traversal of the ST. Special nodes like
lambda
and->
(conditional) lead to the creation of nestedDelta
control structures.
- File Path:
-
Symbol.java
- File Path:
src/CSE_Machine/Symbol.java
- Core Task: Defines an abstract base class
Symbol
and a hierarchy of concrete subclasses. These objects represent the items that can appear on the CSE machine's control list and stack. - Key Subclasses (Illustrative):
abstract class Symbol { protected String data; /* ... */ } // Operands (data values) class Rand extends Symbol { /* ... */ } class Int extends Rand { /* ... */ } class Str extends Rand { /* ... */ } class Bool extends Rand { /* ... */ } class Tup extends Rand { private List<Symbol> symbols; /* ... */ } // Represents tuples, including 'nil' (empty tuple) class Dummy extends Rand { /* ... */ } class Func extends Rand { /* For built-in functions like Print, Order */ } // Operators class Rator extends Symbol { /* ... */ } class Bop extends Rator { /* For binary operators like +, -, eq */ } class Uop extends Rator { /* For unary operators like neg, not */ } // Control Structures & Special Symbols class Gamma extends Symbol { /* Represents function application */ } class Lambda extends Symbol { /* int index, int envIndex, List<Id> identifiers, Delta delta; ... */ } class Delta extends Symbol { /* int index, List<Symbol> symbols; // A sequence of control symbols */ } class Beta extends Symbol { /* For conditional branching */ } class Tau extends Symbol { /* int n; // For tuple creation */ } class Ystar extends Symbol { /* For recursion (Y-combinator) */ } class Eta extends Symbol { /* Helper for Ystar application */ } // Identifiers & Environment class Id extends Symbol { /* ... */ } class E extends Symbol { /* int index, E parent, boolean isRemoved, Map<Id, Symbol> values; ... */ } class Err extends Symbol { /* Represents an error state */ }
- Each symbol class encapsulates data and behavior relevant to its role in the CSE machine execution. For example,
Lambda
stores its parameters and body (as aDelta
), andE
(environment marker) stores variable bindings.
- File Path:
-
RPAL-Project
├─ .idea/
├─ bin/ # Compiled .class files (output of Makefile)
│ ├─ CSE_Machine/
│ ├─ Lexer/
│ ├─ Parser/
│ └─ Standardizer/
├─ example/
│ └─ example.txt # Sample RPAL input file
├─ Makefile # Makefile for common build/run tasks
├─ out/ # Output directory for IDE (e.g., IntelliJ)
├─ README.md # This file
├─ RPAL Project.iml # IntelliJ module file
├─ RPAL-Project.iml # IntelliJ project file
├─ RPAL_Grammar.md # RPAL Phrase Structure Grammar
├─ RPAL_Lexicon.md # RPAL Lexical Rules
├─ RPAL_Project.md # Original Project Description
└─ src/ # Source code
├─ CSE_Machine/
│ ├─ CSEMachine.java
│ ├─ CSEMachineFactory.java
│ └─ Symbol.java
├─ input.txt # Another input file, possibly for testing
├─ Lexer/
│ ├─ Lexer.java
│ ├─ Token.java
│ └─ TokenType.java
├─ myrpal.java # Main class
├─ Parser/
│ ├─ ASTNode.java
│ ├─ NodeType.java
│ └─ Parser.java # Includes inner ParseException class
├─ Standardizer/
│ └─ Standardizer.java
└─ tests/ # Test files
├─ ASTandStandardizerTester.java
├─ CSEMachineTest.java
├─ ParserTester.java
└─ StandardizerTests.java
The following make
commands are available for working with the project (assuming you are in the RPAL-Project
root directory):
make
ormake compile
- Compiles all Java files from the
src
directory (except test files) into thebin
directory.
- Compiles all Java files from the
make run
- Compiles to the
bin
directory and runs the program with the example input located atexample/example.txt
.
- Compiles to the
make run-ast
- Compiles to the
bin
directory, runs the program, and prints the AST of the example input inexample/example.txt
.
- Compiles to the
make run-file FILE=path/to/your/file.txt
- Compiles to the
bin
directory and runs an RPAL program from a custom input file.
- Compiles to the
make run-ast-file FILE=path/to/your/file.txt
- Compiles to the
bin
directory and prints the AST of the RPAL program from a custom input file.
- Compiles to the
make clean
- Removes all compiled class files from the
bin
directory.
- Removes all compiled class files from the
To compile and run manually, navigate to the src
directory (or ensure your classpath is set up correctly).
-
Compile:
- Compile the main program and its dependencies. A robust way to compile all necessary files from the
RPAL-Project/src
directory into a../bin
directory (relative tosrc
) would be:# From RPAL-Project/src directory javac -d ../bin */*.java *.java # or to compile all .java files found recursively into ../bin # (adjust find command for your OS if not bash-like) # find . -name "*.java" -print0 | xargs -0 javac -d ../bin
- For Windows (from
RPAL-Project\src
directory):if not exist ..\bin mkdir ..\bin javac -d ..\bin CSE_Machine\*.java Lexer\*.java Parser\*.java Standardizer\*.java myrpal.java
- Compile the main program and its dependencies. A robust way to compile all necessary files from the
-
Run:
-
After compiling, from the
RPAL-Project/bin
directory (orRPAL-Project
ifbin
is in classpath):# Assuming you are in RPAL-Project/bin java myrpal ../example/example.txt java myrpal ../example/example.txt -ast # Or, from RPAL-Project directory, if bin is in classpath or classes are in src: # (Adjust classpath as needed if using the bin directory) # java -cp bin myrpal example/example.txt # java -cp bin myrpal example/example.txt -ast
-
The commands mentioned in the original
README.md
(run from the same location asmyrpal.java
afterjavac myrpal.java
) imply a simpler setup, perhaps where all.class
files are in the same directory asmyrpal.java
. The Makefile handles a more structuredbin
directory. The provided source showsmyrpal.java
atsrc/myrpal.java
, so commands would typically be run relative to thesrc
or abin
output directory. -
General Java Commands (as per project spec, run from
src
or withbin
in classpath):java myrpal <filepath>
- Example:
java myrpal ../example/example.txt
(if running fromsrc
andexample.txt
is inRPAL-Project/example
)
- Example:
java myrpal <filepath> -ast
- Example:
java myrpal ../example/example.txt -ast
- Example:
-
(As specified in RPAL_Project.md
)
File: example.rpal
(or any .txt
file)
let Sum(A) = Psum (A,Order A )
where rec Psum (T,N) = N eq 0 -> 0
| Psum(T,N-1)+T N
in Print ( Sum (1,2,3,4,5) )
Prints the Abstract Syntax Tree. The exact formatting involves indentation with dots (.
) for tree levels.
Example structure:
let
.function_form
..P
..V1
..Vn
..E
.gamma
..lambda
...X
...P (original 'in' part)
..E (original 'let' assignment part)
(The example from RPAL_Project.md
shows a deeply nested structure based on the example input program.)
Prints the result of the program's execution. For the example input program above, the output is:
Output of the above program is:
15
- Grammar: RPAL_Grammar.md
- Lexicon: RPAL_Lexicon.md