Skip to content

ShakW101/RPAL-Project-JAVA

Repository files navigation

RPAL Compiler in Java

This project implements a compiler front-end and interpreter for the RPAL (Right-Program, Abstract Language) language, developed in Java. It processes RPAL source code through several stages: lexical analysis, parsing into an Abstract Syntax Tree (AST), AST standardization into a Standardized Tree (ST), and finally, execution using a Control Stack Execution (CSE) machine.

The compiler is built manually without the use of parser/lexer generator tools like YACC or Lex.

Features

  • Lexical Analysis: Converts RPAL source code into a stream of tokens.
  • Recursive-Descent Parsing: Builds an Abstract Syntax Tree (AST) from the token stream based on the RPAL grammar.
  • AST Printing: Option to print the generated AST for debugging and verification (-ast flag).
  • AST Standardization: Transforms the AST into a more canonical Standardized Tree (ST) by applying specific rewrite rules to simplify constructs like let, where, fcn_form, etc., into more primitive forms using lambda, gamma, and tau nodes.
  • Control Stack Execution (CSE) Machine: Evaluates the Standardized Tree to produce the program's output. It supports lexical scoping, closures, recursion, and various built-in functions and operators.

Project Workflow

The RPAL compiler processes input in the following sequence:

  1. Input: An RPAL program provided as a text file (e.g., example.txt).
  2. myrpal.java (Main Program):
    • Reads the input file.
    • Orchestrates the flow between different compiler components.
  3. Lexical Analysis (Lexer package):
    • The source code string is passed to the Lexer.
    • Lexer.tokenize() converts the string into a List<Token>.
  4. Parsing (Parser package):
    • The List<Token> is passed to the Parser.
    • Parser.parse() constructs an ASTNode representing the root of the Abstract Syntax Tree.
  5. AST Output (Optional):
    • If the -ast command-line flag is provided, ASTNode.print() is called to display the tree structure, and the program exits.
  6. Standardization (Standardizer package):
    • The root ASTNode of the AST is passed to the Standardizer.
    • Standardizer.standardize() modifies the AST in-place, transforming it into a Standardized Tree (ST).
  7. CSE Machine Setup (CSE_Machine package):
    • The root ASTNode of the ST is passed to the CSEMachineFactory.
    • CSEMachineFactory.getMachine() traverses the ST and generates the initial control structures (a list of Symbol objects), stack, and environment for the CSE machine. It returns an initialized CSEMachine instance.
  8. Execution (CSE_Machine package):
    • CSEMachine.execute() is called on the initialized machine.
    • The CSE machine processes its control structures and stack according to a defined set of rules, evaluating the program.
  9. Output:
    • The CSEMachine returns the final result of the program as a String.
    • myrpal.java prints this output to the console.

Module Analysis / Component Breakdown

This project is divided into several key modules, each handling a specific phase of the compilation and execution process.

1. Main Program (myrpal.java)

  • File Path: src/myrpal.java
  • Responsibility:
    • Serves as the entry point for the RPAL compiler and interpreter.
    • Handles command-line argument parsing (input file path and -ast flag).
    • Manages file input/output.
    • Coordinates the overall workflow by instantiating and invoking methods from the Lexer, Parser, Standardizer, and CSE Machine components in sequence.
  • Key Method Structure:
    public class myrpal {
        public static void main(String[] args) {
            // 1. Parse CLI arguments (filename, -ast flag)
            // 2. Read program string from file
            // 3. Lexer lexer = new Lexer(); List<Token> tokens = lexer.tokenize(program);
            // 4. Parser parser = new Parser(tokens); ASTNode ast = parser.parse();
            // 5. If -ast: ast.print(0); return;
            // 6. Standardizer standardizer = new Standardizer(); standardizer.standardize(ast);
            // 7. CSEMachineFactory factory = new CSEMachineFactory(); CSEMachine machine = factory.getMachine(ast);
            // 8. String output = machine.execute(false);
            // 9. Print output
        }
    }

2. Lexer Package

  • Directory: [src/Lexer/](file:///RPAL-Project\src\Lexer)

  • Responsibility: Performs lexical analysis, converting the raw RPAL source code string into a sequence of tokens.

    • Lexer.java

      • File Path: src/Lexer/Lexer.java
      • Core Task: Implements the tokenization logic using regular expressions.
      • Key Method Structure:
        public class Lexer {
            // private static final Map<String, String> regexMap; // Defines token patterns
            public List<Token> tokenize(String input); // Main tokenization method
        }
      • It iterates through predefined regex patterns (for keywords, identifiers, integers, strings, operators, punctuation, comments, spaces) to identify and extract tokens. Comments and spaces are typically skipped. An END_OF_TOKENS token is added to mark the end of the input.
    • Token.java

      • File Path: src/Lexer/Token.java
      • Core Task: A simple data class representing a single lexical token.
      • Structure:
        public class Token {
            private TokenType type;
            private String value;
        
            public Token(TokenType type, String value);
            public TokenType getType();
            public String getValue();
            public String toString(); // For debugging
        }
    • TokenType.java

      • File Path: src/Lexer/TokenType.java
      • Core Task: An enumeration defining all possible types of tokens that the lexer can produce (e.g., KEYWORD, IDENTIFIER, INTEGER, STRING, OPERATOR, PUNCTUATION, END_OF_TOKENS).

3. Parser Package

  • Directory: [src/Parser/](file:///RPAL-Project\src\Parser)

  • Responsibility: Syntactic analysis, transforming the list of tokens from the Lexer into an Abstract Syntax Tree (AST).

    • Parser.java

      • File Path: src/Parser/Parser.java
      • Core Task: Implements a recursive-descent parser based on the RPAL grammar specified in RPAL_Grammar.md.
      • Key Method Structure:
        public class Parser {
            // private final List<Token> tokens;
            // private int currentTokenIndex;
        
            public Parser(List<Token> tokens);
            public ASTNode parse(); // Entry point for parsing
        
            // Helper methods for token stream management
            private Token peekToken();
            private Token consumeToken();
            private Token match(TokenType expectedType);
            private Token match(TokenType expectedType, String expectedValue);
        
            // Recursive methods for each non-terminal in the grammar
            private ASTNode parseE();      // Parses Expressions
            private ASTNode parseEw();     // Parses Expressions with 'where'
            private ASTNode parseD();      // Parses Definitions
            // ... and so on for parseDa, parseDr, parseDb, parseVl, parseVb,
            // parseT, parseTa, parseTc, parseB, parseBt, parseBs, parseBp,
            // parseA, parseAt, parseAf, parseAp, parseR, parseRn
        }
      • It handles operator precedence and associativity by structuring the grammar rules and parsing methods appropriately (e.g., by eliminating left-recursion and using iterative parsing for left-associative operators).
    • ASTNode.java

      • File Path: src/Parser/ASTNode.java
      • Core Task: Represents a node in the Abstract Syntax Tree. A single class is used for all node types, distinguished by a NodeType enum.
      • Structure:
        public class ASTNode {
            private NodeType type;
            private String value; // Stores identifier name, literal value, operator symbol, etc.
            private List<ASTNode> children;
        
            public ASTNode(NodeType type, String value);
            public ASTNode(NodeType type, String value, List<ASTNode> children);
            public void addChild(ASTNode child);
            public NodeType getType();
            public String getValue();
            public List<ASTNode> getChildren();
            public void setType(NodeType type);
            public void setValue(String value);
            public void print(int indent); // For pretty-printing the AST
        }
    • NodeType.java

      • File Path: src/Parser/NodeType.java
      • Core Task: An enumeration defining all types of nodes that can appear in the AST (e.g., LET, LAMBDA, IDENTIFIER, INTEGER, OP_PLUS, GAMMA, TAU). These correspond to the => 'node_type' annotations in the grammar.
    • ParseException.java (Defined as an inner class in Parser.java)

      • Core Task: A custom runtime exception used to signal errors encountered during parsing (e.g., mismatched tokens, unexpected end of input).

4. Standardizer Package

  • Directory: [src/Standardizer/](file:///RPAL-Project\src\Standardizer)

  • Responsibility: Transforms the AST into a Standardized Tree (ST). This process simplifies complex RPAL constructs into more fundamental ones, making it easier for the CSE machine to evaluate. The standardization is done in-place by modifying the original AST nodes.

    • Standardizer.java
      • File Path: src/Standardizer/Standardizer.java
      • Core Task: Applies a set of predefined rewrite rules to the AST nodes.
      • Key Method Structure:
        public class Standardizer {
            public void standardize(ASTNode node); // Main method, traverses AST (post-order)
        
            // Private helper methods for specific transformation rules:
            private void handleLet(ASTNode node);       // let X = E in P  =>  gamma(lambda X.P, E)
            private void handleWhere(ASTNode node);     // P where X = E   =>  gamma(lambda X.P, E)
            private void handleFunctionForm(ASTNode node); // fcn_form P V1..Vn E => P = lambda V1 . ... . lambda Vn . E
            private void handleLambda(ASTNode node);    // lambda V1..Vn . E => lambda V1 . ... . lambda Vn . E
            private void handleWithin(ASTNode node);    // D1 within D2 (where D1 is X1=E1, D2 is X2=E2) => X2 = gamma(lambda X1.E2, E1)
            private void handleAt(ASTNode node);        // @ E1 N E2 => gamma(gamma(N,E1), E2)
            private void handleAnd(ASTNode node);       // D1 and D2 ... (where Di is Xi=Ei) => (X1,X2,...) = tau(E1,E2,...)
            private void handleRec(ASTNode node);       // rec X = E => X = Y*(lambda X.E)
        }

5. CSE Machine Package

  • Directory: [src/CSE_Machine/](file:///RPAL-Project\src\CSE_Machine)

  • Responsibility: Implements the Control Stack Execution (CSE) machine, which evaluates the Standardized Tree.

    • CSEMachine.java

      • File Path: src/CSE_Machine/CSEMachine.java
      • Core Task: Executes the program based on the CSE model using a control list, a stack, and an environment list.
      • Key Method Structure:
        public class CSEMachine {
            // private List<Symbol> control;
            // private List<Symbol> stack;
            // private List<E> environment; // List of environment markers
        
            public CSEMachine(List<Symbol> control, List<Symbol> stack, List<E> environment);
            public String execute(boolean test); // Main execution loop implementing CSE rules
        
            // Private helpers for operations and rule applications
            private Symbol applyUnaryOperation(Uop rator, Symbol rand);
            private Symbol applyBinaryOperation(Bop rator, Symbol a, Symbol b);
            private E getEnv(int index); // Retrieves an environment by its index
            // private void printControlAndStack(); // For debugging
        }
      • The execute method iteratively processes Symbol objects from the control list, applying rules based on the symbol type (e.g., Id, Lambda, Gamma, Bop, Beta). It manages variable lookups and bindings through E (environment) objects.
    • CSEMachineFactory.java

      • File Path: src/CSE_Machine/CSEMachineFactory.java
      • Core Task: Converts the Standardized Tree (ST, an ASTNode) into the initial data structures (control, stack, environment) required by the CSEMachine.
      • Key Method Structure:
        public class CSEMachineFactory {
            // private int lambdaIndex; // For unique lambda identifiers
            // private int deltaIndex;  // For unique delta (control structure) identifiers
            // private final E e0;      // Initial base environment
        
            public CSEMachine getMachine(ASTNode root); // Main factory method
        
            private Delta buildDelta(ASTNode node); // Builds a Delta (list of symbols) from an AST subtree
            private List<Symbol> preOrder(ASTNode node); // Traverses AST to create symbols for a Delta
            private Lambda createLambda(ASTNode node); // Creates Lambda symbols
            private Symbol createSymbol(ASTNode node); // Maps ASTNode to specific Symbol subclass
        }
      • It performs a pre-order traversal of the ST. Special nodes like lambda and -> (conditional) lead to the creation of nested Delta control structures.
    • Symbol.java

      • File Path: src/CSE_Machine/Symbol.java
      • Core Task: Defines an abstract base class Symbol and a hierarchy of concrete subclasses. These objects represent the items that can appear on the CSE machine's control list and stack.
      • Key Subclasses (Illustrative):
        abstract class Symbol { protected String data; /* ... */ }
        
        // Operands (data values)
        class Rand extends Symbol { /* ... */ }
        class Int extends Rand { /* ... */ }
        class Str extends Rand { /* ... */ }
        class Bool extends Rand { /* ... */ }
        class Tup extends Rand { private List<Symbol> symbols; /* ... */ } // Represents tuples, including 'nil' (empty tuple)
        class Dummy extends Rand { /* ... */ }
        class Func extends Rand { /* For built-in functions like Print, Order */ }
        
        // Operators
        class Rator extends Symbol { /* ... */ }
        class Bop extends Rator { /* For binary operators like +, -, eq */ }
        class Uop extends Rator { /* For unary operators like neg, not */ }
        
        // Control Structures & Special Symbols
        class Gamma extends Symbol { /* Represents function application */ }
        class Lambda extends Symbol { /* int index, int envIndex, List<Id> identifiers, Delta delta; ... */ }
        class Delta extends Symbol { /* int index, List<Symbol> symbols; // A sequence of control symbols */ }
        class Beta extends Symbol { /* For conditional branching */ }
        class Tau extends Symbol { /* int n; // For tuple creation */ }
        class Ystar extends Symbol { /* For recursion (Y-combinator) */ }
        class Eta extends Symbol { /* Helper for Ystar application */ }
        
        // Identifiers & Environment
        class Id extends Symbol { /* ... */ }
        class E extends Symbol { /* int index, E parent, boolean isRemoved, Map<Id, Symbol> values; ... */ }
        
        class Err extends Symbol { /* Represents an error state */ }
      • Each symbol class encapsulates data and behavior relevant to its role in the CSE machine execution. For example, Lambda stores its parameters and body (as a Delta), and E (environment marker) stores variable bindings.

Project Directory Structure

RPAL-Project
├─ .idea/
├─ bin/                     # Compiled .class files (output of Makefile)
│  ├─ CSE_Machine/
│  ├─ Lexer/
│  ├─ Parser/
│  └─ Standardizer/
├─ example/
│  └─ example.txt           # Sample RPAL input file
├─ Makefile                 # Makefile for common build/run tasks
├─ out/                     # Output directory for IDE (e.g., IntelliJ)
├─ README.md                # This file
├─ RPAL Project.iml         # IntelliJ module file
├─ RPAL-Project.iml         # IntelliJ project file
├─ RPAL_Grammar.md          # RPAL Phrase Structure Grammar
├─ RPAL_Lexicon.md          # RPAL Lexical Rules
├─ RPAL_Project.md          # Original Project Description
└─ src/                     # Source code
   ├─ CSE_Machine/
   │  ├─ CSEMachine.java
   │  ├─ CSEMachineFactory.java
   │  └─ Symbol.java
   ├─ input.txt              # Another input file, possibly for testing
   ├─ Lexer/
   │  ├─ Lexer.java
   │  ├─ Token.java
   │  └─ TokenType.java
   ├─ myrpal.java            # Main class
   ├─ Parser/
   │  ├─ ASTNode.java
   │  ├─ NodeType.java
   │  └─ Parser.java         # Includes inner ParseException class
   ├─ Standardizer/
   │  └─ Standardizer.java
   └─ tests/                 # Test files
      ├─ ASTandStandardizerTester.java
      ├─ CSEMachineTest.java
      ├─ ParserTester.java
      └─ StandardizerTests.java

Building and Running

Makefile Usage

The following make commands are available for working with the project (assuming you are in the RPAL-Project root directory):

  • make or make compile
    • Compiles all Java files from the src directory (except test files) into the bin directory.
  • make run
    • Compiles to the bin directory and runs the program with the example input located at example/example.txt.
  • make run-ast
    • Compiles to the bin directory, runs the program, and prints the AST of the example input in example/example.txt.
  • make run-file FILE=path/to/your/file.txt
    • Compiles to the bin directory and runs an RPAL program from a custom input file.
  • make run-ast-file FILE=path/to/your/file.txt
    • Compiles to the bin directory and prints the AST of the RPAL program from a custom input file.
  • make clean
    • Removes all compiled class files from the bin directory.

Java Command Usage

To compile and run manually, navigate to the src directory (or ensure your classpath is set up correctly).

  1. Compile:

    • Compile the main program and its dependencies. A robust way to compile all necessary files from the RPAL-Project/src directory into a ../bin directory (relative to src) would be:
      # From RPAL-Project/src directory
      javac -d ../bin */*.java *.java 
      # or to compile all .java files found recursively into ../bin
      # (adjust find command for your OS if not bash-like)
      # find . -name "*.java" -print0 | xargs -0 javac -d ../bin
    • For Windows (from RPAL-Project\src directory):
      if not exist ..\bin mkdir ..\bin
      javac -d ..\bin CSE_Machine\*.java Lexer\*.java Parser\*.java Standardizer\*.java myrpal.java
  2. Run:

    • After compiling, from the RPAL-Project/bin directory (or RPAL-Project if bin is in classpath):

      # Assuming you are in RPAL-Project/bin
      java myrpal ../example/example.txt
      java myrpal ../example/example.txt -ast
      
      # Or, from RPAL-Project directory, if bin is in classpath or classes are in src:
      # (Adjust classpath as needed if using the bin directory)
      # java -cp bin myrpal example/example.txt
      # java -cp bin myrpal example/example.txt -ast
    • The commands mentioned in the original README.md (run from the same location as myrpal.java after javac myrpal.java) imply a simpler setup, perhaps where all .class files are in the same directory as myrpal.java. The Makefile handles a more structured bin directory. The provided source shows myrpal.java at src/myrpal.java, so commands would typically be run relative to the src or a bin output directory.

    • General Java Commands (as per project spec, run from src or with bin in classpath):

      • java myrpal <filepath>
        • Example: java myrpal ../example/example.txt (if running from src and example.txt is in RPAL-Project/example)
      • java myrpal <filepath> -ast
        • Example: java myrpal ../example/example.txt -ast

Input and Output Format

(As specified in RPAL_Project.md)

Example Input Program

File: example.rpal (or any .txt file)

let Sum(A) = Psum (A,Order A )
where rec Psum (T,N) = N eq 0 -> 0
| Psum(T,N-1)+T N
in Print ( Sum (1,2,3,4,5) )

Output Format with -ast switch

Prints the Abstract Syntax Tree. The exact formatting involves indentation with dots (.) for tree levels. Example structure:

let
.function_form
..P
..V1
..Vn
..E
.gamma
..lambda
...X
...P (original 'in' part)
..E (original 'let' assignment part)

(The example from RPAL_Project.md shows a deeply nested structure based on the example input program.)

Output Format without -ast switch

Prints the result of the program's execution. For the example input program above, the output is:

Output of the above program is:
15

RPAL Language References

  • Grammar: RPAL_Grammar.md
  • Lexicon: RPAL_Lexicon.md

About

RPAL project of Programming Languages module in the University of Moratuwa.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •