- 
                Notifications
    You must be signed in to change notification settings 
- Fork 18
Three address instruction list IR
JcRawInstList represents a raw list representation of three-address Java bytecode instructions. "Raw" means that
this representation does not depict information about
types and control flow of the program. This representation is one-to-one
matching of JVM bytecode instructions to the list of three-address instructions.
The base class of this representation is JcRawInstList. It produces a list-like
collection of instructions, so you can modify this list, iterate over instructions or access them by
index.
JcRawInst is the base interface for the raw instructions. All instructions are identified
by the object, they are not comparable using equals.
Here is the list of JcRawInst implementations:
- 
JcAssignInst— Assignment instruction. Left-hand side of the instruction can only beJcRawValue, right-hand side can be any expression (JcRawExpr).
- 
JcRawEnterMonitorInst,JcRawExitMonitorInst— Monitor instructions. Correspond directly to their existing analogs. Themonitorproperty can only beJcRawSimpleValue.
- 
JcRawCallInst— Call instruction. Represents a method that does not save its return variable to any local variable. Method calls that return a value are represented viaJcRawAssignInst.
- 
JcRawLabelInst— Label instruction. Used to mark some program points in the code. Mainly required to be used in the branching instructions. A label is identified by name, all the references to a label are represented viaJcRawLabelRefclass.
- 
JcRawReturnInst— Return instruction. ThereturnValueproperty isnullwhen a method does not return anything.
- 
JcRawThrowInst— Throw instruction.
- 
JcRawCatchInst— Catch instruction. Represents an entry for atry...catchblock in the code. Does not map directly to a bytecode instruction but represents aTryCatchBlockof a method. Stores a value that corresponds to a caughtthrowableexception and a range of the instructions that it catches fromstartInclusive until endExclusive.
- 
JcRawGotoInst— Jump instruction.
- 
JcRawIfInst— Conditional jump instruction. The condition of the instruction should necessarily beJcRawConditionExprbecause not all the conditional expressions that we use in higher-level programming languages can be easily expressed in JVM bytecode.
- 
JcRawSwitchInst— Switch instruction. A combined representation ofLookupSwitchandTableSwitchbytecode instructions.
JcRawExpr is a base interface for all the expression types and value types that can be
expressed in JVM bytecode. JcRawExpr stores its type as a TypeName object which is just a Java type name
represented as a string (that's why it is "raw").
Here is the list of JcRawExpr implementations:
- 
JcRawBinaryExpr— Binary expression that allows to implement all the arithmetic expressions (e.g.JcRawAdd,JcRawMul, etc.), conditional expressions (JcRawEq,JcRawGt, etc.), logical expressions (JcRawAnd,JcRawOr,JcRawXor).- 
JcRawConditionExpr— Conditional expressions. Can be used as a condition inJcRawIfInst.
 
- 
- 
JcRawLengthExpr— Array length expression.
- 
JcRawNegExpr— Negation expression.
- 
JcRawCastExpr— Cast expression. Can be used to cast both reference types and primitive types.
- 
JcRawNewExpr— New expression. Creates a single object.
- 
JcRawNewArrayExpr— New array expression. Creates a (multi)array of a given type.
- 
JcRawInstanceOfExpr—Instanceofcheck.
- 
JcRawCallExpr— Method call expression.- 
JcRawDynamicCallExpr— Aninvokedynamicinstruction representation. Preserves all the info.
- JcRawVirtualCallExpr
- JcRawInterfaceCallExpr
- JcRawStaticCallExpr
- JcRawSpecialCallExpr
 
- 
- 
JcRawValue— Representation of a single value.- 
JcRawSimpleValue— Representation of a simple value that does not have any sub-values.- JcRawThis
- JcRawArgument
- JcRawLocal
- JcRawConstant
 
- 
JcRawComplexValue— Complex value that has subvalues.- 
JcRawFieldRef— Field reference. Can be used both as a field read access (e.g.a = x.y) and field store access (e.g.x.y = a).
- 
JcRawArrayAccess— Array element reference. Can be used both as an array read access (e.g.a = x[y]) and array store access (e.g.x[y] = a)
 
- 
 
- 
To get a three-address instruction list representation of a method you need to call JcMethod::instructionList.
Instruction list building requires JcClasspath because some stages use subtyping information.
RawInstListBuilder is used to build a JcRawInstList from bytecode representation (i.e. from MethodNode).
To build an instruction list representation, MethodNode should contain frame information (i.e. the FrameNode
instances) and should not contain JSR (jump subroutine) instructions. Thus, each time a ClassNode is
created, we invoke ClassNode.computeFrames() extension function that computes frame information
for each method. The computeFrames function uses ASM functionality for computing frames: it converts
the ClassNode back to bytecode using ClassWriter (which actually performs frame computing during
the conversion) and reads that bytecode again using ClassReader. This is not the most effective way of
doing things, but the easiest one: manually computing frames is quite hard.
Another thing is inlining JSR instructions. We do this by calling MethodNode.jsrInlined extension that
returns a new MethodNode instance. It uses ASM JSRInlinerAdapter utility to create a new method node with
inlined JRS instructions.
RawInstListBuilder converts JVM bytecode instruction list into a three-address instruction list. Most of
the conversion process is simple: bytecode instructions map to three-address expressions and instructions.
The most complex part is frame merging. JVM frame describes the state of the virtual machine at each instruction, i.e.
the declared local variables and stack state. When an instruction has multiple predecessors, we need to merge several
incoming frames into one. Sometimes JVM adds a special frame node before an instruction to describe how the
frame should look like after merging.
Four situations are possible during the frame merging process.
(1) There is only one incoming frame, and it is fully defined (i.e. we have already collected all the frame information) when converting the previous stages. In that case, everything is quite simple, and we can just copy the frame info. However, we can also refine type information for local variables. Consider following bytecode:
NEW java/lang/ArrayList
ASTORE 0
...
FRAME FULL [ java/lang/List ]
...
NEW java/lang/LinkedList
ASTORE 0
When converting the first two instructions, we created a local variable %0 for cell 0 with java.lang.ArrayList type.
However, when converting frame information, we see that JVM considers the type of cell 0 to be java.lang.List
and then uses cell 0 to store other implementations of Java List. We can refine the type of %0 to be
java.lang.List and then replace all the occurrences of %0 with the new version. That is performed by
using localTypeRefinement property of RawInstListBuilder and ExprMapper utility class.
(2) There is only one incoming frame, and it has not been defined yet. It is a rare case, but it can happen in a situation like this:
GOTO L2
L1
FRAME FULL [ java/lang/List ]
...
GOTO L3
L2
...
GOTO L1
L3
RETURN
In this situation, we use frame info to create a new local variable with the defined type and remember to
add an assignment that will initialize a new variable with a correct value in the predecessor. It is performed in the
end by using laterAssignments and laterStackAssignments maps as well as the buildRequiredAssignments function.
The process is similar for both stack variables and local variables.
(3) There are several predecessor frames, and all of them are defined (e.g. when merging after if...else block). In
that case, we create a new local variable with the defined type in the current frame and add necessary assignments
into the predecessor blocks.
(4) There are several predecessor frames, and not all of them are defined (e.g. when merging in the loop header).In that case, we create a new local variable with the defined type in the current frame, add necessary assignments into the predecessor blocks, and remember to add the required assignments to undefined predecessor blocks in the end.
RawInstListBuilder also simplifies the resulting instruction list. It is necessary because the list constructing
process naturally introduces a lot of redundancy.
The main simplification stages are:
- Deleting the repeated assignments inside a basic block.
- Deleting declarations of unused variables.
- Deleting declarations of mutually dependent unused variables (e.g. a = bandb = a).
- Simple unit propagation.
- Type normalization using JcClasspath.
There is a visitor API for traversing and modifying JcRawInstList. Visitors have a standard
interface — they can be invoked using an accept method on instructions and expressions:
val a = jcRawInst.accept(MyInstVisitor())
val b = jcRawExpr.accept(MyExprVisitor())
We also provide "functional"-like extensions for applying visitors to JcRawInstList:
- filter(visitor: JcRawInstVisitor<Boolean>): JcRawInstList
- filterNot(visitor: JcRawInstVisitor<Boolean>): JcRawInstList
- map(visitor: JcRawInstVisitor<JcRawInst>): JcRawInstList
- mapNotNull(visitor: JcRawInstVisitor<JcRawInst?>): JcRawInstList
- flatMap(visitor: JcRawInstVisitor<Collection<JcRawInst>>): JcRawInstList
- apply(visitor: JcRawInstVisitor<Unit>): JcRawInstList
- applyAndGet(visitor: T, getter: (T) -> R): R
- collect(visitor: JcRawInstVisitor<T>): Collection<T>
jcdb-core has a number of utility visitors for working with the instruction list:
- 
ExprMapper(val mapping: Map<JcRawExpr, JcRawExpr>)— Traverses an instruction list and replaces all expression occurrences frommappingto the corresponding property.
- 
FullExprSetCollector()— Collects all the expressions that occur in a given object (instruction list, single instruction or an expression).
- 
InstructionFilter(val predicate: (JcRawInst) -> Boolean)— Filters the instructions by a given predicate.
JcRawInstList can be converted back to the ASM MethodNode using MethodNodeBuilder.build()
method. The conversion process is pretty straightforward and does not require any additional
comments.