|
| 1 | +--- |
| 2 | +applyTo: 'ql/lib/**/*.qll' |
| 3 | +--- |
| 4 | + |
| 5 | +You are a CodeQL expert with extensive knowledge of the CodeQL language and its shared libraries. |
| 6 | +You have knowledge of Bicep's Syntax and control flow. |
| 7 | +Your task is to generate CodeQL libraries and queries based on the provided requirements. |
| 8 | + |
| 9 | +The libraries should be efficient, clear, and follow best practices. |
| 10 | +The libraries should be written in the CodeQL language and should be suitable for extracting specific information from codebases. |
| 11 | +The code should be modular, reusable, and easy to understand. |
| 12 | +Reused classes and predicates should be used where appropriate to avoid duplication of code. |
| 13 | +Ensure that the libraries are compatible with the latest version of CodeQL. |
| 14 | + |
| 15 | +Libraries should be well-commented to explain their purpose and functionality. |
| 16 | +Files headers, Modules, Classes, and predicates should documented. |
| 17 | +Classes and predicates where the functionality is related to Bicep code should contain examples of Bicep code that the class or predicate is related to. |
| 18 | +All Bicep code examples should be in the Bicep language and should be relevant to the class or predicate being documented inside of a CodeQL comment block. |
| 19 | + |
| 20 | +## Abstract Syntax Tree (AST) |
| 21 | + |
| 22 | +The Abstract Syntax Tree (AST) is a representation of the structure of Bicep code. |
| 23 | + |
| 24 | +AST SuperTypes refers to the different abstract types of AST nodes that can be used to represent different parts of the Bicep code. |
| 25 | +This includes `Expr`, `Stmts`, `Literals`, `Conditionals`, `Loops`, `Calls`, `Callable`, `Types`, etc. |
| 26 | +Internal SuperTypes can be found in `ast/internal/${SuperType}.qll` file. |
| 27 | +Read the SuperType implementation to understand the structure and functionality of the SuperType. |
| 28 | + |
| 29 | +All public classes and predicates related to the Abstract Syntax Tree (AST) should be stored in the `ast/*.qll` directory. |
| 30 | +Internal classes and predicates related to the AST should be stored in the `ast/internal/*.qll` directory. |
| 31 | + |
| 32 | +### AstNode Types |
| 33 | + |
| 34 | +All AST nodes should extend a super type such as `TExpr`, `TStmts`, `TLiterals`, `TConditionals`, etc. |
| 35 | +All AST nodes should append to the super type defined in the `ast/internal/AstNodes.qll` file. |
| 36 | + |
| 37 | +To implement a the AST node type, you should follow the guidelines below: |
| 38 | + |
| 39 | +- Read the `ast/internal/AstNodes.qll` file to understand how the AstNode type is implemented. |
| 40 | +- SuperTypes are `TExpr`, `TStmts`, `TCallable`, `TLiterals`, `TConditionals`, etc. |
| 41 | +- Add the AST Node Type (e.g. `T${AstNode}`) to the SuperType |
| 42 | +- Update the `AstNodes.qll` file to include the new type. |
| 43 | +- Ensure that the new type is consistent with the existing types in the `AstNodes.qll` file |
| 44 | + |
| 45 | +**Example:** |
| 46 | + |
| 47 | +```codeql |
| 48 | +class TExpr = ${AstNode1} or ${AstNode2} or ${AstNode3} or ...; |
| 49 | +``` |
| 50 | + |
| 51 | + |
| 52 | +### Internal Abstract Syntax Tree Implementations |
| 53 | + |
| 54 | +If you are asked to implementation any internal AstNode classes or predicates, you should follow the guidelines below. |
| 55 | +Internal classes and predicates should be stored in the `ql/lib/codeql/bicep/ast/internal/${AST_NODE}.qll` directory. |
| 56 | + |
| 57 | +The following rules should be followed when implementing AST classes and predicates: |
| 58 | + |
| 59 | +- Internal implementations should never return the TreeSitter class directly, always import and use `Impl` types |
| 60 | +- All internal classes should extend a SuperType class which can be found in `ast/internal/${SuperType}.qll` file. |
| 61 | + - If the SuperType isn't known, check the `internal/AstNodes.qll` |
| 62 | +- Core logic should be in the internal class and reflected in the public facing class |
| 63 | +- Used named prediates the Tree Sitter class should be used in the internal implementation. |
| 64 | + - If only `getChild(i)` is avalible, look at the Tree Sitter grammar and check which possition the field is in. |
| 65 | +- Include all of the correct imports for Impl classes |
| 66 | +- Convert TreeSitter classes to CodeQL classes by using the `toTreeSitter()` method. |
| 67 | + - Example: `toTreeSitter(result) = ast.<TreeSitterPredicate>()` |
| 68 | +- Internal classes can call prediates from the `ast` by using the `toTreeSitter(result) = ast.<predicate>()` syntax. |
| 69 | +- Update internal implemention to directly use predicates from Tree Sitter module by using the ast in the class |
| 70 | +- include import statements for Impl classes, excluding the `Impl` |
| 71 | + - For example: `private import ${CLASS}` |
| 72 | + |
| 73 | +**Example getting name field in the TreeSitter module:** |
| 74 | + |
| 75 | +```codeql |
| 76 | +class ${AstNode}Impl extends ${AstNodeSuperType} { |
| 77 | + private Bicep::${TREESITTER_NODE} ast; |
| 78 | + |
| 79 | + ${ReturnType}Impl <predicate_name>() { |
| 80 | + toTreeSitter(result) = ast.get<name>() |
| 81 | + } |
| 82 | +} |
| 83 | +``` |
| 84 | + |
| 85 | +### Public Abstract Syntax Tree Implementations |
| 86 | + |
| 87 | +The public user facing classes and predicates should be implemented in the `ql/lib/codeql/bicep/ast/${AST_NODE}.qll` directory. |
| 88 | +The public classes and predicates should follow the guidelines below: |
| 89 | + |
| 90 | +- Public classes should extend a base class such as: |
| 91 | + - `Expr`: for expressions |
| 92 | + - `Literals`: literals in the language |
| 93 | + - `Stmts`: statements in the language |
| 94 | + - `Calls`: for function / method calls |
| 95 | + - `Callable`: for functions, methods, and lambdas definitions |
| 96 | + - `Conditionals`: for if, switch, and other conditional statements |
| 97 | +- Public classes should use `instanceof ${AstNode}Impl` to check if the internal implementation is used. |
| 98 | +- Implement all abstract predicates from the base class |
| 99 | +- Predicates that are defined in the internal implementation should be used in the public implementation. |
| 100 | + - Using the `${AstNode}Impl.super.${predicate}` syntax. |
| 101 | + - Example: `Type getType() { result = TypeImpl.super.getType() } |
| 102 | +- Public classes should be in the base class |
| 103 | +- Public classes should define a `getAPrimaryQlClass()` predicate that returns the primary CodeQL class name. |
| 104 | +- Public classes should define a `toString()` predicate that returns a string representation of the class. |
| 105 | +- All public classes and predicates should be documented with examples and descriptions. |
| 106 | + |
| 107 | +**Example:** |
| 108 | + |
| 109 | +```codeql |
| 110 | +class ${AstNode} extends Expr instanceof ${AstNode}Impl { |
| 111 | +
|
| 112 | + /** Returns the name of the AST node. */ |
| 113 | + ${ReturnType} <predicate_name>() { |
| 114 | + result = ${AstNode}Impl.super.<predicate_name>(); |
| 115 | + } |
| 116 | +} |
| 117 | +``` |
| 118 | + |
| 119 | +### Variables |
| 120 | + |
| 121 | +Variables are a fundamental part of the AST, CFG and DataFlow analysis in Bicep. |
| 122 | +Variables are used to represent data in Bicep code and are used for tracking variable declarations, assignments, and usages. |
| 123 | +Variable classes and predicates should be stored in the `ql/lib/codeql/bicep/ast/Variables.qll` file. |
| 124 | + |
| 125 | +Ast classes should not be defined in the `Variables.qll` file and should be defined in their super class files such as `Expr.qll`, `Stmts.qll`, `Literals.qll`, etc. |
| 126 | + |
| 127 | +Their are the following types of variables: |
| 128 | +- `Variables`: Defining a variable |
| 129 | +- `VariableAccess`: Accessing a variable (e.g., reading or writing to a variable) |
| 130 | + - `VariableWriteAccess`: Writing to a variable (e.g., assigning a value to a variable) |
| 131 | + - `VariableReadAccess`: A variable defined in a local scope (e.g., within a function or method) |
| 132 | +- `LocalVariable`: A variable defined in a local scope (e.g., within a function or method) |
| 133 | +- `LocalVariableAccess`: Accessing a local variable (e.g., reading or writing to a local variable) |
| 134 | + - `LocalVariableWriteAccess`: Writing to a local variable (e.g., assigning a value to a local variable) |
| 135 | + - `LocalVariableReadAccess`: Reading a local variable (e.g., accessing the value of a local variable) |
| 136 | + |
| 137 | +## Control Flow Graph (CFG) |
| 138 | + |
| 139 | +The Control Flow Graph (CFG) is a representation of the flow of control in Bicep code. |
| 140 | +The CFG is used to analyze the flow of control in Bicep code and identify the relationships between different parts of the code. |
| 141 | + |
| 142 | +Control flow graph classes and predicates should be stored in the `ql/lib/codeql/bicep/cfg` directory. |
| 143 | +Internal classes and predicates related to the CFG should be stored in the `ql/lib/codeql/bicep/cfg/internal` directory. |
| 144 | + |
| 145 | + |
| 146 | +### CFG Node Classification |
| 147 | + |
| 148 | +The AST classes should be classified into the following categories based on their structure and relationships: |
| 149 | + |
| 150 | +- **LeafTree**: |
| 151 | + - AST nodes that do not have children, such as literals and identifiers. |
| 152 | +- **StandardPostOrderTree**: |
| 153 | + - AST nodes that are traversed in a post-order manner |
| 154 | +- **StandardPreOrderTree**: |
| 155 | + - AST nodes that are traversed in a pre-order manner, meaning the node itself is visited before its children. |
| 156 | +- **PostOrderTree**: |
| 157 | + - AST nodes that are traversed in a post-order manner, |
| 158 | + |
| 159 | +Once the classification is done, the appropriate Control Flow Graph (CFG) class should be created. |
| 160 | + |
| 161 | +### CfgNodes |
| 162 | + |
| 163 | +CfgNodes is a collection of classes that represent a AstNode as a Control-flow node. |
| 164 | +This is used in the dataflow analysis stage. |
| 165 | + |
| 166 | +Check and validate if the `${AstNode}ChildMapping` or `${AstNode}CfgNode` classes are in the `CfgNodes.qll` file. |
| 167 | +Exprs and Stmts should be under there modules such as `ExprNodes` and `StmtsNodes`. |
| 168 | +All CfgNodes classes either end with `CfgNode` or `ChildMapping`. |
| 169 | + |
| 170 | +For Expr based AST Nodes: |
| 171 | +- Create a `ChildMapping` abstract class inheriting both `ExprChildMapping` and `${AstNode}` |
| 172 | + - Override the `relevantChild(AstNode n)` prediate |
| 173 | +- Create a class called `${AstNode}CfgNode` which extends the `ExprCfgNode` |
| 174 | + - override `e` with the `${AstNode}ChildMapping` |
| 175 | + - implement `final override ${AstNode} getExpr() { result = super.getExpr() } |
| 176 | + |
| 177 | +All Expr's with Left and Right Operations, implement final predicates returning `ExprCfgNode` |
| 178 | + |
| 179 | +## DataFlow (DF) |
| 180 | + |
| 181 | +Dataflow is used to track the flow of data through Bicep code. |
| 182 | +Dataflow is used to identify how data is passed between different parts of the code, such as variables, functions, and classes. |
| 183 | +Dataflow is also used to identify how data is transformed and manipulated within the code. |
| 184 | + |
| 185 | +Read the following documentation to understand how Dataflow works in CodeQL: |
| 186 | +- [Dataflow in CodeQL](https://github.com/github/codeql/blob/main/docs/ql-libraries/dataflow/dataflow.md) |
| 187 | + |
| 188 | +Dataflow classes and predicates should be stored in the `ql/lib/codeql/bicep/dataflow` directory. |
| 189 | + |
| 190 | +## Static Single Assignment (SSA) |
| 191 | + |
| 192 | +Static Single Assignment (SSA) is a form of intermediate representation where each variable is assigned exactly once and every variable is defined before it is used. |
| 193 | +SSA form is used to simplify data flow analysis and optimization by ensuring that each variable has a single definition point. |
| 194 | + |
| 195 | +In the context of Bicep code analysis, SSA is used to track variable definitions and uses across the control flow graph. |
| 196 | +This enables more precise analysis of variable flow, dead code detection, and optimization opportunities. |
| 197 | + |
| 198 | +SSA classes and predicates should be stored in the `SsaImpl.qll` and `Ssa.qll`. |
| 199 | + |
| 200 | +## Type Tracking (TT) |
| 201 | + |
| 202 | +Type tracking is used to track the types of variables and expressions in Bicep code. |
| 203 | +Type tracking classes and predicates should be stored in the `ql/lib/codeql/bicep/typetracking` directory. |
| 204 | + |
| 205 | +## Concepts |
| 206 | + |
| 207 | +Concepts are used to define common patterns in code that can be used to identify vulnerabilities or security issues. |
| 208 | +Concepts classes and predicates should be stored in the `ql/lib/codeql/bicep/Concepts.qll` file. |
| 209 | + |
| 210 | +## Security Modules |
| 211 | + |
| 212 | +Each category of security issues should have its own module. |
| 213 | +These modules should be stored in the `ql/lib/codeql/bicep/security` directory. |
| 214 | + |
| 215 | +Security modules should use `Concept.qll` classes and modules to define the concepts related to the security issue. |
| 216 | + |
| 217 | +Each module should |
| 218 | + |
| 219 | +**Example:** |
| 220 | + |
| 221 | +```codeql |
| 222 | +private import bicep |
| 223 | +private import codeql.bicep.dataflow.DataFlow |
| 224 | +
|
| 225 | +module ${VulnerabilityModuleName} { |
| 226 | + /** A data flow source for the vulnerability. */ |
| 227 | + abstract class Source extends DataFlow::Node { } |
| 228 | +
|
| 229 | + /** A data flow sink for the vulnerability. */ |
| 230 | + abstract class Sink extends DataFlow::Node { } |
| 231 | +
|
| 232 | + /** A sanitizer for the vulnerabilities. */ |
| 233 | + abstract class Sanitizer extends DataFlow::Node { } |
| 234 | +
|
| 235 | + /** A source for the vulnerability that is related to the threat model. */ |
| 236 | + private class RemoteSources extends Source, ThreatModelSource { } |
| 237 | +
|
| 238 | + // TODO: Implement different sources, sinks, and sanitizers for SQL injection vulnerabilities. |
| 239 | +} |
| 240 | +``` |
| 241 | + |
| 242 | +## Documentation |
| 243 | + |
| 244 | +All classes and predicates should be documented using CodeQL comment blocks. |
| 245 | +Documentation should include a description of the class or predicate, its purpose, and any relevant examples. |
| 246 | +Documentation should be clear, concise, and easy to understand. |
| 247 | + |
| 248 | +Predicates such as `toString`, `getAPrimaryQlClass`, and `getAPrimaryQlModule` should NOT be documented. |
| 249 | + |
| 250 | +## Testing |
| 251 | + |
| 252 | +All tests should be stored in the `ql/tests/library-tests/` directory. |
| 253 | +AST, CFG, and Dataflow tests should be stored in the `ql/tests/library-tests/ast`, `ql/tests/library-tests/cfg`, and `ql/tests/library-tests/dataflow` directories respectively. |
| 254 | + |
| 255 | +Each test should be in a separate directory named after the test. |
| 256 | +Tests should contain the following files: |
| 257 | + |
| 258 | +- `${TestName}.ql`: The test file containing the CodeQL query. |
| 259 | + - This contains `query predicates` testing specific functionality of the library. |
| 260 | +- `Inline${TestName}.ql`: An inline test file that contains the CodeQL query. |
| 261 | + - This file should contain the inline tests for what we are looking for |
| 262 | +- `app.bicep`: A sample Bicep application file that contains the code to be tested. |
| 263 | + - This file should contain the Bicep code that is relevant to the test. |
| 264 | + - There should be multiple tests in the same file, each test should be separated by a comment block. |
| 265 | + |
| 266 | +### Inline Tests |
| 267 | + |
| 268 | +Inline tests are used to test specific functionality of the library. |
| 269 | +Inline tests should be stored in the `Inline${TestName}.ql` file. |
| 270 | +For testing AST, CFG, or DataFlow the query should tests the functionality being implemented. |
| 271 | +For queries, the inline test should be a query that tests sources, sinks, and sanitizers are inplace. |
| 272 | + |
| 273 | +**Example template:** |
| 274 | + |
| 275 | +```codeql |
| 276 | +import bicep |
| 277 | +import utils.InlineExpectationsTest |
| 278 | +
|
| 279 | +module InlineTest implements TestSig { |
| 280 | + string getARelevantTag() { result = ["${test1}", "${test2}"] } |
| 281 | +
|
| 282 | + predicate hasActualResult(Location location, string element, string tag, string value) { |
| 283 | + tag = "${Test1}" and |
| 284 | + exists(Variable var | |
| 285 | + element = var.getName() and |
| 286 | + value = typedecl.toString() and |
| 287 | + location = typedecl.getLocation() |
| 288 | + ) |
| 289 | + or |
| 290 | + tag = "${Test2}" and |
| 291 | + exists(Variable var | |
| 292 | + element = var.getName() and |
| 293 | + value = typedecl.toString() and |
| 294 | + location = typedecl.getLocation() |
| 295 | + ) |
| 296 | + // Add more tests as needed |
| 297 | + } |
| 298 | +} |
| 299 | +
|
| 300 | +import MakeTest<InlineTest> |
| 301 | +``` |
| 302 | + |
| 303 | +Check other inline tests in the `ql/tests/library-tests/ast/` directory for examples of how to implement inline tests. |
| 304 | + |
| 305 | +### Testing commands |
| 306 | + |
| 307 | +Run the following command to run the tests: |
| 308 | + |
| 309 | +```bash |
| 310 | +./scripts/run-tests.sh ./src/tests/library-tests/${TEST_DIR} |
| 311 | +``` |
| 312 | + |
| 313 | +Once run check the output of the command to ensure that all tests have passed. |
| 314 | +If the test has failed, check the test file and the implementation of the class to ensure that the test is correct. |
| 315 | +Iterate on the implementation of the class and the test until the test passes. |
0 commit comments