Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

options skip_function_bodies can not work with ReadBinaryIr #2534

Open
primstar-cool opened this issue Jan 25, 2025 · 18 comments
Open

options skip_function_bodies can not work with ReadBinaryIr #2534

primstar-cool opened this issue Jan 25, 2025 · 18 comments

Comments

@primstar-cool
Copy link

primstar-cool commented Jan 25, 2025

wabt::ReadBinaryOptions options;
    options.skip_function_bodies = true;

wabt::Result errorCode = wabt::ReadBinaryIr("", wasmData, fileSize, options, &errors,
&module);

will got an assert
at function [Result BinaryReaderIR::EndFunctionBody(Index index)]
of file
[binary-reader-ir.cc]

the reason is: PushLabel at BinaryReaderIR::BeginFunctionBody, but not consume it when 'skip_function_bodies' turns on.

I think it should add a Step names SkipExpr or trans options on BeginFunctionBody.

BTW, i got another problem, when parse a big wasm in Android, it will crash randomly when parse function code, but same code work well in MacOS. when i skip_function_bodies, it works well.
i will provide detail later.

@primstar-cool
Copy link
Author

the bug of run in Android, I'v got the error "Can't populate more pages for size class 224",
seems allocate mem error for too many string

@primstar-cool
Copy link
Author

primstar-cool commented Jan 25, 2025

i think the reason is allocating too mush times. a wasm of a unity demo game, will new 14,767,201 times.

here is my solution:

extends BinaryReaderIR, replace [std::make_unique] refer [AppendExpr], to [spec_make_unique]
then override new operate in [spec_make_unique]
alloc all unique_ptr in a big buffer;

before ~Module
clear expr in module.funcs.exprs
call expr.clear will call delete, so we should use extract_back() and release, until exprs empty

then it will run well in Android.

i hopes there is an official version with override memery alloc

@primstar-cool
Copy link
Author

with using mem pool(release big buffer at end, and each opCode save some bytes and time for avoiding manage free memory),
but it cost more than 1.6g mem, it still runs poorly on low-performance phones.

I will make a branch for saving memory.

@primstar-cool
Copy link
Author

i'v make a pr, solve the error of ReadBinaryIr when options.skip_function_bodies=true
and make a virtual op code, can output original byte code when skip_function_bodies=true.

#2535

@SoniEx2
Copy link
Collaborator

SoniEx2 commented Jan 26, 2025

do you need the IR? why not use binary reader directly?

@primstar-cool
Copy link
Author

primstar-cool commented Jan 27, 2025

A typical wasm size of an Unity game, is about 50-70M
with 15,000,000+ Func exprs.
in my case, i will load wasm and modify it in phone(android / ios).
parse an 57M wasm will cost more than 1.5G mem. App will crash in Android on mem alloc ("Can't populate more pages for size class 224").

first step, i override operate new, for func expr will not delete at decode wasm, so they can be allocate solid in a big buffer.
at the end, i will call destructor of each exprs and free the big buffer manually before Module destructor.

but, memory is still insufficient on low-end iOS devices.
so i impl the options skip_function_bodies. it can hold raw bytecode.
and then, we can extract it one by one.

@SoniEx2
Copy link
Collaborator

SoniEx2 commented Jan 27, 2025

yeah, that happens when using the IR. why not use ReadBinary instead of ReadBinaryIr? that one gives you a stream of instructions you can modify on the fly.

@sbc100
Copy link
Member

sbc100 commented Jan 27, 2025

A typical wasm size of an Unity game, is about 50-70M
with 15,000,000+ Func exprs.
in my case, i will load wasm and modify it in phone(android / ios).

Can you explain what you are trying to do here? For example, it sounds like you are trying to run a unity game on an phone using the wabt interpreter? Is that right?

@primstar-cool
Copy link
Author

load local wasm , and mod or inject sth in byte code.
when we inject a new import function, each func-call index will be changed, so make a diff-file is a little big. so decode wasm, then merge patch in phone is effective.

libbinaryen can process the task in low mem, but libbinaryen is a bit slowly. libwabt decode 250% faster than libbinaryen. (libbinaryen 5 second vs libwabt 2s)

what i did:
use a struct to save raw byte code when skip_function_bodies turns on.
if we need patch functions, extract the struct, modify, and pack the struct one by one;
whatever extracted or not, the new struct can output byte code correctly.

@SoniEx2
Copy link
Collaborator

SoniEx2 commented Jan 27, 2025

that's what we figured you were doing, and we would still recommend using ReadBinary instead. it's a streaming parser so you only need enough RAM for roughly two copies of the wasm (the original and the modded), plus you still get decoded bytecode instead of raw bytecode.

@primstar-cool
Copy link
Author

primstar-cool commented Jan 27, 2025

does it means that, i impl a new BinaryReaderDelegate?
but if i save the decoded Expr, it will cost same mem as BinaryReaderIr (57M wasm file cost 1.6G mem).
the code in BinaryReaderIr just save it into a vector, I can't do it better. if i defined a small struct to save decoded data, it seems a too complex proj.
i need modify the wasm after whole wasm decoded, then find dist function, decoded function body, modify , and re-pack it to binary code to save memory.

so the point is, i can't decode function bodies at first step, but needs byte code of funcs for second-pass decode.
skip_function_bodies is an read options, when turn it on, BinaryReaderIr will assert on reading, and module will not output again.
i think, if we turn the option skip_function_bodies on, it will at least output the same file.

here is my demo Pseudocode

void addImportFunc() {

  module.AppendField(std::make_unique<wabt::ImportModuleField>(
      std::move(newImport), wabt::Location()));
  // update elem id...
  // update export id...
  // update func binding id...
  // update call Expr as follows:
  class CallIdModifier : public wabt::ExprVisitor::DelegateNop {
  public:
    int importStart;
    wabt::Module *module;

    wabt::Result OnCallExpr(wabt::CallExpr *call_expr) override {
      // return wabt::Result::Ok;
      if (call_expr->var.is_index() && call_expr->var.index() >= importStart) {
        call_expr->var.set_index(call_expr->var.index() + 1);
      }
      return wabt::Result::Ok;
    }

    wabt::Result
    OnOpcodeRawExpr(wabt::OpcodeRawExpr *opcode_raw_expr) override {
        wabt::ReadBinaryOptions read_options;
        wabt::ExtractOpcodeRawExpr(*opcode_raw_expr, *module, read_options);
        return wabt::Result::Ok;
    }
  };

  for (auto func : module.funcs) {
    ev.VisitExprList(func->exprs);
   // repackaging opcode here...
  }
}

in pc , we turned skip_function_bodies off, and in phone we skip_function_bodies skip_function_bodies on,
use ExtractOpcodeRawExpr when visit or by manual, and PackOpcodeRawExpr after modify.
extracted or unextracted Expr will output correct byte code as follows

...
case ExprType::OpCodeRaw: {
  auto* opcode_raw_expr = cast<OpcodeRawExpr>(expr);
  if (opcode_raw_expr->is_extracted) {
    WriteExprList(func, opcode_raw_expr->extracted_exprs);
  } else {
    auto& opcode_buffer = opcode_raw_expr->opcode_buffer;
    // Opcode::End 0x0b
    assert(opcode_buffer[opcode_buffer.size() - 1] == 0x0b);
    stream_->WriteData(opcode_buffer.data(), opcode_buffer.size() - 1);
  }
}

@SoniEx2
Copy link
Collaborator

SoniEx2 commented Jan 27, 2025

the webassembly binary format is mostly designed for single-pass processing, this is why imports come before anything else.

so the first thing that would be decoded would be the number of imports. you'd then immediately start emitting a new webassembly binary, with the extra imports added, and then go on to process the elem id, export id, etc. no need to save the decoded forms.

@primstar-cool
Copy link
Author

primstar-cool commented Jan 27, 2025

i think the read option skip_function_bodies is designed for quick analysis other blocks, but i think it should retain the ability to output again. I just enhanced the compatibility of this option, even though it won't be used in most cases.
the new commit provided two static functions for extracting and packing raw data.

when we add a new import function, the func index must insert at the import block, this means that, every call cmd which id >= num_func_imports should add an offset (+1).
the data in [function table] which func index >= num_func_imports, should add an offset also. same as exports_segment;
update elem_segment and exports_segment is just 10 lines in total, it's not important.
visit all func exprs is not difficult too, I spent half a day debugging on the PC and got it working.
but in mobile device, it always crash for low memory.

steaming decode and encode can save memory, but it seems a big project, does writeModule accept an incomplete module? when On****Expr, encode and put it in an output stream, and release the Expr?

so the first commit is just save original bytes when option skip_function_bodies turns on.
then i can modify it outside.
in some case i just need to patch 1~2 funcs, so i need not decode all function bodies.
the new feature is wrapped by a new macro WABT_KEEP_OPCODE_WHEN_SKIP_FUNC_BODY.
extract and pack need use private func ReadInstructions and WriteExprList. so in new commit, i made 2 static funcs for user.

Modifying WASM on low-memory devices may be a rare requirement, but fortunately, the amount of modification needed is not significant.

@SoniEx2
Copy link
Collaborator

SoniEx2 commented Jan 27, 2025

oh wait, we don't really have a streaming module writer. hmm...

(is it needed? how hard would it be to write out a module as it's being read? maybe this is something that could be improved.)

@primstar-cool
Copy link
Author

primstar-cool commented Jan 28, 2025

in my case, modify wasm in stream is a bit hard for I must add all import function before i scaned the function section, and i need to save lots of context info for modifying.

I wonder what dose the option skip_function_bodies usually used for ?
Maybe using virtual ExprType is an urgly way to implement it, but i think retaining the original function_bodies is necessary
virtual opcode OpCodeRaw can implment [loop], [visit], [write], [log], [valid] with only a little code. virtual ExprTypeis compatible to [join] [insert] to other exprsList. and when the expr extracted, it is similar to a BlockExpr without a Label.
when we do a coverage testing, we just insert a callExpr at the beginning of this function, I don't even need to unpack this struct, it will be connected with other instructions and outputted as-is.

I also think it's a bit ugly for add an virtual ExprType, maybe a spec struct is a better choice. but when I noticed that there is a lot of code that switches on (eptr->type()), creating a dedicated struct might lead to a large number of modifications. so I chose the current solution.

btw, i handle wabt_expr_make_unique with a solid mem pool, it significantly improve the decoding performance of expr.
in my test, decode time reduce from 1.51s to 0.88s, and reduce some mem usage(avoid create headers of memory block headers nor memory alignment, no memory fragmentation). i think libwabt can make a official version of expr mem alloc.

@SoniEx2
Copy link
Collaborator

SoniEx2 commented Jan 28, 2025

we believe skip_function_bodies was never designed for use with ReadBinaryIr. it should probably have rejected it. (in fact, it seems to only be used by objdump.)

@sbc100
Copy link
Member

sbc100 commented Jan 28, 2025

Yes, skip_function_bodies was designed for by used by objdump se you can do things like objdump -h to see the all the file headers in O(1) time without decoding the instruction stream.

@primstar-cool
Copy link
Author

primstar-cool commented Jan 28, 2025

I didn't know what this option was for at first, so i wrapped the feature in a macro, although a simple memcpy will cost only little time in DelegateNop. it can work with ReadBinaryIr when turned the macro on. when this macro is disabled, it has no side effects.
if someone else need modify wasm in low-mem device, the feature is useful for that (maybe some people also have this requirement (low-memory environments), after finding it unworkable, they turned to libbinaryen, although it is larger and slower. So it may not be a spurious need. In fact, we use libbinaryen at the beginning, but it parses 2.5 times slower than wabt, so we switched to this library. If there will be an official low-peak-memory mode in the future, we would be very willing to switch to official mode (eg. streaming decode and encode).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants