Skip to content

Add Image Transformer Library #679

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 42 commits into from

Conversation

jonpsy
Copy link
Contributor

@jonpsy jonpsy commented Sep 9, 2021

@lu-wang-g @wangtz

TODO

  • ImageTransformerOptions.proto files and fill in BUILD.
  • PostProcess logic using a class derived from PostProcessor.
  • Ensure dynamic image support exist.
  • .h and .cc file for ImageTransformer.
  • Use Postprocessor class to delegate the task.
  • Add test and confirm working.
  • Write a demo app.

@google-cla google-cla bot added the cla: yes label Sep 9, 2021
// `base_options` to specifying the TFLite model and using
// `base_options.compute_settings.tflite_settings.cpu_settings.num_threads`,
// to configure the number of threads.
optional int32 num_threads = 7 [default = -1];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_threads is configured through compute_settings, so can be removed too.

@jonpsy jonpsy force-pushed the image_transformer branch from f7310c0 to fe089a3 Compare October 7, 2021 11:19
Comment on lines 98 to 99
if (options.base_options().compute_settings().tflite_settings().cpu_settings().num_threads() == 0 ||
options.base_options().compute_settings().tflite_settings().cpu_settings().num_threads() < -1) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the correct way to get num_threads?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct. But you don't need to check num_threads specifically. It has been verified here when the base task is created.

Comment on lines 170 to 172
// TODO: Will the output be float and should be converted or directly available?
// The example had float and it had to be converted. Anyway, we're guaranteed to have uint8 as output.
has_uint8_outputs_ = (output_tensor->type == kTfLiteUInt8);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, here the model will most likely output kTfLiteFloat32 right? Then we convert it to uint8_t type explicitly in Preprocess? Or does all ImageTransformer model should always have kTfLiteuint8 type as output tensor type?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both situation will happen. If the output is float, it most likely will be in the same range as the input image, meaning if the input is normalized to [0, 1], you need to denormalized the output from [0, 1] to [0, 255]. We'll need to implement for both. See the style transfer model here: https://www.tensorflow.org/lite/examples/style_transfer/overview#performance_benchmarks

@jonpsy
Copy link
Contributor Author

jonpsy commented Oct 27, 2021

Hey @lu-wang-g, we decided upon the

  tflite::support::StatusOr<std::unique_ptr<::tflite::task::vision::FrameBuffer>> Postprocess();

API for Postprocess but it looks like the current BaseVisionTaskAPI is still using the old API. Any idea when it'd be changed to the new? Based on that I can make it work on the old API just fine.

Tagging @xunkai55 @wangtz for their opinion as well. Cheers.

Comment on lines +155 to +117
StatusOr<FrameBuffer> ImageTransformer::Postprocess(
const std::vector<const TfLiteTensor*>& /*output_tensors*/,
const FrameBuffer& /*frame_buffer*/, const BoundingBox& /*roi*/) {
Copy link
Contributor Author

@jonpsy jonpsy Oct 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very well, I've come up with a compromise. For now I'm using the old API but not using any of the arguments. When the new API lands, one can just remove the arguments and it will work like charm.

Also @lu-wang-g , it looks like we can't use std::unique_ptr<FrameBuffer> as return type, so I'm returning plain FrameBuffer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my comment, "StatusOr<unique_ptr> ImagePostprocessor::Postprocess();" in the doc, it was referring to creating an ImagePostprocessor class, similar to ImagePreprocessor. Then ImageTransformer::Postprocess can all ImagePostprocessor::Postprocess(), just like ImageEmbedder::Postprocess.

Agreed we should return StatusOr<FrameBuffer> from ImageTransformer::Postprocess and ImagePostprocessor::Postprocess.

You may noticed that EmbeddingPostprocessor::Postprocess takes the output embedding object, instead of creating and returning a new embedding object. This is due to some legacy reasons, and we'll change it to the following API later:

StatusOr<Embeddings> EmbeddingPostprocessor::Postprocess()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see! Sorry I missed it before, so we should delegate the postprocess task to ImagePostprocessor class. Right? Will make that change in a flash.

Copy link
Contributor Author

@jonpsy jonpsy Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lu-wang-g For passing options, we can pass NormalizationOptions to the postprocessor sounds cool?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, that's acceptable.

Comment on lines +155 to +117
StatusOr<FrameBuffer> ImageTransformer::Postprocess(
const std::vector<const TfLiteTensor*>& /*output_tensors*/,
const FrameBuffer& /*frame_buffer*/, const BoundingBox& /*roi*/) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my comment, "StatusOr<unique_ptr> ImagePostprocessor::Postprocess();" in the doc, it was referring to creating an ImagePostprocessor class, similar to ImagePreprocessor. Then ImageTransformer::Postprocess can all ImagePostprocessor::Postprocess(), just like ImageEmbedder::Postprocess.

Agreed we should return StatusOr<FrameBuffer> from ImageTransformer::Postprocess and ImagePostprocessor::Postprocess.

You may noticed that EmbeddingPostprocessor::Postprocess takes the output embedding object, instead of creating and returning a new embedding object. This is due to some legacy reasons, and we'll change it to the following API later:

StatusOr<Embeddings> EmbeddingPostprocessor::Postprocess()

// results will be filled.
//
// An example of such model can be found at:
// https://tfhub.dev/bohemian-visual-recognition-alliance/lite-model/models/mushroom-identification_v1/1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use this model as an example for testing.

@jonpsy jonpsy force-pushed the image_transformer branch from 8a1274d to 26a9783 Compare October 29, 2021 18:38
@jonpsy
Copy link
Contributor Author

jonpsy commented Nov 1, 2021

Thoughts on testing @lu-wang-g ?

Comment on lines 41 to 43
Create(core::TfLiteEngine* engine,
const std::initializer_list<int> output_indices,
std::unique_ptr<vision::NormalizationOptions> options);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a second thought, this API may be improved by the following two paths:

  1. (most common use case) If the output tensor use the same processing metadata as the input tensor, instead of passing in normalization params, we can pass the input input_indices. Then ImagePostprocessor can read whatever metadata associated with the input tensor.
  2. If the output tensor use a different processing metadata as the input tensor, it should be populated in the metadata. And ImagePostprocessor should read it just like ImagePreprocessor.

What do you think?

Copy link
Contributor Author

@jonpsy jonpsy Nov 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, but won't this break the existing design pattern? To implement this, I think we can extract input metadata from image_tensor_specs utility by getting metadata_extractor() from our engine. Something like this could work

  // Gather metadata
  auto* output_metadata =
      engine_->metadata_extractor()->GetOutputTensorMetadata(output_index);
  auto* input_metadata =
      engine_->metadata_extractor()->GetInputTensorMetadata(input_index);

  // Use input metadata for normalization as fallback.
  auto* processing_metadata = output_metadata != nullptr ? output_metadata : input_metadata; 

  absl::optional<vision::NormalizationOptions> normalization_options;
  ASSIGN_OR_RETURN(normalization_options,
                     GetNormalizationOptionsIfAny(*processing_metadata))

Thoughts on handling NULL cases?

  1. InputMetaData shouldn't ever be NULL I guess for this task, also NormalizationOptions shouldn't be NULL as well.
  2. What if OutputMetadata exists but it doesn't have NormalizationOptions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. InputMetaData shouldn't ever be NULL I guess for this task, also NormalizationOptions shouldn't be NULL as well.

Those has been verified here. You can share large amount of code with image_preprocessor (for both initialization and process ), that they both read metadata from a specified tensor (either input or output), and process FrameBuffer according to the metadata.

  1. What if OutputMetadata exists but it doesn't have NormalizationOptions?
    If the tensor is unit8, it's OK. But if the tensor is float, raise an error. See here.

Copy link
Contributor Author

@jonpsy jonpsy Nov 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those has been verified here.

In BuildInputTensorSpecs, when InputMetaData is nullptr it just skips over, and fills normalization_options as abs::opt. But then you're accessing it inside ImagePreprocessor::Preprocess here. Is this undefined behaviour?

P.S. Nvm figured it out! So it checks for abs::opt in later stage there. Nice! Can you have a look at the code again and see if it fits your description?

@lu-wang-g
Copy link
Member

You can have an end-2-end test using ImageTransformer to validate the implementation. Try the style transfer model here: https://www.tensorflow.org/lite/examples/style_transfer/overview#performance_benchmarks

@jonpsy
Copy link
Contributor Author

jonpsy commented Nov 10, 2021

You can have an end-2-end test using ImageTransformer to validate the implementation. Try the style transfer model here: https://www.tensorflow.org/lite/examples/style_transfer/overview#performance_benchmarks

@lu-wang-g Correct me if I'm wrong, but this requires 2 input images, aka two tensors. But our entire library is, currently, based on single input image. So, what's the solution here?

Comment on lines +89 to +135
ASSIGN_OR_RETURN(normalization_options,
GetNormalizationOptionsIfAny(*processing_metadata));
Copy link
Contributor Author

@jonpsy jonpsy Nov 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since GetNormalizationOptionsIfAny was wrapped inside unknown namespace inside image_tensor_specs.cc we might need to copy-paste the code here unfortunately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please share as much code as possible betweenImagePostprocessor::Init and ImageTensorSpecs::BuildInputImageTensorSpecs. You can put a todo and implement it in a follow up PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just so we're clear, do you mean copy code from BuildInputImageTensorSpecs when you said "share".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Share code" means ImagePostprocessor::Init and ImageTensorSpecs::BuildInputImageTensorSpecs use the same piece of code to do processing or validation.

public:
static tflite::support::StatusOr<std::unique_ptr<ImagePostprocessor>>
Create(core::TfLiteEngine* engine,
const std::initializer_list<int> output_indices.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use int instead of std::initializer_list for output_index, since it only supports one output tensor. Same for input_index.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document what input_index is above Create().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't because Postprocessor class stores std::vector<int> output_indices_ as state variable. I've continued the same pattern for input_indices. We can just access output_indices_.at(0). Sounds good?

const std::initializer_list<int> output_indices.
const std::initializer_list<int> input_indices);

// Processes the provided vision::FrameBuffer and populate tensor values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the comment accordingly to reflect real functionality of Postprocess().

// is currently detected by checking if all output tensors data type is uint8.
bool has_uint8_outputs_;

absl::Status Init(std::unique_ptr<vision::NormalizationOptions> options);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "absl::Status Init(std::unique_ptrvision::NormalizationOptions options);" implemented anywhere?

Copy link
Contributor Author

@jonpsy jonpsy Nov 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Should be Init(const std::vector<int>& input_indices) instead. Updated it, thanks.

Comment on lines +89 to +135
ASSIGN_OR_RETURN(normalization_options,
GetNormalizationOptionsIfAny(*processing_metadata));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please share as much code as possible betweenImagePostprocessor::Init and ImageTensorSpecs::BuildInputImageTensorSpecs. You can put a todo and implement it in a follow up PR.

@jonpsy
Copy link
Contributor Author

jonpsy commented Nov 17, 2021

@lu-wang-g Looks like there were some changes in the API? I've updated the PR accordingly. Also added filled the model metadata using this script.

@lu-wang-g
Copy link
Member

Thanks Sai for updating the code! Yes, some internal API is changed which could affect your code here. I'll take a look at the changes on Friday (catching up a ddl today, and team off-site event tomorrow. Sorry about the delay!).

@jonpsy
Copy link
Contributor Author

jonpsy commented Nov 18, 2021

I'm getting worse PSNR score than the one in the colab notebook. Either my PSNR implementation is wrong or I haven't taken images correctly. I'll fix this.

Another concern is, the ImageTransformer fails to CreateFromOptions when only InputMetadata is provided. Not sure why, I'll investigate this further.

@jonpsy
Copy link
Contributor Author

jonpsy commented Nov 23, 2021

@lu-wang-g Hi, in this notebook. Why this happens?

def preprocess_image(image_path):
.. 
..
  hr_size = (tf.convert_to_tensor(hr_image.shape[:-1]) // 4) * 4
  hr_image = tf.image.crop_to_bounding_box(hr_image, 0, 0, hr_size[0], hr_size[1])
  hr_image = tf.cast(hr_image, tf.float32)

It is converting my (50 x 50) image into (48 x 48) image. The inference engine is pretty strict on (50 x 50) as input.

@lu-wang-g
Copy link
Member

@lu-wang-g Hi, in this notebook. Why this happens?

def preprocess_image(image_path):
.. 
..
  hr_size = (tf.convert_to_tensor(hr_image.shape[:-1]) // 4) * 4
  hr_image = tf.image.crop_to_bounding_box(hr_image, 0, 0, hr_size[0], hr_size[1])
  hr_image = tf.cast(hr_image, tf.float32)

It is converting my (50 x 50) image into (48 x 48) image. The inference engine is pretty strict on (50 x 50) as input.

This is the TF model, which is different from the TFLite model. I think here, it tries to align the width/height to the multiples of 4. You can ignore this in the TFLite implementation.

@jonpsy
Copy link
Contributor Author

jonpsy commented Nov 24, 2021

@lu-wang-g Reading the same file using tf.io.decode_jpg and DecodeImage is giving different values.

image

image

They're off by 1-2 , but since I'm comparing PSNR these changes are effecting a lot. What do you think is the cause?

@lu-wang-g
Copy link
Member

@lu-wang-g Reading the same file using tf.io.decode_jpg and DecodeImage is giving different values.

image

image

They're off by 1-2 , but since I'm comparing PSNR these changes are effecting a lot. What do you think is the cause?

That's a known issue. It is due to different Jepg reader algorithms used in tf.io.decode_jpg and DecodeImage. For testing purpose in Task library, you can use DecodeImage to parse your image.

@jonpsy
Copy link
Contributor Author

jonpsy commented Nov 25, 2021

Okay, but in our test I wanted to test against a known PSNR value, with a known set of images. Which is why I loaded the same image on colab (as in the figure above) and our testing model. But since they're reading different value, PSNR value come off very different.

So, how do I perform PSNR test? Or, should I perform PSNR test?

// Use a bi-cubically downsampled image as input to the model and compare
// the model output with the original image. Expected PSNR taken from:
// https://colab.research.google.com/github/tensorflow/hub/blob/master/examples/colab/image_enhancing.ipynb.
TEST_F(SuperResolutionTest, PSNRTest) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lu-wang-g , continuing above comment. i was talkin about this test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much is your PSNR value? Is your output image clear than the input image, like is it enhanced?

Copy link
Contributor Author

@jonpsy jonpsy Nov 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't particularly bad, around 25.7081, expected value is 29.799528. I haven't inspected the output image. How do I convert FrameBuffer to jpeg?

@jonpsy
Copy link
Contributor Author

jonpsy commented Dec 2, 2021

After doing EncodeToPng(), the output png is something like this.

husky_error

Possible reasons could be:

  • Corrupted model: To verify this, I used ipynb notebook and fed image to same model. Which means this is not the case.
  • Incorrect postprocessing: To verify this, I used ofstream to dumped the output of FrameBuffer result_or.value() to .txt file and read it in ipynb. Then, I performed RMSE loss between image_transformer output vs python inference with the same tflite model, the loss was around ~2.222 which is not bad. Which means this is not the case either.

This confirms the error happens during writing to png. Have a look at the notebook I created.

Model output from python inference
husky_python

Model output from reading FrameBuffer output via txt file
husky_cpp

Looks almost same right? That means our framework works properly, but storing to png is causing this weird behaviour. I'm not sure why though, maybe because input image dims is not divisible by 4? Let me know what you think @lu-wang-g

@lu-wang-g
Copy link
Member

o

It is probably due to misusage of EncodeToPng, i.e. width and height are reversed. If you already have a working output image (as you pasted in the thread), just use that one in test. Good to confirm that the pipeline is good.

@jonpsy
Copy link
Contributor Author

jonpsy commented Dec 3, 2021

o

It is probably due to misusage of EncodeToPng, i.e. width and height are reversed.

But then height and width have the same value (200), so not sure what's the reason. Let's discuss this over #718 since this PR isn't to blame.

If you already have a working output image (as you pasted in the thread), just use that one in test. Good to confirm that the pipeline is good.

Yeah, I'll use the image I got from dumping & displaying in ipynb. This should seal the deal with this PR.

@jonpsy
Copy link
Contributor Author

jonpsy commented Dec 3, 2021

@lu-wang-g I'm closing this PR & opening dividing into two as discussed.

@jonpsy jonpsy closed this Dec 3, 2021
@jonpsy jonpsy mentioned this pull request Dec 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants