HDDS-3155. Improved ozone client flush implementation to make it faster. #716

captainzmc · 2020-03-24T14:10:01Z

What changes were proposed in this pull request?

When we run MR Job (with 1000 maps) based on OzoneFileSystem. After the map and reduce has finished 100%, the appmaster pauses More than 40 minutes .
20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100%
20/03/05 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed successfully
It turns out that the appmaster writes all the task events to the log one by one, calling flush once for each one. This operation is very time consuming in ozone.

HDFS currently has two flush ports, flush () and hflush ().
flush() : flush the data from client buffer to the client package (dfs.write.packet.size default 64k). If the package is not full, it will not be sent to the datanode.
hflush(): each invocation sends the data in the buffer to the datanode.

Now, ozone's flush is more similar to HDFS's hflush. This PR adds an implementation of flush similar to HDFS‘s flush. Using ozone.client.stream.buffer.flush.delay to control whether to enable(not enabled by default). If we enabled it, when we call the flush() method, we will determine whether the data in the current buffer is greater than ozone.client.stream.buffer.size. If greater than, we will send it to the datanode. Otherwise, we will not send it.

The flush performance has been significantly improved through testing. The job is no longer blocked, It will take 1 second to exit after MR finished.
20/03/25 11:04:04 INFO mapreduce.Job: map 100% reduce 100%
20/03/25 11:04:05 INFO mapreduce.Job: Job job_1585104739905_0002 completed successfully

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-3155

How was this patch tested?

Run yarn on the ozone, perform the testdfsio job below, start a thousand maps. And see the exit time after map and reduce 100%.
hadoop jar /path/of/hadoop-mapreduce-client-jobclient-2.8.5-tests.jar TestDFSIO -write -nrFiles 1000 -fileSize 1KB -resFile /tmp/dfsio-write.out

Add the following configuration in ozone-site.xml and repeat the above command to see the execution.
<property>
<name>ozone.client.stream.buffer.flush.delay</name>
<value>true</value>
</property>

captainzmc · 2020-03-31T13:02:45Z

@bshashikant
Yes, I'll turn off the PR first. I will submit another PR and refresh the buffer with a timed task.

bshashikant · 2020-04-13T12:13:33Z

I checked the behaviour in HDFS and it seems like approach here makes ozone flush() similar to what HDFS flush is currently doing.

We can also have an implementation as @xiaoyuyao mentioned like a time based flush on lines similar to S3AFlush()?

@xiaoyuyao , what do you think?

xiaoyuyao · 2020-04-15T23:28:33Z

agree, a time based flush could work better for object store like workload in ozone/s3. We might also implement size based and time based flush as different options to fit different requirements.

bshashikant

The changes look good. Can you please add a unit test too?

bshashikant · 2020-04-16T13:32:35Z

hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/OzoneConfigKeys.java

@@ -149,6 +149,17 @@
  public static final TimeDuration OZONE_CLIENT_RETRY_INTERVAL_DEFAULT =
      TimeDuration.valueOf(0, TimeUnit.MILLISECONDS);

+  /**
+   * If this value is true, when the client calls the flush() method,
+   * we will checks whether the data in the buffer is greater than


let's change "we will checks" to "it checks"

bshashikant · 2020-04-20T08:20:53Z

Thanks @captainzmc for updating the patch. The patch looks good. Can you remove the variables related to the metric count in the unit test as these are not used/referred anywhere else?

captainzmc · 2020-04-22T03:10:49Z

hi @bshashikant Thanks for your review, now I had delete useless metrics in new ut testFlushChunkDelay.

support for data delay flush in buffers

d2a1f57

captainzmc closed this Mar 24, 2020

fix checkstyle

0c32bec

captainzmc reopened this Mar 24, 2020

fix checkstyle

e02e02b

captainzmc closed this Mar 31, 2020

bshashikant reopened this Apr 13, 2020

bshashikant reviewed Apr 16, 2020

View reviewed changes

captainzmc added 2 commits April 18, 2020 12:23

fix review issue and add ut

2ffe86a

fix ut

58233ac

captainzmc closed this Apr 18, 2020

captainzmc reopened this Apr 18, 2020

captainzmc closed this Apr 18, 2020

captainzmc reopened this Apr 18, 2020

remove useless metrics in new ut testFlushChunkDelay

26e27ea

captainzmc force-pushed the HDDS-3155 branch from 22baf79 to 26e27ea Compare April 22, 2020 01:50

bshashikant approved these changes Apr 22, 2020

View reviewed changes

bshashikant merged commit 2098516 into apache:master Apr 22, 2020

captainzmc mentioned this pull request May 22, 2020

HDDS-3477. Disable partial chunk write during flush() call in ozone client by default. #957

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDDS-3155. Improved ozone client flush implementation to make it faster. #716

HDDS-3155. Improved ozone client flush implementation to make it faster. #716

captainzmc commented Mar 24, 2020 •

edited

Loading

captainzmc commented Mar 31, 2020

bshashikant commented Apr 13, 2020

xiaoyuyao commented Apr 15, 2020

bshashikant left a comment

bshashikant Apr 16, 2020

bshashikant commented Apr 20, 2020

captainzmc commented Apr 22, 2020

HDDS-3155. Improved ozone client flush implementation to make it faster. #716

HDDS-3155. Improved ozone client flush implementation to make it faster. #716

Conversation

captainzmc commented Mar 24, 2020 • edited Loading

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

captainzmc commented Mar 31, 2020

bshashikant commented Apr 13, 2020

xiaoyuyao commented Apr 15, 2020

bshashikant left a comment

Choose a reason for hiding this comment

bshashikant Apr 16, 2020

Choose a reason for hiding this comment

bshashikant commented Apr 20, 2020

captainzmc commented Apr 22, 2020

captainzmc commented Mar 24, 2020 •

edited

Loading