-
Notifications
You must be signed in to change notification settings - Fork 520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-3155. Improved ozone client flush implementation to make it faster. #716
Conversation
@bshashikant |
I checked the behaviour in HDFS and it seems like approach here makes ozone flush() similar to what HDFS flush is currently doing. We can also have an implementation as @xiaoyuyao mentioned like a time based flush on lines similar to S3AFlush()? @xiaoyuyao , what do you think? |
agree, a time based flush could work better for object store like workload in ozone/s3. We might also implement size based and time based flush as different options to fit different requirements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good. Can you please add a unit test too?
@@ -149,6 +149,17 @@ | |||
public static final TimeDuration OZONE_CLIENT_RETRY_INTERVAL_DEFAULT = | |||
TimeDuration.valueOf(0, TimeUnit.MILLISECONDS); | |||
|
|||
/** | |||
* If this value is true, when the client calls the flush() method, | |||
* we will checks whether the data in the buffer is greater than |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's change "we will checks" to "it checks"
Thanks @captainzmc for updating the patch. The patch looks good. Can you remove the variables related to the metric count in the unit test as these are not used/referred anywhere else? |
hi @bshashikant Thanks for your review, now I had delete useless metrics in new ut testFlushChunkDelay. |
What changes were proposed in this pull request?
When we run MR Job (with 1000 maps) based on OzoneFileSystem. After the map and reduce has finished 100%, the appmaster pauses More than 40 minutes .
20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100%
20/03/05 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed successfully
It turns out that the appmaster writes all the task events to the log one by one, calling flush once for each one. This operation is very time consuming in ozone.
HDFS currently has two flush ports, flush () and hflush ().
flush() : flush the data from client buffer to the client package (dfs.write.packet.size default 64k). If the package is not full, it will not be sent to the datanode.
hflush(): each invocation sends the data in the buffer to the datanode.
Now, ozone's flush is more similar to HDFS's hflush. This PR adds an implementation of flush similar to HDFS‘s flush. Using ozone.client.stream.buffer.flush.delay to control whether to enable(not enabled by default). If we enabled it, when we call the flush() method, we will determine whether the data in the current buffer is greater than ozone.client.stream.buffer.size. If greater than, we will send it to the datanode. Otherwise, we will not send it.
The flush performance has been significantly improved through testing. The job is no longer blocked, It will take 1 second to exit after MR finished.
20/03/25 11:04:04 INFO mapreduce.Job: map 100% reduce 100%
20/03/25 11:04:05 INFO mapreduce.Job: Job job_1585104739905_0002 completed successfully
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-3155
How was this patch tested?
Run yarn on the ozone, perform the testdfsio job below, start a thousand maps. And see the exit time after map and reduce 100%.
hadoop jar /path/of/hadoop-mapreduce-client-jobclient-2.8.5-tests.jar TestDFSIO -write -nrFiles 1000 -fileSize 1KB -resFile /tmp/dfsio-write.out
Add the following configuration in ozone-site.xml and repeat the above command to see the execution.
<property>
<name>ozone.client.stream.buffer.flush.delay</name>
<value>true</value>
</property>