Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-3155. Improved ozone client flush implementation to make it faster. #716

Merged
merged 6 commits into from
Apr 22, 2020

Conversation

captainzmc
Copy link
Member

@captainzmc captainzmc commented Mar 24, 2020

What changes were proposed in this pull request?

When we run MR Job (with 1000 maps) based on OzoneFileSystem. After the map and reduce has finished 100%, the appmaster pauses More than 40 minutes .
20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100%
20/03/05 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed successfully
It turns out that the appmaster writes all the task events to the log one by one, calling flush once for each one. This operation is very time consuming in ozone.

HDFS currently has two flush ports, flush () and hflush ().
flush() : flush the data from client buffer to the client package (dfs.write.packet.size default 64k). If the package is not full, it will not be sent to the datanode.
hflush(): each invocation sends the data in the buffer to the datanode.

Now, ozone's flush is more similar to HDFS's hflush. This PR adds an implementation of flush similar to HDFS‘s flush. Using ozone.client.stream.buffer.flush.delay to control whether to enable(not enabled by default). If we enabled it, when we call the flush() method, we will determine whether the data in the current buffer is greater than ozone.client.stream.buffer.size. If greater than, we will send it to the datanode. Otherwise, we will not send it.

The flush performance has been significantly improved through testing. The job is no longer blocked, It will take 1 second to exit after MR finished.
20/03/25 11:04:04 INFO mapreduce.Job: map 100% reduce 100%
20/03/25 11:04:05 INFO mapreduce.Job: Job job_1585104739905_0002 completed successfully

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-3155

How was this patch tested?

Run yarn on the ozone, perform the testdfsio job below, start a thousand maps. And see the exit time after map and reduce 100%.
hadoop jar /path/of/hadoop-mapreduce-client-jobclient-2.8.5-tests.jar TestDFSIO -write -nrFiles 1000 -fileSize 1KB -resFile /tmp/dfsio-write.out

Add the following configuration in ozone-site.xml and repeat the above command to see the execution.
<property>
<name>ozone.client.stream.buffer.flush.delay</name>
<value>true</value>
</property>

@captainzmc captainzmc closed this Mar 24, 2020
@captainzmc captainzmc reopened this Mar 24, 2020
@captainzmc
Copy link
Member Author

@bshashikant
Yes, I'll turn off the PR first. I will submit another PR and refresh the buffer with a timed task.

@captainzmc captainzmc closed this Mar 31, 2020
@bshashikant bshashikant reopened this Apr 13, 2020
@bshashikant
Copy link
Contributor

I checked the behaviour in HDFS and it seems like approach here makes ozone flush() similar to what HDFS flush is currently doing.

We can also have an implementation as @xiaoyuyao mentioned like a time based flush on lines similar to S3AFlush()?

@xiaoyuyao , what do you think?

@xiaoyuyao
Copy link
Contributor

agree, a time based flush could work better for object store like workload in ozone/s3. We might also implement size based and time based flush as different options to fit different requirements.

Copy link
Contributor

@bshashikant bshashikant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good. Can you please add a unit test too?

@@ -149,6 +149,17 @@
public static final TimeDuration OZONE_CLIENT_RETRY_INTERVAL_DEFAULT =
TimeDuration.valueOf(0, TimeUnit.MILLISECONDS);

/**
* If this value is true, when the client calls the flush() method,
* we will checks whether the data in the buffer is greater than
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's change "we will checks" to "it checks"

@captainzmc captainzmc closed this Apr 18, 2020
@captainzmc captainzmc reopened this Apr 18, 2020
@captainzmc captainzmc closed this Apr 18, 2020
@captainzmc captainzmc reopened this Apr 18, 2020
@bshashikant
Copy link
Contributor

Thanks @captainzmc for updating the patch. The patch looks good. Can you remove the variables related to the metric count in the unit test as these are not used/referred anywhere else?

@captainzmc
Copy link
Member Author

hi @bshashikant Thanks for your review, now I had delete useless metrics in new ut testFlushChunkDelay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants