Skip to content

Commit cb58477

Browse files
hx235facebook-github-bot
authored andcommitted
New stat rocksdb.{cf|db}-write-stall-stats exposed in a structural way (facebook#11300)
Summary: **Context/Summary:** Users are interested in figuring out what has caused write stall. - Refactor write stall related stats from property `kCFStats` into its own db property `rocksdb.cf-write-stall-stats` as a map or string. For now, this only contains count of different combination of (CF-scope `WriteStallCause`) + (`WriteStallCondition`) - Add new `WriteStallCause::kWriteBufferManagerLimit` to reflect write stall caused by write buffer manager - Add new `rocksdb.db-write-stall-stats`. For now, this only contains `WriteStallCause::kWriteBufferManagerLimit` + `WriteStallCondition::kStopped` - Expose functions in new class `WriteStallStatsMapKeys` for examining the above two properties returned as map - Misc: rename/comment some write stall InternalStats for clarity Pull Request resolved: facebook#11300 Test Plan: - New UT - Stress test `python3 tools/db_crashtest.py blackbox --simple --get_property_one_in=1` - Perf test: Both converge very slowly at similar rates but post-change has higher average ops/sec than pre-change even though they are run at the same time. ``` ./db_bench -seed=1679014417652004 -db=/dev/shm/testdb/ -statistics=false -benchmarks="fillseq[-X60]" -key_size=32 -value_size=512 -num=100000 -db_write_buffer_size=655 -target_file_size_base=655 -disable_auto_compactions=false -compression_type=none -bloom_bits=3 ``` pre-change: ``` fillseq [AVG 15 runs] : 1176 (± 732) ops/sec; 0.6 (± 0.4) MB/sec fillseq : 1052.671 micros/op 949 ops/sec 105.267 seconds 100000 operations; 0.5 MB/s fillseq [AVG 16 runs] : 1162 (± 685) ops/sec; 0.6 (± 0.4) MB/sec fillseq : 1387.330 micros/op 720 ops/sec 138.733 seconds 100000 operations; 0.4 MB/s fillseq [AVG 17 runs] : 1136 (± 646) ops/sec; 0.6 (± 0.3) MB/sec fillseq : 1232.011 micros/op 811 ops/sec 123.201 seconds 100000 operations; 0.4 MB/s fillseq [AVG 18 runs] : 1118 (± 610) ops/sec; 0.6 (± 0.3) MB/sec fillseq : 1282.567 micros/op 779 ops/sec 128.257 seconds 100000 operations; 0.4 MB/s fillseq [AVG 19 runs] : 1100 (± 578) ops/sec; 0.6 (± 0.3) MB/sec fillseq : 1914.336 micros/op 522 ops/sec 191.434 seconds 100000 operations; 0.3 MB/s fillseq [AVG 20 runs] : 1071 (± 551) ops/sec; 0.6 (± 0.3) MB/sec fillseq : 1227.510 micros/op 814 ops/sec 122.751 seconds 100000 operations; 0.4 MB/s fillseq [AVG 21 runs] : 1059 (± 525) ops/sec; 0.5 (± 0.3) MB/sec ``` post-change: ``` fillseq [AVG 15 runs] : 1226 (± 732) ops/sec; 0.6 (± 0.4) MB/sec fillseq : 1323.825 micros/op 755 ops/sec 132.383 seconds 100000 operations; 0.4 MB/s fillseq [AVG 16 runs] : 1196 (± 687) ops/sec; 0.6 (± 0.4) MB/sec fillseq : 1223.905 micros/op 817 ops/sec 122.391 seconds 100000 operations; 0.4 MB/s fillseq [AVG 17 runs] : 1174 (± 647) ops/sec; 0.6 (± 0.3) MB/sec fillseq : 1168.996 micros/op 855 ops/sec 116.900 seconds 100000 operations; 0.4 MB/s fillseq [AVG 18 runs] : 1156 (± 611) ops/sec; 0.6 (± 0.3) MB/sec fillseq : 1348.729 micros/op 741 ops/sec 134.873 seconds 100000 operations; 0.4 MB/s fillseq [AVG 19 runs] : 1134 (± 579) ops/sec; 0.6 (± 0.3) MB/sec fillseq : 1196.887 micros/op 835 ops/sec 119.689 seconds 100000 operations; 0.4 MB/s fillseq [AVG 20 runs] : 1119 (± 550) ops/sec; 0.6 (± 0.3) MB/sec fillseq : 1193.697 micros/op 837 ops/sec 119.370 seconds 100000 operations; 0.4 MB/s fillseq [AVG 21 runs] : 1106 (± 524) ops/sec; 0.6 (± 0.3) MB/sec ``` Reviewed By: ajkr Differential Revision: D44159541 Pulled By: hx235 fbshipit-source-id: 8d29efb70001fdc52d34535eeb3364fc3e71e40b
1 parent 204fcff commit cb58477

17 files changed

+667
-97
lines changed

CMakeLists.txt

+1
Original file line numberDiff line numberDiff line change
@@ -748,6 +748,7 @@ set(SOURCES
748748
db/write_batch.cc
749749
db/write_batch_base.cc
750750
db/write_controller.cc
751+
db/write_stall_stats.cc
751752
db/write_thread.cc
752753
env/composite_env.cc
753754
env/env.cc

HISTORY.md

+1
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
* Added a new PerfContext counter `internal_merge_point_lookup_count` which tracks the number of Merge operands applied while serving point lookup queries.
1616
* Add new statistics rocksdb.table.open.prefetch.tail.read.bytes, rocksdb.table.open.prefetch.tail.{miss|hit}
1717
* Add support for SecondaryCache with HyperClockCache (`HyperClockCacheOptions` inherits `secondary_cache` option from `ShardedCacheOptions`)
18+
* Add new db properties `rocksdb.cf-write-stall-stats`, `rocksdb.db-write-stall-stats`and APIs to examine them in a structured way. In particular, users of `GetMapProperty()` with property `kCFWriteStallStats`/`kDBWriteStallStats` can now use the functions in `WriteStallStatsMapKeys` to find stats in the map.
1819

1920
### Public API Changes
2021
* Changed various functions and features in `Cache` that are mostly relevant to custom implementations or wrappers. Especially, asychronous lookup functionality is moved from `Lookup()` to a new `StartAsyncLookup()` function.

TARGETS

+2
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,7 @@ cpp_library_wrapper(name="rocksdb_lib", srcs=[
106106
"db/write_batch.cc",
107107
"db/write_batch_base.cc",
108108
"db/write_controller.cc",
109+
"db/write_stall_stats.cc",
109110
"db/write_thread.cc",
110111
"env/composite_env.cc",
111112
"env/env.cc",
@@ -451,6 +452,7 @@ cpp_library_wrapper(name="rocksdb_whole_archive_lib", srcs=[
451452
"db/write_batch.cc",
452453
"db/write_batch_base.cc",
453454
"db/write_controller.cc",
455+
"db/write_stall_stats.cc",
454456
"db/write_thread.cc",
455457
"env/composite_env.cc",
456458
"env/env.cc",

db/column_family.cc

+8-7
Original file line numberDiff line numberDiff line change
@@ -869,7 +869,7 @@ int GetL0ThresholdSpeedupCompaction(int level0_file_num_compaction_trigger,
869869
}
870870
} // anonymous namespace
871871

872-
std::pair<WriteStallCondition, ColumnFamilyData::WriteStallCause>
872+
std::pair<WriteStallCondition, WriteStallCause>
873873
ColumnFamilyData::GetWriteStallConditionAndCause(
874874
int num_unflushed_memtables, int num_l0_files,
875875
uint64_t num_compaction_needed_bytes,
@@ -942,7 +942,8 @@ WriteStallCondition ColumnFamilyData::RecalculateWriteStallConditions(
942942
internal_stats_->AddCFStats(InternalStats::L0_FILE_COUNT_LIMIT_STOPS, 1);
943943
if (compaction_picker_->IsLevel0CompactionInProgress()) {
944944
internal_stats_->AddCFStats(
945-
InternalStats::LOCKED_L0_FILE_COUNT_LIMIT_STOPS, 1);
945+
InternalStats::L0_FILE_COUNT_LIMIT_STOPS_WITH_ONGOING_COMPACTION,
946+
1);
946947
}
947948
ROCKS_LOG_WARN(ioptions_.logger,
948949
"[%s] Stopping writes because we have %d level-0 files",
@@ -963,7 +964,7 @@ WriteStallCondition ColumnFamilyData::RecalculateWriteStallConditions(
963964
SetupDelay(write_controller, compaction_needed_bytes,
964965
prev_compaction_needed_bytes_, was_stopped,
965966
mutable_cf_options.disable_auto_compactions);
966-
internal_stats_->AddCFStats(InternalStats::MEMTABLE_LIMIT_SLOWDOWNS, 1);
967+
internal_stats_->AddCFStats(InternalStats::MEMTABLE_LIMIT_DELAYS, 1);
967968
ROCKS_LOG_WARN(
968969
ioptions_.logger,
969970
"[%s] Stalling writes because we have %d immutable memtables "
@@ -981,11 +982,11 @@ WriteStallCondition ColumnFamilyData::RecalculateWriteStallConditions(
981982
SetupDelay(write_controller, compaction_needed_bytes,
982983
prev_compaction_needed_bytes_, was_stopped || near_stop,
983984
mutable_cf_options.disable_auto_compactions);
984-
internal_stats_->AddCFStats(InternalStats::L0_FILE_COUNT_LIMIT_SLOWDOWNS,
985-
1);
985+
internal_stats_->AddCFStats(InternalStats::L0_FILE_COUNT_LIMIT_DELAYS, 1);
986986
if (compaction_picker_->IsLevel0CompactionInProgress()) {
987987
internal_stats_->AddCFStats(
988-
InternalStats::LOCKED_L0_FILE_COUNT_LIMIT_SLOWDOWNS, 1);
988+
InternalStats::L0_FILE_COUNT_LIMIT_DELAYS_WITH_ONGOING_COMPACTION,
989+
1);
989990
}
990991
ROCKS_LOG_WARN(ioptions_.logger,
991992
"[%s] Stalling writes because we have %d level-0 files "
@@ -1011,7 +1012,7 @@ WriteStallCondition ColumnFamilyData::RecalculateWriteStallConditions(
10111012
prev_compaction_needed_bytes_, was_stopped || near_stop,
10121013
mutable_cf_options.disable_auto_compactions);
10131014
internal_stats_->AddCFStats(
1014-
InternalStats::PENDING_COMPACTION_BYTES_LIMIT_SLOWDOWNS, 1);
1015+
InternalStats::PENDING_COMPACTION_BYTES_LIMIT_DELAYS, 1);
10151016
ROCKS_LOG_WARN(
10161017
ioptions_.logger,
10171018
"[%s] Stalling writes because of estimated pending compaction "

db/column_family.h

-6
Original file line numberDiff line numberDiff line change
@@ -462,12 +462,6 @@ class ColumnFamilyData {
462462
bool queued_for_flush() { return queued_for_flush_; }
463463
bool queued_for_compaction() { return queued_for_compaction_; }
464464

465-
enum class WriteStallCause {
466-
kNone,
467-
kMemtableLimit,
468-
kL0FileCountLimit,
469-
kPendingCompactionBytes,
470-
};
471465
static std::pair<WriteStallCondition, WriteStallCause>
472466
GetWriteStallConditionAndCause(
473467
int num_unflushed_memtables, int num_l0_files,

db/db_impl/db_impl_write.cc

+3
Original file line numberDiff line numberDiff line change
@@ -1211,6 +1211,9 @@ Status DBImpl::PreprocessWrite(const WriteOptions& write_options,
12111211
// exceeded at this point so no new write (including current one) will go
12121212
// through until memory usage is decreased.
12131213
if (UNLIKELY(status.ok() && write_buffer_manager_->ShouldStall())) {
1214+
default_cf_internal_stats_->AddDBStats(
1215+
InternalStats::kIntStatsWriteBufferManagerLimitStopsCounts, 1,
1216+
true /* concurrent */);
12141217
if (write_options.no_slowdown) {
12151218
status = Status::Incomplete("Write stall");
12161219
} else {

db/db_properties_test.cc

+175
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
#include <string>
1414

1515
#include "db/db_test_util.h"
16+
#include "db/write_stall_stats.h"
1617
#include "options/cf_options.h"
1718
#include "port/stack_trace.h"
1819
#include "rocksdb/listener.h"
@@ -2106,6 +2107,180 @@ TEST_F(DBPropertiesTest, GetMapPropertyBlockCacheEntryStats) {
21062107
ASSERT_EQ(3 * kNumCacheEntryRoles + 4, values.size());
21072108
}
21082109

2110+
TEST_F(DBPropertiesTest, WriteStallStatsSanityCheck) {
2111+
for (uint32_t i = 0; i < static_cast<uint32_t>(WriteStallCause::kNone); ++i) {
2112+
std::string str = kWriteStallCauseToHyphenString[i];
2113+
ASSERT_TRUE(!str.empty())
2114+
<< "Please ensure mapping from `WriteStallCause` to "
2115+
"`kWriteStallCauseToHyphenString` is complete";
2116+
WriteStallCause cause = static_cast<WriteStallCause>(i);
2117+
if (cause == WriteStallCause::kCFScopeWriteStallCauseEnumMax ||
2118+
cause == WriteStallCause::kDBScopeWriteStallCauseEnumMax) {
2119+
ASSERT_EQ(str, kInvalidWriteStallCauseHyphenString)
2120+
<< "Please ensure order in `kWriteStallCauseToHyphenString` is "
2121+
"consistent with `WriteStallCause`";
2122+
}
2123+
}
2124+
2125+
for (uint32_t i = 0; i < static_cast<uint32_t>(WriteStallCondition::kNormal);
2126+
++i) {
2127+
std::string str = kWriteStallConditionToHyphenString[i];
2128+
ASSERT_TRUE(!str.empty())
2129+
<< "Please ensure mapping from `WriteStallCondition` to "
2130+
"`kWriteStallConditionToHyphenString` is complete";
2131+
}
2132+
2133+
for (uint32_t i = 0; i < static_cast<uint32_t>(WriteStallCause::kNone); ++i) {
2134+
for (uint32_t j = 0;
2135+
j < static_cast<uint32_t>(WriteStallCondition::kNormal); ++j) {
2136+
WriteStallCause cause = static_cast<WriteStallCause>(i);
2137+
WriteStallCondition condition = static_cast<WriteStallCondition>(j);
2138+
2139+
if (isCFScopeWriteStallCause(cause)) {
2140+
ASSERT_TRUE(InternalCFStat(cause, condition) !=
2141+
InternalStats::INTERNAL_CF_STATS_ENUM_MAX)
2142+
<< "Please ensure the combination of WriteStallCause(" +
2143+
std::to_string(static_cast<uint32_t>(cause)) +
2144+
") + WriteStallCondition(" +
2145+
std::to_string(static_cast<uint32_t>(condition)) +
2146+
") is correctly mapped to a valid `InternalStats` or bypass "
2147+
"its check in this test";
2148+
} else if (isDBScopeWriteStallCause(cause)) {
2149+
InternalStats::InternalDBStatsType internal_db_stat =
2150+
InternalDBStat(cause, condition);
2151+
if (internal_db_stat == InternalStats::kIntStatsNumMax) {
2152+
ASSERT_TRUE(cause == WriteStallCause::kWriteBufferManagerLimit &&
2153+
condition == WriteStallCondition::kDelayed)
2154+
<< "Please ensure the combination of WriteStallCause(" +
2155+
std::to_string(static_cast<uint32_t>(cause)) +
2156+
") + WriteStallCondition(" +
2157+
std::to_string(static_cast<uint32_t>(condition)) +
2158+
") is correctly mapped to a valid `InternalStats` or "
2159+
"bypass its check in this test";
2160+
}
2161+
} else if (cause != WriteStallCause::kCFScopeWriteStallCauseEnumMax &&
2162+
cause != WriteStallCause::kDBScopeWriteStallCauseEnumMax) {
2163+
ASSERT_TRUE(false) << "Please ensure the WriteStallCause(" +
2164+
std::to_string(static_cast<uint32_t>(cause)) +
2165+
") is either CF-scope or DB-scope write "
2166+
"stall cause in enum `WriteStallCause`";
2167+
}
2168+
}
2169+
}
2170+
}
2171+
TEST_F(DBPropertiesTest, GetMapPropertyWriteStallStats) {
2172+
Options options = CurrentOptions();
2173+
CreateAndReopenWithCF({"heavy_write_cf"}, options);
2174+
2175+
for (auto test_cause : {WriteStallCause::kWriteBufferManagerLimit,
2176+
WriteStallCause::kMemtableLimit}) {
2177+
if (test_cause == WriteStallCause::kWriteBufferManagerLimit) {
2178+
options.write_buffer_manager.reset(
2179+
new WriteBufferManager(100000, nullptr, true));
2180+
} else if (test_cause == WriteStallCause::kMemtableLimit) {
2181+
options.max_write_buffer_number = 2;
2182+
options.disable_auto_compactions = true;
2183+
}
2184+
ReopenWithColumnFamilies({"default", "heavy_write_cf"}, options);
2185+
2186+
// Assert initial write stall stats are all 0
2187+
std::map<std::string, std::string> db_values;
2188+
ASSERT_TRUE(dbfull()->GetMapProperty(DB::Properties::kDBWriteStallStats,
2189+
&db_values));
2190+
ASSERT_EQ(std::stoi(db_values[WriteStallStatsMapKeys::CauseConditionCount(
2191+
WriteStallCause::kWriteBufferManagerLimit,
2192+
WriteStallCondition::kStopped)]),
2193+
0);
2194+
2195+
for (int cf = 0; cf <= 1; ++cf) {
2196+
std::map<std::string, std::string> cf_values;
2197+
ASSERT_TRUE(dbfull()->GetMapProperty(
2198+
handles_[cf], DB::Properties::kCFWriteStallStats, &cf_values));
2199+
ASSERT_EQ(std::stoi(cf_values[WriteStallStatsMapKeys::TotalStops()]), 0);
2200+
ASSERT_EQ(std::stoi(cf_values[WriteStallStatsMapKeys::TotalDelays()]), 0);
2201+
}
2202+
2203+
// Pause flush thread to help coerce write stall
2204+
std::unique_ptr<test::SleepingBackgroundTask> sleeping_task(
2205+
new test::SleepingBackgroundTask());
2206+
env_->SetBackgroundThreads(1, Env::HIGH);
2207+
env_->Schedule(&test::SleepingBackgroundTask::DoSleepTask,
2208+
sleeping_task.get(), Env::Priority::HIGH);
2209+
sleeping_task->WaitUntilSleeping();
2210+
2211+
// Coerce write stall
2212+
if (test_cause == WriteStallCause::kWriteBufferManagerLimit) {
2213+
ASSERT_OK(dbfull()->Put(
2214+
WriteOptions(), handles_[1], Key(1),
2215+
DummyString(options.write_buffer_manager->buffer_size())));
2216+
2217+
WriteOptions wo;
2218+
wo.no_slowdown = true;
2219+
Status s = dbfull()->Put(
2220+
wo, handles_[1], Key(2),
2221+
DummyString(options.write_buffer_manager->buffer_size()));
2222+
ASSERT_TRUE(s.IsIncomplete());
2223+
ASSERT_TRUE(s.ToString().find("Write stall") != std::string::npos);
2224+
} else if (test_cause == WriteStallCause::kMemtableLimit) {
2225+
FlushOptions fo;
2226+
fo.allow_write_stall = true;
2227+
fo.wait = false;
2228+
2229+
ASSERT_OK(
2230+
dbfull()->Put(WriteOptions(), handles_[1], Key(1), DummyString(1)));
2231+
ASSERT_OK(dbfull()->Flush(fo, handles_[1]));
2232+
2233+
ASSERT_OK(
2234+
dbfull()->Put(WriteOptions(), handles_[1], Key(2), DummyString(1)));
2235+
ASSERT_OK(dbfull()->Flush(fo, handles_[1]));
2236+
}
2237+
2238+
if (test_cause == WriteStallCause::kWriteBufferManagerLimit) {
2239+
db_values.clear();
2240+
EXPECT_TRUE(dbfull()->GetMapProperty(DB::Properties::kDBWriteStallStats,
2241+
&db_values));
2242+
EXPECT_EQ(std::stoi(db_values[WriteStallStatsMapKeys::CauseConditionCount(
2243+
WriteStallCause::kWriteBufferManagerLimit,
2244+
WriteStallCondition::kStopped)]),
2245+
1);
2246+
// `WriteStallCause::kWriteBufferManagerLimit` should not result in any
2247+
// CF-scope write stall stats changes
2248+
for (int cf = 0; cf <= 1; ++cf) {
2249+
std::map<std::string, std::string> cf_values;
2250+
EXPECT_TRUE(dbfull()->GetMapProperty(
2251+
handles_[cf], DB::Properties::kCFWriteStallStats, &cf_values));
2252+
EXPECT_EQ(std::stoi(cf_values[WriteStallStatsMapKeys::TotalStops()]),
2253+
0);
2254+
EXPECT_EQ(std::stoi(cf_values[WriteStallStatsMapKeys::TotalDelays()]),
2255+
0);
2256+
}
2257+
} else if (test_cause == WriteStallCause::kMemtableLimit) {
2258+
for (int cf = 0; cf <= 1; ++cf) {
2259+
std::map<std::string, std::string> cf_values;
2260+
EXPECT_TRUE(dbfull()->GetMapProperty(
2261+
handles_[cf], DB::Properties::kCFWriteStallStats, &cf_values));
2262+
EXPECT_EQ(std::stoi(cf_values[WriteStallStatsMapKeys::TotalStops()]),
2263+
cf == 1 ? 1 : 0);
2264+
EXPECT_EQ(
2265+
std::stoi(cf_values[WriteStallStatsMapKeys::CauseConditionCount(
2266+
WriteStallCause::kMemtableLimit,
2267+
WriteStallCondition::kStopped)]),
2268+
cf == 1 ? 1 : 0);
2269+
EXPECT_EQ(std::stoi(cf_values[WriteStallStatsMapKeys::TotalDelays()]),
2270+
0);
2271+
EXPECT_EQ(
2272+
std::stoi(cf_values[WriteStallStatsMapKeys::CauseConditionCount(
2273+
WriteStallCause::kMemtableLimit,
2274+
WriteStallCondition::kDelayed)]),
2275+
0);
2276+
}
2277+
}
2278+
2279+
sleeping_task->WakeUp();
2280+
sleeping_task->WaitUntilDone();
2281+
}
2282+
}
2283+
21092284
namespace {
21102285
std::string PopMetaIndexKey(InternalIterator* meta_iter) {
21112286
Status s = meta_iter->status();

0 commit comments

Comments
 (0)