Bitd daemon refactoring new #1

gaurav36 · 2015-02-25T09:52:10Z

No description provided.

Change-Id: I747e1c4eef4f147977da40eafceb9fec5d869a1c Signed-off-by: Venky Shankar <[email protected]>

This patch introduces rotational buffers aiming at the classic multiple producer and multiple consumer problem. A fixed set of buffer list is allocated during initialization, where each list consist of a list of buffers. Each buffer is an iovec pointing to a memory region of fixed allocation size. Multiple producers write data the these buffers. A buffer list starts with a single buffer (iovec) and allocates more when required (although this can be preallocatd in multiples of k). rot-buffs allow multiple producers to write data parallely with a bit of extra cost of taking locks. Therefore, it's much suited for large writes. Multiple producers are allowed to write in the buffer parallely by "reserving" write space for selected number of bytes and returning pointer to the start of the reserved area. The write size is selected by the producer before it starts the write (which is often known). Therefore, the write itself need not be serialized -- just the space reservation needs to be done safely. The other part is when a consumer kicks in to consume what has been produced. At this point, a buffer list switch is performed. The "current" buffer list pointer is safely pointed to the next available buffer list. New writes are now directed to the just switched buffer list (the old buffer list is now considered out of rotation). Note that the old buffer still may have producers in progress (pending writes), so the consumer has to wait till the writers are drained. Currently this is the slow path for producers (write completion) and needs to be improved. Currently, there is special handling for cases where the number of consumers match (or exceed) the number of producers, which could result in writer starvation. In this scenario, when a consumers requests a buffer list for consumption, a check is performed for writer starvation and consumption is denied until at least another buffer list is ready of the producer for writes, i.e., one (or more) consumer(s) completed, thereby putting the buffer list back in rotation. [ NOTE: I've not performance tested this producer-consumer model yet but using it changelog for event notification. The list of buffers (iovecs) are directly passed to the RPC layer. Before performance tests are done, the slow path needs to be improved and some other safety checks needs to be added such as write sizes exceeding iovec allocation size. ] Change-Id: I88d235522b05ab82509aba861374a2312bff57f2 Signed-off-by: Venky Shankar <[email protected]>

This patch imports timer-wheel[1] algorithm from the linux kernel (~/kernel/time/timer.c) with some modifications. Timer-wheel is an efficent way to track millions of timers for expiry. This is a variant of the simple but RAM heavy approach of having a list (timer bucket) for every future second. Timer-wheel categorizes every future second into a logarithmic array of arrays. This is done by splitting the 32 bit "timeout" value into fixed "sliced" bits, thereby each category has a fixed size array to which buckets are assigned. A classic split would be 8+6+6+6 (used in this patch) which results in 256+64+64+64 == 512 buckets. Therefore, the entire 32 bit futuristic timeouts have been mapped into 512 buckets. [ NOTE: There are other possible splits, such as "8+8+8+8", but this patch sticks to the widely used and tested default. ] Therfore, the first category "holds" timers whose expiry range is between 1..256, the next cateogry holds 257..16384, third category 16385..1048576 and so on. When timers are added, unless it's in the first category, timers with different timeouts could end up in the same bucket. This means that the timers are "partially sorted" -- sorted in their highest bits. The expiry code walks the first array of buckets and exprires any pending timers (1..256). Next, at time value 257, timers in the first bucket of the second array is "cascaded" onto the first category and timers are placed into respective buckets according to the thier timeout values. Cascading "brings down" the timers timeout to the coorect bucket of their respective category. Therefore, timers are sorted by their highest bits of the timeout value and then by the lower bits too. [1] https://lwn.net/Articles/152436/ Change-Id: I1219abf69290961ae9a3d483e11c107c5f49c4e3 Signed-off-by: Venky Shankar <[email protected]>

This patch introduces RPC based communication between the changelog translator and libgfchangelog. It replaces the old pathetic stream based interaction that existed earlier (due to time constraints :-/). Changelog, upon initialization starts a RPC server (rpcsvc) allowing clients to invoke a probe API as a bootup mechanism to request for event notifications. During probe, clients can choose an event filter specifying the type(s) of events they are interested in. As of now there is no way to change the event notification set once the probe RPC call is made, but that is easier to implement. The actual event notifications is done on a separate RPC session. The client (libgfchangelog) itself starts and RPC server which the changelog translator "connects back" during probe. Notifications are dispatched by a bunch of threads from the server (translator) and the client optionally orders them if ordered notifications are requried. FOPs fill in their respective event details in a buffer (rot-buffs to be particular) and a bunch of threads (consumers) swap the buffers out of roatation and dispatch them via RPC. To avoid writer starvation, then number of dispatcher threads is one less than the number of buffer list in rot-buffs.x libgfchangelog becomes purely callback based -- upon event notification from the server (and re-ordering them if required) invoke a callback routine specified by the consumer. A major part of the patch is also aimed at providing backward compatibility for geo-replication, which was one of the main consumer of the stream based API. Also, this patch does not\ "turn on" event notifications for all fops, just a bunch which is currently in requirement. Another pain point is that the server does not filter events before dispatching it to the clients. That load is taken up by the client itself (although it's done at the library layer rather than making it hard on the callback implementor). This needs improvement and care needs to be taken to not load the server up with expensive filtering mechanisms. Change-Id: Ibf60a432b68f2dfa60c6f9add2bcfd37a9c41395 Signed-off-by: Venky Shankar <[email protected]>

Bitrot stub implements object versioning required for identifying signature freshness. More details about versioning is explained as a part of the "bitrot feature doc" patch. Change-Id: I2ad70d9eb109ba4a12148ab8d81336afda529ad9 Signed-off-by: Venky Shankar <[email protected]>

* Implement the skeleton of bit-rot xlator Change-Id: If33218bdc694f5f09cb7b8097c4fdb74d7a23b2d Original-Author: Raghavendra Bhat <[email protected]> Signed-off-by: Venky Shankar <[email protected]> Signed-off-by: Gaurav Kumar Garg <[email protected]> Signed-off-by: Anand nekkunti <[email protected]>

Coverity CID 1288822 (#1 of 2) strncpy executed with a limit equal to the target array size potentially leaves the target string not null terminated. In this case the strncpy is not needed due to the snprintf with the same target buffer which follows immediately. This patch also removes the now unneeded scratch_dir argument to gf_changelog_init_history(), which is semantically correct, since scratch_dir has previously been filled into jnl->jnl_working_dir by the caller, and this is now used to fill hist_scratch_dir. Change-Id: Ib1ed3a1058e80e34191758921b49c29030d6c9db BUG: 789278 Signed-off-by: Michael Adam <[email protected]> Reviewed-on: http://review.gluster.org/10058 Reviewed-by: Kotresh HR <[email protected]> Tested-by: Gluster Build System <[email protected]> Reviewed-by: Vijay Bellur <[email protected]>

vshankar added 6 commits February 25, 2015 14:35

rfc.sh: relax checkpatch.pl on RFC patches

618e512

Change-Id: I747e1c4eef4f147977da40eafceb9fec5d869a1c Signed-off-by: Venky Shankar <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bitd daemon refactoring new #1

Bitd daemon refactoring new #1

Uh oh!

gaurav36 commented Feb 25, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Bitd daemon refactoring new #1

Are you sure you want to change the base?

Bitd daemon refactoring new #1

Uh oh!

Conversation

gaurav36 commented Feb 25, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants