Skip to content

Conversation

@gaurav36
Copy link
Owner

No description provided.

Change-Id: I747e1c4eef4f147977da40eafceb9fec5d869a1c
Signed-off-by: Venky Shankar <[email protected]>
This patch introduces rotational buffers aiming at the classic
multiple producer and multiple consumer problem. A fixed set
of buffer list is allocated during initialization, where each
list consist of a list of buffers. Each buffer is an iovec
pointing to a memory region of fixed allocation size. Multiple
producers write data the these buffers. A buffer list starts with
a single buffer (iovec) and allocates more when required (although
this can be preallocatd in multiples of k).

rot-buffs allow multiple producers to write data parallely with
a bit of extra cost of taking locks. Therefore, it's much suited
for large writes. Multiple producers are allowed to write in the
buffer parallely by "reserving" write space for selected number
of bytes and returning pointer to the start of the reserved area.
The write size is selected by the producer before it starts the
write (which is often known). Therefore, the write itself need not
be serialized -- just the space reservation needs to be done safely.

The other part is when a consumer kicks in to consume what has
been produced. At this point, a buffer list switch is performed.
The "current" buffer list pointer is safely pointed to the next
available buffer list. New writes are now directed to the just
switched buffer list (the old buffer list is now considered out
of rotation). Note that the old buffer still may have producers
in progress (pending writes), so the consumer has to wait till
the writers are drained. Currently this is the slow path for
producers (write completion) and needs to be improved.

Currently, there is special handling for cases where the number
of consumers match (or exceed) the number of producers, which
could result in writer starvation. In this scenario, when a
consumers requests a buffer list for consumption, a check is
performed for writer starvation and consumption is denied
until at least another buffer list is ready of the producer
for writes, i.e., one (or more) consumer(s) completed, thereby
putting the buffer list back in rotation.

[
   NOTE:
   I've not performance tested this producer-consumer model
   yet but using it changelog for event notification. The
   list of buffers (iovecs) are directly passed to the RPC
   layer.

   Before performance tests are done, the slow path needs
   to be improved and some other safety checks needs to be
   added such as write sizes exceeding iovec allocation size.
]

Change-Id: I88d235522b05ab82509aba861374a2312bff57f2
Signed-off-by: Venky Shankar <[email protected]>
This patch imports timer-wheel[1] algorithm from the linux
kernel (~/kernel/time/timer.c) with some modifications.

Timer-wheel is an efficent way to track millions of timers for
expiry. This is a variant of the simple but RAM heavy approach
of having a list (timer bucket) for every future second.
Timer-wheel categorizes every future second into a logarithmic
array of arrays. This is done by splitting the 32 bit "timeout"
value into fixed "sliced" bits, thereby each category has a
fixed size array to which buckets are assigned.

A classic split would be 8+6+6+6 (used in this patch) which
results in 256+64+64+64 == 512 buckets. Therefore, the entire
32 bit futuristic timeouts have been mapped into 512 buckets.

[
   NOTE:
     There are other possible splits, such as "8+8+8+8", but
     this patch sticks to the widely used and tested default.
]

Therfore, the first category "holds" timers whose expiry range
is between 1..256, the next cateogry holds 257..16384, third
category 16385..1048576 and so on. When timers are added,
unless it's in the first category, timers with different
timeouts could end up in the same bucket. This means that the
timers are "partially sorted" -- sorted in their highest bits.

The expiry code walks the first array of buckets and exprires
any pending timers (1..256). Next, at time value 257, timers
in the first bucket of the second array is "cascaded" onto
the first category and timers are placed into respective
buckets according to the thier timeout values. Cascading
"brings down" the timers timeout to the coorect bucket
of their respective category. Therefore, timers are sorted
by their highest bits of the timeout value and then by the
lower bits too.

[1] https://lwn.net/Articles/152436/

Change-Id: I1219abf69290961ae9a3d483e11c107c5f49c4e3
Signed-off-by: Venky Shankar <[email protected]>
This patch introduces RPC based communication between the changelog
translator and libgfchangelog. It replaces the old pathetic stream
based interaction that existed earlier (due to time constraints :-/).

Changelog, upon initialization starts a RPC server (rpcsvc) allowing
clients to invoke a probe API as a bootup mechanism to request for
event notifications. During probe, clients can choose an event
filter specifying the type(s) of events they are interested in. As
of now there is no way to change the event notification set once
the probe RPC call is made, but that is easier to implement.

The actual event notifications is done on a separate RPC session.
The client (libgfchangelog) itself starts and RPC server which the
changelog translator "connects back" during probe. Notifications
are dispatched by a bunch of threads from the server (translator)
and the client optionally orders them if ordered notifications
are requried. FOPs fill in their respective event details in a
buffer (rot-buffs to be particular) and a bunch of threads
(consumers) swap the buffers out of roatation and dispatch them
via RPC. To avoid writer starvation, then number of dispatcher
threads is one less than the number of buffer list in rot-buffs.x

libgfchangelog becomes purely callback based -- upon event
notification from the server (and re-ordering them if required)
invoke a callback routine specified by the consumer.

A major part of the patch is also aimed at providing backward
compatibility for geo-replication, which was one of the main
consumer of the stream based API. Also, this patch does not\
"turn on" event notifications for all fops, just a bunch which
is currently in requirement. Another pain point is that the
server does not filter events before dispatching it to the
clients. That load is taken up by the client itself (although
it's done at the library layer rather than making it hard on
the callback implementor). This needs improvement and care
needs to be taken to not load the server up with expensive
filtering mechanisms.

Change-Id: Ibf60a432b68f2dfa60c6f9add2bcfd37a9c41395
Signed-off-by: Venky Shankar <[email protected]>
Bitrot stub implements object versioning required for identifying
signature freshness. More details about versioning is explained
as a part of the "bitrot feature doc" patch.

Change-Id: I2ad70d9eb109ba4a12148ab8d81336afda529ad9
Signed-off-by: Venky Shankar <[email protected]>
* Implement the skeleton of bit-rot xlator

Change-Id: If33218bdc694f5f09cb7b8097c4fdb74d7a23b2d

Original-Author: Raghavendra Bhat  <[email protected]>
Signed-off-by:   Venky Shankar     <[email protected]>
Signed-off-by:   Gaurav Kumar Garg <[email protected]>
Signed-off-by:   Anand nekkunti    <[email protected]>
gaurav36 pushed a commit that referenced this pull request Apr 13, 2015
Coverity CID 1288822 (#1 of 2)

strncpy executed with a limit equal to the target array
size potentially leaves the target string not null terminated.

In this case the strncpy is not needed due to the snprintf
with the same target buffer which follows immediately.

This patch also removes the now unneeded scratch_dir
argument to gf_changelog_init_history(), which is semantically
correct, since scratch_dir has previously been filled into
jnl->jnl_working_dir by the caller, and this is now used to
fill hist_scratch_dir.

Change-Id: Ib1ed3a1058e80e34191758921b49c29030d6c9db
BUG: 789278
Signed-off-by: Michael Adam <[email protected]>
Reviewed-on: http://review.gluster.org/10058
Reviewed-by: Kotresh HR <[email protected]>
Tested-by: Gluster Build System <[email protected]>
Reviewed-by: Vijay Bellur <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants