I<watcher> is the number of event watchers created/destroyed. Since
different event models feature vastly different performances, each event
loop was given a number of watchers so that overall runtime is acceptable
and similar between tested event loop (and keep them from crashing): Glib
would probably take thousands of years if asked to process the same number
of watchers as EV in this benchmark.
I<bytes> is the number of bytes (as measured by the resident set size,
RSS) consumed by each watcher. This method of measuring captures both C
and Perl-based overheads.
I<create> is the time, in microseconds (millionths of seconds), that it
takes to create a single watcher. The callback is a closure shared between
all watchers, to avoid adding memory overhead. That means closure creation
and memory usage is not included in the figures.
I<invoke> is the time, in microseconds, used to invoke a simple
callback. The callback simply counts down a Perl variable and after it was
invoked "watcher" times, it would C<< ->send >> a condvar once to
signal the end of this phase.
I<destroy> is the time, in microseconds, that it takes to destroy a single
watcher.
=head3 Results
name watchers bytes create invoke destroy comment
EV/EV 400000 244 0.56 0.46 0.31 EV native interface
EV/Any 100000 244 2.50 0.46 0.29 EV + AnyEvent watchers
CoroEV/Any 100000 244 2.49 0.44 0.29 coroutines + Coro::Signal
Perl/Any 100000 513 4.92 0.87 1.12 pure perl implementation
Event/Event 16000 516 31.88 31.30 0.85 Event native interface
Event/Any 16000 590 35.75 31.42 1.08 Event + AnyEvent watchers
Glib/Any 16000 1357 98.22 12.41 54.00 quadratic behaviour
Tk/Any 2000 1860 26.97 67.98 14.00 SEGV with >> 2000 watchers
POE/Event 2000 6644 108.64 736.02 14.73 via POE::Loop::Event
POE/Select 2000 6343 94.13 809.12 565.96 via POE::Loop::Select
=head3 Discussion
The benchmark does I<not> measure scalability of the event loop very
well. For example, a select-based event loop (such as the pure perl one)
can never compete with an event loop that uses epoll when the number of
file descriptors grows high. In this benchmark, all events become ready at
the same time, so select/poll-based implementations get an unnatural speed
boost.
Also, note that the number of watchers usually has a nonlinear effect on
overall speed, that is, creating twice as many watchers doesn't take twice
the time - usually it takes longer. This puts event loops tested with a
higher number of watchers at a disadvantage.
To put the range of results into perspective, consider that on the
benchmark machine, handling an event takes roughly 1600 CPU cycles with
EV, 3100 CPU cycles with AnyEvent's pure perl loop and almost 3000000 CPU
cycles with POE.
C<EV> is the sole leader regarding speed and memory use, which are both
maximal/minimal, respectively. Even when going through AnyEvent, it uses
far less memory than any other event loop and is still faster than Event
natively.
The pure perl implementation is hit in a few sweet spots (both the
constant timeout and the use of a single fd hit optimisations in the perl
interpreter and the backend itself). Nevertheless this shows that it
adds very little overhead in itself. Like any select-based backend its
performance becomes really bad with lots of file descriptors (and few of
them active), of course, but this was not subject of this benchmark.
The C<Event> module has a relatively high setup and callback invocation
cost, but overall scores in on the third place.
C<Glib>'s memory usage is quite a bit higher, but it features a
faster callback invocation and overall ends up in the same class as
C<Event>. However, Glib scales extremely badly, doubling the number of
watchers increases the processing time by more than a factor of four,
making it completely unusable when using larger numbers of watchers
(note that only a single file descriptor was used in the benchmark, so
inefficiencies of C<poll> do not account for this).
The C<Tk> adaptor works relatively well. The fact that it crashes with
more than 2000 watchers is a big setback, however, as correctness takes
precedence over speed. Nevertheless, its performance is surprising, as the
file descriptor is dup()ed for each watcher. This shows that the dup()
employed by some adaptors is not a big performance issue (it does incur a
hidden memory cost inside the kernel which is not reflected in the figures
above).
C<POE>, regardless of underlying event loop (whether using its pure perl
select-based backend or the Event module, the POE-EV backend couldn't
be tested because it wasn't working) shows abysmal performance and
memory usage with AnyEvent: Watchers use almost 30 times as much memory
as EV watchers, and 10 times as much memory as Event (the high memory
requirements are caused by requiring a session for each watcher). Watcher
invocation speed is almost 900 times slower than with AnyEvent's pure perl
implementation.
The design of the POE adaptor class in AnyEvent can not really account
for the performance issues, though, as session creation overhead is
=15= |