In a recent post on his blog Matthew Horsfall explained a few of the optimizations to Perl that he contributed recently. On this site, we reported on thefantastic work that Dave Mitchell is doing. The two of them inspired me to try my hand at this optimization game at a recent hackathon at work by improving the Perl optimizer.
But first, let's discuss a bit about how the Perl compiler works and its work products.
The output of the Perl compiler is two-fold. The first is
an in-memory data structure, that resembles a traditional compiler'sAbstract Syntax Tree. This is called
the OP tree since it consists of struct OP
s that generally represent someOP
eration to be performed. The second is a linked list that connects (some of) these OPs
in the order they are to be executed[1].
So for the code my $a = $b + 3
, we have this tree
sassign-+-add-+-padsv[$b] | `-const[3] `-padsv[$a]
where sassign
indicates scalar assignment, and padsv
is a lexical variable access.
It helps to think of perl as a stack machine, so the linked list for execution would be
this sequence (with annotations on what they actually do):
padsv[$b] (push $b on stack) const[3] (push 3 on stack) add (pop 3 and $b off of stack, push their sum on stack) padsv[$a] (push $a on stack) sassign (pop $a and the sum off of stack, set $a)
Things get just a tiny bit more complicated if you consider operations
that take a variable number of items as parameters such as taking a slice
of an array. Consider @x[1,2]
aslice-+-pushmark |-list-+-pushmark | |-const[1] | `-const[2] `-padav[@x]
The array slice needs two pieces of information: a list of indexes to slice
from the array, and the array itself. In Perl, the way to pass an
arbitrary number of items to an operation is to remember the stack
position (which is what pushmark
does[2]) and then simply push the
items onto the stack. The recipient simply looks at the most recently
remembered stack position and pops its parameters off the stack until it
reaches this position. aslice
is such an operation that receives
a variable number of parameters, so the first thing to do is to
execute a pushmark
. The execution order of operations here is:
pushmark (outer) pushmark (inner) const[1] const[2] list padav aslice
This is where it gets a bit weird. One would
expect to simply execute the two const
s and the padav
now. That
would correspond to this tree
aslice-+-pushmark |-const[1] |-const[2] `-padav[@x]
and this execution order.
pushmark const[1] const[2] padav aslice
The list
operation has a very simple purpose. It looks up the
most recently remembered stack position. Then it checks which context
it was invoked in (context in the normal Perl sense of context).
If it was invoked in list context, then it will simply do nothing and
leave the items on the stack. If it was invoked in scalar context,
it will only leave the last, topmost item on the stack.
Let that sink in for a moment.
Indeed, in list context, list
simply undoes the effect of one precedingpushmark
. The two operations of the inner pushmark
and the list
cancel each other out entirely. This is one of the places where the OP
tree's dual nature as an AST and as a data structure intended for the
actual execution of the program shows.
And here is where my optimization comes into play:
In Perl 5.20, the optimizer is just a tiny bit smarter now because it can
detect list
/ pushmark
pairs that will be a no-op. It modifies the
execution chain to not execute them while leaving the AST intact
for decompilers and similar tooling. Because the internet has informed me
that optimizations without benchmarks to "prove" their effectiveness
are meaningless, let's indulge in some micro-benchmarking while investing
some effort into making the result somewhat meaningful. What we'll
benchmark is simply array slicing @y = @x[3,7]
.
use5.14.2;usewarnings;useDumbbench;useDumbbench::CPUFrequencyPinner;my$opt='/path/to/optimized/perl';my$unopt='/path/to/unoptimized/perl';my$code=<<'HERE';my @x = (1..8);my @y;@y = @x[3,7] for 1..1e6;HEREmy$db=Dumbbench->new(target_rel_precision=>0.005,initial_runs=>20,);$db->add_instances(Dumbbench::Instance::Cmd->new(command=>[$unopt,'-e',$code],name=>"before"),Dumbbench::Instance::Cmd->new(command=>[$opt,'-e',$code],name=>"after"),);SCOPE:{# Pin CPU frequency down to reduce variabilitymy$p=Dumbbench::CPUFrequencyPinner->new;$SIG{INT}=sub {undef$p;exit;};$p->set_max_frequencies($p->min_frequencies->[0]);$db->run;}$db->report;$db->box_plot->show();
This type of graph is generally referred to as a box plot and it shows the measurement results with additional information on the variability of the repeat measurements. "after" refers to a perl built from commit7d3c8a6837b55fff whereas "before" is a perl built from the commit prior to that. This particular example benchmark shows a statistically significant speed-up of 8% in slicing arrays. You can see similar speed-ups in hash slices, certain map blocks, and a few other list-type operations. For a day's work on a very mature code base, that's not too shabby at all!
[1] Technically speaking, there are many such linked lists since a logical operation will create a branch. So it's really a tree. But many stretches of code will be contained in linear sections, so please excuse me for the slight oversimplification.
[2]
The name pushmark
stems from the fact that the way to remember multiple
stack positions in nested operations is done with another stack, the
MARK stack. So pushmark
does very much what's on the tin.