This is a story about a significant new optimization to the Perl interpreter. It is a story about battling code complexity. And it is a story about wanting to have our cake and eat it too.
A recent Booking.com hackathon provided us the opportunity to investigate speeding up integer allocation in the Perl interpreter. If successful, this could optimize nearly every program we run. We discovered that a naive implementation could work, but would make the code a lot more difficult to maintain. Our path lead us to attempt to leverage the C preprocessor to improve code clarity while opening doors to real gains in program execution speed.
First, some background
As described in perlguts and PerlGuts Illustrated, representations of variables in Perl are usually composed of two parts: a head struct and (optionally) a body struct. The head contains the essential “internal book-keeping” common to all variables (regardless of type) including a pointer to the (optional) body struct. Below is an image of the layout of fields in a head struct as depicted in PerlGuts Illustrated:
The body struct can be quite different, depending upon the type of the
variable. The most simple type of variable is the SvNULL
, which
represents undef
and does not need a body struct at all.
A string (called PV
for “pointer value”) has a body type
of XPV
:
But the body struct of a PV
is different to a body struct of a PVNV
.
A PVNV
can hold a floating point number and a string representation of
that same value:
One benefit of this type of design is that all references to the value point to the memory location of the head. As such, Perl is free to change which memory is being used to represent a value by changing the body struct, without having to update any pointer except the pointer contained within the head.
Changing types
Naturally, Perl has an internal function to convert between types of
variable. This function is sv_upgrade
(“scalar value
upgrade”). In essence, whenever we have a variable of some type in
Perl (for example, a simple integer) and we need to access it as a
different type (for example, a string), the sv_upgrade
will convert
the type of the variable (for example, into a type that contains both
integer and string representations of a value). This change may involve
replacing the body struct with a larger struct.
To see how sv_upgrade
is implemented, we can look at the
Perl_sv_upgrade
function in sv.c
. We see that this function encapsulates a lot
of complexity. There are many comments, which describe subtle corner
cases. Since it can take a scalar value of essentially any type and
convert it to something capable of representing any other type, perhaps
it’s unsurprising that there’s a lot of code here.
Without going line-by-line, an overview of how this function is
implemented may be useful. First, there is a switch
based on the current type of
variable, to determine what it needs for the new type. Shortly
thereafter, there is a second switch
based on the new type. Inside of
the second switch
block, there are numerous if
blocks for doing
different things depending on the old type. Finally, after the new body
struct has been set up, and the head struct contains all the correct
flags, the memory used by the old body is freed.
Still with me? Good.
A naive optimization
The sv_upgrade
function is called from a number of places. Not only is
it called in places such as printing integers as strings, but it’s
also called when assigning an integer value to a previously cleared
variable.
A previously cleared variable is always an undef
with no body struct.
The reason sv_upgrade
is called is to allow the correct setup for the
body to occur. This was a very sensible design decision, as it
centralizes a lot of “internal book-keeping” behavior,
rather than duplicating it. The cost of this centralization is
performance: some generic (and in this case superfluous) code is
executed. For example, when creating a new integer, sv_upgrade
will
run an unnecessary check for a conversion of a larger type to a smaller
type.
Integer assignment to a cleared variable occurs so frequently that one
would imagine it might be worth duplicating the required complexity to
achieve the performance improvement. Thus we decided to evaluate that
trade-off for integer allocation. After a review of the nearly 300 lines
of sv_upgrade
internals, we saw that we could remove the call entirely
if we could “hoist out” just the essential two lines of
code! However, there is a very good reason this had not been done
before. Let's look at the two lines.
The first line (since we know the new type) is easy:
SvFLAGS(sv) |= new_type;
But the other line is quite complex:
SvANY(sv) = (XPVIV*)((char*)&(sv->sv_u.svu_iv) - STRUCT_OFFSET(XPVIV, xiv_iv));
If your head is swimming after looking at that last line of code, don’t worry: you’re in good company. Granted, this is described in Illustrated perlguts:
Since 5.10 for a raw IV (without PV) the IVX slot is in the HEAD, there is no xpviv struct ("body") allocated. The SvIVX macro abuses SvANY pointer arithmethic to point to a compile-time calculated negative offset from HEAD-1 to sv_u.svu_iv, so that PVIV and IV can use the same SvIVX macro.
But, even once I thought I maybe understood what that complex line of code supposedly did, I made Steffen patiently sit for more than 15 minutes as I convinced myself with paper and pencil that it actually did what he had described. After that, the drawing from Illustrated Perl Guts made more sense to me:
What’s more, I finally understood that this complexity exists in
order to avoid executing an if
statement that would otherwise be
called every time the value is retrieved!
Thus, the previous design trade-offs seemed like the right ones. Yes, we could make Perl faster, but at the cost of leaking a lot of complexity into another part of the code. That complexity would make any additional development much more difficult in any future work.
Having our cake and eating it too
We wanted to encapsulate that very complex bit of code, but without any added runtime performance cost. Since this is C, we looked at using the preprocessor to push the complexity behind a macro, in much the same way that with other languages we might move some hairy lines of code behind a well named function or method:
#define SET_SVANY_FOR_BODYLESS_IV(sv) \ SvANY(sv) = (XPVIV*)((char*)&(sv->sv_u.svu_iv) - STRUCT_OFFSET(XPVIV, xiv_iv))
And, of course, the advantage of using a macro (rather than a function) is that the cost is paid entirely at compile time, and thus there is zero additional runtime performance lost.
By introducing the macro, the readability of the code would be improved in several places.
So, how did this change our situation? With the macro, the two lines we wanted to “hoist out” looked a lot less complex. All we would need would be a patch which replaces this one computationally heavy function call:
sv_upgrade(dstr, SVt_IV);
with just these two lines:
SET_SVANY_FOR_BODYLESS_IV(dstr); SvFLAGS(dstr) |= SVt_IV;
By factoring the complexity differently, we opened the doors to a different set of performance/complexity cost trade-offs. We would leak only a small amount of complexity relative to the gain in performance. But is it worth it? With these changes, the cost would be low, but what would the actual benefit be?
Measuring the gain
The micro-benchmark we used is very heavy on exercising one particular code path, however it is a common code path.
$dumbbench-i50--pin-frequency--\./perl-Ilib-e\'for my $x (1..1000){my @a = (1..2000);}'
Here are the results of the micro-benchmark before the optimization:
Rounded run time per iteration: 2.4311e-01 +/- 1.4e-04
And here are the results after the optimization:
Rounded run time per iteration: 1.99354e-01 +/- 5.5e-05
That’s 18% faster.
Results
Overall, this hackathon project was a success.
With these measurements, we demonstrated that the optimization was
worthwhile. The complexity cost is pretty close to zero: in some places
the Perl core is slightly more complex, but the internals of
sv_upgrade
are certainly more clear. Additionally, we found there
were other similar potential optimizations that could leverage the same
technique. In the end, five related patches were pushed to Perl:
- Refactor bodyless-IV/NV hacks into define
- Speed up assigning an IV to a previously cleared SV
- Speed up newSViv()
- Repeat newSViv optimization for newSVuv
- Attempt to bring newSViv/uv optimization to newRV
When Perl 5.22 ships, in part because of this work, many real-world programs will be faster.