Every single time I see an error message that looks like
Attempt to free unreferenced scalar: SV 0xDEADBEEF
,
my heart sinks. I know that I am in for an extended debugging
session. Debugging memory management problems is always a big hassle.
That is true in Perl or any other language.
valgrind and similar
tools can be a godsend, but even those struggle with certain classes of
problems.
Understanding the Problem
The Perl diagnostics list has the following useful bit to say about the warning:
Perl went to decrement the reference count of a scalar to see if it would go to 0, and discovered that it had already gone to 0 earlier, and should have been freed, and in fact, probably was freed. This could indicate that SvREFCNT_dec() was called too many times, or that SvREFCNT_inc() was called too few times, or that the SV was mortalized when it shouldn't have been, or that memory has been corrupted.
Let's pull that apart as it references a number of implementation details of Perl:
The current implementation of Perl5 uses a reference counting
based memory management scheme. Every basic value (a scalar, technically
a pointer to an SV
struct) has a slot that tracks the number of times
it is referenced by anything else. If you create a new reference to an SV,
you increment this so-called refcount. After giving up your reference
to the SV, you need decrement the counter. As soon as this refcount reaches
zero, Perl frees the memory associated with that SV and in turn
decrements the refcounts on all other SVs that the soon-to-be-ex-SV
holds a reference to. SvREFCNT_inc()
and SvREFCNT_dec()
are the
Perl (C) API macros that do just that.
If you call SvREFCNT_inc()
one time too many or call SvREFCNT_dec()
one time too few, then the SV and everything
it references will leak because they never get destroyed until the
global destruction phase of the perl VM. If you do the opposite
(too many SvREFCNT_dec()
or too few SvREFCNT_inc()
calls)
then the refcount on an SV drops to zero prematurely and it is freed
even though it is still referenced by data structures.
Alas, those are left in blissful unawareness of the pending doom
by invalid memory access.
The aforementioned warning is generated by Perl when it detects that the refcount on an SV is being decremented and that refcount was already zero. The refcount being zero to begin with, of course, means that it's actually no longer a valid SV in use by Perl! Since the memory segment where the SV used to live is no longer used to store the original SV (see below for a twist), the memory might have been reused for storing a different SV. If so, then you may not see the warning about the scalar that had the originally bad memory management. Perl knows nothing about your intentions and happily decrements the refcount on the new resident of your favorite slot of memory. Eventually that just means the refcount of the new SV will drop to 0 prematurely as well. Rinse-repeat until you manage to corrupt memory and warn about it before Perl gets a chance to reuse the memory. Fun times.
Astute C programmers will now bring up the memory debugging tool of
the day (my favorite is and remains valgrind/memcheck) that ought
to deal with this problem quite effectively by pinpointing invalid
access to freed memory. I wish it was so easy! The scheme is
broken twofold: For once, the above action at a distance with memory
reuse already means that perfectly valid code can contain the invalid
access. But more importantly, perl's internal memory management
makes this all the more likely to happen. Perl uses
slab allocation to avoid going to the OS for each and every SV
it creates since many OS malloc
implementations are deficient.
The parts of the SV structs that hold the refcounts are allocated
in such a slab (called arena in the Perl sources) that is typically
as large as one page of memory and holds however many items of the
same size that fit. Perl uses a list of unused elements in the slab to
efficiently "allocate" and "deallocate" SVs. This is great for
performance and to avoid fragmentation of memory. But for debugging
the above memory problems, it exacerbates the action at a distance
by frequent reuse of SV slabs.
If you fall victim to the eponymous warning, a search of the internet will find quite a number of cases of exasperated fellow sufferers asking for help from the experts. Alas, there is no one true debugging recipe that will lead to resolution in all cases and the few specific hints that do exist generally require building a special copy of perl for debugging.
Rigging Your Perl
There are a number of configure-time options with varying degrees of coverage in the Perl documentation that will help building a copy of perl that avoids some of the action-at-a-distance problems outlined above. The basic (*nix) recipe for building your own Perl is as follows (assuming you're in a checkout of the perl git repository or have an unpacked release tarball):
$ sh Configure -des -Dusedevel $ make $ make test
The -d -e -s
options basically mean "don't ask me any questions and use sane defaults for everything!".
The -Dusedevel
option just means that if you're building from a git clone, Configure shouldn't
whine about building a development version of perl. It's basically the "yes, I really mean it"
option to prevent people from deploying unreleased versions of Perl in production.
To build and test your perl more quickly on a multi-core machine, you can sprinkle some -j magic:
$ sh Configure -des -Dusedevel $ TEST_JOBS=5 make -j5 test
That will compile and test with five parallel jobs. If you want to install the new perl into a specific
location, then add -Dprefix=/home/you/mydebugperl
and run make install
.
Moving towards a more debugging-enabled perl, for starters, you want to include
debugging symbols in your output and possibly disable the C compiler's optimizations,
so add -Doptimize="-g3 -O0"
to the Configure
invocation. This will come in handy
for locating problems in the actual perl sources when staring at valgrind output. The
3
after -g
allows gcc to expand macros. Next up
is building a perl with its own debugging facilities enabled: Add -DDEBUGGING
.
Putting it all together so far, we get:
$ sh Configure -des -Dprefix=/home/you/mydebugperl -Dusedevel \ -Doptimize="-g3 -O0" -DDEBUGGING
All you've achieved so far, of course, is obtaining a copy of perl that is massively slower than your production perl (probably at least an order of magnitude) and won't really help you debug your refcount problem just yet! Having a perl like that handy is your typical stepping stone for perl-core debugging and so far, all of these steps are well-documented elsewhere.
The perlhacktips document explains a number of more intricate options
for memory debugging. The PERL_DESTRUCT_LEVEL section is of particular
interest. It turns out that perl by default doesn't bother cleaning up its memory slabs
when it's done. It generally lets the OS do it (I believe because that has less overhead).
Setting the environment variablePERL_DESTRUCT_LEVEL
during program execution makes
perl be more pedantic. That's important to make such issues visible to tools like valgrind
in the first place:
$ PERL_DESTRUCT_LEVEL=2 perl your_buggy_program.pl
If you had the opposite problem of Attempt to free...
, that is,
SVs with too high refcounts, then
you would start getting notifications about leaking scalars with this setup.
The next steps of getting to the bottom of that involve the
-DDEBUG_LEAKING_SCALARS
, -DDEBUG_LEAKING_SCALARS_FORK_DUMP
,
and -DDEBUG_LEAKING_SCALARS_ABORT
Configure options. These are
mostly documented in perlhacktips.
To make it easier pick up on the suspected refcount issues,
we can piggy-back on an option intended for the Purify tool: -Accflags=-DPURIFY
(read: add "-DPURIFY" to the C compiler options). With this C define,
perl will avoid using slabs for allocating SVs, which ought to improve
your chances of picking up on weird behavior. On top of that, we can ask Perl
to overwrite derelict memory areas with a known pattern (0xEF) to avoid obscuring
errors by reusing memory previously used for SVs. Akin to the purify option,
this is a C compiler define, so we now get: -Accflags="-DPURIFY -DPERL_POISON"
.
Just to put it all the tools together, this is how I built my perl in the end: (the -Dcc and -Dld settings allow me to use ccache with gcc for faster repeated compilation)
$ sh Configure -Doptimize="-g3 -Wall -Wextra -O2" -DDEBUGGING -Dusedevel \ -Dprefix=/home/you/mydebugperl -Dcc=ccache\\ gcc\\ -g3 -Dld=gcc \ -Uusethreads -de -DPERL_TRACK_MEMPOOL -DDEBUG_LEAKING_SCALARS_FORK_DUMP \ -DDEBUG_LEAKING_SCALARS -Accflags="-DPURIFY -DPERL_POISON" \ -DDEBUG_LEAKING_SCALARS_ABORT
My Unreferenced Scalar
A perl equipped with all of the above debugging tools has made my debugging work much easier on many occasions. Usually, a subset of the facilities outlined above is enough. Alas, in this case, only stepping through my code carefully, dumping the addresses of many SV's to locate the one that was being prematurely freed, allowed me track down the one stray statement that spoiled my day so thoroughly:
SvREFCNT_dec(*fetched_sv);
The code was using a hash access that creates the element it tries to fetch
if it doesn't exist (hv_common
's HV_FETCH_LVALUE|HV_FETCH_JUST_SV
mode)
but falsely assumed an implicit refcount increase as part of that operation.
Needless to say, I killed it with fire and then proceeded to perform an
intricate victory dance.
The actual flaw had only been uncovered by running many, many millions of fuzz tests against our Perl/XS implementation of the Serealdeserialization library to harden it against attacks. But that is a topic for another day.
Updated: Added additional suggestions (-g3
) from Reini Urban. Thanks!