linux, gcc, glibc, and the ppc405
ppc405 Erratum 77
The IBM ppc405cr
has a bug in the stwcx instruction
which can cause semaphores, mutexes, and the like implemented in terms
of stwcx to fail. See "erratum 77" in
the IBM document 405CR_C_errata_1_2
for a description and workarounds suggested by IBM.
The bug can be very hard to trigger. We suspect that the lack of a workaround
for this erratum in the libstdc++ library caused a failure on average
once every 12 hours in one complex ppc405 c++ program.
Bug tracking
I've submitted the following entries in gnats to bring this problem
to the attention of the gcc and glibc maintainers:
Latest Patches
You need all three of these to get a working linux/glibc/gcc system:
History
The following are some historical details about the above patches.
Linux kernel workaround
In Sept 2001, a fix for erratum 77 was
committed to the ppc development linux kernel.
(This fix has not made it into the mainline 2.4 kernel --
see for example
arch/ppc/kernel/bitops.c --
but most ppc405 users realize that and act accordingly.)
glibc workaround
Mark Hatle (fray at mvista.com) then
commented on the linuxppc-embedded list
Just as an FYI, we also [use SYNC before STWCX] in glibc to be safe. We have never been
able to pin down a problem in userspace due to this bug, but we thought
it would be better safe then sorry until we can get definative proof
that the bug will not happen in userspace.
The following two files in glibc should be patched:
linuxthreads/sysdeps/powerpc/pt-machine.h
sysdeps/powerpc/atomicity.h
The patch Mark is referring is part of Hard Hat Linux 2.0.
.
If and when this makes it into the mainline,
you should see sync or dbct instructions before the stwcx in
linuxthreads/sysdeps/powerpc/pt-machine.h in cvs and
sysdeps/powerpc/atomicity.h in cvs.
libstdc++ workaround
I
posted a rough patch to the libstdc++ mailing list.
If and when this makes it into the mainline, you should
see a sync or dbct instruction in
libstdc++-v3/config/cpu/powerpc/atomicity.h in cvs.
my current patch is a bit more
polished -- it includes documentation, uses multilibs properly,
and does absolutely nothing unless enabled at configure time with
--enable-ppc405cpu.
(Here's the same patch rediffed to gcc-20020715).
libgcj workaround
libgcj needs the same workaround, but nobody has put a patch together yet.
Regression test
I'm still looking for a good regression test for this.
My current thinking is to raise HZ in our kernel to 1024
to increase the chance that an STWCX instruction is interrupted
midthought, then run something like
atomicity_test.c.
Thanks to Jan Olderdissen, Mark Hatle, Montavista, and everyone who's
helped squish this bug.
Last update: 23 July 2002.