linux, gcc, glibc, and the ppc405

ppc405 Erratum 77

The IBM ppc405cr has a bug in the stwcx instruction which can cause semaphores, mutexes, and the like implemented in terms of stwcx to fail. See "erratum 77" in the IBM document 405CR_C_errata_1_2 for a description and workarounds suggested by IBM.

The bug can be very hard to trigger. We suspect that the lack of a workaround for this erratum in the libstdc++ library caused a failure on average once every 12 hours in one complex ppc405 c++ program.

Bug tracking

I've submitted the following entries in gnats to bring this problem to the attention of the gcc and glibc maintainers:

Latest Patches

You need all three of these to get a working linux/glibc/gcc system:

History

The following are some historical details about the above patches.

Linux kernel workaround

In Sept 2001, a fix for erratum 77 was committed to the ppc development linux kernel. (This fix has not made it into the mainline 2.4 kernel -- see for example arch/ppc/kernel/bitops.c -- but most ppc405 users realize that and act accordingly.)

glibc workaround

Mark Hatle (fray at mvista.com) then commented on the linuxppc-embedded list
Just as an FYI, we also [use SYNC before STWCX] in glibc to be safe. We have never been
able to pin down a problem in userspace due to this bug, but we thought
it would be better safe then sorry until we can get definative proof
that the bug will not happen in userspace.

The following two files in glibc should be patched:
linuxthreads/sysdeps/powerpc/pt-machine.h
sysdeps/powerpc/atomicity.h
The patch Mark is referring is part of Hard Hat Linux 2.0. . If and when this makes it into the mainline, you should see sync or dbct instructions before the stwcx in linuxthreads/sysdeps/powerpc/pt-machine.h in cvs and sysdeps/powerpc/atomicity.h in cvs.

libstdc++ workaround

I posted a rough patch to the libstdc++ mailing list. If and when this makes it into the mainline, you should see a sync or dbct instruction in libstdc++-v3/config/cpu/powerpc/atomicity.h in cvs.

my current patch is a bit more polished -- it includes documentation, uses multilibs properly, and does absolutely nothing unless enabled at configure time with --enable-ppc405cpu. (Here's the same patch rediffed to gcc-20020715).

libgcj workaround

libgcj needs the same workaround, but nobody has put a patch together yet.

Regression test

I'm still looking for a good regression test for this. My current thinking is to raise HZ in our kernel to 1024 to increase the chance that an STWCX instruction is interrupted midthought, then run something like atomicity_test.c.

Thanks to Jan Olderdissen, Mark Hatle, Montavista, and everyone who's helped squish this bug.


Last update: 23 July 2002.