Oh, and as a sign that 2.6.x really _is_ approaching, people have started sending me spelling fixes. Kernel coders are apparently all atrocious spellers, and for some reason the spelling police always comes out of the woodwork when stable releases get closer.
Therefore, to help the spelling police, I and a few others put together the following tools, tips, and data files to help those who wish to help fix spelling errors in the source of the Linux kernel.
One possible way this could be addressed is for the spelling police to offer a patch update service. Here's how it might work:
The patches that Dave Jones was concerned about were the "2.4 kernel commit archives". We might want to proactively try updating these. Once we have an archive of patches that need updating, we could probably automate the update process.
Both American and British English spellings are fine by this definition, as is jargon. In fact, if you can find a word in any dictionary at all, it's probably ok. (For instance, check dictionary.cambridge.org first, and onelook.com as a backup.)
All changes must be carefully manually reviewed at some point, possibly by more than one person. Jared Smith wrote:
I have tried to automatically spell-check long, complex texts for years, with numerous algorithms; all of them fail for one reason or another, and I find that the only proper way to do it is the tedious work by hand.See also the discussion of the "loose -> lose" changes on linux-kernel,Even a single lost pun because of overenthusiastic spellchecking is not worth the cleanup. I would prefer to see typos than lose a single intentional 'misspelling'. It would be best if you posted all changes somewhere so that they could be verified manually.
To avoid submitting a spellfix that breaks real code, consider following these simple rules:
To generate a stopword file containing all the nonwords from the noncomment part of the kernel source, do
find linux -name '*.[ch]' | xargs perl lspell2.pl /dev/null | grep -v ':' | sort -f | uniq > stop1.txt
To generate a stopword file containing all nonwords from the comment part of the kernel source, spellcheck the entire kernel tree using the stopword file generated above:
find linux -name '*.[ch]' | xargs perl lspell.pl stop1.txt | grep -v ':' | sort -f | uniq > stop2.txtthen edit 'stop2.txt' and remove lines that are not obviously spelling errors.
Finally, generate a master stopword file by combining stop1.txt and stop2.txt:
sort -u stop[12].txt > stop.txt
Here's where to get the scripts and data files mentiond above:
You can use the output of this program and a little elbow grease to create a corrections file for the next program:linux-2.5.63-bk5.old/include/asm-s390x/atomic.h: 1 enviroment linux-2.5.63-bk5.old/include/asm-s390x/rwsem.h: 1 consequtive linux-2.5.63-bk5.old/include/asm-s390x/dasd.h: 3 featueres Perfomance requests's linux-2.5.63-bk5.old/include/asm-s390x/pgtable.h: 3 lenght regiontable specifiation
typo.sh is a shell script by Francois Gouget which corrects a built-in list of common spelling mistakes. He wrote it to use with the Wine source tree, but it works ok on the kernel source tree as well. His post to lkml says the script is at fgouget.free.fr/typos. and kernel patches based on it are at fgouget.free.fr/tmp/linux-spelling/.
Stephane LOEUILLET also had a go at an automated typo fixer. Here's his script and his list of typos. See his first and his second posts to lkml on the subject.
Patches should be against a tree as close to Linus's BK tree as possible. One way to get a good reference tree is to download the latest released 2.5 kernel from www.kernel.org, and then patch it with the "gzipped full patch" from www.kernel.org/pub/linux/kernel/v2.5/testing/cset.
The patch turns out to be a very good place to review the proposed changes, since it shows a couple lines of context. If you don't like a proposed change, you can edit the patch to remove the hunk containing the change.
There is some debate about whether to submit a single patch for each kind of spelling error (e.g. "loose -> lose"), or a single patch for each area of the kernel source. Both approaches are probably good, but in any case, patches should be small and carefully reviewed by hand before submission.
Please have a look at earlier spelling patches accepted into Linus's tree. Linus seems to be applying patches that fix a single kind of spelling error (e.g. [PATCH] Spelling fixes: accommodate). Look at Linus's testing tree changeset page and search for "spelling".
International Ispell supports simple plug-in filters that let it spell-check just the portions of a document of interest, say the comments in a C program. ispell-c-comments.c is a filter that should be compatible with International Ispell; I haven't tried it myself yet.