dkftpbench
Need to stress out an ftp server, or measure how many users
it can support? dkftpbench can do it.
Want to write your own highly efficient networking software,
but annoyed by having to support very different code for
Linux, FreeBSD, and Solaris? libPoller can help.
Sources
Support
If you have a question about dkftpbench, join the
ftpbench mailing list
and post it there.
License
dkftpbench is released under the
GPL (GNU Public License) v2.
Introduction
dkftpbench is an FTP benchmark program inspired by
SPECweb99.
The result of the benchmark is a number-of-simultaneous-users rating;
after running the benchmark properly, you have a good idea how many
simultaneous dialup clients a server can support.
The target bandwidth per client is set at 28.8 kilobits/second to model
dialup users; this is important for servers on the real Internet, which
often serve thousands of clients on only 10 MBits/sec of bandwidth.
The final result of the benchmark is "the number of simultaneous
28.8 kilobits/second dialup users". To estimate this number,
the benchmark starts up a new simulated user as soon as the last one has
finished connecting. It stops increasing the number of users when one
fails to connect, fails to maintain the desired bandwidth, or the limit
specified by the -n option is reached. It runs the simulated
users until the amount of time specified by the -t option has elapsed
since the last simulated user birth or death; the final score is the
number of users still alive at the end.
To help people tune up their systems in preparation for running the benchmark,
the utility dklimits is provided.
Results
Comparisons of various ftp daemons, and
between poll and F_SETSIG, using this
tool are available.
Results of a microbenchmark comparing poll() and /dev/poll
are available.
Competition
I planned to hold a Linux FTP Server Performance Bakeoff
in March 2000 using this benchmark, but it looks like it might be more like March 2002 :-(
FTP Client Library
As part of this project, a multiplexing FTP client
library has been developed. This will let programs do
nonblocking FTP client stuff more or less
conveniently. Potentially useful for FTP clients that want to fetch
lots of files at once. The library consists of all the source files in
the ftp benchmark except for bench.cc and robouser.{cc,h}.
Of particular interest is Poller,
an OO wrapper around the various readiness
notification methods supported by various versions of Unix
(vanilla Unix poll() and select(), FreeBSD's kqueue(), Linux and Solaris /dev/poll,
O_ASYNC for Linux, and O_ONESIGFD for Linux).
Regardless of whether the underlying notification mechanism is
edge-triggered or level-triggered, Poller presents a level-triggered
interface to the program.
The benchmark uses Poller, and you can even pick one of the six readiness
notification schemes at runtime using the -s option.
You can use Poller in your programs, too. 'make install' installs
libPoller.a and the needed headers.
If you want to use nonblocking connects, read Stevens 'Unix Network
Programming vol 1', p. 410 and see the unit test testRejection() in
Poller_test.cc for how to do it.
Warning: the 2.4 linux kernel's SIGIO support doesn't extend to pipes.
If you're using the 2.4 linux kernel, you may need to apply
Jeremy Elson's patch
to your kernel if you want to use Poller_sigio.
Status
The following features are implemented:
- Compiles and runs on Linux, FreeBSD, and Solaris (or did, last I tried)
- fetches many files in parallel
- waits for each connect to finish (and then a bit) before starting next one;
slows down to < 1 connect/second when it reaches 75% of desired number of users.
This spreads out user activity more evenly.
- checks bandwidth continuously during each file fetch, stops adding users if any fetch too slow
- throttles each fetch to use only the specified bandwidth
- search for the max number of supported users
- Displays verbose error message when any user fails
- Aborts if it detects the client system running out of resources
- Aborts if connecting to the server takes > 5 seconds
- Aborts if it takes longer than 5 seconds to get first packet of a file
- Uses new Poller class for scalability; you can specify which Poller to use on the commandline
- Supports slow datarates (before, it only handles rates above 80kbits/sec on some systems)
- Lets you set how picky it is about datarates (before, its 'must be faster than'
threshold was fixed at 3/4 the target bandwidth)
- Supports alternative readiness notification methods like O_ASYNC and O_ONESIGFD
- Provided both as a standalone executable, and as a Corba object.
(Thanks to http://corbaconf.kiev.ua/
for the Corba autoconf macros.
See omni_scripts.tar.gz
for how I compiled OmniOrb 3.)
- Switches to BINARY mode after login. (The client API lets you choose;
edit robouser.cc to skip the START_TYPE state if you want to use ASCII.)
The following features are not yet implemented:
- verify that each ftp command (other than GET) doesn't take too long
- verify that GETs retrieve right number of bytes
- verify that GETs retrieve right bytes
- fetch random files
- upload test files to server under test
- support non-passive mode
- optimization
Issues:
- Doesn't support Solaris /dev/poll yet. (It would be easy but I've been lame.)
Example
After unpacking the sources, configure them for your system with the command
./configure
This will generate Makefile from Makefile.in.
To make sure the sources arrived intact and work properly on your system, type
make check
It will build all unit tests, and fail if any unit test fails.
You must be connected to the Internet, as this will try to download a file from ftp.uu.net.
To build the system tuning tool dklimits, type
make dklimits
Run it on both the client and the server machine; make sure
that the number of files it can open is about
three times the desired number of users, and
that the number of ports it can bind is higher than the desired number
of users. You should not be running X Windows or any other programs
on the client and server machines when running the benchmark.
To build the benchmark, type
make
This produces the executable 'dkftpbench', the tuning program 'dklimits', and
a bunch of unit tests (executables with names ending in _test) that you can
ignore for now.
Here's a simple use of dkftpbench:
./dkftpbench -n1 -hftp.uu.net -t15 -v
This tells bench to simulate one user fetching the
default file from ftp.uu.net repeatedly, and stop after fifteen seconds.
The program produces this output:
Option values:
-hftp.uu.net host name of ftp server
-P21 port number of ftp server
-n1 number of users
-t15 length of run (in seconds)
-b3600 desired bandwidth (in bytes per second)
-uanonymous user name
-probouser@ user password
-fusenet/rec.juggling/juggling.FAQ.Z file to fetch
-m1500 bytes per 'packet'
-v1 verbosity
1 users
User0: fetching 22708 bytes took 6.530000 seconds, 3477 bytes per second
User0: fetching 22708 bytes took 6.530000 seconds, 3477 bytes per second
Test over. 1 users left standing.
Distributed Load Generation
As of version 0.42, dkftpbench includes an experimental distributed version.
To use it, follow these steps:
- Install a C++ corba library, preferably OmniOrb 3.
(See omni_scripts.tar.gz for an example of how to build it.)
- 'make CorbaPlatoon_impl corbaftpbench'
- Start a Corba name service somewhere.
- Start a copy of CorbaPlatoon_impl on each load generation machine,
being sure to configure Corba on each load machine to know about your name service.
- Start a single copy of corbaftpbench (it's nearly the same as dkftpbench).
This should start your own little distributed denial-of-service-attack
against the ftp server of your choice. Please don't bombard a public FTP
server -- run your own for the purpose!
On a Linux system, you may need to pay attention to the per-process limit on
open filehandles (ulimit -n) as well as the system limit on open filehandles
(/proc/sys/fs/file-max) and the available port range (/proc/sys/net/ipv4/ip_local_port_range).
The program dklimits.c can help you check your
system's limits. Generally, ftp daemons require at least two network sockets and
one disk file descriptor per user. Use dklimits to verify you have enough
sockets and descriptors for your expected number of users. Same goes for
systems that will be used as ftp load generators, except that since they
don't usually store data on disk, they don't need the disk file descriptor.
On Linux, I run the following commands before starting the server or client:
ulimit -n 4096
echo 1024 32767 > /proc/sys/net/ipv4/ip_local_port_range
echo 4096 > /proc/sys/fs/file-max
and then check using dklimits to make sure these settings took effect.
Reporting Guidelines
I invite people to run this on their FTP servers and report the results by email.
If you want to do this, use the following command
to generate test data files:
make data
This will generate x10k.dat, x100k.dat, and x1000k.dat.
Run the commands
time dkftpbench -h200.201.202.203 -utestuser -ptestpass -n500 -t600 -fx10k.dat
and
time dkftpbench -h200.201.202.203 -utestuser -ptestpass -n500 -t600 -fx1000k.dat
on a different machine from the ftp server (substituting your server's IP address,
username, and password), and send in the following data:
- Server hardware (CPU type, speed, L2 cache, RAM, network card)
- Server OS (name, version, kernel version, output of dklimits immediately before starting server, other tuning parameters)
- Client hardware (CPU type, speed, L2 cache, RAM, network card)
- Client OS (name, version, kernel version, output of dklimits immediately before starting dkftpbench, other tuning parameters)
- Number of users left and elapsed time for x10k.dat
- Number of users left and elapsed time for x1000k.dat
I will collect and post the results.
Eventually, a more sophisticated workload and set of reporting guidelines will be provided.
How to read the sources
- Read this whole Web page, including the coding standards.
- Read the pages this links to, including the documentation included
with the sources (listed above).
- If you don't know sockets, read one of the tutorials linked to below.
- If poll() or "non-blocking I/O" is still mysterious to you, read
Unix Network Programming
or
Advanced Programming in the UNIX Environment
until it makes sense :-)
- To start reading the sources, arrange the modules in order from lowest level
(i.e. doesn't use any of the other modules) to highest level
(ui.e. ses but isn't used by any of the other modules).
For this project, the order is nbbio, fdmap, ftp_client_proto, ftp_client_pipe,
robouser, bench.
- Pick the first module in the list, e.g. nbbio.
Review its .h file (nbbio.h).
Note any confusing parts, and email the mailing list with any questions.
They may respond by improving the comments in the file, by
simply answering your questions, or even by fixing a bug you find.
- After the .h file makes sense to you, review the same module's .cc file
(e.g. nbbio.cc) and do the same thing.
- When you understand that module's .h and .cc, move on to the
next module in the list that uses modules you've already reviewed.
When you come across references to modules you've already reviewed,
you'll have a good understanding of them, and they won't stump you.
Throttling
ftp_client_pipe_t keeps track of the number of bytes read via
the network (in either data or control channels). When this exceeds
a threshold, no more reads are executed for the appropriate amount of time.
ftp_client_pipe_t sleeps for enough clock ticks to hide the granularity of
the clock. In particular, ftp_client_pipe_t sleeps as soon as
Tw = (bytes_sent / desired_bandwidth - elapsed_time) is greater than eight
clock ticks.
For example, if eclock_hertz() is 100, the desired bandwidth
is 28000 bits/sec, it's been one clock tick since it last woke up,
and it has received 1500 bytes since it last woke up, then
desired_bandwidth = 28000 / eclock_hertz() = 280 bits/tick, and
Tw = 1500 * 8 / 280 - 1 = 42, so it would wait 42 clock ticks before
accepting any more reads.
On the other hand, if it takes 60 clock ticks to receive 1500 bytes,
it won't sleep at all.
(Compare with SPECweb99's "Rated Receive" logic, which only sleeps
at the end of each file fetch.)
But see
Rick Jones' post on comp.benchmarks
for a report of some trouble with this kind of throttling technique.
To Thread or not to Thread
I've chosen an event-driven approach to the problem. ("Event driven" is
also known as "non-threaded", "polling I/O", "non-blocking I/O", or
"multiplexed I/O".)
Many programmers today are familiar only with the threaded model of
writing servers, where the server creates a new thread or process
for each client. This lets you write code in a stream-of-conciousness
way, but has several drawbacks: it can be very hard to debug, and
it can have high overhead.
John Ousterhout's
talk on "Why Threads are a Bad Idea (for most purposes)" explains
some of the reasons programmers familiar with threads should also
learn about the alternatives to threads:
The talk compares the threads style of programming to an alternative
approach, events, that use only a single thread of control. Although each
approach has its weaknesses, events result in simpler, more manageable
code than threads, with efficiency that is generally as good as or better
than threads. Most of the applications for which threading is currently
recommended (including nearly all user-interface applications) would be
better off with an event-based implementation.
In an event-driven server, a single thread handles many clients at the same
time. This is done by dividing up the work into small pieces, and explicitly
handling a single stream of all the pieces of work from all the
clients; each client gets a moment of attention just when it needs it.
I've chosen this approach because it will use much less
memory to support tens of thousands of clients than would a thread-per-client
approach. I may still introduce threads at some point to allow the
program to make use of multiple CPU's, but I will do so sparingly.
Support for alternatives to poll()
dkftpbench supports both poll() and
alternative readiness notification methods.
This was done by adding a Poller class which
abstracts the poll() system call; concrete subclasses of this
have been written for poll(), select(), F_SETSIG, kqueue(), and /dev/poll.
dkftpbench has been tested with most of these (not kqueue or /dev/poll yet),
and you can choose which one to use from the dkftpbench commandline.
Notes
Other Benchmarks
Interesting Server Programs
- webfs is a very
simple, single-threaded, multiplexing HTTP server. The event
processing is very simple and clear; it's a good server to look at
to understand multiplexing. It uses select(), but the idea is the same for
poll().
- mathopd is another server
that uses poll() or select(), and has a very clear main loop.
- Search freshmeat.net for ftpd --
there are a lot of FTP server programs out there.
For instance, Betaftpd is a single-threaded FTP server.
- Linuxmafia.com's list of Linux FTP daemons
- Rana Bhattacharyya's FTP Server - multithreaded; resumable; lots of features; now part of Apache Avalon project - 17KLOC. Took about 5MB of RAM per connection in my tests.
Other FTP libraries
- Thomas Pfau's ftplib
-- existing library that implements the client side of the FTP protocol.
Doesn't let you multiplex lots of connections, though.
- IBM's FTP beans -- open source Java ftp protocol and UI beans
Standards
Resources for learning about network programming
Since this code is licensed under the GPL, you're free to do as you
like with it. If you want to contribute to the project, though,
please follow these guidelines:
- In general, documentation is written first, then the module self-test,
then implementation. Documentation should be extremely brief,
consist mostly of interface comments embedded in the .h files,
and describe things just well enough that you could implement or use the code
with it as a guide. The self-test is written first so it can be
used to help initial debugging of the module, and later as a regression test.
- Anyone who adds code should first review the existing code
to look for places where the .h files are confusing or incomplete,
and give feedback to Dan so he can fix this. This will help
everyone understand the code, and will ensure the documentation is up to
snuff for future contributors.
- Comments at the beginning of modules or functions are
called 'interface comments'. Design comments:
- document the interface, rationale, and intent of the module or function
- avoid talking about implementation details that don't affect the interface
- start with a /*------------- line
- end with a ------------*/ line
- enclose text whose left margin is indented to line up with the *
- don't use stars at the left margin of the text.
- are repeated verbatim in both the .h and .cc files.
- explain what the module or function is for well enough that you don't
have to look at the innards to use it or understand what it's for.
- are kept up to date; when the interface of a function changes,
the interface comment should change to reflect it.
- Each module has a simple self-test program at the bottom, surrounded with
#ifdef modulename_MAIN ... #endif.
The self-test should, if possible, provide a simple unsupervised
go/no-go indication.
- 'make test' will compile and run all module self-tests.
- tab stops 4 and indent width 4 are used throughout. Try to
follow the style of the existing code (i.e. spaces after
keywords, curly braces on same line as keywords, etc.)
- A minimal subset of C++ is used; essentially, it's
C with classes.
- No C++ - style I/O (no cout, cin, etc.). All I/O is native Unix I/O
(e.g. read, write) or C - style I/O (printf, etc.).
(That doesn't mean cin, cout, etc are bad, they're just not to be used in this project.)
- No inheritance without good reason; check with Dan before using any. (Inheriting from Poller::Client
is ok.)
- No templates in the main code. It's ok in unit tests, though...
- No STL. (That doesn't mean STL
is evil; it's just not appropriate for this project.)
Last Change: 18 Mar 2002; links last fixed 21 Dec 2005
Most files Copyright 1999-2002 Dan Kegel
nbbio.{cc,h} are Copyright 1999 Disappearing, Inc.
See AUTHORS in the tarball for more details