dkcorbabench

Description
Results
Notes
Downloading/Compiling
Pseudocode
Links
Tcl implementation

Description

dkcorbabench is a small suite of benchmarks for comparing the dynamic footprint of various CORBA implementations.

The 'footprint' benchmark measures how long it takes to send and receive an echo to N servers in parallel, where both the client and all servers are running as separate processes on the local system.

'footprint' takes two arguments:
max_n (the maximum number of servers to try to run), and
max_t (the maximum allowed latency in milliseconds).
It then loops, creating a new server and measuring latency to all servers, until either max_n or max_t is exceeded. After each measurement, it prints the latency and displays the first two lines of /proc/meminfo (if running on Linux).

To just measure performance, choose a low value for max_t.
To measure a combination of memory consumption and performance, choose a high value for max_t, or run the benchmark on a machine with much too little memory relative to its cpu speed.

Two versions of 'footprint' are included:

footprint0: implemented using unix pipes
footprint1: implemented using corba

Results

With max_n set to 1000, and max_t set to 100ms, results are as follows:

CPU	RAM	Swap	CPU speed	Compiler	Kernel	unix pipes result	OmniOrb3.0.4 result	Tao1.2 result
pentium 2	128 MB	ide hard drive	450 MHz	gcc2.96-98	Red Hat 7.2	800 servers	117 servers	n/a
dual pentium 3	416 MB	ide hard drive	650 MHz	gcc2.96-98	Red Hat 7.2 SMP	2650 servers	194 servers	272 servers
dual pentium 3	48 MB	ide hard drive	650 MHz	gcc2.96-98	Red Hat 7.2 SMP	350 servers	55 servers	64 servers
dual pentium 3	16 MB	ide hard drive	650 MHz	gcc2.96-98	Red Hat 7.2 SMP	61 servers	9 servers	5 servers
dual pentium 3	14 MB	ide hard drive	650 MHz	gcc2.96-98	Red Hat 7.2 SMP	54 servers	7 servers	0 servers
dual pentium 3	12 MB	ide hard drive	650 MHz	gcc2.96-98	Red Hat 7.2 SMP	41 servers	6 servers	0 servers
dual pentium 3	11 MB	ide hard drive	650 MHz	gcc2.96-98	Red Hat 7.2 SMP	33 servers	5 servers	0 servers
dual pentium 3	10 MB	ide hard drive	650 MHz	gcc2.96-98	Red Hat 7.2 SMP	26 servers	1 server	0 servers
dual pentium 3	9 MB	ide hard drive	650 MHz	gcc2.96-98	Red Hat 7.2 SMP	16 servers	1 server	0 servers
dual pentium 3	8 MB	ide hard drive	650 MHz	gcc2.96-98	Red Hat 7.2 SMP	10 servers	0 servers	0 servers

Notes

The above results were without linking in -lomniDynamic3 to the server. With that library linked in, 25% fewer servers could be spawned on one system with 64 MB RAM. Thus it looks like the simple act of linking in the dynamic any support adds significant overhead per process.

On the 'footprint1' benchmark, TAO1.2 seems to beat OmniOrb3 above 48 MB of RAM, but OmniOrb3 wins below 16 MB.

Downloading/Compiling

The source code is GPL'd, and is in the file dkcorbabench-0.4.tar.gz. The change log is in the file ChangeLog.

dkcorbabench uses autoconf to adapt to the platform you're running on. The autoconf macros used to adapt to the platform's ORB are adapted from those at corbaconf.kiev.ua.

Only omniorb3 and TAO 1.2 have been tested so far, although I intend to support more ORBs later.

Omniorb3

I compiled and installed omniorb3 from sources using the scripts in omni_scripts.tar.gz.

To build dkcorbabench for the x86 workstation, I used the commands

./configure --with-omni=/opt/omniorb3/native/
make

To cross-compile dkcorbabench for the ppc405, I used the commands

CC=${MY_GCC3_CROSS_TOOL}gcc CXX=${MY_GCC3_CROSS_TOOL}g++ CFLAGS=$MY_TARGET_CFLAGS CXXFLAGS=$MY_TARGET_CFLAGS ./configure --with-omni=/opt/omniorb3/405_linux_2.0_glibc2.1 --host=powerpc-linux
make

where MY_GCC3_CROSS_TOOL is the location and prefix of the cross-development tools, and
MY_TARGET_CFLAGS are the compiler flags needed to build for the ppc405.

TAO 1.2

I compiled TAO 1.2 from sources for the x86 using the script mintao.sh. I had to turn off the minimum tao switch, but hope to turn it back on by compiling the server with minimum tao, but the client with maximum tao, and, um, running them on separate machines or something.

To build dkcorbabench with Tao for the x86 workstation, I set the usual ACE_ROOT and TAO_ROOT environment variables, then used the commands

./configure
make

Pseudocode

For footprint1, the measurement is performed by the following code:

/**
 Ping n children, return how many milliseconds it takes.
*/
static int ping_n_servers(int n, Echo_var *servers)
{
    int start = time_in_ms();

    CORBA::Request_var *req = new CORBA::Request_var[n];
    const char *arg = "Hello!";

    /* Send N pings */
    for (i=0; i<n; i++) {
        req[i] = servers[i]->_request("echoString");
        req[i]->add_in_arg() <<= CORBA::string_dup(arg);
        req[i]->set_return_type(CORBA::_tc_string);
        req[i]->send_deferred();
    }

    /* Wait for N replies */
    for (i=0; i<n; i++) {
        req[i]->get_response();
        const char* ret;
        req[i]->return_value() >>= ret;
    }

    return time_in_ms() - start;
}

footprint1(int max_n, int max_t)
{
    Echo_var servers[max_n];
    for (i=0; i<max_n; i++) {
        int latency;
        spawn new echo server.
        servers[i] = reference to new echo server.
        // Invoke method now, so TCP connect time isn't included in measuement
        servers[i]->echoString("Hello!");
        latency = ping_n_servers(i+1, servers);
        if (latency > max_t) abort();
        printf("Latency to %d children is %d ms\n", i+1, latency);
    }
}

Tcl implementation

Frank wrote:

Hi Dan,

 I see that you are evaluating ORBs in low memory situations. I can't
help but chime in here with an unorthodox possibility.

 Just for curiosity, I have implemented your footprint client and
server example in Tcl using the Combat ORB. Client and Server are
attached; in addition, you will need

  - Tcl 8.3.x from http://tcl.sourceforge.net/
  - [incr Tcl] 3.x from http://incrtcl.sourceforge.net/
  - Combat/Tcl from http://www.fpx.de/Combat/#Download-Tcl

 Your footprint benchmark runs unchanged.

 Performance is expectably worse than that of C++ clients and servers.
On my 266 MHz machine, the 100ms latency threshold is enough for 2
children only. Combat is probably not the ORB you want to use for high
throughput.

 However, it has a small footprint, so I'm confident that it will keep
its "performance" even in low memory conditions. So maybe you might want
to try running it on your test system.

 Another plus is that as you add more features, scripted clients and
servers grow only a little. On the Combat home page, you can find an
Account example; its statically-linked Win32 server is 460k, the (gra-
phical) client 860k.

 This is not supposed to be a sales pitch, but I hope I made you
curious.

 Have fun,
        Frank