有按键精灵不歉容64位Mindcraft1.4.7的显血Mod吗?

Mindcraft Redux
On Mindcraft's April 1999 Benchmark
"First they ignore you, then they laugh at you, then they fight you, then you win." --
Executive summary:
The Mindcraft benchmark proved to be a wake-up call, and the Linux
community responded effectively.
Several problems which caused Apache to run slowly on
Linux were found and resolved.
As of mid-1999, Linux/Apache performance was identical to NT/IIS
performance under light load, and respectable under heavy load.
As of May 2000, performance on Mindcraft-like benchmarks
As of February 2001,
is dramatically better with 2.4 compared to 2.2.
In January 2002,
was added to the
This should help Linux scale better to many processors and many
running threads.
(The new scheduler will probably not be included by
default in Linux distributions until 2003, but it is already available
as a patch to the 2.4.17 kernel for the few people who might need it.)
By early 2003, the NGPT and NPTL projects have succeeded in bringing
high-performance POSIX-compliant threading to Linux.
commercial distribution to include NPTL will likely be Red Hat 8.1
in April 2003.
Graphs showing remarkable progress in performance of 8-CPU SMP Linux
systems are online at
In May 2003, .
Here we go again!
I've split the summary of recent benchmark results (including the
Mindcraft benchmarks) off into a separate page,
at the request of readers who liked the data but weren't interested
in the history of the Linux kernel hacking community's response to the
benchmarks.
Updated 3 Oct 2000
Updated 2 March 2000
Updated 27 Jan 2000
Updated 19 May 2000
Updated 17 Sept 2000
4 June 2000
Updated 4 Oct 2000
18 Sept 2000
31 March 2000
In March 1999, Microsoft commissioned
to carry out a
showing that NT was 2 to 3 times faster than Linux.
This provoked responses from
and , among others.
(evidently Microsoft takes this very ).
The responses generally claimed that Mindcraft had not configured Linux properly,
and gave specific examples.
Both Mindcraft and the Linux community agree that good tuning information for Linux is
hard to find.
Why the Outcry?
The Linux community responded to Mindcraft's announcements with hostility, at least
partly because of Mindcraft's attitude.
Mindcraft stated
"We posted notices on various Linux and Apache newsgroups and received no relevant responses."
and concluded that the Linux 2.2 kernel wasn't as well supported as NT.
Mindcraft did not seem to take the time to become familliar with all the appropriate
forums for discussion, and apparantly did not respond to requests for further information
(see section III of
Others hav in particular,
all came about in the course of
normal support activities on Usenet, the linux-kernel mailing list, and the Apache bug
tracking database.
I believe the cases illustrated below indicate that free 2.2.x kernel support is better
than Mindcraft concluded.
Also, Mindcraft's
neglected to mention that Microsoft sponsored the Mindcraft benchmarks,
that the tests were carried out at a Microsoft lab,
and that Mindcraft had access to the highest level of NT support imaginable.
Finally, Mindcraft did not try to purchase a support contract for Linux
both of whom were offering commercial support at the time of Mindcraft's tests.
at an independant testing lab to address concerns that their testing
was biased, but they have not yet addressed concerns that their conclusions
about Linux kernel support are biased.
Truth or FUD?
Mindcraft probably did tune NT well and Linux poorly -- but rather than
assume this fully accounts for Linux's poor showing, let's look for other things that
could have contributed.
I'm going to focus on the web tests, since that's what I'm familliar with.
Although Apache was designed for flexibility and correctness rather than raw performance,
it has done quite well in benchmarks in the past.
showed that "Linux with Apache beats NT 4.0 with IIS,
hands down".
(Also, , but this is with special caching software.)
that Apache's performance falls off dramatically when there are more than 160 clients
Is this a contradiction?
Not really.
done by Jef Poskanzer, the author of the
high-performance server 'thttpd', showed that Apache 1.3.0 (among
other servers) has trouble above 125 connections on Solaris 2.6.
The number of clients served by Apache in the
above was 40 or less, below the knee found by Poskanzer.
By contrast, in the Mindcraft tests (and in the IIS
), the server
was asked to handle over 150 clients, above the point where Poskanzer saw the dropoff.
Also, the January Ziff-Davis benchmarks used
, not enough to hold both the server code and the 60 megabyte
used by both Mindcraft and Ziff-Davis, whereas Mindcraft used 960 megabytes of RAM.
So it's not suprising that the Jan '99 Ziff-Davis and April '99 Mindcraft tests of Apache
got different results.
Does it matter?
These benchmarks are done on static pages, using very little of Apache's
dynamic page generation power.
points out that the performance levels reached in the test correspond
to sites that receive over a million hits per day on static pages.
It's not clear that the results of such a test have much relevance to
typical big sites, which tend to use a lot of dynamically generated pages.
Another objection to these benchmarks is that they don't accurately reflect the
real world of many slow connections.
A realistic benchmark for a heavily-
trafficked Web server would involve 500 or 1000 clients all restricted to
28.8 or 56 Kbps, not 100 or so clients connected via 100baseT.
A benchmark that aims to deal with both of these concerns is the new
benchmark.
When it becomes available, it looks like it will set the standard
for how web servers should be benchmarked.
Nevertheless,
that until more realistic benchmarks (like SPECWeb99) become available,
benchmarks like the one Mindcraft ran are an understandable if dirty compromise.
Why does Apache fall off in the above tests above 160 active connections?
It appears the steep falloff may have been due to a TCP stack problem
reported by
and later by Karthik Prabhakar:
(Ariel Faigon) reported on 3 May 1999 (updates added):
"A couple of items you may find interesting.
For a long time the web performance team at SGI has noted
that among the three web servers we have been benchmarking:
Apache, Netscape (both enterprise and fasttrack), and
Apache is (by far) the slowest.
In fact an SGI
employee (Mike Abbott) has done some optimizations which
made Apache run 3 (!) times faster on SPECWeb 96 on IRIX.
It is our intention to make these patches public soon.
[They are now online at .]
When we tried to test our Apache patches on IRIX the expected
3x speedup was easy to achieve.
However when we ported our
changes to Linux (2.2.5), we were surprised to find that
we don't even get into the scalability game.
A 200ms delay
in connection establishment in the TCP/IP stack in Linux 2.x
was preventing Apache to respond to anything more than 5
connections per second.
We have been in touch with David Miller
on this and sent him a patch by Feng Zhou which eliminates
this bottleneck.
This patch ... has made it into the
[2.2.7] kernel [after some modifications by David Miller].
So now we are back into optimizing Apache. ..."
[The patch affected the files
e.g. it notes that Nagle shouldn't be used for the final FIN packet.]
(6 May 1999):
"As for our changes to Apache.
They are much more significant
and make Apache run 3 times faster on SPECWeb 96.
talked to the author and made sure we are releasing them
to the Apache group when we're ready.
We just don't want to
be too hasty in this.
We want to make it right, clean and
accepted by the Apache guys.
The 'patch' here is pretty big. ...
It includes:
A page cache
Performance tunables adjusted to the max
Changed some critical algorithms with faster ones
(this is long, I hope to have more details when we release).
Eliminated system calls where they weren't needed
(so Apache is less dependent on the underlying OS)"
Karthik Prabhakar reports on a problem with Apache 1.3.4 and 1.3.6 on Linux kernel 2.2.5:
"I've seen load fall off well below 160 clients (for eg., 3 specweb clients with 30
processes each). I can't explain it yet, especially the
fact that the performance stays low even after the test
concludes. This behavior seems limited to apache."
He has reported this as a bug to the Apache see
"The mystery continues. I got round to trying out 1.3.6 again this evening,
this time on 2.2.7. I did _not_ see the performance drop off. Just to verify,
I rechecked on the stock 2.2.5 kernel, and the drop off is there.
So _something_ has been fixed between 2.2.5 and 2.2.7 that has made this problem
go away. I'll keep plugging away as I get spare time to see if I can get the
problem to occur. ...
Compiling 1.3.6 in a reasonable way, along
with a few minor tweaks in linux 2.2.7 gives me about 2-4 times the
peak performance of the default 1.3.4 on 2.2.5.
I simply compiled [Apache] with pretty much all modules
disabled....
I'm using the
highperformance-conf.dist config file from the distribution."
and its followups.
This sounds rather like the behavior Mindcraft reported
("After the restart, Apache performance climbed back to within 30% of its peak from a low of
about 6% of the peak performance").
(Note: According to the Linux Scalability Project's
, a "task exclusive" wake-one
patch is now integrated into the 2.3 however, according to
as of 2.4.0-test10, it still wakes up processes in same order they were put to sleep,
which is not optimal from a caching point of view.
The reverse order would be better.
Nov 2000 measurements by Andrew Morton (andrewm@uow.edu.au);
Phillip Ezolt, 5 May 1999, in linux-kernel (
"When running a SPECWeb96 strobe run on Alpha/linux, I found that when the
CPU is pegged, 18% of the time is spent in the scheduler."
(Russinovich .)
This post started a
(now on its
Looks like the scheduler (and possibly Apache) are in for some changes.
Rik van Riel, 6 May 1999, in linuxperf
... The main bug with the web benchmark remains. The
way Linux and Apache 'cooperate', there's a lot of trouble
with the 'thundering herd' problem.
That is, when a signal comes in, all processes are woken up
and the scheduler has to select one from the dozens of new
runnable processes....
The real solution is to go from wake-all semantics to
a wake-one style so we won't have the enormous runqueues
the guy at DEC [Phillip Ezolt] experienced.
The good news is that it's a simple patch that can
probably be fixed within a few days...
Tony Gale, 6 May 1999, in linuxperf (
Apache uses file locking to serialise access to the accept call. This
can be very expensive on some systems. I haven't found the time to
run the numbers on Linux yet for the 10 or so different server
models that can be employed to see which is the most efficient. Check
Chapter 27 for details.
Andrea Arcangeli, May 12th, 1999, in linux-kernel
I released a new andrea-patch against 2.2.8. This new one has my new
wake-one on accept(2) strightforward code (but to get the improvement you
must make sure that your apache tasks are sleeping in accept(2), a strace
-p `pidof apache` should tell you that).
The patch is linked to from
David Miller's
to the above:
... on every
new TCP connection, there will be 2 spurious and unnecessary wakeups,
and these originate in the write_space socket callback because as we
free up the SYN frames we wakeup listening socket sleepers.
I've been working today on solving this very issue.
Ingo Molnar, May 13th, 1999, in linux-kernel
note that pre-2.3.1 already has a wake-one implementation for accept() ...
and more coming up.
Phillip Ezolt (ezolt@perf.), May 14th, 1999, in linux-kernel
I've been doing some more SPECWeb96 tests, and with Andrea's
patch to 2.2.8 (2.2.8_andrea1.bz)
**On identical hardware, I get web-performance nearly identical to Tru64!** ...
2.2.8_a ~4ms
Time spent in schedule has decreased, as shown by this Iprobe data:
The number of SPECWeb96 MaxOps per second have jumped has well.
**Please, put the wakeone patch into the 2.2.X kernel if it isn't already. **
Larry Sendlosky tried this patch, and :
Your 2.2.8 patch really helps apache performance on a single cpu system,
but there is really no performance improvement on a 2 cpu SMP system.
The latest version of the wake-one patch is listed
Dimitris Michailidis (), 14 May 1999, in linux-kernel
( ) -- several improvements on the 2.2.8 scheduler
Andrea Arcangeli (andrea@suse.de), 15 May 1999, in linux-kernel
) -- several improvements on the 2.2.9 buffers and scheduler, and
Andrea Arcangeli (andrea@suse.de), 21 May 1999, in linux-kernel
) -- update of same.
Might have some SMP bottleneck fixes, too.
Juergen Schmidt, May 19th, 1999, in linux-kernel (
), asked what could make Apache do poorly under SMP.
Andi Kleen :
One culprit is most likely that the data copy for TCP sending runs
completely serialized. This can be fixed by replacing the
skb->csum = csum_and_copy_from_user(from, skb_put(skb, copy), copy, 0, &err);
in tcp.c:tcp_do_sendmsg with
unlock_kernel();
skb->csum = csum_and_copy_from_user(from, skb_put(skb, copy), copy, 0, &err);
lock_kernel();
The patch does not violate any locking requirements in the kernel...
[To fix your connection refused errors,] try:
echo 32768 & /proc/sys/fs/file-max
echo 65536 & /proc/sys/fs/inode-max
Overall it should be clear that the current Linux kernel doesn't scale to
CPUs for system load (user load is fine). I blame the Linux vendors for
advertising it, although it is not true.
Work to fix all these problems is underway [2.3 will be fixed first,
then the changes will be backported to 2.2].
[Note: Andi's TCP unlocking fix appears to be in
Andrea Arcangeli
describing his own version of this fix
( 2.3.3_andrea2.bz2 ) as less cluttered:
If you look at my patch (the second one, in the first one I missed the
reaquire_kernel_lock done before returning from schedule, woops :) then
you'll see my approch to address the unlock-during-uaccess. My patch don't
change tcp/ip ext2 etc... but it touch only uaccess.h and usercopy.c. I
don't like to put unlock_kernel all over the place.
Juergen Schmidt, 26 May 1999, on linux-kernel and new-httpd,
( Linux/Apache and SMP - my fault ), retracted his earlier problem report:
I reported "disastrous" performance for Linux and Apache on a SMP
To doublecheck, I've downloaded a clean kernel source (2.2.8 and 2.2.9)
and had to realize, that those do *not* show the reported penalty when
running on SMP systems.
My error was to use the installed kernel sources (which I patched from
2.2.5 to 2.2.8 - after seeing the first very bad results). But those
sources already had been modified before the machine came to me. Should
have thrown them away in the first place :-( ...
Please take my excuses for this confusion.
Others have
(20% or so) with Andrea's SMP fix, but
only when serving largish files (100 kilobytes).
Juergen has now .
Unfortunately, he neglected to compile Apache with
significantly hurt Apache performance.
If Juergen missed that, it means it's too hard to figure out.
To make it easier to get good performance in the future,
we need the wake-one patch added to a stable kernel (say, 2.2.10),
and we need Apache's configuration script
to notice that the system is being compiled for 2.2.10 or later,
and automatically select SINGLE_LISTEN_UNSERIALIZED_ACCEPT.
Mike Whitaker (mike@altrion.org), 22 May 1999, in linuxperf
), described an interesting performance
Our typical webserver is a dual PII450 with 1G, and split httpd's, typically
300 static to serve the pages and proxy to 80-100 dynamic to serve the
mod_perl adverts. Unneeded modules are diabled and hostname lookups turned
off, as any sensible person would.
There's typically between one and three mod_perl hits/page on top of the
usual dozen or so inline images...
The kernel (2.2.7) has MAX_TASKS upped to 4090, and the
unlock_kernel/lock_kernel around csum_and_copy_from_user() in tcp_do_sendmsg
that Andi Kleen suggested.
Performance is .. interesting. Load on the machine fluctuates between 10 and
120, while the user CPU goes from 15% (80% idle) to 180% (0% idle, machine
*crawling*), about once every minute and a half. vmstat shows the number of
processes in a run state to range from 0 (when load is low) to 30-40, and
the static servers manage a mighty 60-70 peak hits/sec. Without the dynamic
httpd's everything *flies*...
After being advised to try a kernel with wake-one support,
We're up with 2.3.3 plus Andi Kleen's tcp_do_sendmsg patch plus Apache
sleeping in accept() on one production server, and comparing it against a
2.2.7 plus tcp_do_sendmsg patch plus Apache sleeping in flock(). Identical
systems (dual PII450, 1G, two disk controllers).
As far as I can *tell*, the wake-one patch is definitely doing its stuff:
the 2.2.7 machine still has cycles of load into three figures, and the 2.3.3
machine hasn't actully managed a load of 1 yet.
UNFORTUNATELY, observation suggests that the 2.3.3 machine/Apache
combination is dropping/ignoring about one connection in ten, maybe more.
(Network error: connection reset by peer.)
More progress from the bleeding edge:
(Reminder: the config here is split static/mod_perl httpd's, with a pretty
CPU-intensive mod_perl script serving ads as an SSI as the probable
bottleneck)
Linux kernel 2.2.9 plus the
(wake-one) patch seems to do the trick:
can handle hits at a speed which suggests it's pushing the adverser to close
to its observed maximum. (As I said in a previous note, avoid 2.2.8 like the
plague: it trashes HDs - see threads on linux-kernel for details.)
However...
When it *does* get overstressed, BOY does it get overstressed.
Once the idle CPU drops to zero (i.e. its spending most of its time
processing advert requests, everything goes unpleasantly pearshaped, with a
load of 400+, and the number of httpd's on both types of server *well* above
MaxClients (in fact, suspiciously close to MaxClients + MinSpareServers).
Spikes in demand can cause this, and once you get into this state, getting
out again under the load of prgressively more backlogging requests is not
easy: in fact from experience the only way is to which the machine out of
the (hopefully short TTL) DNS round-robin while it dies down.
The potentially counterintuitive step at this point is to *REDUCE*
MaxClients, and hope that the tcp Listen queue will handle a load surge.
Experience suggests this does in fact work.
(Aside: this is a perfect case for using something like
's load balancing DNS).
Eric Hicks, 26 May 1999, in linux-kernel
... I'm having some big problems in which
it appears that a single PII 400Mhz or a single AMD 400 will outrun a dual
PII 450 at http requests from Apache. ...
Data for HTTP Server Tests: 100 1MByte mpeg files stored on local disks.
AMD 400Mghz K6, 128MB, Linux 2.0.36; handles 1000 simultaneous clients @ 57.6Kbits/sec.
PII 400Mghz, 512MB, Linux 2.0.36; handles 1000 simultaneous clients @ 57.6Kbits/sec.
Dual PII/450Mghz, 512MB, Linux 2.0.36 and 2.2.8; handles far less than
300 simultaneous @57.6Kbits/sec [and even then, clients were seeing 5 second connection delays, and 'reset by peer' and 'connection time out' errors]
I advised him to try
he said he'd try it and report back.
the Mindcraft benchmark's use of four Fast Ethernet
cards and a quad SMP system exposes a bottleneck in Linux's interrupt
the kernel spent a lot of time in synchronize_bh().
(A single Gigabit Ethernet card would stress this bottleneck much less.)
although he hasn't tried it with multiple Ethernets yet.
See also comments on reducing interrupts under heavy load by
where he talks about the Mindcraft benchmark and SMP scalability.
See also .
Softnet is coming! Kernel 2.3.43 adds the new
networking changes.
Softnet changes the interface to the networking cards,
so every single driver needs updating, but in return network performance should scale
much better on large SMP systems.
(For more info, see
about how to convert old drivers.)
The Feb '00 thread
(especially its )
has lots of interesting tidbits about how what interrupt (and other) bottlenecks remain, and how they are
being addressed in the 2.3 kernel.
Ingo Molnar's
describes interrupt-handling improvements in the IA32 code in great
These will be moving into the core kernel in 2.5, it seems.
on 29 June 1999:
After upgrading to the 2.2 series we have from time to time
experienced severe slow-downs on the TCP performance...
The performance goes back to normal when I take down the interface and
reinsert the eepro100 module into the kernel.
After I've done that,
the performance is fine for a couple of days or maybe weeks.
on 29 June 1999:
I've got 3 machines running 2.2.10 [with multiple] 3COM 3C905/905b PCI [cards]...
After approximately 2 days of uptime, I will start to see ping times on
the local lan jump to 7-20 seconds.
As others have noticed, there is no
loss -- just some damn high latency.
It seems to be dependant upon the network load -- lighter loads lead to
longer periods between problems.
The problem ALSO is gradual -- it'll
start at 4 second pings, then 7 second pings about 20 minutes later,
than 30 minutes later it's up to 12-20 seconds.
Less repeatable.
David Stahl wrote on 13 July 1999:
What DID fix the problem was a private reply from someone
elese (sorry about the credit, but i'm not in the mood to sieve 10k emails
right now), to try the alpha version of the latest 3c59x.c driver from
Donald Becker ().
3c59x.c:v0.99L 5/28/99
is the version that fixed it, from
On 23 Sep 1999,
that clears up a similar mysterious slowdown.
2.2.13 and Red Hat 6.1 already have this patch applied.
On three Red Hat 6.0 systems I know of with Masq support compiled in,
connected to cable modems, this patch fixed a bug which caused very high pings after even
short bursts of heavy TCP transfers to distant hosts.
reported about October 21st on linux-kernel that
that although Alexey's patch greatly
improved the problem, it is not totally gone.
is also seeing occasional long delays with 2.2.13.
the replies say it's likely caused
by a particular Tulip driver.
Petru Paler, July 10 1999, in linux-kernel (
) reported that any kind of TCP connection between Linux
(2.2.10) and a NT Server 4 (Service Pack 5) slows down to a crawl.
The problem was much milder (6kbytes/sec) with 2.0.37.
He included a
log of a slow connection made with tcpdump, which
helped Andi Kleen see .
Solved: false alarm!
It wasn't Linux' fault at all.
NT needed to be told to not use full duplex mode on the ethernet card.
Phil Ezolt, 22 Jan 2000, in linux-kernel (
When I run SPECWeb96 tests here, I see both a large number of running
process and a huge number of context switches. ...
Here's a sample of the vmstat data:
procs memory swap io system cpu
r b w swpd free
si so bi bo
Notice. 24 running process and ~7000 context switches.
That is a lot of overhead. Every second, 7000*24 goodnesses are calculated.
Not the (20*3) that a desktop system sees. This is a scalability issue.
A better scheduler means better scalability.
Don't tell me benchmark data is useless. Unless you can give me data
using a real system and where it's faults are, benchmark data is all we
SPECWeb96 pushes Linux until it bleeds. I'm telling you where it
bleeds. You can fix it or bury your head in the sand. It might not
be what your system is seeing today, but it will be in the future.
Would you rather fix it now or wait until someone else how thrown down
the performance gauntelet?
Here's a juicy tidbit.
During my runs, I see 98% contention on the
[2.2.14] kernel lock, and it is accessed ALOT.
I don't know how 2.3.40
compares, because I don't have big memory support for it.
Hopefully,
Andrea will be kind enough give me a patch, and then I can see if things
have improved.
[Phil's data is for the web server undergoing the SPECWeb96 test,
which is an ES40 4 CPU alpha EV6 running Redhat 6.0 w/kernel v2.2.14
and Apache-v1.3.9 w/SGI the interfaces
receiving the load are two ACENic gigabit ethernet cards.]
Manfred Spraul, April 21, 2000, in linux-kernel (
kumon@flab.fujitsu.co.jp noticed that select() caused a high contention
for the kernel lock, so here is a patch that removes lock_kernel() from
[tested] with 2.3.99-pre5.
There was some discussion about whether this was wise at this late date,
but Linus and David Miller were enthusiastic.
Looks like one more bottleneck
bites the dust.
On 26 April 2000, kumon@flab.fujitsu.co.jp
with and without the
lock_kernel() in poll().
The followups included
a kernel patch to improve checksum performance and
to force it to align its buffers to 32-word boundaries.
The latter
patch, by Dean Gaudet, earned praise from Linus, who relayed rumors that
this can speed up SPECWeb results by 3%.
This was an interesting thread.
See also ,
and the , in which Kumon presents
some benchmark results and another patch.
kumon@flab.fujitsu.co.jp, 19 May 2000, in linux-kernel (
) reports a 3% reduction in total CPU time compared to 2.3.99-pre8
on i686 by optimizing the cache behavior of csum_partial_copy_generic.
The workload was .
The benchmark we used has almost same setting as the MINDCRAFT ones,
but the apache setting is [changed] slightly not to use symlink checking.
We used maximum of 24 independent clients and number of apache
processes is 16.
A four-way XEON procesor system is used, and the
performance is twice and more than a single CPU performance.
Note that in
a 4 CPU system only achieved a 1.5x speedup over a single CPU.
reporting a & 2x speedup.
This appears to be about the same speedup NT 4.0sp3 achieved with 4 CPUs
at that number of clients (24).
It's encouraging to hear that things may have improved in the 11 months since the
2.2.6 tests.
When I asked him about this, Kumon said
Major improvement is between pre3 and pre5, poll optimization.
pre4 (I forget exact version), kernel-lock prevents performance
improvement.
If you can retrieve l-k mails around Apr 20-25, the following mails
will help you understand the background.
On 4 Sept 2000, kumon ,
noting that his change still hadn't made it into the kernel.
On 22 May 2000, Manfred Spraul
which optimized kmalloc(), getname(), and select() a bit, speeding up apache
by about 1.5% on 2.3.99-pre8.
On 30 May 2000,
that got rid of a big lock in close_flip() and _fput(), and asked for testing.
I measured viro's ac6-D patch with WebBench on 4cpu Xeon system.
I applied to 2.4.0-test1 not ac6.
The patch reduced 50% of stext_lock time and 4% of the total OS
Some part of kmalloc/kfree overhead is come from do_select, and it is
easily eliminated using small array on a stack.
kumon then
that avoids kmalloc/kfree in select() and poll()
when # of fd's involved is under 64.
On 20 July 2000, Robert Cohen (robert@coorong.anu.edu.au)
listing netatalk (appletalk file sharing) benchmarks comparing 2.0, 2.2, and several versions of 2.4.0-pre.
The elevator code in 2.4 seems to help (some versions of 2.4 can handle 5 benchmark clients instead of 2)
The more recent test4 and test5pre2 don't fair quite so well.
They handle 2 clients on a 128 Meg server fine, so they're doing better
than 2.2 but they choke and go seek bound with 4 clients.
So something has definitely taken a turn for the worse since test1-ac22.
The *only* 2.4 kernel versions that could handle 5 clients
were 2.4.0-test1-ac22-riel and 2.4.0-test1-ac22-class 5+; everything before
and after (up to 2.4.0-test5pre4) can only handle 2.
On 26 Sept 2000, Robert Cohen posted
which included a simple program to demonstrate the problem, which appears to be in the elevator code.
responded that he and Andrea had a patch almost ready for 2.4.0-test9-pre5 that fixes
this problem.
On 4 Oct 2000, Robert Cohen posted
with benchmark results for many kernels, showing that the problem still exists in
2.4.0-test9.
On 18 Sept 2000, Jamal (hadi@cyberus.ca)
describing proposed changes to the 2.4 kernel's netw
the changes add hardware flow control and several other refinements.
Robert Olson and I decided after the OLS that we were going to try to
hit the 100Mbps(148.8Kpps) routing peak by year end. I am afraid the
bar has been raised. Robert is already hitting with 2.4.0-test7 ~148Kpps
with a ASUS CUBX motherboard carrying PIII 700 MHZ coppermine with
about 65% CPU utilization.
With a single PII based Dell machine i was able to get a consistent value of
So the new goal is to go to about 500K-> (maybe not by year end, but
surely by that next random Linux hacker conference)
A sample modified tulip driver (hacked by Alexey for 2.2 and mod'ed by Robert
and myself over a period of time) is supplied as an example on how to use the
feedback values.
I believe we could have done better with the mindcraft tests with these
changes in 2.2 (and HW FC turned on).
BTW, I am informed that Linux people were _not_ allowed to change the
hardware for those tests, so I dont think they could have used these
changes if they were available back then.
On 30 March 2000, Takashi Richard Horikawa
listing SPECWeb96 results for both the 2.2.14 and 2.3.41.
Performance between a 2.2.14 client and a 2.2.14 server was poor
because few enough ports were being used that ports were not done
with TIME_WAIT by the time that port number was needed again for a new
connection.
The moral of the story may be
to tune the client and servers to use as large a port range as possible,
& /proc/sys/net/ipv4/ip_local_port_range
to avoid bumping into this situation when trying to simulate large
numbers of clients with a small number of client machines.
On 2 April 2000, Mr. Horikawa
that increasing the local port range with the above command solved the
Become familliar with
as well as the Linux newsgroups on Usenet (try
in forums matching '*linux*').
Post your proposed configuration and see whether people agree with it.
Also, be open
post intermediate results, and
see if anyone has suggestions for improvements.
You should probably
expect to spend a week or so mulling over ideas with these mailing lists
during the course of your tests.
If possible, use a modern benchmark like
rather than
the simple ones used by Mindcraft.
It might be interesting to
into the path between the server and the clients to more realistically
model the situation on the Internet.
Benchmark both single and multiple CPUs,
and single and multiple Ethernet interfaces, if possible.
Be aware that the networking performance of version 2.2.x of the Linux kernel
does not scale well as you add more CPUs and Ethernet cards.
This applies mostly to static pages and
noncached dynamic pages usually take a fair bit of CPU
time, and should scale very well when you add CPUs.
If possible, use a cache to save comm
this will bring the dynamic page speeds closer to the static page speeds.
When testing dynamic content:
Don't use the old model of running a
separate proc nobody running a big web site
uses that interface anymore, as it's too slow.
Always use a modern
dynamic content generation interface (e.g. mod_perl for Apache).
Configuring Linux
Tuning problems probably resulted in less than 20% performance decrease
in Mindcraft's test, so as of 3 October 1999, most people will be happy with
a stock 2.2.13 kernel or whatever comes with Red Hat 6.1.
The 2.4 kernel, when it's available, will help with SMP performance.
Here are some notes if you want to see what people going for the utmost
were trying in June:
As of June 1, Linux kernel 2.2.9 plus
have been mentioned as performing well on a dual-processor task (see above).
(2.2.9_andrea3 seems to include both a wake-one scheduler fix as well
as an SMP unlock_kernel fix.)
(andrea3 only works on x86, I hear, so people with
Alphas or PPC's will have to apply some other wake-one and tcp copy kernel_unlock patch.)
Jan Gruber writes: "the 2.2.9_andrea3-patch doesn't
compile with SMP Support disabled.
Andrea told me to use
On 7 June, Andrea Arcangeli asked:
If you are going to do bench I would like if you would bench also the
patch below.
On 11 Oct 1999, , waiting to go into 2.2.13 or so.
This includes several that
might help performance of SMP systems and systems undergoing heavy I/O.
You might consider trying these if you run into bottlenecks.
The truly daring may wish to try using the kernel-mode http server,
as a front-end for Apache.
It accellerates static web page fetches
It's at version 0.1, so use caution.
linux-kernel (
) is currently (8 June 1999) discussing benchmarking Apache.
or something like it, and points out that NT is doing the same kind of thing.
Configuring Apache
The usual optimizations should be applied (all unused modules should
be left out when compiling, host name lookup should be disabled,
and symbolic link
Apache should be compiled to block in accept, e.g.
env CFLAGS='-DSINGLE_LISTEN_UNSERIALIZED_ACCEPT' ./configure
may be worth applying.
PC Week used top_fuel in their recent benchmarks.
(See also interesting comments by Dean Gaudet in
Supposedly, applying top_fuel.patch and using mod_mmap_static on a set of
documents can reduce the number of syscalls per request from 18 to 9.
For static file benchmarks, try compiling mod_mmap_static into Apache
and configuring Apache to memory-map the static documents, e.g.
by creating a config file like this:
find /www/htdocs -type f -print | sed -e 's/.*/mmapfile &/' & mmap.conf
and including mmap.conf in your Apache config file.
Several people have mentioned that using Squid as a front-end to
Apache would greatly accellerate static web page fetches.
A few Usenet posts showing people experiencing slowness with Apache or Linux:
"...when we run WebBench to test the requests/sec and total
throughput, Microsoft IIS 4 is 3 times faster for both Linux and Mac OS X."
"Why are you surprised?
I thought it was common knowledge that Apache
I haven't tested IIS, but I did compare Apache against a number
of other servers last year and found a bunch that were three or four
times faster."
Ways to profile the kernel:
- tools to measure SMP spinlock contention.
comparing 2.2 to 2.3.
- for 2.1.x
Christoph Lameter's perfstat patch, at Captech's
-- see also their 25 Oct 99 post on linuxperf
Ways to profile user programs:
The old favorite: compile with -pg, and analyze gmon.out with gprof.
Supports 2.3.22 and 2.2.13.
Includes list of other related tools.
by David Mentr

我要回帖

更多关于 by2整容媒体道歉 的文章

 

随机推荐