Contributed by merdely on from the shootdown-at-the-tlb-corral dept.
The Translation Lookaside Buffer (TLB) is cache in the CPU that maps virtual page addresses to physical page addresses. It prevents the CPU from having to go all the way out to the page table.
A TLB Shootdown occurs when a process restricts access to a page in shared memory and must interrupt processes using that memory space on other processors so they flush their TLB tables.
During the hackathon, art@ committed a change that's a simplified, faster version of the shootdown code which gives a 15% reduction in system time on Art's dual-core laptop.
The commit message is below.
CVSROOT: /cvs Module name: src Changes by: art a t cvs openbsd org 2007/05/25 09:55:27 Modified files: sys/arch/i386/i386: apicvec.s ipifuncs.c lapic.c lock_machdep.c machdep.c pmap.c vm_machdep.c sys/arch/i386/include: atomic.h i82489var.h intr.h pmap.h Log message: Replace the overdesigned and overcomplicated tlb shootdown code with very simple and dumb fast tlb IPI handlers that have in the order of the same amount of instructions as the old code had function calls. All TLB shootdowns are reorganized so that we always shoot the[m], without looking at PG_U and when we're shooting a range (primarily in pmap_remove), we shoot the range when there are 32 or less pages in it, otherwise we just nuke the whole TLB (this might need tweaking if someone is interested in micro-optimization). The IPIs are not handled through the normal interrupt vectoring code, they are not blockable and they only shoot one page or a range of pages or the whole tlb. This gives a 15% reduction in system time on my dual-core laptop during a kernel compile and an 18% reduction in real time on a quad machine doing bulk ports build. Tested by many, in snaps for a week, no slowdowns reported (although not everyone is seeing such huge wins).The massive speed improvements we've seen in different parts of OpenBSD through the hackathon will certainly make OpenBSD 4.2 an interesting upgrade.
(Comments are closed)
By Anonymous Coward (156.34.75.11) on
Comments
By Noryungi (noryungi) on
Well, yes, I do believe it's mostly SMP, since the 15% improvement has been measured on a dual-core laptop...
Comments
By scot bontrager (216.62.11.163) scot@indievisible.org on
>
> Well, yes, I do believe it's mostly SMP, since the 15% improvement has been measured on a dual-core laptop...
>
>
with this change, make build (wall-clock time) went from 1:12:00 to 1:02:00 on my 2x amd64 (1.6Ghz) system. I was happy back when the lockmgr/simplelock changes started and build time came down from 1:19:00. When they can shave another 3 minutes off and I can do a make build in less than an hour, I'll be very happy.
Between this change and the other work done at hackathon, CPU usage is hovering right at 0.1% on this system, before it was 3-4%. If only my CPU's (Opteron 242's) supported PowerNow!, I would throttle them back so I could save on my electrictiy bill!
Good going all!
FFS2 was giving fits a few weeks back, but it seems much better now. I've only been using it for /usr/obj, but the last few builds have been solid. I was hoping FFS2 would be faster than FFS, but I don't see any measurable improvement there (I know there isn't suppose to be either). Once the last few userland bits get finished I'll switch /usr/src over as well.
Comments
By Anonymous Coward (63.237.125.20) on
Have you ever timed make build on this system in single processor mode? Just wondering how much performance the SMP adds.
Comments
By scot bontrager (216.62.11.163) on
>
> Have you ever timed make build on this system in single processor mode? Just wondering how much performance the SMP adds.
2547.021u 672.352s 59:13.29 90.6% 0+0k 88260+245879io 159172pf+0w
59 minutes using a non-SMP kernel! It took 3 minutes LONGER using the SMP! That seems odd. "make build" is a mostly linear process, so I can understand why SMP doesn't gain much, but why does it cost so much more? (make -j 2 deadlocks in a hurry so I've not even tried that in years).
I'll newfs /usr/obj and rerun this test just to make sure.
Comments
By scot bontrager (216.62.11.163) on
> >
> > Have you ever timed make build on this system in single processor mode? Just wondering how much performance the SMP adds.
>
> 2547.021u 672.352s 59:13.29 90.6% 0+0k 88260+245879io 159172pf+0w
>
> 59 minutes using a non-SMP kernel! It took 3 minutes LONGER using the SMP! That seems odd. "make build" is a mostly linear process, so I can understand why SMP doesn't gain much, but why does it cost so much more? (make -j 2 deadlocks in a hurry so I've not even tried that in years).
>
> I'll newfs /usr/obj and rerun this test just to make sure.
>
>
try 2, clean /usr/obj and rm /tmp/ac.cache
2537.256u 679.384s 59:01.10 90.8% 0+0k 78968+242154io 148936pf+0w
By tedu (69.12.168.115) on
By sthen (85.158.44.149) on
They should do; have you seen this?
Comments
By sthen (85.158.44.149) on
>
> They should do; have you seen
...ah, sorry... I meant this but it seems Opteron [128]4[02] don't have a lower p-state listed in the AMD documentation.
By Bret Lambert (tbert) on
I'm not a hardware guy, but the TLB exists in UP systems as well. There may not be as much of a performance enhancement if you're not shuttling processes between CPUs, but you'll still get some benefit from the faster code.
By Anonymous Coward (70.67.139.183) on
Comments
By art (213.0.113.90) on
Actually, the code was written way before the hackathon. I had the first prototype out for testing several months ago, it just didn't work correctly at first (because of other bugs, I might add).