ChangeSet@1.1626, 2004-05-11 07:54:30-07:00, geert@linux-m68k.org [PATCH] M68k superfluous whitespace M68k: Remove superfluous whitespace that hurts my eyes with `let c_space_errors=1' in vim. This includes correcting trailing whitespace and spaces in front of tabs. `diff -urNbB' shows no difference before/after. ChangeSet@1.1625, 2004-05-10 22:12:24-04:00, jgarzik@redhat.com [libata] Maintainer annotations In MAINTAINERS and in individual low-level drivers. ChangeSet@1.1624, 2004-05-10 21:51:40-04:00, jgarzik@redhat.com [libata] preparation for writeback caching support * bug fix: make sure 'nsect' member of struct ata_queued_cmd is initialized each time a cmd is re-used. Only affects PIO data xfers, which nobody uses. * slightly change the way a device's flags are printed out. currently the only flag is 'lba48', but soon 'wcache' will appear also. * add WB-cache-related constants and macros to linux/ata.h ChangeSet@1.1623, 2004-05-10 16:54:27-07:00, eger@havoc.gtf.org [PATCH] radeon: fix overlapping copyarea This fixes a corruption problem with overlapping copyarea()'s in the radeon driver. ChangeSet@1.1622, 2004-05-10 16:54:17-07:00, paulus@samba.org [PATCH] ppc64: extra barrier in I/O operations At the moment, on PPC64, the instruction we use for wmb() doesn't order cacheable stores vs. non-cacheable stores. (It does order cacheable vs. cacheable and non-cacheable vs. non-cacheable.) This causes problems in the sort of driver code that writes stuff into memory, does a wmb(), then a writel to the device to start a DMA operation to read the stuff it has just written to memory. This patch solves the problem by adding a sync instruction before the store in the write* and out* macros. The sync is a full barrier that orders all loads and stores, cacheable or not. The patch also moves the eieio instruction that we had after the store to before the load in the read* and in* macros. With the sync before the store, we don't need an eieio as well in a sequence of stores, but we still need an eieio between a store and a load. I think it is better to do this than to turn wmb() into a full memory barrier (a sync instruction) because the full barrier is slow and isn't needed with the sync in the write*/out* macros. This way, write*/out* are fully ordered with respect to preceding loads and stores, which is what driver writers expect, and we avoid penalizing users of wmb() who are only doing cacheable stores. ChangeSet@1.1620, 2004-05-10 16:30:02-07:00, willy@debian.org [PATCH] PA-RISC updates for 2.6.6 - Split PA7300LC from PA7100LC (Matthew Wilcox) - Handle 32-bit firmware and 64-bit kernel at runtime (Ryan Bradetich) - Fix building in a separate tree (Matthew Wilcox) - Update defconfigs (Randolph Chung) - Make WCHAN work (Randolph Chung) - Initial support for SMP in 2.6 (Grant Grundler) - Use 8-byte PTEs on 32-bit kernels (James Bottomley) - Implement L2/L3 hybrid page tables for 64 bit kernels (James Bottomley) - Support 8TB of physical and virtual address space (James Bottomley) - Macro'ise the tlb miss handlers (James Bottomley) - Check the ptrace flags correctly in the syscall return path (Randolph Chung) - Eliminate many magic numbers (James Bottomley) - Work around linker bug in vmlinux.lds.S (James Bottomley) - Many cache flushing fixes (James Bottomley) - first baby step for PA8800 support (Grant Grundler) - Self-aligning spinlocks (Randolph Chung) ChangeSet@1.1619, 2004-05-10 16:25:28-07:00, geert@linux-m68k.org [PATCH] M68k missing M68k: needs include for __attribute_const__ (from Richard Zidlicky) ChangeSet@1.1618, 2004-05-10 16:25:18-07:00, geert@linux-m68k.org [PATCH] Sun3x dummycon Sun3x: Like most other platforms, Sun3x needs conswitchp set if CONFIG_DUMMY_CONSOLE is defined (from Sam Creasey) ChangeSet@1.1617, 2004-05-10 16:24:30-07:00, torvalds@ppc970.osdl.org Merge bk://gkernel.bkbits.net/libata-2.6 into ppc970.osdl.org:/home/torvalds/v2.6/linux ChangeSet@1.1615, 2004-05-10 16:13:18-07:00, viro@parcelfarce.linux.theplanet.co.uk [PATCH] ntfs cleanup ntfs_fill_super() and ntfs_read_inode_mount() cleaned up. Removed the kludges around the first iget() on NTFS. Instead of playing with (re)setting ->s_op we have the MFT_FILE inode set up by explicit new_inode()/ set ->i_ino/insert_inode_hash()/call ntfs_read_inode_mount() directly. That kills the need of second super_operations and it allows to return error from ntfs_read_inode_mount() without resorting to ugly "poisoning" tricks. ChangeSet@1.1614, 2004-05-10 16:10:46-07:00, torvalds@ppc970.osdl.org Merge bk://linux-scsi.bkbits.net/scsi-for-linus-2.6 into ppc970.osdl.org:/home/torvalds/v2.6/linux ChangeSet@1.1371.762.51, 2004-05-10 17:29:29-05:00, jejb@mulgrave.(none) qla2100 fabric fixes From: "Andrew Vasquez" Ok, well there aren't too many folks using an QLA2100 in a fabric topology, if there were, they wouldn't have gotten very far in the driver load sequence. I've been able to scrape-up a QLA2100, 1Gig switch, and an JBOD. Upon loading the 8.00.00b12k driver, the firmware successfully logs into the switch, the driver receives a LOOP_UP event, but, the kernel panics due to NULL pointer dereference while trying to perform an RFT_ID -- the attached patch against current scsi-misc-2.6 fixes that problem. ChangeSet@1.1371.762.50, 2004-05-10 16:40:22-05:00, James.Bottomley@steeleye.com [PATCH] fix LLD module refcounting in sr.c The patch to close all the open/close/hotplug races in sr left the module refcounting broken so that the ULD housing the CD device now can't be removed until the device itself is removed. This patch (structurally identical to the one for sd.c to perform the same function) fixes the module refcounting. ChangeSet@1.1608.6.149, 2004-05-10 14:25:52-07:00, akpm@osdl.org [PATCH] get_thread_area macro fixes From: Adam Lackorzynski one of the macros for get_thread_area extracts the wrong bit. The "32bit" field is in bit 22, not 23 (as can be seen in desc.h). [ Fix ia64/x86-64 too, while we're at it. Linus ] ChangeSet@1.1608.6.148, 2004-05-10 14:25:41-07:00, akpm@osdl.org [PATCH] Add SMT setup for domain scheduler on x86-64 From: Andi Kleen Set up SMT for the domain scheduler on x86-64. This way the scheduling works better on HyperThreading aware systems; in particular it will use both physical CPUs before sharing two virtual CPUs on the same package. This improves performance considerably in some cases. Based on the i386 code and a previous patch from Suresh B. Siddha. ChangeSet@1.1608.6.147, 2004-05-10 14:25:30-07:00, akpm@osdl.org [PATCH] x86-64: convert sibling map to masks From: Andi Kleen From: Suresh B. Siddha Convert sibling map on x86-64 to cpumasks. This is needed for the SMT patches. ChangeSet@1.1608.6.146, 2004-05-10 14:14:39-07:00, torvalds@ppc970.osdl.org Remove intermezzo, per instructions from Peter Braam. ChangeSet@1.1608.6.145, 2004-05-10 14:12:21-07:00, akpm@osdl.org [PATCH] Fix __down Tainting Kernel with CONFIG_MODVERSIONS=y From: Rusty Russell PowerPC64 ABI has ".funcname" (the actual function) and "funcname" (the function descriptor) and we strip off the dots in "dedotify" called from module_frob_arch_sections(). We need to also de-dotify the corresponding names in the __version section. Actually has nothing to do with __down, it's just that we only print the first symbol whose version is missing. ChangeSet@1.1608.6.144, 2004-05-10 14:12:10-07:00, akpm@osdl.org [PATCH] PPC termio fix From: Paul Mackerras It turns out that we are not handling the TABDLY bits of the termios c_oflag field correctly on PPC, PPC64 and Alpha. These three architectures have a value for XTABS that is different from the TAB3 value. POSIX specifies that setting the TABDLY field to TAB3 should result in tabs being expanded to spaces. In n_tty.c:opost() we check for O_TABDLY(tty) == XTABS, which is fine on most architectures because they have XTABS == TAB3. I think the right thing to do is just to change the definition of XTABS to be the same as TAB3 on these architectures. The patch below does this for PPC and PPC64 (and I suggest the Alpha maintainer should do the same). At the moment, applications using either the XTABS or TAB3 values won't get the expected behaviour. With this patch, apps that use TAB3 will get the expected behaviour. Apps that use XTABS will need to be recompiled (but note that the POSIX-specified name to use is TAB3 not XTABS). ChangeSet@1.1608.6.143, 2004-05-10 14:12:00-07:00, akpm@osdl.org [PATCH] remove intermezzo Peter Braam said: I would just like to say that I have no difficulties with intermezzo being rm -rf'd. There are probably only a handful of users. In the past 4 years nobody has supported InterMezzo sufficiently for it to become successful. I have been fortunate to get really good support for the Lustre project. So I have focussed on that. Lustre 1.X has become really solid. The disconnected operation, caching and mirroring functionality of InterMezzo will become available in Lustre as a new feature in version 2. So I see no point in keeping InterMezzo if it is a nuisance. The patch removes the references to intermezzo. Please do a `bk rm' of fs/intermezzo. ChangeSet@1.1608.6.142, 2004-05-10 14:11:49-07:00, akpm@osdl.org [PATCH] make tags for selinux From: Olaf Hering make tags skips security/selinux/include because of find . -name include -prune This patch does just add it later. No idea if it can be done better. ChangeSet@1.1608.6.141, 2004-05-10 14:11:38-07:00, akpm@osdl.org [PATCH] fix some typos in sound docs From: Christoph Hellwig (partially from the debian kernel tree) ChangeSet@1.1608.6.140, 2004-05-10 14:11:28-07:00, akpm@osdl.org [PATCH] telephony/ixj.h: remove kernel 2.2 #ifdef's From: Adrian Bunk The patch below removes two #ifdef's for kernel 2.2 from linux-2.6.2-mm1/drivers/telephony/ixj.h ChangeSet@1.1608.6.139, 2004-05-10 14:11:17-07:00, akpm@osdl.org [PATCH] remove kernel 2.2 code from drivers/net/hamradio/dmascc.c From: Adrian Bunk The patch below removes some #ifdef'd kernel 2.2 code from drivers/net/hamradio/dmascc.c. ChangeSet@1.1608.6.138, 2004-05-10 14:11:07-07:00, akpm@osdl.org [PATCH] Crystal cs4235 mixer fix From: Joseph Parmelee Fixes improper setup of the mixer on Crystal soundcards with the CS4235 chip. ChangeSet@1.1608.6.137, 2004-05-10 14:10:56-07:00, akpm@osdl.org [PATCH] export con_set_default_unimap() fbcon needs this symbol. ChangeSet@1.1608.6.136, 2004-05-10 14:10:46-07:00, akpm@osdl.org [PATCH] Make usermodehelper_init() use core_initcall() We may as well make usermodehelper_init() core_initcall as well, to make sure its services are avaialble to all the other initcall levels. ChangeSet@1.1608.6.135, 2004-05-10 14:10:35-07:00, akpm@osdl.org [PATCH] use core_initcall for binfmt initialisation We need to register the binfmts earlier, so normal initcalls can successfully run call_usermodehelper() to execute things. ChangeSet@1.1608.6.134, 2004-05-10 14:10:24-07:00, akpm@osdl.org [PATCH] minor RCU optimization From: Stephen Hemminger Minor tweak to rcu, use __list_splice instead of list_splice because the list has already been checked for empty. ChangeSet@1.1608.6.133, 2004-05-10 14:10:14-07:00, akpm@osdl.org [PATCH] remove MOD_INC_USE_COUNT usage in arch/um/drivers/harddog_kern.c From: Christoph Hellwig ->open already has a reference so use __module_get. The file has no maintainer noted in it, all credits are from the driver it's copied from. ChangeSet@1.1608.6.132, 2004-05-10 14:10:03-07:00, akpm@osdl.org [PATCH] fix MOD_INC_USE_COUNT usage in mtd From: Christoph Hellwig mtd driver need to get another reference if ->probe succeeds (strange design if you ask me, but what the heck..), and while most drivers have been switched to __module_get already two are still missing. ChangeSet@1.1608.6.131, 2004-05-10 14:09:53-07:00, akpm@osdl.org [PATCH] drivers/video/* MOD_INC_USE_COUNT fixes From: Christoph Hellwig A bunch of framebuffer drivers use MOD_INC_USE_COUNT to prevent themselves from unloading completely - but we have a much easier way to do so, that is simply removing the module_exit/cleanup_module handler. ChangeSet@1.1608.6.130, 2004-05-10 14:09:42-07:00, akpm@osdl.org [PATCH] fix MOD_{INC,DEC}_USE_COUNT gunk in arch/um/drivers/net_kern.c From: Christoph Hellwig Well, UML is pretty out of date in mainline, but I'd like to squash the last users of said beasts rather sooner than later. ChangeSet@1.1608.6.129, 2004-05-10 14:09:31-07:00, akpm@osdl.org [PATCH] kill MOD_{INC,DEC}_USE_COUNT gunk in arch/cris/arch-v10/drivers/pcf8563.c From: Christoph Hellwig Driver already sets fops->owner so the open/close methods are entirely superflous. ChangeSet@1.1608.6.128, 2004-05-10 14:09:21-07:00, akpm@osdl.org [PATCH] kill useless MOD_{INC,DEC}_USE_COUNT in sound/oss/msnd.c From: Christoph Hellwig Callers are exported register/unregister handlers so the module is locked in core by users of said exports. ChangeSet@1.1608.6.127, 2004-05-10 14:09:10-07:00, akpm@osdl.org [PATCH] cpqarray update for 2.6 From: This patch fixes 2 minor issues that break our Array Configuration utility. my_io was changed to a pointer so the & had to removed when using it with copy_to_user(). Sometime in 2.5 SG_MAX got changed to 31. Maybe to copy cciss? Now I'm changing it back to 32 so our app can work. ChangeSet@1.1608.6.126, 2004-05-10 14:09:00-07:00, akpm@osdl.org [PATCH] Add sysctl to define a hugetlb-capable group From: "Chen, Kenneth W" , "Seth, Rohit" This patch addresses the longstanding problem wherein Oracle needs CAP_IPC_LOCK to allocate SHM_HUGETLB shm memory, but people don't want to run Oracle as root, and capabilties are busted. Various ideas with rlimits didn't work out, mainly because these objects live beyond the lifetime of the user processes which establish them. What we do is to create root-writeable /proc/sys/vm/hugetlb_shm_group which specifies a single group ID. Users who belong to that group may allocate hugepages for SHM_HUGETLB shm segments. So the sysadmin will greate a new group, say `hugepageusers', will add the oracle user to that group and will write that group's ID into /proc/sys/vm/hugetlb_shm_group. ChangeSet@1.1608.6.125, 2004-05-10 14:08:49-07:00, akpm@osdl.org [PATCH] hugepage: fix add_to_page_cache() error handling From: David Gibson add_to_page_cache() locks the given page if and only if it suceeds. The hugepage code (every arch), however, does an unlock_page() after add_to_page_cache() before checking the return code, which could trip the BUG() in unlock_page() if add_to_page_cache() failed. In practice we've never hit this bug, because the only ways add_to_page_cache() can fail are when we fail to allocate a radix tree node (very rare), or when there is already a page at that offset in the radix tree, which never happens during prefault, obviously. We should probably fix it anyway, though. The analagous bug in some of the patches floating about to demand-allocation of hugepages is more of a problem, because multiple processes can race to instantiate a particular page in the radix tree - that's been hit at least once (which is how I found this). ChangeSet@1.1608.6.124, 2004-05-10 14:08:38-07:00, akpm@osdl.org [PATCH] fix wrong var used in hotplug/shpchp_ctrl.c. From: "Luiz Fernando N. Capitulino" Zhenmin's checker tool detected this: 9. /drivers/pci/hotplug/shpchp_ctrl.c, Line 1575: err("%s: Failed to disable slot, error code(%d)\n", __FUNCTION__, rc); Maybe change to: err("%s: Failed to disable slot, error code(%d)\n", __FUNCTION__, retval); I think it is right because at line 1564, the slot is turned off, and in this line (1575) is checked the status to see if we got an error; if so, the error number is shown. This number is in 'retval', not in 'rc' ('rc' does have the return of configure_new_device()). ChangeSet@1.1608.6.123, 2004-05-10 14:08:27-07:00, akpm@osdl.org [PATCH] Lindent arch/i386/kernel/cpuid.c From: Hanna Linder Per Greg's request this is a patch of having run Lindent on cpuid.c. The tabs were not the right number of spaces before. I have verified it still compiles and boots with this "change". ChangeSet@1.1608.6.122, 2004-05-10 14:08:17-07:00, akpm@osdl.org [PATCH] pcmcia/tcic.c warning fix. From: "Luiz Fernando N. Capitulino" drivers/pcmcia/tcic.c:63: warning: `version' defined but not used ChangeSet@1.1608.6.121, 2004-05-10 14:08:06-07:00, akpm@osdl.org [PATCH] as-iosched barrier fix From: Jens Axboe AS does not correctly account requests inserted with INSERT_FRONT or INSERT_BACK, barriers for example. In other elevators, requeued requests also go through the insert path, but AS has its own requeue handler which means the code has never been tested. Also, make inserting a barrier with INSERT_SORT imply INSERT_BACK, which is the logical behaviour. Previously such insertions weren't rigorously defined. ChangeSet@1.1608.6.120, 2004-05-10 14:07:55-07:00, akpm@osdl.org [PATCH] Fix race on tty close From: Benjamin Herrenschmidt ldisc close can race with the flush_to_ldisc workqueue. This patch fixes it by killing the workqueue first. ChangeSet@1.1608.6.119, 2004-05-10 14:07:45-07:00, akpm@osdl.org [PATCH] SElinux interface for reporting size of printk buffer From: Olaf Dabrunz Add the necessary hooks so that a SELinux-enabled kernel will allow the new "report the size of the printk buffer" query to work. ChangeSet@1.1608.6.118, 2004-05-10 14:07:34-07:00, akpm@osdl.org [PATCH] blk: cache queue_congestion_on/off_threshold values From: "Chen, Kenneth W" It's kind of redundant that queue_congestion_on/off_threshold gets calculated on every I/O and they produce the same number over and over again unless q->nr_requests gets changed (which is probably a very rare event). We can cache those values in the request_queue structure. ChangeSet@1.1608.6.117, 2004-05-10 14:07:23-07:00, akpm@osdl.org [PATCH] swsusp documentation updates From: Pavel Machek ChangeSet@1.1608.6.116, 2004-05-10 14:07:13-07:00, akpm@osdl.org [PATCH] simplify mqueue_inode_info->messages allocation From: Chris Wright Currently, if a user creates an mqueue and passes an mq_attr, the info->messages will be created twice (and the extra one is properly freed). This patch simply delays the allocation so that it only ever happens once. The relevant mq_attr data is passed to lower levels via the dentry->d_fsdata fs private data. This also helps isolate the areas we'd need to touch to do rlimits on mqueues. ChangeSet@1.1608.6.115, 2004-05-10 14:07:02-07:00, akpm@osdl.org [PATCH] bfs filesystem read past the end of dir From: Jakub Jermar I found out that BFS filesystem will eventually try to read and interpret garbage past the end of directory in bfs_add_entry(). If the garbage (interpreted as i-node number) is not set to zero (does it have to be?) bfs_add_entry() will consider it a regular directory entry. This causes weird things like this: # touch a # rm a # ls # touch b # ls a My patch detects an attempt to read past the end of directory and explicitly clears the garbage that represents i-node number. Thus the correct behaviour is achieved. (was unable to contact Tigran) ChangeSet@1.1608.6.114, 2004-05-10 14:06:52-07:00, akpm@osdl.org [PATCH] update Documentation/md.txt From: (Dick Streefland) The following patch documents the currently undocumented raid= kernel parameter. ChangeSet@1.1608.6.113, 2004-05-10 14:06:42-07:00, akpm@osdl.org [PATCH] es7000 subarch update for generic arch From: "Protasevich, Natalie" This is ES7000 sub architecture update. It makes ES7000 a part of the generic architecture, so the single compiled kernel will be able to choose a correct set of parameters, routines ("genapic"), and a boot path. It uses criteria provided by the subarch for platform identification. In case of ES7000, it is a unique product/vendor string in the ACPI/MP OEM table, and server control registers. The patch is confined to only es7000 subarch and generic subarch. It was tested on ES7000 as well as generic Intel 8x Xeon system. Andi Kleen has reviewed the changes. ChangeSet@1.1608.6.112, 2004-05-10 14:05:51-07:00, akpm@osdl.org [PATCH] CLOCK_TICK_RATE: use CLOCK_TICK_RATE From: Thorsten Kranzkowski use CLOCK_TICK_RATE where 1193180 was used in general timing calculations. (optional) ChangeSet@1.1608.6.111, 2004-05-10 14:05:40-07:00, akpm@osdl.org [PATCH] CLOCK_TICK_RATE: use PIT_TICK_RATE in *spkr.c From: Thorsten Kranzkowski ChangeSet@1.1608.6.110, 2004-05-10 14:05:29-07:00, akpm@osdl.org [PATCH] CLOCK_TICK_RATE: introduce asm-*/8253pit.h, #define PIT_TICK_RATE constant. From: Thorsten Kranzkowski The calculation of the counter values in drivers/input/misc/pcspkr.c is incorrectly based on CLOCK_TICK_RATE. This goes unnoticed in i386 because there the system clock is driven by the same Programmable Interval Timer chip as the speaker. But this doesn't hold true on other archs, e.g. Alpha. To solve this problem I made these patches: 1/3: introduce asm-*/8253pit.h, #define PIT_TICK_RATE constant. It seems this is not always the same value. 2/3: use PIT_TICK_RATE in *spkr.c 3/3: use CLOCK_TICK_RATE where 1193180 was used in general timing calculations. (optional) There are still some places where the magic number is used instead of the #define (vt_ioctl.c, gameport.c) but I left them as-is. I got some responses from arch maintainers to specifically not touch their respective architectures so changing these places would mean breakage for them. Tested on Alpha and i386, ack'ed by Ralf Baechle for MIPS. This patch: introduce asm-*/8253pit.h, #define PIT_TICK_RATE constant. ChangeSet@1.1608.6.109, 2004-05-10 14:05:18-07:00, akpm@osdl.org [PATCH] readahead: keep file->f_ra sane When two threads are simultaneously pread()ing from the same fd (which is a legitimate thing to do), the readahead code thinks that a huge amount of seeking is happening and shrinks the window, damaging performance a lot. I don't see a sane way to avoid this within the readahead code, so take a private copy of the readahead state and restore it prior to returning from the read. ChangeSet@1.1608.6.108, 2004-05-10 14:05:07-07:00, akpm@osdl.org [PATCH] jiffies-to-clockt fix From: john stultz This patch polishes up Tim Schmielau's (tim@physik3.uni-rostock.de) fix for jiffies_to_clock_t() and jiffies_64_to_clock_t(). The issues observed was w/ /proc output not matching up to wall time due to accumulated error caused by HZ not being exactly 1000 on i386 systems. The solution is to correct that error by using the more accurate TICK_NSEC in our calculation. Additionally, this patch corrects 3 warnings in the TCP layer uncovered by this change. ChangeSet@1.1608.6.107, 2004-05-10 14:04:56-07:00, akpm@osdl.org [PATCH] cyclades cleanups From: Marcelo Tosatti - cleanups for cyclades Kconfig entry (Adrian Bunk/me) - janitors project: remove dead function (Don Koch) From: aris@cathedrallabs.org (Aristeu Sergio Rozanski Filho) Use the standard min/max macros ChangeSet@1.1608.6.106, 2004-05-10 14:04:46-07:00, akpm@osdl.org [PATCH] fix ramdisk size assembler warning From: Jorn Engel AS arch/i386/boot/setup.o /usr/src/linux-2.6.5/arch/i386/boot/setup.S: Assembler messages: /usr/src/linux-2.6.5/arch/i386/boot/setup.S:159: Warning: value 0x37ffffff truncated to 0x37ffffff The warning is correct, the calculated value for ramdisk_max would be 0xb7ffffff instead of 0x37ffffff. Truncating 0xb7ffffff to 0x37ffffff is desired behaviour, so we should do it explicitly. ChangeSet@1.1608.6.105, 2004-05-10 14:04:36-07:00, akpm@osdl.org [PATCH] ppc64: use generic ipc syscall translation From: David Gibson Currently ppc64 has its own code to convert 32-bit ipc() syscalls to 64-bit, rather than using the common translation code from ipc/compat.c. This patch, tweaked slightly from an earlier version of Anton Blanchard's fixes that, replacing the ppc64 code with calls to the common code. I've run the LSB IPC tests, and as many of the LTP IPC tests as I could figure out how to run easily, and it seems to pass them all. ChangeSet@1.1608.6.104, 2004-05-10 14:04:25-07:00, akpm@osdl.org [PATCH] gcc-3.4.0 fixes for 2.6.6-rc3 x86_64 kernel From: Mikael Pettersson Here are some patches to fix compilation warnings from gcc-3.4.0 in the 2.6.6-rc3 x86_64 kernel. - puts() type conflict in boot/compressed/misc.c: rename to putstr(), just like i386 did - cast-as-lvalue in ia32_copy_siginfo_from_user(): use temporary - code before declaration in io_apic.c: move decl up - code before declaration in ioremap.c: move existing #ifndef up - cast-as-lvalue (tons of them) from UP version of per_cpu(): merged asm-generic's version ChangeSet@1.1608.6.103, 2004-05-10 14:04:14-07:00, akpm@osdl.org [PATCH] fixup 68360 module refcounting From: Christoph Hellwig ChangeSet@1.1608.6.102, 2004-05-10 14:04:03-07:00, akpm@osdl.org [PATCH] Warn when smp_call_function() is called with interrupts disabled From: Keith Owens Almost every architecture has a comment above smp_call_function() * You must not call this function with disabled interrupts or from a * hardware interrupt handler or from a bottom half handler. I have not seen any problems with calling smp_call_function() from a bottom half handler, but calling it with interrupts disabled can definitely deadlock. This bug is hard to reproduce and even harder to debug. CPU A CPU B Disable interrupts smp_call_function() Take call_lock Send IPIs Wait for all cpus to acknowledge IPI CPU A has not responded, spin waiting for cpu A to respond, holding call_lock smp_call_function() Spin waiting for call_lock Deadlock Deadlock Change all smp_call_function() to WARN_ON(irqs_disabled()). It should be BUG_ON() but some buggy code like SCSI sg will break with BUG_ON, so just warn for now. Change it to BUG_ON after the buggy code has been fixed. ChangeSet@1.1608.6.101, 2004-05-10 14:03:52-07:00, akpm@osdl.org [PATCH] worker_thread race fix Fix a waitqueue-handling race in worker_thread(). ChangeSet@1.1608.6.100, 2004-05-10 14:03:41-07:00, akpm@osdl.org [PATCH] pcmcia/i82365.c warning fix From: "Luiz Fernando N. Capitulino" drivers/pcmcia/i82365.c: At top level: drivers/pcmcia/i82365.c:71: warning: `version' defined but not used ChangeSet@1.1608.6.99, 2004-05-10 14:03:31-07:00, akpm@osdl.org [PATCH] throttle P4 thermal warnings From: Zwane Mwaikambo In really bad conditions this can keep printing for a while, throttle the output somewhat. Also change the "CPU%d" formatting to better match the other boot output. ChangeSet@1.1608.6.98, 2004-05-10 14:03:20-07:00, akpm@osdl.org [PATCH] fix deadlock in create_workqueue() Fix bug identified by Srivatsa Vaddagiri : There's a deadlock in __create_workqueue when CONFIG_HOTPLUG_CPU is set. This can happen when create_workqueue_thread fails to create a worker thread. In that case, we call destroy_workqueue with cpu hotplug lock held. destroy_workqueue however also attempts to take the same lock. ChangeSet@1.1608.6.97, 2004-05-10 14:03:10-07:00, akpm@osdl.org [PATCH] remove blk_queue_bounce() printks From: Matt Domsch Jens Axboe wrote: It should just be deleted. As you note, it is a debug message. I originally added it so we would have some clues as to dma capability for bug reports. There never was any, the check can go :) ChangeSet@1.1608.6.96, 2004-05-10 14:02:59-07:00, akpm@osdl.org [PATCH] Fix MTD suspend/resume From: Russell King This patch carries forward the following bug fix from MTD CVS, which causes a lot of noise after a suspend/resume cycle on ARM devices. revision 1.127 date: 2003/07/02 20:29:38; author: acurtis; state: Exp; lines: +2 -1 Added FL_STATUS to the FL_READY case in put_chip(). (Eliminate noise) ChangeSet@1.1608.6.95, 2004-05-10 14:02:49-07:00, akpm@osdl.org [PATCH] dentry and inode cache hash algorithm performance changes. From: "Jose R. Santos" It alleviates some issues seen with Linux when accessing millions of files on machines with large amounts of RAM (+32GB). Both algorithms are base on some studies that Dominique Heger was doing on hash table efficiencies in Linux. The dentry hash table has been tested in small systems with one internal IDE hard disk as well as in large SMP with many fiberchanel disks. Dominique claims that in all the testing done, they did not see one case were this has function provided worst performance and that in most test they were seeing better performance. The inode hash function was done by me base on Dominique's original work and has only been stress tested with SpecSFS. It provided a 3% improvement over the default algorithm in the SpecSFS results and speed ups in the response time of almost all filesystem operations the benchmark stress. With the better distribution is as also possible to reduce the number of inode buckets for 32 million to 16 million and still get a slightly better results. Anton was nice enough to provide some graphs that show the distribution before and after the patch at http://samba.org/~anton/linux/sfs/1/ For the dentry hash function, some of my other coorkers had put this hash function through various testing and have concluded that the hash function was equal or better than the default hash function. These runs were done with a (hopefully to be Open Source soon) benchmark called FFSB which can simulate various io patters across many filesystems and variable file sizes. SpecSFS fileset is basically a lot of small file which varies depending on the size of the run. For a not so big SMP system the number of file is in the +20 Million files range. Of those 20 million files only 10% are access randomly by the client. The purpose of this is that the benchmark tries to stress not only the NFS layer but, VM and Filesystems layers as well. The filesets are also hundreds of gigabytes in size in order to promote disk head movement by guaranteeing cache misses in memory. SFS 27% of the workload are lookups __d_lookup has showing high in my profiles. For the inode hash the problem that I see is that when running a benchmark with this huge fileset we end up trying to free a lot of inode entries during the run while trying to put new entries in cache. We end up calling ifind_fast() which calls find_inodes_fast() held under inode_lock. In order to avoid holding the inode_lock we needed to avoid having long chains in that hash function. When I took a look at the original hash function, I found it to be a bit to simple for any workload. My solution (which I took advantage of Dominique's work) was to create a hash that function that could generate completely different hashes depending on the hashval and the superblock in order to have the hash scale as we added more filesystems to the machine. Both of these problems can be somewhat tuned out by increasing the number of buckets of both d and i cache but it got to a point were I had 256MB of inode and 128MB in dentry hash buckets on a not so large SMP. With the hash changes I have been able to reduce the number of buckets to 128MB for inode cache and to 32MB for dentry cache and still get better performance. If it help my case... I haven't been running this benchmark for long, so I haven't been able to find a way to cheat. I need to come up with generic solutions until I can find a cheat for the benchmark. :) SDET results: Steve Pratt seem to have a SDET setup already and he did me the favor of running SDET with a reduce dentry entry hash table size. I belive that his table suggest that less than 3% change is acceptable variability, but overall he got a 5% better number using the new hash algorith. A) x4408way1.sdet.2.6.5100000-8p.04-05-05_12.08.44 vs B) x4408way1.sdet.2.6.5+hash-100000-8p.04-05-05_11.48.02 Dentry cache hash table entries: 131072 (order: 7, 524288 bytes) Inode-cache hash table entries: 1048576 (order: 10, 4194304 bytes) Results:Throughput tolerance = 0.00 + 3.00% of A A B Threads Ops/sec Ops/sec %diff diff tolerance ----------- ------------ ------------ -------- ------------ ------------ 1 4341.9300 4401.9500 1.38 60.02 130.26 2 8242.2000 8165.1200 -0.94 -77.08 247.27 4 15274.4900 15257.1000 -0.11 -17.39 458.23 8 21326.9200 21320.7000 -0.03 -6.22 639.81 16 23056.2100 24282.8000 5.32 1226.59 691.69 * 32 23397.2500 24684.6100 5.50 1287.36 701.92 * 64 23372.7600 23632.6500 1.11 259.89 701.18 128 17009.3900 16651.9600 -2.10 -357.43 510.28 ========================================================================= ChangeSet@1.1608.6.94, 2004-05-10 14:02:38-07:00, akpm@osdl.org [PATCH] cmpci OSS driver update From: C.L. Tien Current version from cmedia. ChangeSet@1.1608.6.93, 2004-05-10 14:02:27-07:00, akpm@osdl.org [PATCH] EDD: follow sysfs convention, MODULE_VERSION, remove dead SCSI symlink From: Matt Domsch Clean up the edd.c driver. * use kobject_set_name() instead of snprintf() per GregKH's recommendation. * Add MODULE_VERSION() * s/driverfs/sysfs/ in Kconfig * Remove report URL message, as there have been too many BIOSs reported, virtually none of which are EDD-capable. This may return if/when I develop a better reporting method and database to capture/store the data from users. * Remove the unused code for creating a symlink to the scsi_device. This never worked right, and I'm going to show the relationship from a userspace tool which uses libsysfs instead. ChangeSet@1.1608.6.92, 2004-05-10 14:02:17-07:00, akpm@osdl.org [PATCH] blk_start_queue() should use kblockd kblockd is the thread which runs unplug functions, not keventd. ChangeSet@1.1608.6.91, 2004-05-10 14:02:06-07:00, akpm@osdl.org [PATCH] Only Print Taint Message Once From: Rusty Russell Only print the tainted message the first time. Its purpose is to warn users that we can't support them, not to fill their logs. ChangeSet@1.1608.6.90, 2004-05-10 14:01:55-07:00, akpm@osdl.org [PATCH] Un-inline spinlocks on ppc64 From: Paul Mackerras The patch below moves the ppc64 spinlocks and rwlocks out of line and into arch/ppc64/lib/locks.c, and implements _raw_spin_lock_flags for ppc64. Part of the motivation for moving the spinlocks and rwlocks out of line was that I needed to add code to the slow paths to yield the processor to the hypervisor on systems with shared processors. On these systems, a cpu as seen by the kernel is a virtual processor that is not necessarily running full-time on a real physical cpu. If we are spinning on a lock which is held by another virtual processor which is not running at the moment, we are just wasting time. In such a situation it is better to do a hypervisor call to ask it to give the rest of our time slice to the lock holder so that forward progress can be made. The one problem with out-of-line spinlock routines is that lock contention will show up in profiles in the spin_lock etc. routines rather than in the callers, as it does with inline spinlocks. I have added a CONFIG_SPINLINE config option for people that want to do profiling. In the longer term, Anton is talking about teaching the profiling code to attribute samples in the spin lock routines to the routine's caller. This patch reduces the kernel by about 80kB on my G5. With inline spinlocks selected, the kernel gets about 4kB bigger than without the patch, because _raw_spin_lock_flags is slightly bigger than _raw_spin_lock. This patch depends on the patch from Keith Owens to add _raw_spin_lock_flags. ChangeSet@1.1608.6.89, 2004-05-10 14:01:44-07:00, akpm@osdl.org [PATCH] Allow architectures to reenable interrupts on contended spinlocks From: Keith Owens As requested by Linus, update all architectures to add the common infrastructure. Tested on ia64 and i386. Enable interrupts while waiting for a disabled spinlock, but only if interrupts were enabled before issuing spin_lock_irqsave(). This patch consists of three sections :- * An architecture independent change to call _raw_spin_lock_flags() instead of _raw_spin_lock() when the flags are available. * An ia64 specific change to implement _raw_spin_lock_flags() and to define _raw_spin_lock(lock) as _raw_spin_lock_flags(lock, 0) for the ASM_SUPPORTED case. * Patches for all other architectures and for ia64 with !ASM_SUPPORTED to map _raw_spin_lock_flags(lock, flags) to _raw_spin_lock(lock). Architecture maintainers can define _raw_spin_lock_flags() to do something useful if they want to enable interrupts while waiting for a disabled spinlock. ChangeSet@1.1608.6.88, 2004-05-10 14:01:33-07:00, akpm@osdl.org [PATCH] Kill some 'No description found...' warnings. (kernel-api.sgml) From: Alexey Dobriyan Fix various kernel-doc parameters. ChangeSet@1.1608.6.87, 2004-05-10 14:01:22-07:00, akpm@osdl.org [PATCH] Kill a warning while making pdfdocs. From: Alexey Dobriyan DOCPROC Documentation/DocBook/parportbook.sgml Warning(drivers/parport/share.c:188): No description found for parameter 'drv' (kernel-doc parameter name is incorrect.) ChangeSet@1.1608.6.86, 2004-05-10 14:01:13-07:00, akpm@osdl.org [PATCH] com90xx error message patch: check_region() gone From: Greg Aumann This patch updates two error messages to reflect changes in the code. ChangeSet@1.1608.6.85, 2004-05-10 14:01:03-07:00, akpm@osdl.org [PATCH] Improve laptop mode's block_dump output From: "Theodore Ts'o" This patch versus improves the output produced by "echo 1 > /proc/sys/vm/block_dump", in the following ways: 1) The messages are printed with KERN_DEBUG, so that even if sysklogd is running, if configured appropriately, it will not need to write to log files. 2) The inode which is dirtied by a process is now identified more precisely by inode number and filesystem ID, and by a dcache name if present. 3) In the generic filesystem sget function, the superblock id (s_id) is filled in with the filesystem type by default. Filesystems which are block-device based will override s_id, but this allows pseudo filesystems such as tmpfs, procfs, etc. to be identified in (2). ChangeSet@1.1608.6.84, 2004-05-10 14:00:52-07:00, akpm@osdl.org [PATCH] find_user locking and leak fix find_user() is being called from set/get_priority(), but it doesn't take the needed lock, and those callers were forgetting to drop the refcount which find_user() took. ChangeSet@1.1608.6.83, 2004-05-10 14:00:41-07:00, akpm@osdl.org [PATCH] mptfusion depends on scsi From: Olaf Hering ChangeSet@1.1608.6.82, 2004-05-10 14:00:30-07:00, akpm@osdl.org [PATCH] reiserfs: add device info to diagnostic messages From: Chris Mason From: Jeff Mahoney Add device info to the various reiserfs warnings and panics so you can tell which filesystem triggers the message. Loosely based on code from Oleg Drokin. ChangeSet@1.1608.6.81, 2004-05-10 14:00:19-07:00, akpm@osdl.org [PATCH] reiserfs: xattr permission fix From: Chris Mason From: jeffm@suse.com reiserfs permission bug fix for xattrs ChangeSet@1.1608.6.80, 2004-05-10 14:00:09-07:00, akpm@osdl.org [PATCH] reiserfs: quota support From: Chris Mason ReiserFS support for quotas. Originally from Jan Kara ChangeSet@1.1608.6.79, 2004-05-10 13:59:58-07:00, akpm@osdl.org [PATCH] reiserfs: xattr locking fixes From: Chris Mason From: jeffm@suse.com reiserfs xattr locking fixes ChangeSet@1.1608.6.78, 2004-05-10 13:59:47-07:00, akpm@osdl.org [PATCH] reiserfs: selinux support From: Chris Mason From: jeffm@suse.com reiserfs support for selinux ChangeSet@1.1608.6.77, 2004-05-10 13:59:36-07:00, akpm@osdl.org [PATCH] reiserfs: support trusted xattrs From: Chris Mason From: jeffm@suse.com reiserfs support for trusted xattrs ChangeSet@1.1608.6.76, 2004-05-10 13:59:25-07:00, akpm@osdl.org [PATCH] reiserfs: ACL support From: Chris Mason From: jeffm@suse.com reiserfs acl support ChangeSet@1.1608.6.75, 2004-05-10 13:59:13-07:00, akpm@osdl.org [PATCH] reiserfs: xattr support From: Chris Mason From: jeffm@suse.com reiserfs support for xattrs ChangeSet@1.1608.6.74, 2004-05-10 13:59:01-07:00, akpm@osdl.org [PATCH] reiserfs: acl device node initialization From: Chris Mason From: jeffm@suse.com properly init device inodes in the acl code ChangeSet@1.1608.6.73, 2004-05-10 13:58:51-07:00, akpm@osdl.org [PATCH] Reiserfs commit default fix From: Bart Samwel This patch from Micha Feigin fixes some bugs in the earlier reiserfs commit default patch. The changelog: * If you remounted without any commit=NNN option, it would assume commit=0 and restore the defaults. This patch makes it leave the current state alone if you don't pass commit=NNN. * Added range check for cast from unsigned long to unsigned int. ChangeSet@1.1608.6.72, 2004-05-10 13:58:41-07:00, akpm@osdl.org [PATCH] partitioning cleanup: use DOS_EXTENDED_PARTITION From: FabF Use the pre-existing enum rather than magic numbers. ChangeSet@1.1608.6.71, 2004-05-10 13:58:30-07:00, akpm@osdl.org [PATCH] fix 3c59x.c to allow 3c905c 100bT-FD From: Burton Windle Fix the 3c905C 10/100 transceiver initialisation woes. ChangeSet@1.1608.6.70, 2004-05-10 13:58:20-07:00, akpm@osdl.org [PATCH] shrink_slab: improved handling of GFP_NOFS allocations Currently, shrink_slab() will decide that it needs to scan a certain number of dentries, will call shrink_dcache_memory() requesting that this be done, and shrink_dcache_memory() will simply bale out without doing anything because the caller did not have __GFP_FS. This has the potential to disrupt our lovely pagecache-vs-slab balancing act. So change things so that shrinker callouts can return -1, indicating that they baled out. This way, shrink_slab can remember that this slab was owed a certain number of scannings and these will be correctly performed next time a __GFP_FS caller comes by. ChangeSet@1.1608.6.69, 2004-05-10 13:58:09-07:00, akpm@osdl.org [PATCH] New version of early CPU detect From: Andi Kleen We still need some kind of early CPU detection, e.g. for the AMD768 workaround and for the slab allocator to size its slabs correctly for the cache line. Also some other code already had private early CPU routines. This patch takes a new approach compared to the previous patch which caused Andrew so much grief. It only fills in a few selected fields in boot_cpu_data (only the data needed to identify the CPU type and the cache alignment). In particular the feature masks are not filled in, and the other fields are also not touched to prevent unwanted side effects. Also convert the ppro workaround to use standard cpu data now. I'm not sure if slab still has the necessary support to use the cache line size early; previously Manfred showed some serious memory saving with this for kernels that are compiled for a bigger cache line size than the CPU (is often the case on distribution kernels). This code could be reenable now with this patch. ChangeSet@1.1608.6.68, 2004-05-10 13:57:58-07:00, akpm@osdl.org [PATCH] remove some unused variables in s2io From: Anton Blanchard Found a few warnings when compiling with NAPI off. ChangeSet@1.1608.6.67, 2004-05-10 13:57:48-07:00, akpm@osdl.org [PATCH] Remove bootsect_helper on x86_64 and pc98 From: Coywolf Qi Hunt Since "Direct booting from floppy is no longer supported", this patch is remove the bootsect_helper code from x86_64 and PC-9800. ChangeSet@1.1608.6.66, 2004-05-10 13:57:37-07:00, akpm@osdl.org [PATCH] Remove bootsect_helper and a comment fix From: Coywolf Qi Hunt Since "Direct booting from floppy is no longer supported", this patch is to remove the bootsect_helper code. And also a comment fix. The other two platforms x86_64 and PC-9800 should also be cleaned up too. ChangeSet@1.1608.6.65, 2004-05-10 13:57:26-07:00, akpm@osdl.org [PATCH] ppc32: ppc8xx build fixes From: "Prof. BJ" - m8xx_setup warning and mfmsr error fix - ppc8xx_pic include error fix - tqm8xxl.c typeing (syntax) error fix - commproc.c include error and prototype warning fix (acked by Matt Porter) ChangeSet@1.1608.6.64, 2004-05-10 13:57:16-07:00, akpm@osdl.org [PATCH] es7000 subarch update From: "Protasevich, Natalie" The patch fixes a problem with ES7000 Server Management mechanism that uses platform register mip_port. It was not initialized, so the mechanism was not functional. The patch also fixes the APIC destination for hierarchical and flat cluster models used in ES7000. The destination ID's reflect policies for Cascade based systems which use logical delivery and lowest priority mechanism, and for xAPIC based models that use physical delivery and fixed APIC destinations. The patch also turns on NO_IOAPIC_CHECK (1) to avoid error messages and attempts to re-write the ID, because on ES7000 all ID's are hard coded in the BIOS and cannot be altered. ChangeSet@1.1608.6.63, 2004-05-10 13:57:05-07:00, akpm@osdl.org [PATCH] Consolidate sys32_nfsservctl From: Arnd Bergmann sys32_nfsservctl is the largest remaining syscall emulation handler that can be consolidated. mips and ia64 currently don't use this at all, parisc has a simpler implementation than the one used by s390, sparc ppc and that the new compat_sys_nfsservctl is based on. The user access checks in the code are inconsistant at least, which should be fixed here. Compile tested only due to lack of proper test setup. ChangeSet@1.1608.6.62, 2004-05-10 13:56:53-07:00, akpm@osdl.org [PATCH] Consolidate sys32_select From: Arnd Bergmann sys32_select has seven mostly but not exactly identical versions, so consolidate them as compat_sys_select. Based on the ppc64 implementation, which most closely resembles sys_select. One bug that was not caught by LTP has been fixed since the first version of this patch. tested x86_64, ia64 and s390. ChangeSet@1.1608.6.61, 2004-05-10 13:56:42-07:00, akpm@osdl.org [PATCH] Consolidate do_execve32 From: Arnd Bergmann The code for sys32_execve/do_execve32 in most of the seven versions was copied from fs/exec.c but not kept up-to-date. The new compat_do_execve() function is based on the mips code and has been resync'ed with do_execve(). IA64 changes are from Arun Sharma. Tested on x86_64, ia64 and s390 ChangeSet@1.1608.6.60, 2004-05-10 13:56:32-07:00, akpm@osdl.org [PATCH] Consolidate sys32_readv and sys32_writev From: Arnd Bergmann The seven implementations of this have gone out of sync and are mostly buggy. The new compat_sys_* version is based on the ppc64 implementation, which most closely resembles the code in sys_readv/sys_writev. Tested on x86_64, ia64 and s390. ChangeSet@1.1608.6.59, 2004-05-10 13:56:20-07:00, akpm@osdl.org [PATCH] AS: increase batch expiry intervals From: Nick Piggin Without disturbing the read/write ratio, increase the bathc expiry intervals. This wil have the effect of increasing latency a little, but with improved throughput. ChangeSet@1.1608.6.58, 2004-05-10 13:56:10-07:00, akpm@osdl.org [PATCH] Laptop Mode doc update From: Richard Atterer reported that mutt does not play well with noatime (it uses access times to check whether new mail has arrived in a folder). This patch warns about this in the doc, and adds a setting to the control script to disable the noatime remount. ChangeSet@1.1608.6.57, 2004-05-10 13:55:59-07:00, akpm@osdl.org [PATCH] cyclades MAINTAINERS update From: Marcelo Tosatti ChangeSet@1.1608.6.56, 2004-05-10 13:55:49-07:00, akpm@osdl.org [PATCH] selinux: reopen descriptors closed on exec to /dev/null From: Stephen Smalley This patch changes the SELinux module to try to reset any descriptors it closes on exec (due to a lack of permission by the new domain to the inherited open file) to refer to the null device. This counters the problem of SELinux inducing program misbehavior, particularly due to having descriptors 0-2 closed when the new domain is not allowed access to the caller's tty. This is primarily to address the case where the caller is trusted with respect to the new domain, as the untrusted caller case is already handled via AT_SECURE and glibc secure mode. The code is partly based on the OpenWall LSM, which in turn drew from the OpenWall kernel patch. Note that the code does not guarantee that the descriptor is always re-opened to /dev/null; it merely makes a reasonable effort to do so, but can fail under various conditions. ChangeSet@1.1608.6.55, 2004-05-10 13:55:38-07:00, akpm@osdl.org [PATCH] ext3 error handling fixes From: Andreas Dilger a) we don't call ext3_error() for an IO error in ext3_find_entry(), so we won't do the normal ext3 error handling (mark SB in error, remount-ro or panic if desired); b) in empty_dir() we don't continue checking for non-empty blocks after a content error (ext3_check_dir_entry() calls ext3_error() already); c) we had decided not to mark the SB in error for holes in directories to allow leway in the indexed-directory implementation, but this change incorrectly also disabled marking the SB in error for real IO errors. ChangeSet@1.1608.6.54, 2004-05-10 13:55:27-07:00, akpm@osdl.org [PATCH] sched: in_sched_functions() cleanup From: Rusty Russell 1) Create an in_sched_functions() function in sched.c and make the archs use it. (Two archs have wchan #if 0'd out: left them alone). 2) Move __sched from linux/init.h to linux/sched.h and add comment. 3) Rename __scheduling_functions_start_here/end_here to __sched_text_start/end. Thanks to wli and Sam Ravnborg for clue donation. ChangeSet@1.1608.6.53, 2004-05-10 13:55:16-07:00, akpm@osdl.org [PATCH] Fix ext3 bogus ENOSPC With strange workloads which do a lot of quick truncation on small filesystems it is possible to get into a situation where there are free blocks on the disk, but they are not allocatable at this time due to their having been freed up in the current JBD transaction. Applications get unexpected ENOSPC errors. We can fix that with this patch, originally by Andreas Dilger which forces a single commit+retry when an ENOSPC is encountered. ChangeSet@1.1608.6.52, 2004-05-10 13:55:06-07:00, akpm@osdl.org [PATCH] reduce NMI watchdog call frequency with local APIC. From: Mikael Pettersson The real problem is that SMP with nmi_watchdog=2 initialises the lapic NMI watchdog but doesn't check it and therefore doesn't reduce nmi_hz. This is an SMP bug. The patch changes smpboot.c to do a check_nmi_watchdog() at the appropriate place, which fixes the high NMI frequency problem w/o changing anything else. I've verified that it solves the problem on my MP-capable UP box. ChangeSet@1.1608.6.51, 2004-05-10 13:54:55-07:00, akpm@osdl.org [PATCH] Fix nmi_watchdog=2 and P4 HT From: Philippe Elie With nmi_watchdog=2 and a P4 ht box the nmi is occurring only on logical processor 0, it's better to get it on both. With this patch, on x86 SMP and nmi_watchdog=2, nmi interupts occur at 1000 hz (if the cpu is loaded) not at the intended 1 hz rate but that's a distinct problem. ChangeSet@1.1608.6.50, 2004-05-10 13:54:45-07:00, akpm@osdl.org [PATCH] Fixes in 32 bit ioctl emulation code From: Raghavan , me I am submitting a patch that fixes 2 race conditions in the 32 bit ioctl emulation code.(fs/compat.c) Since the search is not locked; when a ioctl_trans structure is deleted, corruption can occur. The following scenarios discuss the race conditions: 1) When the search is hapenning, if any ioctl_trans structure gets deleted; then rather than searching the hash table, the code will start searching the free list. while (t && t->cmd != cmd) - ChangeSet@1.1608.6.49, 2004-05-10 13:54:34-07:00, akpm@osdl.org [PATCH] mips: sgiwd93 2.6 fixes and crapectomy From: Ralf Baechle Get to work under 2.6 sorting out the giant mess this has been. Further cleanups would require a full crapectomy of wd33c93.c itself ... ChangeSet@1.1608.6.48, 2004-05-10 13:54:24-07:00, akpm@osdl.org [PATCH] mips: remove dz driver From: Ralf Baechle This driver has been obsoleted by drivers/serial/dz.c. ChangeSet@1.1608.6.47, 2004-05-10 13:54:13-07:00, akpm@osdl.org [PATCH] mips: 64-bit MIPS needs compat stuff From: Ralf Baechle ChangeSet@1.1608.6.46, 2004-05-10 13:54:02-07:00, akpm@osdl.org [PATCH] mips: add missing IP22 Zilog bit From: Ralf Baechle Add missing definition PORT_IP22ZILOG which is need by ip22zilog driver. ChangeSet@1.1608.6.45, 2004-05-10 13:53:51-07:00, akpm@osdl.org [PATCH] mips: GBE Video Driver From: Ralf Baechle This patch adds the GBE video driver for the video system in SGI IP32 aka O2 and it's i386-based equivalent the Visual Workstation. This driver obsoletes sgivwfb.c; but I'd prefer to play safe and remove it after some additional time, just in case. ChangeSet@1.1608.6.44, 2004-05-10 13:53:41-07:00, akpm@osdl.org [PATCH] mips: remove VIDEO_TYPE_SNI_RM From: Ralf Baechle The RM200's onboard video really is a plain old boring Cyrix PCI card. ChangeSet@1.1608.6.43, 2004-05-10 13:53:30-07:00, akpm@osdl.org [PATCH] mips: newport driver fixes From: Ralf Baechle Make the driver for Newport aka XL work in 2.6. ChangeSet@1.1608.6.42, 2004-05-10 13:53:19-07:00, akpm@osdl.org [PATCH] mips: Simplify expression From: Ralf Baechle CONFIG_MIPS is always defined, for 32-bit and 64-bit. ChangeSet@1.1608.6.41, 2004-05-10 13:53:07-07:00, akpm@osdl.org [PATCH] mips: fix 2.6 fb setup From: Ralf Baechle ChangeSet@1.1608.6.40, 2004-05-10 13:52:56-07:00, akpm@osdl.org [PATCH] MIPS update From: Ralf Baechle - Kconfig cleanups: - enable DMA_NONCOHERENT, DMA_COHERENT or DMA_IP27 via reverse dependencies - untangle VRC4171 / VRC4173 selection - R10000 support enables PREFETCH - SEAD needs IRQ_CPU - Update defconfig against latest Kconfig files. - Fix computation of return address if syscall number was out of range - Add power managment hooks in signal code. - Don't try to handle signals when previous context was not in user mode. - Fix serial interface setup for VR41xx systems. - Build fixes after CLEAR_BITMAP changed name. - Removes bogus comment from - is dead. - Start collecting common definitions for PMON firmware in - Define ARCH_MIN_TASKALIGN to 8; we have 64-bit members even on 32-bit kernels if we're running on MIPS II or better. ChangeSet@1.1608.6.39, 2004-05-10 13:52:43-07:00, akpm@osdl.org [PATCH] Fix deadlock in journalled quota From: Jan Kara Attached patch should fix reported deadlock in journalled quota code. quotactl() call was violating the locking rules and didn't start transaction when it should. From: Found a couple of symbols not exported that were needed by the ext3.ko module. ChangeSet@1.1371.762.49, 2004-05-10 15:38:00-05:00, markh@osdl.org [PATCH] aacraid reset handler fix This fixes a situation where the handler can exit too early. ChangeSet@1.1608.6.38, 2004-05-10 13:30:22-07:00, akpm@osdl.org [PATCH] migration_thread() race fix From: Srivatsa Vaddagiri Noticed that migration_thread can examine "kthread_should_stop()?" without setting its state to TASK_INTERRUPTIBLE first. This can cause kthread_stop on that thread to block forever ... P.S - I assumed that having the task state set to TASK_INTERRUTIBLE while it is doing active_load_balance is fine. It seemed to be the case earlier also. ChangeSet@1.1608.6.37, 2004-05-10 13:30:12-07:00, akpm@osdl.org [PATCH] sched_getaffinity vs cpu hotplug race fix From: Srivatsa Vaddagiri Fix the race in sys_sched_getaffinity. Patch below takes cpu_hotplug lock before reading cpus_allowed mask of a task. ChangeSet@1.1608.6.36, 2004-05-10 13:30:01-07:00, akpm@osdl.org [PATCH] Move migrate_all_tasks to CPU_DEAD handling From: Srivatsa Vaddagiri migrate_all_tasks is currently run with rest of the machine stopped. It iterates thr' the complete task table, turning off cpu affinity of any task that it finds affine to the dying cpu. Depending on the task table size this can take considerable time. All this time machine is stopped, doing nothing. Stopping the machine for such extended periods can be avoided if we do task migration in CPU_DEAD notification and that's precisely what this patch does. The patch puts idle task to the _front_ of the dying CPU's runqueue at the highest priority possible. This cause idle thread to run _immediately_ after kstopmachine thread yields. Idle thread notices that its cpu is offline and dies quickly. Task migration can then be done at leisure in CPU_DEAD notification, when rest of the CPUs are running. Some advantages with this approach are: - More scalable. Predicatable amout of time that machine is stopped. - No changes to hot path/core code. We are just exploiting scheduler rules which runs the next high-priority task on the runqueue. Also since I put idle task to the _front_ of the runqueue, there are no races when a equally high priority task is woken up and added to the runqueue. It gets in at the back of the runqueue, _after_ idle task! - cpu_is_offline check that is presenty required in try_to_wake_up, idle_balance and rebalance_tick can be removed, thus speeding them up a bit From: Srivatsa Vaddagiri Rusty mentioned that the unlikely hints against cpu_is_offline is redundant since the macro already has that hint. Patch below removes those redundant hints I added. ChangeSet@1.1608.6.35, 2004-05-10 13:29:51-07:00, akpm@osdl.org [PATCH] sched: Look at another CPU's domain From: Nick Piggin The SMT wake_idle code really wants to look at a non-local CPU's domain in order to check for idle siblings. So change the domain attachment code a little bit so we continue to hold a runqueue's lock while attaching a new domain. This means the locking rules have changed to: you may access your own domain without any lock, you must hold a remote runqueue's lock in order to view its domain. ChangeSet@1.1608.6.34, 2004-05-10 13:29:40-07:00, akpm@osdl.org [PATCH] sched: micro-optimisation for wake_up From: Nick Piggin This actually does produce better code, especially under the locked section. Turns a conditional + unconditional jump under the lock in the unlikely case into a cmov outside the lock. ChangeSet@1.1608.6.33, 2004-05-10 13:29:30-07:00, akpm@osdl.org [PATCH] sched: reduce idle time From: Nick Piggin It makes NEWLY_IDLE balances cause find_busiest_group return the busiest available group even if there isn't an imbalance. Basically - try a bit harder to prevent schedule emptying the runqueue. It is quite aggressive, but that isn't so bad because we don't (by default) do NEWLY_IDLE balancing across NUMA nodes, and NEWLY_IDLE balancing is always restricted to cache_hot tasks. It picked up a little bit of idle time that dbt2-pgsql was seeing... ChangeSet@1.1608.6.32, 2004-05-10 13:29:19-07:00, akpm@osdl.org [PATCH] sched: balance-on-clone From: Ingo Molnar Implement balancing during clone(). It does the following things: - introduces SD_BALANCE_CLONE that can serve as a tool for an architecture to limit the search-idlest-CPU scope on clone(). E.g. the 512-CPU systems should rather not enable this. - uses the highest sd for the imbalance_pct, not this_rq (which didnt make sense). - unifies balance-on-exec and balance-on-clone via the find_idlest_cpu() function. Gets rid of sched_best_cpu() which was still a bit inconsistent IMO, it used 'min_load < load' as a condition for balancing - while a more correct approach would be to use half of the imbalance_pct, like passive balancing does. - the patch also reintroduces the possibility to do SD_BALANCE_EXEC on SMP systems, and activates it - to get testing. - NOTE: there's one thing in this patch that is slightly unclean: i introduced wake_up_forked_thread. I did this to make it easier to get rid of this patch later (wake_up_forked_process() has lots of dependencies in various architectures). If this capability remains in the kernel then i'll clean it up and introduce one function for wake_up_forked_process/thread. - NOTE2: i added the SD_BALANCE_CLONE flag to the NUMA CPU template too. Some NUMA architectures probably want to disable this. ChangeSet@1.1608.6.31, 2004-05-10 13:29:07-07:00, akpm@osdl.org [PATCH] sched: cpu load management cleanup From: Ingo Molnar This does the source/target cleanup. This is a no-functionality patch which also adds more comments to explain these functions. ChangeSet@1.1608.6.30, 2004-05-10 13:28:57-07:00, akpm@osdl.org [PATCH] sched: passive balancing damping From: Nick Piggin This patch starts to balance woken processes when half the relevant domain's imbalance_pct is reached. Previously balancing would start after a small, constant difference in waker/wakee runqueue loads was reached, which would cause too much process movement when there are lots of processes running. It also turns wake balancing into a domain flag while previously it was always on. Now sched domains can "soft partition" an SMP system without using processor affinities. ChangeSet@1.1608.6.29, 2004-05-10 13:28:46-07:00, akpm@osdl.org [PATCH] sched: cleanups From: Ingo Molnar This re-adds cleanups which were lost in splitups of an earlier patch. ChangeSet@1.1608.6.28, 2004-05-10 13:28:35-07:00, akpm@osdl.org [PATCH] sched: lock cpu_attach_domain for hotplug From: Nick Piggin The attached patch is required to work correctly with the CPU hotplug framework. John Hawkes reports successful booting with this. ChangeSet@1.1608.6.27, 2004-05-10 13:28:20-07:00, akpm@osdl.org [PATCH] sched: extend sync wakeups From: Ingo Molnar The attached patch extends sync wakeups to the process sys_exit() path too: the chldwait wakeup can be done sync, since we know that the process is going to exit (and thus deschedule). The most visible effect of this change is strace's behavior on SMP systems: it now stays on a single CPU, together with the traced child. (previously it would run in parallel to the child, bouncing around madly.) ChangeSet@1.1608.6.26, 2004-05-10 13:28:10-07:00, akpm@osdl.org [PATCH] sched: add enqueeu_task_head() From: Ingo Molnar Helper function for later patches ChangeSet@1.1608.6.25, 2004-05-10 13:27:59-07:00, akpm@osdl.org [PATCH] sched: uninlinings From: Ingo Molnar Uninline things ChangeSet@1.1608.6.24, 2004-05-10 13:27:48-07:00, akpm@osdl.org [PATCH] sched: minor cleanups From: Nick Piggin Minor cleanups from Ingo's patch including task_hot (do it right in try_to_wake_up too). ChangeSet@1.1608.6.23, 2004-05-10 13:27:37-07:00, akpm@osdl.org [PATCH] sched: fix setup races From: Nick Piggin De-racify the sched domain setup code. This involves creating a dummy "init" domain during sched_init (which is called early). When topology information becomes available, the sched domains are then built and attached. The attach mechanism is asynchronous and uses the migration threads, which perform the switch with interrupts off. This is a quiescent state, so domains can still be lockless on the read side. It also allows us to change the domains at runtime without much more work. This is something SGI is interested in to elegantly do soft partitioning of their systems without having to use hard cpu affinities (which cause balancing problems of their own). The current setup code also has a race somewhere because it is unable to boot on a 384 CPU system. From: Anton Blanchard This is basically a mindless ppc64 merge of the x86 changes to sched domain init code. Actually if I produce a sibling_map[] then the x86 code and the ppc64 will be identical. Maybe we can merge it. ChangeSet@1.1608.6.22, 2004-05-10 13:27:24-07:00, akpm@osdl.org [PATCH] ARCH_HAS_SCHED_WAKE_BALANCE doesnt exist From: Anton Blanchard It seems someone has been making trivial changes without using grep. ChangeSet@1.1608.6.21, 2004-05-10 13:27:13-07:00, akpm@osdl.org [PATCH] ppc64: sched-domain support From: Anton Blanchard Below are the diffs between the current ppc64 sched init stuff and x86. - Ignore the POWER5 specific stuff, I dont set up a sibling map yet. - What should I set cache_hot_time to? large cpumask typechecking requirements (perhaps useful on x86 as well): - cpu->cpumask = CPU_MASK_NONE -> cpus_clear(cpu->cpumask); - cpus_and(nodemask, node_to_cpumask(i), cpu_possible_map) doesnt work, need to use a temporary ChangeSet@1.1608.6.20, 2004-05-10 13:27:02-07:00, akpm@osdl.org [PATCH] sched: oops fix From: Nick Piggin After the for_each_domain change, the warn here won't trigger, instead it will oops in the if statement. Also, make sure we don't pass an empty cpumask to for_each_cpu. ChangeSet@1.1608.6.19, 2004-05-10 13:26:51-07:00, akpm@osdl.org [PATCH] sched: altix tuning From: Nick Piggin From: John Hawkes The following brings up performance on a 64-way Altix. This system being on the smaller end of the scale should also be applicable to other NUMA systems. ChangeSet@1.1608.6.18, 2004-05-10 13:26:40-07:00, akpm@osdl.org [PATCH] sched: fix imbalance calculations From: Nick Piggin Imbalance calculations were not right. This would cause unneeded migration. ChangeSet@1.1608.6.17, 2004-05-10 13:26:30-07:00, akpm@osdl.org [PATCH] sched: wakeup balancing fixes From: Nick Piggin Make affine wakes and "passive load balancing" more conservative. Aggressive affine wakeups were causing huge regressions in dbt3-pgsql on 8-way non NUMA systems at OSDL's STP. ChangeSet@1.1608.6.16, 2004-05-10 13:26:19-07:00, akpm@osdl.org [PATCH] Hotplug CPU sched_balance_exec Fix From: Rusty Russell From: Srivatsa Vaddagiri From: Andrew Morton From: Rusty Russell We want to get rid of lock_cpu_hotplug() in sched_migrate_task. Found that lockless migration of execing task is _extremely_ racy. The races I hit are described below, alongwith probable solutions. Task migration done elsewhere should be safe (?) since they either hold the lock (sys_sched_setaffinity) or are done entirely with preemption disabled (load_balance). sched_balance_exec does: a. disables preemption b. finds new_cpu for current c. enables preemption d. calls sched_migrate_task to migrate current to new_cpu and sched_migrate_task does: e. task_rq_lock(p) f. migrate_task(p, dest_cpu ..) (if we have to wait for migration thread) g. task_rq_unlock() h. wake_up_process(rq->migration_thread) i. wait_for_completion() Several things can happen here: 1. new_cpu can go down after h and before migration thread has got around to handle the request ==> we need to add a cpu_is_offline check in __migrate_task 2. new_cpu can go down between c and d or before f. ===> Even though this case is automatically handled by the above change (migrate_task being called on a running task, current, will delegate migration to migration thread), would it be good practice to avoid calling migrate_task in the first place itself when dest_cpu is offline. This means adding another cpu_is_offline check after e in sched_migrate_task 3. The 'current' task can get preempted _immediately_ after g and when it comes back, task_cpu(p) can be dead. In which case, it is invalid to do wake_up on a non-existent migration thread. (rq->migration_thread can be NULL). ===> We should disable preemption thr' g and h 4. Before migration thread gets around to handle the request, its cpu goes dead. This will leave unhandled migration requests in the dead cpu. ===> We need to wakeup sleeping requestors (if any) in CPU_DEAD notification. I really wonder if we can get rid of these issues by avoiding balancing at exec time and instead have it balanced during load_balance ..Alternately if this is valuable and we want to retain it, I think we still need to consider a read/write sem, with sched_migrate_task doing down_read_trylock. This may eliminate the deadlock I hit between cpu_up and CPU_UP_PREPARE notification, which had forced me away from r/w sem. Anyway patch below addresses the above races. Its against 2.6.6-rc2-mm1 and has been tested on a 4way Intel Pentium SMP m/c. Rusty sez: Two other changes: 1) I grabbed a reference to the thread, rather than using preempt_disable(). It's the more obvious way I think. 2) Why the wait_to_die code? It might be needed if we move tasks after stop_machine, but for nowI don't see the problem with the migration thread running on the wrong CPU for a bit: nothing is on this runqueue so active_load_balance is safe, and __migrate task will be a noop (due to cpu_is_offline() check). If there is a problem, your fix is racy, because we could be preempted immediately afterwards. So I just stop the kthread then wakeup any remaining... ChangeSet@1.1608.6.15, 2004-05-10 13:26:09-07:00, akpm@osdl.org [PATCH] sched: trivial fixes, cleanups From: Ingo Molnar The trivial fixes. - added recent trivial bits from Nick's and my patches. - hotplug CPU fix - early init cleanup ChangeSet@1.1608.6.14, 2004-05-10 13:25:57-07:00, akpm@osdl.org [PATCH] Reduce TLB flushing during process migration From: Martin Hicks Another optimization patch from Jack Steiner, intended to reduce TLB flushes during process migration. Most architextures should define tlb_migrate_prepare() to be flush_tlb_mm(), but on i386, it would be a wasted flush, because i386 disconnects previous cpus from the tlb flush automatically. ChangeSet@1.1608.6.13, 2004-05-10 13:25:45-07:00, akpm@osdl.org [PATCH] sched: add local load metrics From: Nick Piggin This patch removes the per runqueue array of NR_CPU arrays. Each time we want to check a remote CPU's load we check nr_running as well anyway, so introduce a cpu_load which is the load of the local runqueue and is kept updated in the timer tick. Put them in the same cacheline. This has additional benefits of having the cpu_load consistent across all CPUs and more up to date. It is sampled better too, being updated once per timer tick. This shouldn't make much difference in scheduling behaviour, but all benchmarks are either as good or better on the 16-way NUMAQ: hackbench, reaim, volanomark are about the same, tbench and dbench are maybe a bit better. kernbench is about one percent better. John reckons it isn't a big deal, but it does save 4K per CPU or 2MB total on his big systems, so I figure it must be a bit kinder on the caches. I think it is just nicer in general anyway. ChangeSet@1.1608.6.12, 2004-05-10 13:25:34-07:00, akpm@osdl.org [PATCH] sched: SMT niceness handling From: Con Kolivas This patch provides full per-package priority support for SMT processors (aka pentium4 hyperthreading) when combined with CONFIG_SCHED_SMT. It maintains cpu percentage distribution within each physical cpu package by limiting the time a lower priority task can run on a sibling cpu concurrently with a higher priority task. It introduces a new flag into the scheduler domain unsigned int per_cpu_gain; /* CPU % gained by adding domain cpus */ This is empirically set to 15% for pentium4 at the moment and can be modified to support different values dynamically as newer processors come out with improved SMT performance. It should not matter how many siblings there are. How it works is it compares tasks running on sibling cpus and when a lower static priority task is running it will delay it till high_priority_timeslice * (100 - per_cpu_gain) / 100 <= low_prio_timeslice eg. a nice 19 task timeslice is 10ms and nice 0 timeslice is 102ms On vanilla the nice 0 task runs on one logical cpu while the nice 19 task runs unabated on the other logical cpu. With smtnice the nice 0 runs on one logical cpu for 102ms and the nice 19 sleeps till the nice 0 task has 12ms remaining and then will schedule. Real time tasks and kernel threads are not altered by this code, and kernel threads do not delay lower priority user tasks. with lots of thanks to Zwane Mwaikambo and Nick Piggin for help with the coding of this version. If this is merged, it is probably best to delay pushing this upstream in mainline till sched_domains gets tested for at least one major release. ChangeSet@1.1608.6.11, 2004-05-10 13:25:22-07:00, akpm@osdl.org [PATCH] sched_domains: use cpu_possible_map From: Nick Piggin This changes sched domains to contain all possible CPUs, and check for online as needed. It's in order to play nicely with CPU hotplug. ChangeSet@1.1608.6.10, 2004-05-10 13:25:11-07:00, akpm@osdl.org [PATCH] sched-group-power From: Nick Piggin The following patch implements a cpu_power member to struct sched_group. This allows special casing to be removed for SMT groups in the balancing code. It does not take CPU hotplug into account yet, but that shouldn't be too hard. I have tested it on the NUMAQ by pretending it has SMT. Works as expected. Active balances across nodes. ChangeSet@1.1608.6.9, 2004-05-10 13:25:00-07:00, akpm@osdl.org [PATCH] sched_balance_exec(): don't fiddle with the cpus_allowed mask From: Rusty Russell , Nick Piggin The current sched_balance_exec() sets the task's cpus_allowed mask temporarily to move it to a different CPU. This has several issues, including the fact that a task will see its affinity at a bogus value. So we change the migration_req_t to explicitly specify a destination CPU, rather than the migration thread deriving it from cpus_allowed. If the requested CPU is no longer valid (racing with another set_cpus_allowed, say), it can be ignored: if the task is not allowed on this CPU, there will be another migration request pending. This change allows sched_balance_exec() to tell the migration thread what to do without changing the cpus_allowed mask. So we rename __set_cpus_allowed() to move_task(), as the cpus_allowed mask is now set by the caller. And move_task_away(), which the migration thread uses to actually perform the move, is renamed __move_task(). I also ignore offline CPUs in sched_best_cpu(), so sched_migrate_task() doesn't need to check for offline CPUs. Ulterior motive: this approach also plays well with CPU Hotplug. Previously that patch might have seen a task with cpus_allowed only containing the dying CPU (temporarily due to sched_balance_exec) and forcibly reset it to all cpus, which might be wrong. The other approach is to hold the cpucontrol sem around sched_balance_exec(), which is too much of a bottleneck. ChangeSet@1.1608.6.8, 2004-05-10 13:24:49-07:00, akpm@osdl.org [PATCH] sched: handle inter-CPU jiffies skew From: Nick Piggin John Hawkes discribed this problem to me: There *is* a small problem in this area, though, that SuSE avoids. "jiffies" gets updated by cpu0. The other CPUs may, over time, get out of sync (and they're initialized on ia64 to start out being out of sync), so it's no guarantee that every CPU will wake up from its timer interrupt and see a "jiffies" value that is guaranteed to be last_jiffies+1. Sometimes the jiffies value may be unchanged since the last wakeup. Sometimes the jiffies value may have incremented by 2 (or more, especially if cpu0's interrupts are disabled for long stretches of time). So an algoithm that says, "I'll call load_balance() only when jiffies is *exactly* N" is going to fail on occasion, either by calling load_balance() too often or not often enough. *** I fixed this by adding a last_balance field to struct sched_domain, and working off that. ChangeSet@1.1608.6.7, 2004-05-10 13:24:38-07:00, akpm@osdl.org [PATCH] sched: implement domains for i386 HT From: Nick Piggin The following patch builds a scheduling description for the i386 architecture using cpu_sibling_map to set up SMT if CONFIG_SCHED_SMT is set. It could be made more fancy and collapse degenerate domains at runtime (ie. 1 sibling per CPU, or 1 NUMA node in the computer). From: Zwane Mwaikambo This fixes an oops due to cpu_sibling_map being uninitialised when a system with no MP table (most UP boxen) boots a CONFIG_SMT kernel. What also happens is that the cpu_group lists end up not being terminated properly, but this oops kills it first. Patch tested on UP w/o MP table, 2x P2 and UP Xeon w/ no siblings. From: "Martin J. Bligh" , Nick Piggin Change arch_init_sched_domains to use cpu_online_map From: Anton Blanchard Fix build with NR_CPUS > BITS_PER_LONG ChangeSet@1.1608.6.6, 2004-05-10 13:24:26-07:00, akpm@osdl.org [PATCH] sched: cpu_sibling_map to cpu_mask From: Nick Piggin This is a (somewhat) trivial patch which converts cpu_sibling_map from an array of CPUs to an array of cpumasks. Needed for >2 siblings per package, but it actually can simplify code as it allows the cpu_sibling_map to be set up even when there is 1 sibling per package. Intel want this, I use it in the next patch to build scheduling domains for the P4 HT. From: Thomas Schlichter Build fix From: "Pallipadi, Venkatesh" Fix to handle more than 2 siblings per package. ChangeSet@1.1608.6.5, 2004-05-10 13:24:15-07:00, akpm@osdl.org [PATCH] scheduler domain balancing improvements From: Nick Piggin This patch gets the sched_domain scheduler working better WRT balancing. Its been tested on the NUMAQ. Among other things it changes to the way SMT load calculation works so as not to active load blances when it shouldn't. It still has a problem with SMT and NUMA: it will put a task on each sibling in a node before moving tasks to another node. It should probably start moving tasks after each *physical* CPU is filled. To fix, you need "how much CPU power in this domain?" At the moment we approximate # runqueues == CPU power, and hack around it at the CPU physical domain by counting all sibling runqueues as 1. It isn't hard to correctly work the CPU power out, but once CPU hotplug is in the equation it becomes much more hotplug events. If anyone is actually interested in getting this fixed, that is. ChangeSet@1.1608.6.4, 2004-05-10 13:24:05-07:00, akpm@osdl.org [PATCH] sched_domain debugging From: Nick Piggin Anton was attempting to make a sched domain topology for his POWER5 and was having some trouble. This patch only includes code which is ifdefed out, but hopefully it will be of some use to implementors. ChangeSet@1.1608.6.3, 2004-05-10 13:23:54-07:00, akpm@osdl.org [PATCH] sched: scheduler domain support From: Nick Piggin This is the core sched domains patch. It can handle any number of levels in a scheduling heirachy, and allows architectures to easily customize how the scheduler behaves. It also provides progressive balancing backoff needed by SGI on their large systems (although they have not yet tested it). It is built on top of (well, uses ideas from) my previous SMP/NUMA work, and gets results very similar to them when using the default scheduling description. Benchmarks ========== Martin was seeing I think 10-20% better system times in kernbench on the 32 way. I was seeing improvements in dbench, tbench, kernbench, reaim, hackbench on a 16-way NUMAQ. Hackbench in fact had a non linear element which is all but eliminated. Large improvements in volanomark. Cross node task migration was decreased in all above benchmarks, sometimes by a factor of 100!! Cross CPU migration was also generally decreased. See this post: http://groups.google.com.au/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&frame=right&th=a406c910b30cbac4&seekm=UAdQ.3hj.5%40gated-at.bofh.it#link2 Results on a hyperthreading P4 are equivalent to Ingo's shared runqueues patch (which is a big improvement). Some examples on the 16-way NUMAQ (this is slightly older sched domain code): http://www.kerneltrap.org/~npiggin/w26/hbench.png http://www.kerneltrap.org/~npiggin/w26/vmark.html From: Jes Sorensen Tiny patch to make -mm3 compile on an NUMA box with NR_CPUS > BITS_PER_LONG. From: "Martin J. Bligh" Fix a minor nit with the find_busiest_group code. No functional change, but makes the code simpler and clearer. This patch does two things ... adds some more expansive comments, and removes this if clause: if (*imbalance < SCHED_LOAD_SCALE && max_load - this_load > SCHED_LOAD_SCALE) *imbalance = SCHED_LOAD_SCALE; If we remove the scaling factor, we're basically conditionally doing: if (*imbalance < 1) *imbalance = 1; Which is pointless, as the very next thing we do is to remove the scaling factor, rounding up to the nearest integer as we do: *imbalance = (*imbalance + SCHED_LOAD_SCALE - 1) >> SCHED_LOAD_SHIFT; Thus the if statement is redundant, and only makes the code harder to read ;-) From: Rick Lindsley In find_busiest_group(), after we exit the do/while, we select our imbalance. But max_load, avg_load, and this_load are all unsigned, so min(x,y) will make a bad choice if max_load < avg_load < this_load (that is, a choice between two negative [very large] numbers). Unfortunately, there is a bug when max_load never gets changed from zero (look in the loop and think what happens if the only load on the machine is being created by cpu groups of which we are a member). And you have a recipe for some really bogus values for imbalance. Even if you fix the max_load == 0 bug, there will still be times when avg_load - this_load will be negative (thus very large) and you'll make the decision to move stuff when you shouldn't have. This patch allows for this_load to set max_load, which if I understand the logic properly is correct. With this patch applied, the algorithm is *much* more conservative ... maybe *too* conservative but that's for another round of testing ... From: Ingo Molnar sched-find-busiest-fix ChangeSet@1.1608.6.2, 2004-05-10 13:23:42-07:00, akpm@osdl.org [PATCH] sched: improved resolution in find_busiest_node From: Nick Piggin From: Frank Cornelis In order to get the best possible resolution we need to use NR_CPUS instead of the constant value 10. load is an int, so no need to worry about overflows... ChangeSet@1.1608.6.1, 2004-05-10 13:23:31-07:00, akpm@osdl.org [PATCH] small scheduler cleanup From: Ingo Molnar From: Nick Piggin wrote: It removes the last place where we mess with run_list open coded. ChangeSet@1.1371.762.48, 2004-05-10 13:17:31-05:00, jejb@mulgrave.(none) Add SCSI IPR PCI Ids to pci_ids.h ChangeSet@1.1608.5.1, 2004-05-10 13:12:01-04:00, jgarzik@redhat.com Merge redhat.com:/spare/repo/netdev-2.6/8139too into redhat.com:/spare/repo/net-drivers-2.6 ChangeSet@1.1371.762.47, 2004-05-10 11:27:04-05:00, jejb@mulgrave.(none) Add IBM power RAID driver 2.0.6 From: Brian King ChangeSet@1.1608.3.2, 2004-05-10 10:55:37-05:00, shaggy@austin.ibm.com JFS: module unload was not removing /proc/fs/jfs/ ChangeSet@1.1371.762.46, 2004-05-10 09:52:40-05:00, noodles@earth.li [PATCH] Initio INI-9X00U/UW error handling in 2.6 Plumb old error handling into new eh infrastructure. ChangeSet@1.1371.762.45, 2004-05-10 09:49:16-05:00, jejb@mulgrave.(none) sym53c500_cs remove irq,ioport scsi attributes From: Bob Tracy ChangeSet@1.1371.762.44, 2004-05-10 09:39:21-05:00, hch@lst.de [PATCH] mca_53c9x needs CONFIG_MCA_LEGACY ChangeSet@1.1371.762.43, 2004-05-10 09:37:24-05:00, hch@lst.de [PATCH] missing pci_set_master in megaraid ChangeSet@1.1371.762.42, 2004-05-10 09:36:45-05:00, hch@lst.de [PATCH] imm/ppa style police fix remaining style problems after Al ressurrected the drivers. ChangeSet@1.1371.762.41, 2004-05-10 09:35:31-05:00, andrew.vasquez@qlogic.com [PATCH] PATCH [15/15] qla2xxx: Update driver version Update version number to 8.00.00b12-k. drivers/scsi/qla2xxx/qla_version.h | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) ChangeSet@1.1371.762.40, 2004-05-10 09:34:47-05:00, jejb@mulgrave.(none) PATCH [14/15] qla2xxx: Resync with latest released firmware -- 3.02.28. From: Andrew Vasquez drivers/scsi/qla2xxx/ql2300_fw.c |12380 +++++++++++++++++++-------------------- drivers/scsi/qla2xxx/ql2322_fw.c |11812 ++++++++++++++++++------------------- drivers/scsi/qla2xxx/ql6312_fw.c |10174 ++++++++++++++++---------------- drivers/scsi/qla2xxx/ql6322_fw.c |10352 ++++++++++++++++---------------- 4 files changed, 22368 insertions(+), 22350 deletions(-) ChangeSet@1.1371.762.39, 2004-05-10 09:31:56-05:00, andrew.vasquez@qlogic.com [PATCH] PATCH [13/15] qla2xxx: Misc. code scrubbing Misc. driver scrubbing: o Use kernel #define for PCI command register bit. o Fix rate-limiting check the queue-depth module parameter. o Clean-up comments. drivers/scsi/qla2xxx/qla_init.c | 2 +- drivers/scsi/qla2xxx/qla_mbx.c | 1 - drivers/scsi/qla2xxx/qla_os.c | 7 +++---- 3 files changed, 4 insertions(+), 6 deletions(-) ChangeSet@1.1371.762.38, 2004-05-10 09:30:37-05:00, andrew.vasquez@qlogic.com [PATCH] PATCH [12/15] qla2xxx: RIO/ZIO fixes RIO/ZIO fixes: o Reduce register access during RIO operation by checking for a 'dirtied' signature. o Fix problem where ZIO mode handling could result in a nasty recursive call-frame. drivers/scsi/qla2xxx/qla_os.c | 5 +---- 1 files changed, 1 insertion(+), 4 deletions(-) ChangeSet@1.1371.762.37, 2004-05-10 09:29:37-05:00, andrew.vasquez@qlogic.com [PATCH] PATCH [11/15] qla2xxx: /proc fixes /proc file updates: o Address 'unaligned access' message on ia64 platorms while displaying bit-field flags. o Iterate through the the OS target array to display target ID bindings. drivers/scsi/qla2xxx/qla_os.c | 30 ++++++++++++------------------ 1 files changed, 12 insertions(+), 18 deletions(-) ChangeSet@1.1371.762.36, 2004-05-10 09:28:20-05:00, andrew.vasquez@qlogic.com [PATCH] PATCH [10/15] qla2xxx: Use readX_relaxed Jeremy Higdon : For those to whom this is new (it was discussed on linux-kernel and linux-ia64 I believe), normal PCI register reads imply that PCI DMA writes that occured prior to the PCI MMR (memory mapped register) read (on the PCI bus) will be reflected in system memory once the MMR read is complete. On our platforms, we can speed up the MMR read significantly if that ordering requirement is "relaxed". So I attempted to find the common register reads that don't have a need for this ordering so that I could make them use this faster read. drivers/scsi/qla2xxx/qla_def.h | 3 +++ drivers/scsi/qla2xxx/qla_iocb.c | 6 +++--- drivers/scsi/qla2xxx/qla_isr.c | 2 +- 3 files changed, 7 insertions(+), 4 deletions(-) ChangeSet@1.1371.762.35, 2004-05-10 09:27:07-05:00, andrew.vasquez@qlogic.com [PATCH] PATCH [9/15] qla2xxx: Tape command handling fixes Fix several problems when handling commands issued to tape devices: 1) insure commands are not prematurely returned to the mid-layer with a failed status during loop/fabric transitions. 2) tape commands tend to have rather 'long' timeout values, unfortunately, as the these values increase into the 17 to 20 minute range (and larger), the cumulative skew of the RISC's own timer result in commands being held for seconds beyond their defined timeout values. Compensate for this in the driver's command timeout function. drivers/scsi/qla2xxx/qla_def.h | 3 + drivers/scsi/qla2xxx/qla_init.c | 4 ++ drivers/scsi/qla2xxx/qla_isr.c | 10 ++--- drivers/scsi/qla2xxx/qla_os.c | 74 ++++++++++++++++++++++++++++++++++++---- 4 files changed, 79 insertions(+), 12 deletions(-) ChangeSet@1.1371.762.34, 2004-05-10 09:25:58-05:00, andrew.vasquez@qlogic.com [PATCH] PATCH [8/15] qla2xxx: Volatile topology fixes Fix problem where during ISP initialization in a volatile topology (i.e. fabric environment with large number of streaming RSCNs) the driver would loop indefinitely or hang due to termination of an invalid thread pid. drivers/scsi/qla2xxx/qla_init.c | 142 ++++++++++++++++------------------------ drivers/scsi/qla2xxx/qla_os.c | 1 2 files changed, 60 insertions(+), 83 deletions(-) ChangeSet@1.1371.762.33, 2004-05-10 09:24:46-05:00, andrew.vasquez@qlogic.com [PATCH] PATCH [7/15] qla2xxx: Firmware options fixes Cleanup retrieval and update of firmware options: o Update only valid for non-(2[12]00) ISPs. o Instruct firmware to return completed IOCBs without waiting for an ABTS to complete. drivers/scsi/qla2xxx/qla_init.c | 79 +++++++++++++++++++++++++--------------- 1 files changed, 50 insertions(+), 29 deletions(-) ChangeSet@1.1371.762.32, 2004-05-10 09:23:44-05:00, andrew.vasquez@qlogic.com [PATCH] PATCH [6/15] qla2xxx: LoopID downcast fix Fix problem where the driver would incorrectly down-cast the target loop_id while retrieving link statistics. drivers/scsi/qla2xxx/qla_gbl.h | 2 +- drivers/scsi/qla2xxx/qla_mbx.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) ChangeSet@1.1371.762.31, 2004-05-10 09:21:29-05:00, andrew.vasquez@qlogic.com [PATCH] PATCH [5/15] qla2xxx: Debug messages during ISP abort Issue a kernel warning message before initiating an ISP abort (big hammer) -- additional debugging mechanism in case of event. drivers/scsi/qla2xxx/qla_mbx.c | 9 +++++++++ drivers/scsi/qla2xxx/qla_os.c | 2 ++ drivers/scsi/qla2xxx/qla_rscn.c | 2 ++ 3 files changed, 13 insertions(+) ChangeSet@1.1371.762.30, 2004-05-10 09:20:18-05:00, andrew.vasquez@qlogic.com [PATCH] PATCH [4/15] qla2xxx: PortID binding fixes Fix problem where port ID binding would not be honoured when a device was moved within the fabric. drivers/scsi/qla2xxx/qla_init.c | 33 ++++++++++++++++++++++++--------- 1 files changed, 24 insertions(+), 9 deletions(-) ChangeSet@1.1371.762.29, 2004-05-10 09:18:48-05:00, andrew.vasquez@qlogic.com [PATCH] PATCH [3/15] qla2xxx: 2100 request-q contraints Older, notably the ISP2100, chips have some contraints for the request queue depth and number of scatter-gather elements allowed for a given command. For this chip, reduce request queue size to 128 and maximum number of scatter-gather entries for a command to 32. drivers/scsi/qla2xxx/qla_def.h | 14 +++----------- drivers/scsi/qla2xxx/qla_init.c | 9 +++++---- drivers/scsi/qla2xxx/qla_iocb.c | 14 +++++++------- drivers/scsi/qla2xxx/qla_os.c | 14 +++++++++----- drivers/scsi/qla2xxx/qla_rscn.c | 2 +- 5 files changed, 25 insertions(+), 28 deletions(-) ChangeSet@1.1371.762.28, 2004-05-10 09:17:18-05:00, andrew.vasquez@qlogic.com [PATCH] PATCH [2/15] qla2xxx: Remove flash routines Remove flash support from embedded driver: o Remove unused option-rom variables from host structure. o Remove flash manipulation routines. drivers/scsi/qla2xxx/qla_def.h | 2 drivers/scsi/qla2xxx/qla_gbl.h | 8 drivers/scsi/qla2xxx/qla_init.c | 3 drivers/scsi/qla2xxx/qla_sup.c | 446 ---------------------------------------- 4 files changed, 459 deletions(-) ChangeSet@1.1371.762.27, 2004-05-10 09:15:55-05:00, andrew.vasquez@qlogic.com [PATCH] PATCH [1/15] qla2xxx: Firmware dump fixes ISP dump routine fixes: o Properly release hardware_lock in failure path. o Fix inability to complete ISP2100 dump, by properly reseting the RISC after register reads. drivers/scsi/qla2xxx/qla_dbg.c | 34 ++++++++++++---------------------- 1 files changed, 12 insertions(+), 22 deletions(-) ChangeSet@1.1371.762.26, 2004-05-10 09:15:12-05:00, jejb@mulgrave.(none) MPT Fusion driver 3.01.06 update From: Moore, Eric Dean ChangeSet@1.1371.762.25, 2004-05-10 09:10:59-05:00, brking@us.ibm.com [PATCH] Make SCSI timeout modifiable add a timeout field to struct scsi_device and expose it in in sysfs. This patch allows LLDs to override the default timeout used for scsi devices and exposes it in sysfs. The default timeout value used is too short for many RAID array devices, such as those created by the ipr driver. ChangeSet@1.1611, 2004-05-10 09:56:02+01:00, aia21@cantab.net NTFS: 2.1.9 release - Fix two bugs in the decompression engine in handling of corner cases. ChangeSet@1.1608.1.9, 2004-05-09 19:30:37-07:00, torvalds@ppc970.osdl.org Linux 2.6.6 TAG: v2.6.6