- 09 Oct, 2007 18 commits
-
-
Trond Myklebust authored
Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
\"Talpey, Thomas\ authored
This implements the configuration and building of the core transport switch implementation of the rpcrdma transport. Stubs are provided for the rpcrdma protocol handling, and the infiniband/iwarp verbs interface. These are provided in following patches. Signed-off-by:
Tom Talpey <talpey@netapp.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
\"Talpey, Thomas\ authored
This file implements the configuration target, protocol template and constants for the rpcrdma transport framing, for use by the xprtrdma rpc transport implementation. Signed-off-by:
Tom Talpey <talpey@netapp.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
\"Talpey, Thomas\ authored
To prepare for including non-sockets-based RPC transports, select RPC transports by an identifier (to be used in following patches). Signed-off-by:
Tom Talpey <tmt@netapp.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
\"Talpey, Thomas\ authored
To prepare for including non-sockets-based RPC transports, move the sockets-dependent definitions into their own file. Signed-off-by:
Tom Talpey <tmt@netapp.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
\"Talpey, Thomas\ authored
To prepare for including non-sockets-based RPC transports, change the overly suggestive name of the transport creation arguments struct. Signed-off-by:
Tom Talpey <tmt@netapp.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
\"Talpey, Thomas\ authored
To allow transport capabilities to be loaded dynamically, provide an API for registering and unregistering the transports with the RPC client. Eventually xprt_create_transport() will be changed to search the list of registered transports when initializing a fresh transport. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Tom Talpey <tmt@netapp.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
\"Talpey, Thomas\ authored
Adds a flag word to the xdrbuf struct which indicates any bulk disposition of the data. This enables RPC transport providers to marshal it efficiently/appropriately, and may enable other optimizations. Signed-off-by:
Tom Talpey <tmt@netapp.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
\"Talpey, Thomas\ authored
The rpcbind (v3+) netid is provided by each RPC client transport. This fixes an omission in IPv6 rpcbind client support, and enables future extension. Signed-off-by:
Tom Talpey <tmt@netapp.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
\"Talpey, Thomas\ authored
Move the TCP/UDP rpcbind netid's from the rpcbind client to a global header. Signed-off-by:
Tom Talpey <tmt@netapp.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Chuck Lever authored
/home/cel/linux/net/sunrpc/clnt.c: In function ‘rpc_bind_new_program’: /home/cel/linux/net/sunrpc/clnt.c:445: warning: comparison between signed and unsigned RPC version numbers are u32, not int. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Chuck Lever authored
"Universal addresses" are a string representation of an IP address and port. They are described fully in RFC 3530, section 2.2. Add support for generating them in the RPC client's socket transport module. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com>
-
Chuck Lever authored
Add support for the NFS client's need to export volume information with IP addresses formatted in hex instead of decimal. This isn't used yet, but subsequent patches (not in this series) will change the NFS client to use this functionality. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Fabio Olive Leite authored
I would like to discuss the idea that the current checks for attribute timeout using time_after are inadequate for 32bit architectures, since time_after works correctly only when the two timestamps being compared are within 2^31 jiffies of each other. The signed overflow caused by comparing values more than 2^31 jiffies apart will flip the result, causing incorrect assumptions of validity. 2^31 jiffies is a fairly large period of time (~25 days) when compared to the lifetime of most kernel data structures, but for long lived NFS mounts that can sit idle for months (think that for some reason autofs cannot be used), it is easy to compare inode attribute timestamps with very disparate or even bogus values (as in when jiffies have wrapped many times, where the comparison doesn't even make sense). Currently the code tests for attribute timeout by simply adding the desired amount of jiffies to the stored timestamp and comparing that with the current timestamp of obtained attribute data with time_after. This is incorrect, as it returns true for the desired timeout period and another full 2^31 range of jiffies. In testing with artificial jumps (several small jumps, not one big crank) of the jiffies I was able to reproduce a problem found in a server with very long lived NFS mounts, where attributes would not be refreshed even after touching files and directories in the server: Initial uptime: 03:42:01 up 6 min, 0 users, load average: 0.01, 0.12, 0.07 NFS volume is mounted and time is advanced: 03:38:09 up 25 days, 2 min, 0 users, load average: 1.22, 1.05, 1.08 # ls -l /local/A/foo/bar /nfs/A/foo/bar -rw-r--r-- 1 root root 0 Dec 17 03:38 /local/A/foo/bar -rw-r--r-- 1 root root 0 Nov 22 00:36 /nfs/A/foo/bar # touch /local/A/foo/bar # ls -l /local/A/foo/bar /nfs/A/foo/bar -rw-r--r-- 1 root root 0 Dec 17 03:47 /local/A/foo/bar -rw-r--r-- 1 root root 0 Nov 22 00:36 /nfs/A/foo/bar We can see the local mtime is updated, but the NFS mount still shows the old value. The patch below makes it work: Initial setup... 07:11:02 up 25 days, 1 min, 0 users, load average: 0.15, 0.03, 0.04 # ls -l /local/A/foo/bar /nfs/A/foo/bar -rw-r--r-- 1 root root 0 Jan 11 07:11 /local/A/foo/bar -rw-r--r-- 1 root root 0 Jan 11 07:11 /nfs/A/foo/bar # touch /local/A/foo/bar # ls -l /local/A/foo/bar /nfs/A/foo/bar -rw-r--r-- 1 root root 0 Jan 11 07:14 /local/A/foo/bar -rw-r--r-- 1 root root 0 Jan 11 07:14 /nfs/A/foo/bar Signed-off-by:
Fabio Olive Leite <fleite@redhat.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Trond Myklebust authored
This helps prevent huge queues of background writes from building up whenever the server runs out of disk or quota space, or if someone changes the file access modes behind our backs. Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Trond Myklebust authored
The only user of nfs_sync_mapping_range() is nfs_getattr(), which uses it to flush out the entire inode without sending a commit. We therefore replace nfs_sync_mapping_range with a more appropriate helper. Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Trond Myklebust authored
The only user of this field was NFS. Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Trond Myklebust authored
The addition of nfs_page_mkwrite means that We should no longer need to create requests inside nfs_writepage() Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
- 08 Oct, 2007 1 commit
-
-
Peter Zijlstra authored
All the current page_mkwrite() implementations also set the page dirty. Which results in the set_page_dirty_balance() call to _not_ call balance, because the page is already found dirty. This allows us to dirty a _lot_ of pages without ever hitting balance_dirty_pages(). Not good (tm). Force a balance call if ->page_mkwrite() was successful. Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 07 Oct, 2007 1 commit
-
-
Linus Torvalds authored
It turns out that there are a few other five-second timers in the kernel, and if the timers get in sync, the load-average can get artificially inflated by events that just happen to coincide. So just offset the load average calculation it by a timer tick. Noticed by Anders Boström, for whom the coincidence started triggering on one of his machines with the JBD jiffies rounding code (JBD is one of the subsystems that also end up using a 5-second timer by default). Tested-by:
Anders Boström <anders@bostrom.dyndns.org> Cc: Chuck Ebbert <cebbert@redhat.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 26 Sep, 2007 1 commit
-
-
Linus Torvalds authored
This reverts commit 184c44d2 . As noted by Dave Jones: "Linus, please revert the above cset. It doesn't seem to be necessary (it was added to fix a miscompile in 'make allnoconfig' which doesn't seem to be repeatable with it reverted) and actively breaks the ARM SA1100 framebuffer driver." Requested-by:
Dave Jones <davej@redhat.com> Cc: Russell King <rmk+lkml@arm.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andi Kleen <ak@suse.de> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 20 Sep, 2007 1 commit
-
-
Davide Libenzi authored
This simplifies signalfd code, by avoiding it to remain attached to the sighand during its lifetime. In this way, the signalfd remain attached to the sighand only during poll(2) (and select and epoll) and read(2). This also allows to remove all the custom "tsk == current" checks in kernel/signal.c, since dequeue_signal() will only be called by "current". I think this is also what Ben was suggesting time ago. The external effect of this, is that a thread can extract only its own private signals and the group ones. I think this is an acceptable behaviour, in that those are the signals the thread would be able to fetch w/out signalfd. Signed-off-by:
Davide Libenzi <davidel@xmailserver.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 19 Sep, 2007 4 commits
-
-
Ingo Molnar authored
add /proc/sys/kernel/sched_compat_yield to make sys_sched_yield() more agressive, by moving the yielding task to the last position in the rbtree. with sched_compat_yield=0: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2539 mingo 20 0 1576 252 204 R 50 0.0 0:02.03 loop_yield 2541 mingo 20 0 1576 244 196 R 50 0.0 0:02.05 loop with sched_compat_yield=1: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2584 mingo 20 0 1576 248 196 R 99 0.0 0:52.45 loop 2582 mingo 20 0 1576 256 204 R 0 0.0 0:00.00 loop_yield Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl>
-
Lee Schermerhorn authored
This patch proposes fixes to the reference counting of memory policy in the page allocation paths and in show_numa_map(). Extracted from my "Memory Policy Cleanups and Enhancements" series as stand-alone. Shared policy lookup [shmem] has always added a reference to the policy, but this was never unrefed after page allocation or after formatting the numa map data. Default system policy should not require additional ref counting, nor should the current task's task policy. However, show_numa_map() calls get_vma_policy() to examine what may be [likely is] another task's policy. The latter case needs protection against freeing of the policy. This patch adds a reference count to a mempolicy returned by get_vma_policy() when the policy is a vma policy or another task's mempolicy. Again, shared policy is already reference counted on lookup. A matching "unref" [__mpol_free()] is performed in alloc_page_vma() for shared and vma policies, and in show_numa_map() for shared and another task's mempolicy. We can call __mpol_free() directly, saving an admittedly inexpensive inline NULL test, because we know we have a non-NULL policy. Handling policy ref counts for hugepages is a bit trickier. huge_zonelist() returns a zone list that might come from a shared or vma 'BIND policy. In this case, we should hold the reference until after the huge page allocation in dequeue_hugepage(). The patch modifies huge_zonelist() to return a pointer to the mempolicy if it needs to be unref'd after allocation. Kernel Build [16cpu, 32GB, ia64] - average of 10 runs: w/o patch w/ refcount patch Avg Std Devn Avg Std Devn Real: 100.59 0.38 100.63 0.43 User: 1209.60 0.37 1209.91 0.31 System: 81.52 0.42 81.64 0.34 Signed-off-by:
Lee Schermerhorn <lee.schermerhorn@hp.com> Acked-by:
Andi Kleen <ak@suse.de> Cc: Christoph Lameter <clameter@sgi.com> Acked-by:
Mel Gorman <mel@csn.ul.ie> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Pavel Emelyanov authored
It turned out, that the user namespace is released during the do_exit() in exit_task_namespaces(), but the struct user_struct is released only during the put_task_struct(), i.e. MUCH later. On debug kernels with poisoned slabs this will cause the oops in uid_hash_remove() because the head of the chain, which resides inside the struct user_namespace, will be already freed and poisoned. Since the uid hash itself is required only when someone can search it, i.e. when the namespace is alive, we can safely unhash all the user_struct-s from it during the namespace exiting. The subsequent free_uid() will complete the user_struct destruction. For example simple program #include <sched.h> char stack[2 * 1024 * 1024]; int f(void *foo) { return 0; } int main(void) { clone(f, stack + 1 * 1024 * 1024, 0x10000000, 0); return 0; } run on kernel with CONFIG_USER_NS turned on will oops the kernel immediately. This was spotted during OpenVZ kernel testing. Signed-off-by:
Pavel Emelyanov <xemul@openvz.org> Signed-off-by:
Alexey Dobriyan <adobriyan@openvz.org> Acked-by:
"Serge E. Hallyn" <serue@us.ibm.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Pavel Emelyanov authored
Surprisingly, but (spotted by Alexey Dobriyan) the uid hash still uses list_heads, thus occupying twice as much place as it could. Convert it to hlist_heads. Signed-off-by:
Pavel Emelyanov <xemul@openvz.org> Signed-off-by:
Alexey Dobriyan <adobriyan@openvz.org> Acked-by:
Serge Hallyn <serue@us.ibm.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 17 Sep, 2007 1 commit
-
-
Matthew Wilcox authored
When CONFIG_ISA is disabled, the isa_driver support will not be compiled in. Define stubs so that we don't get link-time errors. Signed-off-by:
Matthew Wilcox <matthew@wil.cx> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 16 Sep, 2007 1 commit
-
-
Herbert Xu authored
This patch adds an optimised version of skb_cow that avoids the copy if the header can be modified even if the rest of the payload is cloned. This can be used in encapsulating paths where we only need to modify the header. As it is, this can be used in PPPOE and bridging. Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 12 Sep, 2007 2 commits
-
-
Alexey Dobriyan authored
Taneli Vähäkangas <vahakang@cs.helsinki.fi> reported that commit 786d7e16 aka "Fix rmmod/read/write races in /proc entries" broke SBCL + SLIME combo. The old code in do_select() used DEFAULT_POLLMASK, if couldn't find ->poll handler. The new code makes ->poll always there and returns 0 by default, which is not correct. Return DEFAULT_POLLMASK instead. Steps to reproduce: install emacs, SBCL, SLIME emacs M-x slime in *inferior-lisp* buffer [watch it doing "Connecting to Swank on port X.."] Please, apply before 2.6.23. P.S.: why SBCL can't just read(2) /proc/cpuinfo is a mystery. Signed-off-by:
Alexey Dobriyan <adobriyan@gmail.com> Cc: T Taneli Vahakangas <vahakang@cs.helsinki.fi> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Matthew Wilcox authored
The AdvanSys driver wants to align some pointers, and the ALIGN macro doesn't work for pointers. Rather than try to make it work, add a new PTR_ALIGN macro which is typesafe. Signed-off-by:
Matthew Wilcox <matthew@wil.cx> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 11 Sep, 2007 6 commits
-
-
Yoichi Yuasa authored
This patch has added #include <linux/spinlock.h> to include/linux/leds.h for rwlock_t. Signed-off-by:
Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp> Signed-off-by:
Richard Purdie <rpurdie@rpsys.net>
-
Sergei Shtylyov authored
Make the SATA drive detection code from eighty_ninty_three() into inline ide_dev_is_sata() helper fixing it along the way to be more strict while checking word 80 for the reserved values... Signed-off-by:
Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by:
Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
-
Jason Gaston authored
This patch adds the Intel Tolapai LPC and SMBus Controller DID's. Signed-off-by:
Jason Gaston <jason.d.gaston@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Randy Dunlap authored
Fix warnings when CONFIG_PCIEAER=n: drivers/pci/pcie/portdrv_pci.c:105: warning: statement with no effect drivers/pci/pcie/portdrv_pci.c:226: warning: statement with no effect drivers/scsi/arcmsr/arcmsr_hba.c:352: warning: statement with no effect Signed-off-by:
Randy Dunlap <randy.dunlap@oracle.com> Acked-by:
Linas Vepstas <linas@austin.ibm.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Neil Horman authored
So I've had a deadlock reported to me. I've found that the sequence of events goes like this: 1) process A (modprobe) runs to remove ip_tables.ko 2) process B (iptables-restore) runs and calls setsockopt on a netfilter socket, increasing the ip_tables socket_ops use count 3) process A acquires a file lock on the file ip_tables.ko, calls remove_module in the kernel, which in turn executes the ip_tables module cleanup routine, which calls nf_unregister_sockopt 4) nf_unregister_sockopt, seeing that the use count is non-zero, puts the calling process into uninterruptible sleep, expecting the process using the socket option code to wake it up when it exits the kernel 4) the user of the socket option code (process B) in do_ipt_get_ctl, calls ipt_find_table_lock, which in this case calls request_module to load ip_tables_nat.ko 5) request_module forks a copy of modprobe (process C) to load the module and blocks until modprobe exits. 6) Process C. forked by request_module process the dependencies of ip_tables_nat.ko, of which ip_tables.ko is one. 7) Process C attempts to lock the request module and all its dependencies, it blocks when it attempts to lock ip_tables.ko (which was previously locked in step 3) Theres not really any great permanent solution to this that I can see, but I've developed a two part solution that corrects the problem Part 1) Modifies the nf_sockopt registration code so that, instead of using a use counter internal to the nf_sockopt_ops structure, we instead use a pointer to the registering modules owner to do module reference counting when nf_sockopt calls a modules set/get routine. This prevents the deadlock by preventing set 4 from happening. Part 2) Enhances the modprobe utilty so that by default it preforms non-blocking remove operations (the same way rmmod does), and add an option to explicity request blocking operation. So if you select blocking operation in modprobe you can still cause the above deadlock, but only if you explicity try (and since root can do any old stupid thing it would like.... :) ). Signed-off-by:
Neil Horman <nhorman@tuxdriver.com> Signed-off-by:
Patrick McHardy <kaber@trash.net> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Joseph Chan authored
Signed-off-by:
Joseph Chan <josephchan@via.com.tw> Signed-off-by:
Jeff Garzik <jeff@garzik.org>
-
- 05 Sep, 2007 1 commit
-
-
Samuel Thibault authored
Some braille keyboards have 10 dots, so extend the Input braille keys definitions. Signed-off-by:
Samuel Thibault <samuel.thibault@ens-lyon.org> Signed-off-by:
Dmitry Torokhov <dtor@mail.ru>
-
- 01 Sep, 2007 1 commit
-
-
Trond Myklebust authored
Ryusuke Konishi says: The recent truncate_complete_page() clears the dirty flag from a page before calling a_ops->invalidatepage(), ^^^^^^ static void truncate_complete_page(struct address_space *mapping, struct page *page) { ... cancel_dirty_page(page, PAGE_CACHE_SIZE); <--- Inserted here at kernel 2.6.20 if (PagePrivate(page)) do_invalidatepage(page, 0); ---> will call a_ops->invalidatepage() ... } and this is disturbing nfs_wb_page_priority() from calling nfs_writepage_locked() that is expected to handle the pending request (=nfs_page) associated with the page. int nfs_wb_page_priority(struct inode *inode, struct page *page, int how) { ... if (clear_page_dirty_for_io(page)) { ret = nfs_writepage_locked(page, &wbc); if (ret < 0) goto out; } ... } Since truncate_complete_page() will get rid of the page after a_ops->invalidatepage() returns, the request (=nfs_page) associated with the page becomes a garbage in nfs_inode->nfs_page_tree. ------------------------ Fix this by ensuring that nfs_wb_page_priority() recognises that it may also need to clear out non-dirty pages that have an nfs_page associated with them. Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
- 31 Aug, 2007 2 commits
-
-
David Gibson authored
For hugepage mappings, the file offset, like the address and size, needs to be aligned to the size of a hugepage. In commit 68589bc3, the check for this was moved into prepare_hugepage_range() along with the address and size checks. But since BenH's rework of the get_unmapped_area() paths leading up to commit 4b1d8929 , prepare_hugepage_range() is only called for MAP_FIXED mappings, not for other mappings. This means we're no longer ever checking for an aligned offset - I've confirmed that mmap() will (apparently) succeed with a misaligned offset on both powerpc and i386 at least. This patch restores the check, removing it from prepare_hugepage_range() and putting it back into hugetlbfs_file_mmap(). I'm putting it there, rather than in the get_unmapped_area() path so it only needs to go in one place, than separately in the half-dozen or so arch-specific implementations of hugetlb_get_unmapped_area(). Signed-off-by:
David Gibson <david@gibson.dropbear.id.au> Cc: Adam Litke <agl@us.ibm.com> Cc: Andi Kleen <ak@suse.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Shane Huang authored
We find that SB700 and SB800 use the same SMBus device ID as SB600, which is 0x4385, instead of the already submitted 0x4395. Besides removing the wrong SB700 device ID, add SB800 support to kernel, by renaming the PCI_DEVICE_ID_ATI_IXP600_SMBUS into PCI_DEVICE_ID_ATI_SBX00_SMBUS. Signed-off-by:
Shane Huang <shane.huang@amd.com> Signed-off-by:
Jean Delvare <khali@linux-fr.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-