- 30 Mar, 2010 1 commit
-
-
Tejun Heo authored
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by:
Tejun Heo <tj@kernel.org> Guess-its-ok-by:
Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
-
- 22 Mar, 2010 1 commit
-
-
Jeff Layton authored
The reply parsing code attempts to decode the GETATTR response even if the DELEGRETURN portion of the compound returned an error. The GETATTR response won't actually exist if that's the case and we're asking the parser to read past the end of the response. This bug is fairly benign. The parser catches this without reading past the end of the response and decode_getfattr returns -EIO. Earlier kernels however had decode_op_hdr using the READ_BUF macro, and this bug would make this printk pop any time the client got an error from a delegreturn: kernel: decode_op_hdr: reply buffer overflowed in line XXXX More recent kernels seem to have replaced this printk with a dprintk. Signed-off-by:
Jeff Layton <jlayton@redhat.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
- 19 Mar, 2010 1 commit
-
-
Trond Myklebust authored
We should not attempt to free the page if __GFP_FS is not set. Otherwise we can deadlock as per http://bugzilla.kernel.org/show_bug.cgi?id=15578 Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org
-
- 15 Mar, 2010 1 commit
-
-
NeilBrown authored
bdi_unregister is called by nfs_put_super which is only called by generic_shutdown_super if ->s_root is not NULL. So if we error out in a circumstance where we called nfs_bdi_register (i.e. server != NULL) but have not set s_root, then we need to call bdi_unregister explicitly in nfs_get_sb and various other *_get_sb() functions. Signed-off-by:
NeilBrown <neilb@suse.de> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
- 11 Mar, 2010 1 commit
-
-
Trond Myklebust authored
J.R. Okajima reports the following deadlock: INFO: task kswapd0:305 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kswapd0 D 0000000000000001 0 305 2 0x00000000 ffff88001f21d4f0 0000000000000046 ffff88001fdea680 ffff88001f21c000 ffff88001f21dfd8 ffff88001f21c000 ffff88001f21dfd8 ffff88001f21dfd8 ffff88001fdea040 0000000000014c00 0000000000000001 ffff88001fdea040 Call Trace: [<ffffffff8146155d>] io_schedule+0x4d/0x70 [<ffffffff810d2be5>] sync_page+0x65/0xa0 [<ffffffff81461b12>] __wait_on_bit_lock+0x52/0xb0 [<ffffffff810d2b80>] ? sync_page+0x0/0xa0 [<ffffffff810d2b64>] __lock_page+0x64/0x70 [<ffffffff81070ce0>] ? wake_bit_function+0x0/0x40 [<ffffffff810df1d4>] truncate_inode_pages_range+0x344/0x4a0 [<ffffffff810df340>] truncate_inode_pages+0x10/0x20 [<ffffffff8112cbfe>] generic_delete_inode+0x15e/0x190 [<ffffffff8112cc8d>] generic_drop_inode+0x5d/0x80 [<ffffffff8112bb88>] iput+0x78/0x80 [<ffffffff811bc908>] nfs_dentry_iput+0x38/0x50 [<ffffffff811285f4>] dentry_iput+0x84/0x110 [<ffffffff811286ae>] d_kill+0x2e/0x60 [<ffffffff8112912a>] dput+0x7a/0x170 [<ffffffff8111e925>] path_put+0x15/0x40 [<ffffffff811c3a44>] __put_nfs_open_context+0xa4/0xb0 [<ffffffff811cb5d0>] ? nfs_free_request+0x0/0x50 [<ffffffff811c3b0b>] put_nfs_open_context+0xb/0x10 [<ffffffff811cb5f9>] nfs_free_request+0x29/0x50 [<ffffffff81234b7e>] kref_put+0x8e/0xe0 [<ffffffff811cb594>] nfs_release_request+0x14/0x20 [<ffffffff811cf769>] nfs_find_and_lock_request+0x89/0xa0 [<ffffffff811d1180>] nfs_wb_page+0x80/0x110 [<ffffffff811c0770>] nfs_release_page+0x70/0x90 [<ffffffff810d18ee>] try_to_release_page+0x5e/0x80 [<ffffffff810e1178>] shrink_page_list+0x638/0x860 [<ffffffff810e19de>] shrink_zone+0x63e/0xc40 We can fix this by making the call to put_nfs_open_context() happen when we actually remove the write request from the inode (which is done by the nfsiod thread in this case). Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org
-
- 10 Mar, 2010 1 commit
-
-
Trond Myklebust authored
If the NFS_INO_REVAL_FORCED flag is set, that means that we don't yet have an up to date attribute cache. Even if we hold a delegation, we must put a GETATTR on the wire. Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org
-
- 08 Mar, 2010 2 commits
-
-
Steve Dickson authored
To avoid hangs in the svc_unregister(), on version 4 mounts (and unmounts), when rpcbind is not running, make the nfs4 callback program an 'hidden' service by setting the 'vs_hidden' flag in the nfs4_callback_version structure. Signed-off-by:
Steve Dickson <steved@redhat.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Dan Carpenter authored
I'll admit that it's unlikely for the first allocation to fail and the second one to succeed. I won't be offended if you ignore this patch. Signed-off-by:
Dan Carpenter <error27@gmail.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
- 05 Mar, 2010 12 commits
-
-
Trond Myklebust authored
Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Trond Myklebust authored
Remove the redundant call to filemap_write_and_wait(). Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Trond Myklebust authored
Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Trond Myklebust authored
Now that we have correct COMMIT semantics in writeback_single_inode, we can reduce and simplify nfs_wb_all(). Also replace nfs_wb_nocommit() with a call to filemap_write_and_wait(), which doesn't need to hold the inode->i_mutex. With that done, we can eliminate nfs_write_mapping() altogether. Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Trond Myklebust authored
In all cases we should be able to just remove the request and call cancel_dirty_page(). Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Trond Myklebust authored
Since nfs_scan_list() doesn't wait for locked pages, we have a race in which it is possible to end up with an inode that needs to send a COMMIT, but which does not have the I_DIRTY_DATASYNC flag set. Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Trond Myklebust authored
Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by:
Peter Zijlstra <peterz@infradead.org> Acked-by:
Wu Fengguang <fengguang.wu@intel.com>
-
Trond Myklebust authored
If the caller is doing a non-blocking flush, and there are still writebacks pending on the wire, we can usually defer the COMMIT call until those writes are done. Also ensure that we honour the wbc->nonblocking flag. Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Trond Myklebust authored
In order to know when we should do opportunistic commits of the unstable writes, when the VM is doing a background flush, we add a field to count the number of unstable writes. Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Trond Myklebust authored
The sole purpose of nfs_write_inode is to commit unstable writes, so move it into fs/nfs/write.c, and make nfs_commit_inode static. Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Christoph Hellwig authored
This gives the filesystem more information about the writeback that is happening. Trond requested this for the NFS unstable write handling, and other filesystems might benefit from this too by beeing able to distinguish between the different callers in more detail. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Christoph Hellwig authored
Similar to the fsync issue fixed a while ago in commit 2daea67e we need to write for data to actually hit the disk before writing out the metadata to guarantee data integrity for filesystems that modify the inode in the data I/O completion path. Currently XFS and NFS handle this manually, and AFS has a write_inode method that does nothing but waiting for data, while others are possibly missing out on this. Fortunately this change has a lot less impact than the fsync change as none of the write_inode methods starts data writeout of any form by itself. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 03 Mar, 2010 2 commits
-
-
Al Viro authored
Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 02 Mar, 2010 7 commits
-
-
Andy Adamson authored
Signed-off-by:
Andy Adamson <andros@netapp.com> Signed-off-by:
Benny Halevy <bhalevy@panasas.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Trond Myklebust authored
sunrpc_cache_update() will always call detail->update() from inside the detail->hash_lock, so it cannot allocate memory. Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org
-
Trond Myklebust authored
Ensure that we change the EXCHANGE_ID verifier (i.e. clp->cl_boot_time) when we want to reset all state. This is mainly needed when the server tells us that it is revoking our open or lock stateids. Handle revoking of recallable state by expiring the delegations. Handle callback path issues by expiring the delegations and then resetting the session. Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Alexandros Batsakis authored
renewd sends RENEW requests to the NFS server in order to renew state. As the request is asynchronous, renewd should take a reference to the nfs_client to prevent concurrent umounts from freeing the client Signed-off-by:
Alexandros Batsakis <batsakis@netapp.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Alexandros Batsakis authored
renewd sends SEQUENCE requests to the NFS server in order to renew state. As the request is asynchronous, renewd should take a reference to the nfs_client to prevent concurrent umounts from freeing the session/client Signed-off-by:
Alexandros Batsakis <batsakis@netapp.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Alexandros Batsakis authored
If the renewd send queue gets backlogged (e.g., if the server goes down), we will keep filling the queue with periodic RENEW/SEQUENCE requests. This patch schedules a new renewd request if and only if the previous one returns (either success or failure) Signed-off-by:
Alexandros Batsakis <batsakis@netapp.com> [Trond.Myklebust@netapp.com: moved nfs4_schedule_state_renewal() into separate nfs4_renew_release() and nfs41_sequence_release() callbacks to ensure correct behaviour on call setup failure] Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Alexandros Batsakis authored
renewd should be synchronously killed before we destroy the session in nfs4_clear_minor_version Signed-off-by:
Alexandros Batsakis <batsakis@netapp.com> [Trond.Myklebust@netapp.com: clean up to remove 'unused function warning when !CONFIG_NFS_V4] Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
- 27 Feb, 2010 1 commit
-
-
Christian Kujau authored
There's currently an open Ubuntu bug[0], with the intent to compile NFS_FSCACHE (and possibly AFS_FSCACHE, 9P_FSCACHE) into the standard Ubuntu kernel. However, since *_FSCACHE still depends on EXPERIMENTAL, this won't happen. As Arjan van de Ven pointed out[1], the EXPERIMENTAL flag doesn't mean that much any more, I propose the following patch to fs/nfs/Kconfig. I'd do the same for fs/9p/Kconfig and fs/afs/Kconfig, but as I did not test 9p or AFS, I feel it would not be appropriate for me to remove the flag. [0] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/440522/comments/5 [1] http://lkml.org/lkml/2010/1/23/145 Signed-off-by:
Christian Kujau <lists@nerdbynature.de> Signed-off-by:
David Howells <dhowells@redhat.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 17 Feb, 2010 1 commit
-
-
Tejun Heo authored
Add __percpu sparse annotations to fs. These annotations are to make sparse consider percpu variables to be in a different address space and warn if accessed without going through percpu accessors. This patch doesn't affect normal builds. Signed-off-by:
Tejun Heo <tj@kernel.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Alex Elder <aelder@sgi.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
-
- 16 Feb, 2010 1 commit
-
-
Chuck Lever authored
The cached read and write paths initialize fattr->time_start in their setup procedures. The value of fattr->time_start is propagated to read_cache_jiffies by nfs_update_inode(). Subsequent calls to nfs_attribute_timeout() will then use a good time stamp when computing the attribute cache timeout, and squelch unneeded GETATTR calls. Since the direct I/O paths erroneously leave the inode's fattr->time_start field set to zero, read_cache_jiffies for that inode is set to zero after any direct read or write operation. This triggers an otw GETATTR or ACCESS call to update the file's attribute and access caches properly, even when the NFS READ or WRITE replies have usable post-op attributes. Make sure the direct read and write setup code performs the same fattr initialization as the cached I/O paths to prevent unnecessary GETATTR calls. This was likely introduced by commit 0e574af1 in 2.6.15, which appears to add new nfs_fattr_init() call sites in the cached read and write paths, but not in the equivalent places in fs/nfs/direct.c. A subsequent commit in the same series, 33801147 , introduces the fattr->time_start field. Interestingly, the direct write reschedule path already has a call to nfs_fattr_init() in the right place. Reported-by:
Quentin Barnes <qbarnes@yahoo-inc.com> Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Cc: stable@kernel.org Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 10 Feb, 2010 8 commits
-
-
Chuck Lever authored
For NFSv2 and v3: O_DIRECT writes are always synchronous, and aren't cached, so nothing should be flushed when closing an NFS O_DIRECT file descriptor. Thus there are no write errors to report on close(2). In addition, there's no cached data to verify on the next open(2), so we don't need clean GETATTR results at close time to compare with. Thus, there's no need for the nfs_revalidate_inode() call when closing an NFS O_DIRECT file. This reduces the number of synchronous on-the-wire requests for a simple open-write-close of an NFS O_DIRECT file by roughly 20%. For NFSv4: Call nfs4_do_close() with wait set to zero when closing an NFS O_DIRECT file. The CLOSE will go on the wire, but the application won't wait for it to complete. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Chuck Lever authored
The bytes counted by the performance counters for NFS writes should reflect write and sync errors. If the write(2) system call reports an error, the bytes should not be counted. And, if the write is short, the actual number of bytes that was written should be counted, not the number of bytes that was requested. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Chuck Lever authored
Bytes read via the splice API should be accounted for in the NFS performance statistics. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Chuck Lever authored
Currently, the NFS I/O counters count the number of bytes requested by applications, rather than the number of bytes actually read by the system calls. The number of bytes requested for reads is actually not that useful, because the value is usually a buffer size for reads. That is, that requested number is usually a maximum, and frequently doesn't reflect the actual number of bytes read. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Chuck Lever authored
Nit: The VFSOPEN and VFSFLUSH counters are function call counters. Count every call to these routines. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Andy Adamson authored
Signed-off-by:
Andy Adamson <andros@netapp.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Andy Adamson authored
Signed-off-by:
Andy Adamson <andros@netapp.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Andy Adamson authored
Return NFS4_OK if target high slotid equals enforced high slotid. Fix nfs_client reference leak. Signed-off-by:
Andy Adamson <andros@netapp.com> Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-