Commits · ece0b4233b6b915d1f63add2bd9f2733aec6317a · Bricked / flo

22 Nov, 2010 7 commits

NFS: Don't ignore errors from nfs_do_filldir() · ece0b423

Trond Myklebust authored 14 years ago


We should ignore the errors from the filldir callback, and just interpret
them as meaning we should exit, however we should definitely pass back
ENOMEM errors.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

ece0b423

NFS: Fix the error handling in "uncached_readdir()" · 85f8607e

Trond Myklebust authored 14 years ago


Currently, uncached_readdir() is broken because if fails to handle
the results from nfs_readdir_xdr_to_array() correctly.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

85f8607e

NFS: Fix a page leak in uncached_readdir() · 7a8e1dc3
Trond Myklebust authored 14 years ago
```
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
```
7a8e1dc3

NFS: Fix a page leak in nfs_do_filldir() · e7c58e97

Trond Myklebust authored 14 years ago


nfs_do_filldir() must always free desc->page when it is done, otherwise
we end up leaking the page.

Also remove unused variable 'dentry'.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

e7c58e97

NFS: Assume eof if the server returns no readdir records · 5c346854

Trond Myklebust authored 14 years ago


Some servers are known to be buggy w.r.t. this. Deal with them...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

5c346854

NFS: Buffer overflow in ->decode_dirent() should not be fatal · 463a376e

Trond Myklebust authored 14 years ago

Overflowing the buffer in the readdir ->decode_dirent() should not lead to
a fatal error, but rather to an attempt to reread the record in question.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

463a376e

Pure nfs client performance using odirect. · b47d19de

Arun Bharadwaj authored 14 years ago


When an application opens a file with O_DIRECT flag, if the size of
the data that is written is equal to wsize, the client sends a
WRITE RPC with stable flag set to UNSTABLE followed by a single
COMMIT RPC rather than sending a single WRITE RPC with the stable
flag set to FILE_SYNC. This a bug.

Patch to fix this.
Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

b47d19de

20 Nov, 2010 2 commits

ext4: Add EXT4_IOC_TRIM ioctl to handle batched discard · e681c047

Lukas Czerner authored 14 years ago


Filesystem independent ioctl was rejected as not common enough to be in
core vfs ioctl. Since we still need to access to this functionality this
commit adds ext4 specific ioctl EXT4_IOC_TRIM to dispatch
ext4_trim_fs().

It takes fstrim_range structure as an argument. fstrim_range is definec in
the include/linux/fs.h and its definition is as follows.

struct fstrim_range {
	__u64 start;
	__u64 len;
	__u64 minlen;
}

start	- first Byte to trim
len	- number of Bytes to trim from start
minlen	- minimum extent length to trim, free extents shorter than this
  number of Bytes will be ignored. This will be rounded up to fs
  block size.

After the FITRIM is done, the number of actually discarded Bytes is stored
in fstrim_range.len to give the user better insight on how much storage
space has been really released for wear-leveling.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

e681c047

fs: Do not dispatch FITRIM through separate super_operation · 93bb41f4

Lukas Czerner authored 14 years ago


There was concern that FITRIM ioctl is not common enough to be included
in core vfs ioctl, as Christoph Hellwig pointed out there's no real point
in dispatching this out to a separate vector instead of just through
->ioctl.

So this commit removes ioctl_fstrim() from vfs ioctl and trim_fs
from super_operation structure.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

93bb41f4

19 Nov, 2010 1 commit

ext4: ext4_fill_super shouldn't return 0 on corruption · 5a9ae68a

Darrick J. Wong authored 14 years ago

At the start of ext4_fill_super, ret is set to -EINVAL, and any failure path
out of that function returns ret.  However, the generic_check_addressable
clause sets ret = 0 (if it passes), which means that a subsequent failure (e.g.
a group checksum error) returns 0 even though the mount should fail.  This
causes vfs_kern_mount in turn to think that the mount succeeded, leading to an
oops.

A simple fix is to avoid using ret for the generic_check_addressable check,
which was last changed in commit 30ca22c7

.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

5a9ae68a

18 Nov, 2010 4 commits

ceph: fix readdir EOVERFLOW on 32-bit archs · 3105c19c

Sage Weil authored 14 years ago


One of the readdir filldir_t callers was passing the raw ceph 64-bit ino
instead of the hashed 32-bit one, producing an EOVERFLOW in the filler
callback.  Fix this by calling the ceph_vino_to_ino() helper to do the
conversion.
Reported-by: Jan Smets <jan.smets@alcatel-lucent.com>
Tested-by: Jan Smets <jan.smets@alcatel-lucent.com>
Signed-off-by: Sage Weil <sage@newdream.net>

3105c19c

jbd2: fix /proc/fs/jbd2/<dev> when using an external journal · 0587aa3d

yangsheng authored 14 years ago


In jbd2_journal_init_dev(), we need make sure the journal structure is
fully initialzied before calling jbd2_stats_proc_init().
Reviewed-by: Andreas Dilger <andreas.dilger@oracle.com>
Signed-off-by: yangsheng <sheng.yang@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

0587aa3d

ext4: missing unlock in ext4_clear_request_list() · f4c8cc65

Dan Carpenter authored 14 years ago


If the the li_request_list was empty then it returned with the lock
held.  Instead of adding a "goto unlock" I just removed that special
case and let it go past the empty list_for_each_safe().
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

f4c8cc65

ext4: fix setting random pages PageUptodate · 08da1193

Markus Trippelsdorf authored 14 years ago


ext4_end_bio calls put_page and kmem_cache_free before calling
SetPageUpdate(). This can result in setting the PageUptodate bit on
random pages and causes the following BUG:

 BUG: Bad page state in process rm  pfn:52e54
 page:ffffea0001222260 count:0 mapcount:0 mapping:          (null) index:0x0
 arch kernel: page flags: 0x4000000000000008(uptodate)

Fix the problem by moving put_io_page() after the SetPageUpdate() call.

Thanks to Hugh Dickins for analyzing this problem.
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

08da1193

17 Nov, 2010 2 commits

BKL: remove references to lock_kernel from comments · 460781b5

Arnd Bergmann authored 14 years ago


Lock_kernel is gone from the code, so the comments should be updated,
too.  nfsd now uses lock_flocks instead of lock_kernel to protect
against posix file locks.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Cc: linux-nfs@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

460781b5

BKL: remove extraneous #include <smp_lock.h> · 451a3c24

Arnd Bergmann authored 14 years ago


The big kernel lock has been removed from all these files at some point,
leaving only the #include.

Remove this too as a cleanup.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

451a3c24

16 Nov, 2010 6 commits

nfs: Ignore kmemleak false positive in nfs_readdir_make_qstr · 04e4bd1c

Catalin Marinas authored 14 years ago


Strings allocated via kmemdup() in nfs_readdir_make_qstr() are
referenced from the nfs_cache_array which is stored in a page cache
page. Kmemleak does not scan such pages and it reports several false
positives. This patch annotates the string->name pointer so that
kmemleak does not consider it a real leak.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Bryan Schumaker <bjschuma@netapp.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

04e4bd1c

NFS: readdir shouldn't read beyond the reply returned by the server · ac396128
Trond Myklebust authored 14 years ago
```
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
```
ac396128

NFS: Fix a couple of regressions in readdir. · 8cd51a0c

Trond Myklebust authored 14 years ago


Fix up the issue that array->eof_index needs to be able to be set
even if array->size == 0.

Ensure that we catch all important memory allocation error conditions
and/or kmap() failures.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

8cd51a0c

Revert "NFSv4: Fall back to ordinary lookup if nfs4_atomic_open() returns EISDIR" · 23ebbd9a

Trond Myklebust authored 14 years ago

This reverts commit 80e60639

.

This change requires further fixes to ensure that the open doesn't
succeed if the lookup later results in a regular file being created.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

23ebbd9a

Regression: fix mounting NFS when NFSv3 support is not compiled · 1e657bd5

Paulius Zaleckas authored 14 years ago


Trying to mount NFS (root partition in my case) fails if CONFIG_NFS_V3
is not selected. nfs_validate_mount_data() returns EPROTONOSUPPORT,
because of this check:

#ifndef CONFIG_NFS_V3
	if (args->version == 3)
		goto out_v3_not_compiled;
#endif /* !CONFIG_NFS_V3 */

and args->version was always initialized to 3.

It was working in 2.6.36
Signed-off-by: Paulius Zaleckas <paulius.zaleckas@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

1e657bd5

NLM: Fix a regression in lockd · 8e35f8e7

Trond Myklebust authored 14 years ago

Nick Bowler reports:
There are no unusual messages on the client... but I just logged into
the server and I see lots of messages of the following form:

  nfsd: request from insecure port (192.168.8.199:35766)!
  nfsd: request from insecure port (192.168.8.199:35766)!
  nfsd: request from insecure port (192.168.8.199:35766)!
  nfsd: request from insecure port (192.168.8.199:35766)!
  nfsd: request from insecure port (192.168.8.199:35766)!

Bisected to commit 92476850

 (SUNRPC:
Properly initialize sock_xprt.srcaddr in all cases)

Apparently, removing the 'transport->srcaddr.ss_family = family' from
xs_create_sock() triggers this due to nlmclnt_lookup_host() incorrectly
initialising the srcaddr family to AF_UNSPEC.
Reported-by: Nick Bowler <nbowler@elliptictech.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

8e35f8e7

15 Nov, 2010 1 commit

GFS2: Fix inode deallocation race · 044b9414

Steven Whitehouse authored 14 years ago


This area of the code has always been a bit delicate due to the
subtleties of lock ordering. The problem is that for "normal"
alloc/dealloc, we always grab the inode locks first and the rgrp lock
later.

In order to ensure no races in looking up the unlinked, but still
allocated inodes, we need to hold the rgrp lock when we do the lookup,
which means that we can't take the inode glock.

The solution is to borrow the technique already used by NFS to solve
what is essentially the same problem (given an inode number, look up
the inode carefully, checking that it really is in the expected
state).

We cannot do that directly from the allocation code (lock ordering
again) so we give the job to the pre-existing delete workqueue and
carry on with the allocation as normal.

If we find there is no space, we do a journal flush (required anyway
if space from a deallocation is to be released) which should block
against the pending deallocations, so we should always get the space
back.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

044b9414

13 Nov, 2010 1 commit

ocfs2: Change some lock status member in ocfs2_lock_res to char. · 1c66b360

Tao Ma authored 14 years ago

Commit 83fd9c7f

 changes l_level, l_requested and l_blocking of
ocfs2_lock_res from int to unsigned char. But actually it is
initially as -1(ocfs2_lock_res_init_common) which
correspoding to 255 for unsigned char. So the whole dlm lock
mechanism doesn't work now which means a disaster to ocfs2.

Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>

1c66b360

12 Nov, 2010 2 commits

hugetlbfs: lessen the impact of a deprecation warning · 52ca0e84

Dave Jones authored 14 years ago


WARN_ONCE is a bit strong for a deprecation warning, given that it spews a
huge backtrace.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

52ca0e84

ceph: fix frag offset for non-leftmost frags · 7b88dadc

Sage Weil authored 14 years ago


We start at offset 2 for the leftmost frag, and 0 for subsequent frags.
When we reach the end (rightmost), we go back to 2.  This fixes readdir on
fragmented (large) directories.
Signed-off-by: Sage Weil <sage@newdream.net>

7b88dadc

11 Nov, 2010 1 commit

ceph: fix dangling pointer · a1629c3b

Sage Weil authored 14 years ago


Clear fi->last_name when it's freed.  The only caller is rewinddir() (or
equivalent lseek).
Signed-off-by: Sage Weil <sage@newdream.net>

a1629c3b

10 Nov, 2010 13 commits

xfs: remove incorrect assert in xfs_vm_writepage · ece413f5

Christoph Hellwig authored 14 years ago

In commit 20cb52eb

, titled
"xfs: simplify xfs_vm_writepage" I added an assert that any !mapped and
uptodate buffers are not dirty.  That asserts turns out to trigger a lot
when running fsx on filesystems with small block sizes.  The reason for
that is that the assert is simply incorrect.  !mapped and uptodate
just mean this buffer covers a hole, and whenever we do a set_page_dirty
we mark all blocks in the page dirty, no matter if they have data or
not.  So remove the assert, and update the comment above the condition
to match reality.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>

ece413f5

locks: remove dead lease error-handling code · 8896b93f

J. Bruce Fields authored 14 years ago

A minor oversight from f7347ce4

,
"fasync: re-organize fasync entry insertion to allow it under a
spinlock": this cleanup-on-error was only needed to handle -ENOMEM.  Now
that we're preallocating it's unneeded.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

8896b93f

locks: fix leak on merging leases · 3df057ac

J. Bruce Fields authored 14 years ago


We must also free the passed-in lease in the case it wasn't used because
an existing lease was upgrade/downgraded or already existed.

Note the nfsd caller doesn't care because it's fl_change callback
returns an error in those cases.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

3df057ac

xfs: use hlist_add_fake · c6f6cd06

Christoph Hellwig authored 14 years ago


XFS does not need it's inodes to actuall be hashed in the VFS inode
cache, but we require the inode to be marked hashed for the
writeback code to work.

Insted of using insert_inode_hash, which requires a second
inode_lock roundtrip after the partial merge of the inode
scalability patches in 2.6.37-rc simply use the new hlist_add_fake
helper to mark it hashed without requiring a lock or touching a
global cache line.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>

c6f6cd06

xfs: fix a few compiler warnings with CONFIG_XFS_QUOTA=n · 5d2bf8a5

Christoph Hellwig authored 14 years ago


Andi Kleen reported that gcc-4.5 gives lots of warnings for him
inside the XFS code.  It turned out most of them are due to the
quota stubs beeing macros, and gcc now complaining about macros
evaluating to 0 that are not assigned to variables.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>

5d2bf8a5

xfs: tell lockdep about parent iolock usage in filestreams · 785ce418

Christoph Hellwig authored 14 years ago


The filestreams code may take the iolock on the parent inode while
holding it on a child.  This is the only place in XFS where we take
both the child and parent iolock, so just telling lockdep about it
is enough.  The lock flag required for that was already added as
part of the ilock lockdep annotations and unused so far.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>

785ce418

xfs: move delayed write buffer trace · bfe27419

Dave Chinner authored 14 years ago


The delayed write buffer split trace currently issues a trace for
every buffer it scans. These buffers are not necessarily queued for
delayed write. Indeed, when buffers are pinned, there can be
thousands of traces of buffers that aren't actually queued for
delayed write and the ones that are are lost in the noise. Move the
trace point to record only buffers that are split out for IO to be
issued on.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>

bfe27419

xfs: fix per-ag reference counting in inode reclaim tree walking · f83282a8

Dave Chinner authored 14 years ago


The walk fails to decrement the per-ag reference count when the
non-blocking walk fails to obtain the per-ag reclaim lock, leading
to an assert failure on debug kernels when unmounting a filesystem.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>

f83282a8

xfs: xfs_ioctl: fix information leak to userland · 6762b938

Kulikov Vasiliy authored 14 years ago

al_hreq is copied from userland. If al_hreq.buflen is not properly aligned
then xfs_attr_list will ignore the last bytes of kbuf. These bytes are
unitialized. It leads to leaking of contents of kernel stack memory.
Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
Signed-off-by: Alex Elder <aelder@sgi.com>

6762b938

xfs: remove experimental tag from the delaylog option · 5d0af85c

Christoph Hellwig authored 14 years ago


We promised to do this for 2.6.37, and the code looks stable enough to
keep that promise.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>

5d0af85c

ioprio: rcu_read_lock/unlock protect find_task_by_vpid call (V2) · f85acd81

Sergey Senozhatsky authored 14 years ago

Commit 4221a991

 "Add RCU check for
find_task_by_vpid()" introduced rcu_lockdep_assert to find_task_by_pid_ns=

Assertion failed in sys_ioprio_get. The patch is fixing assertion
failure in ioprio_set as well.

 kernel/pid.c:419 invoked rcu_dereference_check() without protection!

 stack backtrace:
 Pid: 4254, comm: iotop Not tainted
 Call Trace:
 [<ffffffff810656f2>] lockdep_rcu_dereference+0xaa/0xb2
 [<ffffffff81053c67>] find_task_by_pid_ns+0x4f/0x68
 [<ffffffff81053c9d>] find_task_by_vpid+0x1d/0x1f
 [<ffffffff811104e2>] sys_ioprio_get+0x50/0x2da
 [<ffffffff81002182>] system_call_fastpath+0x16/0x1b

V2: rcu critical section expanded according to comment by Paul E. McKenney
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>

f85acd81

ioprio: fix RCU locking around task dereference · 1447399b

Daniel J Blueman authored 14 years ago


With 2.6.37-rc1, I observe sys_ioprio_set not taking the RCU lock [1]
across access to the task credentials.

Inspecting the code in fs/ioprio.c, the tasklist_lock is held for read
across the __task_cred call, which is presumably sufficient to prevent
the task credentials becoming stale.

===================================================

[ INFO: suspicious rcu_dereference_check() usage. ]

---------------------------------------------------

kernel/pid.c:419 invoked rcu_dereference_check() without protection!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 1

1 lock held by start-stop-daem/2246:

 #0:  (tasklist_lock){.?.?..}, at: [<ffffffff811a2dfa>]
sys_ioprio_set+0x8a/0x400

stack backtrace:

Pid: 2246, comm: start-stop-daem Not tainted 2.6.37-rc1-330cd+ #2

Call Trace:

 [<ffffffff8109f5f4>] lockdep_rcu_dereference+0xa4/0xc0

 [<ffffffff81085651>] find_task_by_pid_ns+0x81/0x90

 [<ffffffff8108567d>] find_task_by_vpid+0x1d/0x20

 [<ffffffff811a3160>] sys_ioprio_set+0x3f0/0x400

 [<ffffffff816efa79>] ? trace_hardirqs_on_thunk+0x3a/0x3f

 [<ffffffff81003482>] system_call_fastpath+0x16/0x1b

Take the RCU lock for read across acquiring the pointer to the task
credentials and dereferencing it.
Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com>

Fixed up by Jens to fix missing rcu_read_unlock() on mismatches.
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>

1447399b

bio: take care not overflow page count when mapping/copying user data · cb4644ca

Jens Axboe authored 14 years ago


If the iovec is being set up in a way that causes uaddr + PAGE_SIZE
to overflow, we could end up attempting to map a huge number of
pages. Check for this invalid input type.
Reported-by: Dan Rosenberg <drosenberg@vsecurity.com>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>

cb4644ca