Commits · 6e87ed0fc93ffbe2aec296e6912b1dcb19034d6c · Bricked / flo

21 Nov, 2011 1 commit

GFS2: move toward a generic multi-block allocator · 6e87ed0f

Bob Peterson authored 13 years ago


This patch is a revision of the one I previously posted.
I tried to integrate all the suggestions Steve gave.
The purpose of the patch is to change function gfs2_alloc_block
(allocate either a dinode block or an extent of data blocks)
to a more generic gfs2_alloc_blocks function that can
allocate both a dinode _and_ an extent of data blocks in the
same call. This will ultimately help us create a multi-block
reservation scheme to reduce file fragmentation.

This patch moves more toward a generic multi-block allocator that
takes a pointer to the number of data blocks to allocate, plus whether
or not to allocate a dinode. In theory, it could be called to allocate
(1) a single dinode block, (2) a group of one or more data blocks, or
(3) a dinode plus several data blocks.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

6e87ed0f

15 Nov, 2011 1 commit

GFS2: combine gfs2_alloc_block and gfs2_alloc_di · 3c5d785a

Bob Peterson authored 13 years ago


GFS2 functions gfs2_alloc_block and gfs2_alloc_di do basically
the same things, with a few exceptions. This patch combines
the two functions into a slightly more generic gfs2_alloc_block.
Having one centralized block allocation function will reduce
code redundancy and make it easier to implement multi-block
reservations to reduce file fragmentation in the future.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

3c5d785a

08 Nov, 2011 1 commit

GFS2: More automated code analysis fixes · 87654896

Steven Whitehouse authored 13 years ago


A potentially uninitialised variable, some unreachable code,
and the main part of this, fixing the error path in the
unlink function.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

87654896

21 Oct, 2011 5 commits

GFS2: Cache the most recently used resource group in the inode · 54335b1f

Steven Whitehouse authored 13 years ago


This means that after the initial allocation for any inode, the
last used resource group is cached in the inode for future use.
This drastically reduces the number of lookups of resource
groups in the common case, and this the contention on that
data structure.

The allocation algorithm is the same as previously, except that we
always check to see if the goal block is within the cached rgrp
first before going to the rbtree to look one up.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

54335b1f

GFS2: Make resource groups "append only" during life of fs · 8339ee54

Steven Whitehouse authored 13 years ago


Since we have ruled out supporting online filesystem shrink,
it is possible to make the resource group list append only
during the life of a super block. This gives several benefits:

Firstly, we only need to read new rindex elements as they are added
rather than needing to reread the whole rindex file each time one
element is added.

Secondly, the rindex glock can be held for much shorter periods of
time, and is completely removed from the fast path for allocations.
The lock is taken in shared mode only when updating the resource
groups when the first allocation occurs, and after a grow has
taken place.

Thirdly, this results in a reduction in code size, and everything
gets a lot simpler to understand in this area.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

8339ee54

GFS2: Clean up gfs2_create · 9a63edd1

Steven Whitehouse authored 13 years ago


If we pass through knowledge of whether the creation is intended to be
exclusive or not, then we can deal with that in gfs2_create_inode
and remove one set of locking. Also this removes the loop in
gfs2_create and simplifies the code a bit.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

9a63edd1

GFS2: Use ->dirty_inode() · ab9bbda0

Steven Whitehouse authored 13 years ago


The aim of this patch is to use the newly enhanced ->dirty_inode()
super block operation to deal with atime updates, rather than
piggy backing that code into ->write_inode() as is currently
done.

The net result is a simplification of the code in various places
and a reduction of the number of gfs2_dinode_out() calls since
this is now implied by ->dirty_inode().

Some of the mark_inode_dirty() calls have been moved under glocks
in order to take advantage of then being able to avoid locking in
->dirty_inode() when we already have suitable locks.

One consequence is that generic_write_end() now correctly deals
with file size updates, so that we do not need a separate check
for that afterwards. This also, indirectly, means that fdatasync
should work correctly on GFS2 - the current code always syncs the
metadata whether it needs to or not.

Has survived testing with postmark (with and without atime) and
also fsx.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

ab9bbda0

GFS2: Fix inode allocation error path · 40ac218f

Steven Whitehouse authored 13 years ago


If we have got far enough through the inode allocation code
path that an inode has already been allocated, then we must
call iput to dispose of it, if an error occurs during a
later part of the process. This will always be the final iput
since there will be no other references to the inode.

Unlike when the inode has been unlinked, its block state will
be GFS2_BLKST_INODE rather than GFS2_BLKST_UNLINKED so we need
to skip the test in ->evict_inode() for this one case in order
to ensure that it will be deallocated correctly. This patch adds
a new flag in order to ensure that this will happen correctly.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

40ac218f

25 Jul, 2011 1 commit

fs: take the ACL checks to common code · 4e34e719

Christoph Hellwig authored 13 years ago

Replace the ->check_acl method with a ->get_acl method that simply reads an
ACL from disk after having a cache miss. This means we can replace the ACL
checking boilerplate code with a single implementation in namei.c.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

4e34e719

21 Jul, 2011 1 commit

simplify gfs2_lookup() · 6c673ab3

Al Viro authored 13 years ago


d_splice_alias() will DTRT when given NULL or ERR_PTR
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

6c673ab3

20 Jul, 2011 3 commits

->permission() sanitizing: don't pass flags to ->permission() · 10556cb2
Al Viro authored 13 years ago
```
not used by the instances anymore.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
10556cb2

->permission() sanitizing: don't pass flags to generic_permission() · 2830ba7f

Al Viro authored 13 years ago


redundant; all callers get it duplicated in mask & MAY_NOT_BLOCK and none of
them removes that bit.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

2830ba7f

kill check_acl callback of generic_permission() · 178ea735

Al Viro authored 13 years ago


its value depends only on inode and does not change; we might as
well store it in ->i_op->check_acl and be done with that.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

178ea735

18 Jul, 2011 1 commit

security: new security_inode_init_security API adds function callback · 9d8f13ba

Mimi Zohar authored 13 years ago

This patch changes the security_inode_init_security API by adding a
filesystem specific callback to write security extended attributes.
This change is in preparation for supporting the initialization of
multiple LSM xattrs and the EVM xattr. Initially the callback function
walks an array of xattrs, writing each xattr separately, but could be
optimized to write multiple xattrs at once.

For existing security_inode_init_security() calls, which have not yet
been converted to use the new callback function, such as those in
reiserfs and ocfs2, this patch defines security_old_inode_init_security().
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>

9d8f13ba

13 May, 2011 3 commits

GFS2: Move all locking inside the inode creation function · f2741d98

Steven Whitehouse authored 13 years ago


Now that there are no longer any exceptions to the normal inode
creation code path, we can move the parts of the locking code
which were duplicated in mkdir/mknod/create/symlink into the
inode create function.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

f2741d98

GFS2: Clean up symlink creation · 160b4026

Steven Whitehouse authored 13 years ago


This moves the symlink specific parts of inode creation
into the function where we initialise the rest of the
dinode. As a result we have one less place where we need
to look up the inode's buffer.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

160b4026

GFS2: Clean up mkdir · e2d0a13b

Steven Whitehouse authored 13 years ago


This moves the initialisation of the directory into the inode
creation functions to avoid having to duplicate the lookup
of the inode's buffer.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

e2d0a13b

10 May, 2011 1 commit

GFS2: Rename ops_inode.c to inode.c · 2ab9cd1c

Steven Whitehouse authored 13 years ago


This is the final part of the ops_inode.c/inode.c reordering. We
are left with a single file called inode.c which now contains
all the inode operations, as expected.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

2ab9cd1c

09 May, 2011 5 commits

GFS2: Move most of the remaining inode.c into ops_inode.c · 194c011f

Steven Whitehouse authored 13 years ago


This is in preparation to remove inode.c and rename ops_inode.c
to inode.c. Also most of the functions which were left in inode.c
relate to the creation and lookup of inodes. I'm intending to work
on consolidating some of that code, and its easier when its all in
one place.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

194c011f

GFS2: Remove gfs2_dinode_print() function · 94fb763b

Steven Whitehouse authored 13 years ago

This function was intended for debugging purposes, but it is not very
useful. If we want to know what is on disk then all we need is a
block number and gfs2_edit can give us much better information about
what is there. Otherwise, if we are interested in what is stored in
the in-core inode, it doesn't help us out there either.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

94fb763b

GFS2: When adding a new dir entry, inc link count if it is a subdir · 3d6ecb7d

Steven Whitehouse authored 13 years ago


This adds an increment of the link count when we add a new directory
entry, if that entry is itself a directory. This means that we no
longer need separate code to perform this operation.

Now that both adding and removing directory entries automatically
update the parent directory's link count if required, that makes
the code shorter and simpler than before.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

3d6ecb7d

GFS2: Make gfs2_dir_del update link count when required · 855d23ce

Steven Whitehouse authored 13 years ago

When we remove an entry from a directory, we can save ourselves
some trouble if we know the type of the entry in question, since
if it is itself a directory, we can update the link count of the
parent at the same time as removing the directory entry.

In addition this patch also merges the rmdir and unlink code which
was almost identical anyway. This eliminates the calls to remove
the . and .. directory entries on each rmdir (not needed since the
directory will be deallocated, anyway) which was the only thing preventing
passing the dentry to gfs2_dir_del(). The passing of the dentry
rather than just the name allows us to figure out the type of the entry
which is being removed, and thus adjust the link count when required.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

855d23ce

GFS2: Don't use gfs2_change_nlink in link syscall · 2baee03f

Steven Whitehouse authored 13 years ago


There are three users of gfs2_change_nlink which add to the link
count. Two of these are about to be removed in later patches, so
this means that there will no callers, when that happens allowing
removal of that function, also in a later patch.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

2baee03f

05 May, 2011 1 commit

GFS2: Double check link count under glock · d192a8e5

Steven Whitehouse authored 13 years ago


To avoid any possible races relating to the link count, we need to
recheck it under the inode's glock in all cases where it matters.
Also to ensure we never get any nasty surprises, this patch also
ensures that once the link count has hit zero it can never be
elevated by rereading in data from disk.

The only place we cannot provide a proper solution is in rename
in the case where we are removing a target inode and we discover
that the target inode has been already unlinked on another node.
The race window is very small, and we return EAGAIN in this case
to indicate what has happened. The proper solution would be to move
the lookup parts of rename from the vfs into library calls which
the fs could call directly, but that is potentially a very big job
and this fix should cover most cases for now.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

d192a8e5

21 Jan, 2011 1 commit

GFS2: Post-VFS scale update for RCU path walk · 75d5cfbe

Steven Whitehouse authored 14 years ago


We can allow a few more cases to use RCU path walking than
originally allowed. It should be possible to also enable
RCU path walking when the glock is already cached. Thats
a bit more complicated though, so left for a future patch.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Nick Piggin <npiggin@gmail.com>

75d5cfbe

17 Jan, 2011 2 commits

fallocate should be a file operation · 2fe17c10

Christoph Hellwig authored 14 years ago

Currently all filesystems except XFS implement fallocate asynchronously,
while XFS forced a commit. Both of these are suboptimal - in case of O_SYNC
I/O we really want our allocation on disk, especially for the !KEEP_SIZE
case where we actually grow the file with user-visible zeroes. On the
other hand always commiting the transaction is a bad idea for fast-path
uses of fallocate like for example in recent Samba versions. Given
that block allocation is a data plane operation anyway change it from
an inode operation to a file operation so that we have the file structure
available that lets us check for O_SYNC.

This also includes moving the code around for a few of the filesystems,
and remove the already unnedded S_ISDIR checks given that we only wire
up fallocate for regular files.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

2fe17c10

make the feature checks in ->fallocate future proof · 64c23e86

Christoph Hellwig authored 14 years ago


Instead of various home grown checks that might need updates for new
flags just check for any bit outside the mask of the features supported
by the filesystem.  This makes the check future proof for any newly
added flag.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

64c23e86

13 Jan, 2011 2 commits

Gfs2: fail if we try to use hole punch · 9ecf639a

Josef Bacik authored 14 years ago


Gfs2 doesn't have the ability to punch holes yet, so make sure we return
EOPNOTSUPP if we try to use hole punching through fallocate.  This support can
be added later.  Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

9ecf639a

switch gfs2, close races · 41ced6dc
Al Viro authored 14 years ago
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
41ced6dc

07 Jan, 2011 2 commits

fs: provide rcu-walk aware permission i_ops · b74c79e9
Nick Piggin authored 14 years ago
```
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
```
b74c79e9

fs: dcache reduce branches in lookup path · fb045adb

Nick Piggin authored 14 years ago

Reduce some branches and memory accesses in dcache lookup by adding dentry
flags to indicate common d_ops are set, rather than having to check them.
This saves a pointer memory access (dentry->d_op) in common path lookup
situations, and saves another pointer load and branch in cases where we
have d_op but not the particular operation.

Patched with:

git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i
Signed-off-by: Nick Piggin <npiggin@kernel.dk>

fb045adb

30 Nov, 2010 2 commits

GFS2: Clean up duplicated setattr code · 2ae51ed7

Steven Whitehouse authored 14 years ago

While preparing the last patch I noticed that the gfs2_setattr_simple
code had been duplicated into two other places. This patch updates
those to call gfs2_setattr_simple rather than open coding it.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

2ae51ed7

GFS2: Remove unreachable calls to vmtruncate · 9e55cd53

Steven Whitehouse authored 14 years ago

Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

9e55cd53

26 Oct, 2010 2 commits

new helper: ihold() · 7de9c6ee

Al Viro authored 14 years ago


Clones an existing reference to inode; caller must already hold one.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

7de9c6ee

fs: kill block_prepare_write · ebdec241

Christoph Hellwig authored 14 years ago


__block_write_begin and block_prepare_write are identical except for slightly
different calling conventions.  Convert all callers to the __block_write_begin
calling conventions and drop block_prepare_write.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

ebdec241

30 Sep, 2010 1 commit

GFS2 fatal: filesystem consistency error on rename · 46290341

Bob Peterson authored 14 years ago


This patch fixes a GFS2 problem whereby the first rename after a
mount can result in a file system consistency error being flagged
improperly and cause the file system to withdraw.  The problem is
that the rename code tries to run the rgrp list with function
gfs2_blk2rgrpd before the rgrp list is guaranteed to be read in
from disk.  The patch makes the rename function hold the rindex
glock (as the gfs2_unlink code does today) which reads in the rgrp
list if need be.  There were a total of three places in the rename
code that improperly referenced the rgrp list without the rindex
glock and this patch fixes all three.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

46290341

28 Sep, 2010 1 commit

GFS2: reserve more blocks for transactions · bf97b673

Benjamin Marzinski authored 14 years ago

Some of the functions in GFS2 were not reserving space in the transaction for
the resource group header and the resource groups bitblocks that get added
when you do allocation. GFS2 now makes sure to reserve space for the
resource group header and either all the bitblocks in the resource group, or
one for each block that it may allocate, whichever is smaller using the new
gfs2_rg_blocks() inline function.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

bf97b673

20 Sep, 2010 3 commits

GFS2: Make . and .. qstrs constant · 8d123585

Steven Whitehouse authored 14 years ago


Rather than calculating the qstrs for . and .. each time
we need them, its better to keep a constant version of
these and just refer to them when required.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Reviewed-by: Christoph Hellwig <hch@infradead.org>

8d123585

GFS2: Fix whitespace in previous patch · fe08d5a8
Steven Whitehouse authored 14 years ago
```
Removes the offending space
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
```
fe08d5a8

GFS2: fallocate support · 3921120e

Benjamin Marzinski authored 14 years ago

This patch adds support for fallocate to gfs2. Since the gfs2 does not support
uninitialized data blocks, it must write out zeros to all the blocks. However,
since it does not need to lock any pages to read from, gfs2 can write out the
zero blocks much more efficiently. On a moderately full filesystem, fallocate
works around 5 times faster on average. The fallocate call also allows gfs2 to
add blocks to the file without changing the filesize, which will make it
possible for gfs2 to preallocate space for the rindex file, so that gfs2 can
grow a completely full filesystem.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

3921120e