- 07 Jan, 2011 6 commits
-
-
Nick Piggin authored
Reduce some branches and memory accesses in dcache lookup by adding dentry flags to indicate common d_ops are set, rather than having to check them. This saves a pointer memory access (dentry->d_op) in common path lookup situations, and saves another pointer load and branch in cases where we have d_op but not the particular operation. Patched with: git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i Signed-off-by:
Nick Piggin <npiggin@kernel.dk>
-
Nick Piggin authored
Use a seqlock in the fs_struct to enable us to take an atomic copy of the complete cwd and root paths. Use this in the RCU lookup path to avoid a thread-shared spinlock in RCU lookup operations. Multi-threaded apps may now perform path lookups with scalability matching multi-process apps. Operations such as stat(2) become very scalable for multi-threaded workload. Signed-off-by:
Nick Piggin <npiggin@kernel.dk>
-
Nick Piggin authored
Perform common cases of path lookups without any stores or locking in the ancestor dentry elements. This is called rcu-walk, as opposed to the current algorithm which is a refcount based walk, or ref-walk. This results in far fewer atomic operations on every path element, significantly improving path lookup performance. It also avoids cacheline bouncing on common dentries, significantly improving scalability. The overall design is like this: * LOOKUP_RCU is set in nd->flags, which distinguishes rcu-walk from ref-walk. * Take the RCU lock for the entire path walk, starting with the acquiring of the starting path (eg. root/cwd/fd-path). So now dentry refcounts are not required for dentry persistence. * synchronize_rcu is called when unregistering a filesystem, so we can access d_ops and i_ops during rcu-walk. * Similarly take the vfsmount lock for the entire path walk. So now mnt refcounts are not required for persistence. Also we are free to perform mount lookups, and to assume dentry mount points and mount roots are stable up and down the path. * Have a per-dentry seqlock to protect the dentry name, parent, and inode, so we can load this tuple atomically, and also check whether any of its members have changed. * Dentry lookups (based on parent, candidate string tuple) recheck the parent sequence after the child is found in case anything changed in the parent during the path walk. * inode is also RCU protected so we can load d_inode and use the inode for limited things. * i_mode, i_uid, i_gid can be tested for exec permissions during path walk. * i_op can be loaded. When we reach the destination dentry, we lock it, recheck lookup sequence, and increment its refcount and mountpoint refcount. RCU and vfsmount locks are dropped. This is termed "dropping rcu-walk". If the dentry refcount does not match, we can not drop rcu-walk gracefully at the current point in the lokup, so instead return -ECHILD (for want of a better errno). This signals the path walking code to re-do the entire lookup with a ref-walk. Aside from the final dentry, there are other situations that may be encounted where we cannot continue rcu-walk. In that case, we drop rcu-walk (ie. take a reference on the last good dentry) and continue with a ref-walk. Again, if we can drop rcu-walk gracefully, we return -ECHILD and do the whole lookup using ref-walk. But it is very important that we can continue with ref-walk for most cases, particularly to avoid the overhead of double lookups, and to gain the scalability advantages on common path elements (like cwd and root). The cases where rcu-walk cannot continue are: * NULL dentry (ie. any uncached path element) * parent with d_inode->i_op->permission or ACLs * dentries with d_revalidate * Following links In future patches, permission checks and d_revalidate become rcu-walk aware. It may be possible eventually to make following links rcu-walk aware. Uncached path elements will always require dropping to ref-walk mode, at the very least because i_mutex needs to be grabbed, and objects allocated. Signed-off-by:
Nick Piggin <npiggin@kernel.dk>
-
Nick Piggin authored
dcache_lock no longer protects anything. remove it. Signed-off-by:
Nick Piggin <npiggin@kernel.dk>
-
Nick Piggin authored
Make d_count non-atomic and protect it with d_lock. This allows us to ensure a 0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when we start protecting many other dentry members with d_lock. Signed-off-by:
Nick Piggin <npiggin@kernel.dk>
-
Nick Piggin authored
Change d_hash so it may be called from lock-free RCU lookups. See similar patch for d_compare for details. For in-tree filesystems, this is just a mechanical change. Signed-off-by:
Nick Piggin <npiggin@kernel.dk>
-
- 07 Dec, 2010 1 commit
-
-
Lino Sanfilippo authored
Unsetting FMODE_NONOTIFY in fsnotify_open() is too late, since fsnotify_perm() is called before. If FMODE_NONOTIFY is set fsnotify_perm() will skip permission checks, so a user can still disable permission checks by setting this flag in an open() call. This patch corrects this by unsetting the flag before fsnotify_perm is called. Signed-off-by:
Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by:
Eric Paris <eparis@redhat.com>
-
- 29 Oct, 2010 1 commit
-
-
Al Viro authored
nameidata_to_filp() drops nd->path or transfers it to opened file. In the former case it's a Bad Idea(tm) to do mnt_drop_write() on nd->path.mnt, since we might race with umount and vfsmount in question might be gone already. Fix: don't drop it, then... IOW, have nameidata_to_filp() grab nd->path in case it transfers it to file and do path_drop() in callers. After they are through with accessing nd->path... Reported-by:
Miklos Szeredi <miklos@szeredi.hu> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 26 Oct, 2010 2 commits
-
-
Al Viro authored
Clones an existing reference to inode; caller must already hold one. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Christoph Hellwig authored
The caller that didn't need it is gone. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 18 Aug, 2010 4 commits
-
-
Nick Piggin authored
fs: brlock vfsmount_lock Use a brlock for the vfsmount lock. It must be taken for write whenever modifying the mount hash or associated fields, and may be taken for read when performing mount hash lookups. A new lock is added for the mnt-id allocator, so it doesn't need to take the heavy vfsmount write-lock. The number of atomics should remain the same for fastpath rlock cases, though code would be slightly slower due to per-cpu access. Scalability is not not be much improved in common cases yet, due to other locks (ie. dcache_lock) getting in the way. However path lookups crossing mountpoints should be one case where scalability is improved (currently requiring the global lock). The slowpath is slower due to use of brlock. On a 64 core, 64 socket, 32 node Altix system (high latency to remote nodes), a simple umount microbenchmark (mount --bind mnt mnt2 ; umount mnt2 loop 1000 times), before this patch it took 6.8s, afterwards took 7.1s, about 5% slower. Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by:
Nick Piggin <npiggin@kernel.dk> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Nick Piggin authored
fs: remove extra lookup in __lookup_hash Optimize lookup for create operations, where no dentry should often be common-case. In cases where it is not, such as unlink, the added overhead is much smaller than the removed. Also, move comments about __d_lookup racyness to the __d_lookup call site. d_lookup is intuitive; __d_lookup is what needs commenting. So in that same vein, add kerneldoc comments to __d_lookup and clean up some of the comments: - We are interested in how the RCU lookup works here, particularly with renames. Make that explicit, and point to the document where it is explained in more detail. - RCU is pretty standard now, and macros make implementations pretty mindless. If we want to know about RCU barrier details, we look in RCU code. - Delete some boring legacy comments because we don't care much about how the code used to work, more about the interesting parts of how it works now. So comments about lazy LRU may be interesting, but would better be done in the LRU or refcount management code. Signed-off-by:
Nick Piggin <npiggin@kernel.dk> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Nick Piggin authored
fs: dentry allocation consolidation There are 2 duplicate copies of code in dentry allocation in path lookup. Consolidate them into a single function. Signed-off-by:
Nick Piggin <npiggin@kernel.dk> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Nick Piggin authored
fs: fix do_lookup false negative In do_lookup, if we initially find no dentry, we take the directory i_mutex and re-check the lookup. If we find a dentry there, then we revalidate it if needed. However if that revalidate asks for the dentry to be invalidated, we return -ENOENT from do_lookup. What should happen instead is an attempt to allocate and lookup a new dentry. This is probably not noticed because it is rare. It is only reached if a concurrent create races in first (in which case, the dentry probably won't be invalidated anyway), or if the racy __d_lookup has failed due to a false-negative (which is very rare). Fix this by removing code and have it use the normal reval path. Signed-off-by:
Nick Piggin <npiggin@kernel.dk> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 11 Aug, 2010 1 commit
-
-
Miklos Szeredi authored
Add three helpers that retrieve a refcounted copy of the root and cwd from the supplied fs_struct. get_fs_root() get_fs_pwd() get_fs_root_and_pwd() Signed-off-by:
Miklos Szeredi <mszeredi@suse.cz> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 02 Aug, 2010 2 commits
-
-
Eric Paris authored
SELinux needs to pass the MAY_ACCESS flag so it can handle auditting correctly. Presently the masking of MAY_* flags is done in the VFS. In order to allow LSMs to decide what flags they care about and what flags they don't just pass them all and the each LSM mask off what they don't need. This patch should contain no functional changes to either the VFS or any LSM. Signed-off-by:
Eric Paris <eparis@redhat.com> Acked-by:
Stephen D. Smalley <sds@tycho.nsa.gov> Signed-off-by:
James Morris <jmorris@namei.org>
-
Tetsuo Handa authored
When commit be6d3e56 "introduce new LSM hooks where vfsmount is available." was proposed, regarding security_path_truncate(), only "struct file *" argument (which AppArmor wanted to use) was removed. But length and time_attrs arguments are not used by TOMOYO nor AppArmor. Thus, let's remove these arguments. Signed-off-by:
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by:
Nick Piggin <npiggin@suse.de> Signed-off-by:
James Morris <jmorris@namei.org>
-
- 28 Jul, 2010 1 commit
-
-
Eric Paris authored
fsnotify was using char * when it passed around the d_name.name string internally but it is actually an unsigned char *. This patch switches fsnotify to use unsigned and should silence some pointer signess warnings which have popped out of xfs. I do not add -Wpointer-sign to the fsnotify code as there are still issues with kstrdup and strlen which would pop out needless warnings. Signed-off-by:
Eric Paris <eparis@redhat.com>
-
- 28 May, 2010 1 commit
-
-
Neil Brown authored
Commit 1f36f774 broke FS_REVAL_DOT semantics. In particular, before this patch, the command ls -l in an NFS mounted directory would always check if the directory on the server had changed and if so would flush and refill the pagecache for the dir. After this patch, the same "ls -l" will repeatedly return stale date until the cached attributes for the directory time out. The following patch fixes this by ensuring the d_revalidate is called by do_last when "." is being looked-up. link_path_walk has already called d_revalidate, but in that case LOOKUP_OPEN is not set so nfs_lookup_verify_inode chooses not to do any validation. The following patch restores the original behaviour. Cc: stable@kernel.org Signed-off-by:
NeilBrown <neilb@suse.de> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 21 May, 2010 1 commit
-
-
Huang Shijie authored
update the mnt of the path when it is not equal to the new one. Signed-off-by:
Huang Shijie <shijie8@gmail.com> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 15 May, 2010 1 commit
-
-
Al Viro authored
1) i_flags simply doesn't work for mount/unlink race prevention; we may have many links to file and rm on one of those obviously shouldn't prevent bind on top of another later on. To fix it right way we need to mark _dentry_ as unsuitable for mounting upon; new flag (DCACHE_CANT_MOUNT) is protected by d_flags and i_mutex on the inode in question. Set it (with dont_mount(dentry)) in unlink/rmdir/etc., check (with cant_mount(dentry)) in places in namespace.c that used to check for S_DEAD. Setting S_DEAD is still needed in places where we used to set it (for directories getting killed), since we rely on it for readdir/rmdir race prevention. 2) rename()/mount() protection has another bogosity - we unhash the target before we'd checked that it's not a mountpoint. Fixed. 3) ancient bogosity in pivot_root() - we locked i_mutex on the right directory, but checked S_DEAD on the different (and wrong) one. Noticed and fixed. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 13 May, 2010 1 commit
-
-
Jan Kara authored
According to specification mkdir d; ln -s d a; open("a/", O_NOFOLLOW | O_RDONLY) should return success but currently it returns ELOOP. This is a regression caused by path lookup cleanup patch series. Fix the code to ignore O_NOFOLLOW in case the provided path has trailing slashes. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Reported-by:
Marius Tolzmann <tolzmann@molgen.mpg.de> Acked-by:
Miklos Szeredi <mszeredi@suse.cz> Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 26 Mar, 2010 1 commit
-
-
Al Viro authored
Lose want_dir argument, while we are at it - since now nd->flags & LOOKUP_DIRECTORY is equivalent to it. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 06 Mar, 2010 1 commit
-
-
Al Viro authored
We managed to lose O_DIRECTORY testing due to a stupid typo in commit 1f36f774 ("Switch !O_CREAT case to use of do_last()") Reported-by:
Walter Sheets <w41ter@gmail.com> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 05 Mar, 2010 16 commits
-
-
Al Viro authored
... and now we have all intents crap well localized Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Now that nd->last stays around until ->put_link() is called, we can just postpone that ->put_link() in do_filp_open() a bit and don't bother with copying. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Don't bother with path_walk() (and its retry loop); link_path_walk() will do it. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
We set it to 1 iff we return NULL Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Note that in case of !O_CREAT we know that nd.root has already been given up Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Nothing else uses it anymore Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Handling of LAST_DOT/LAST_ROOT/LAST_DOTDOT/terminating slash can be pulled in as well Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
If we'd passed through 32 trailing symlinks already, there's no sense following the 33rd - we'll bail out anyway. Better bugger off earlier. It *does* change behaviour, after a fashion - if the 33rd happens to be a procfs-style symlink, original code *would* allow it. This one will not. Cry me a river if that hurts you. Please, do. And post a video of that, while you are at it. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Since do_last() doesn't mangle nd->last_name, we can safely postpone __putname() done in handling of trailing symlinks until after the call of do_last() Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-