- 04 Nov, 2009 1 commit
-
-
Changli Gao authored
sendfile(2) was reworked with the splice infrastructure, but it still checks f_op.sendpage() instead of f_op.splice_write() wrongly. Although if f_op.sendpage() exists, f_op.splice_write() always exists at the same time currently, the assumption will be broken in future silently. This patch also brings a side effect: sendfile(2) can work with any output file. Some security checks related to f_op are added too. Signed-off-by:
Changli Gao <xiaosuo@gmail.com> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
- 14 Sep, 2009 1 commit
-
-
Jan Kara authored
Introduce new function for generic inode syncing (vfs_fsync_range) and use it from fsync() path. Introduce also new helper for syncing after a sync write (generic_write_sync) using the generic function. Use these new helpers for syncing from generic VFS functions. This makes O_SYNC writes to block devices acquire i_mutex for syncing. If we really care about this, we can make block_fsync() drop the i_mutex and reacquire it before it returns. CC: Evgeniy Polyakov <zbr@ioremap.net> CC: ocfs2-devel@oss.oracle.com CC: Joel Becker <joel.becker@oracle.com> CC: Felix Blyakher <felixb@sgi.com> CC: xfs@oss.sgi.com CC: Anton Altaparmakov <aia21@cantab.net> CC: linux-ntfs-dev@lists.sourceforge.net CC: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> CC: linux-ext4@vger.kernel.org CC: tytso@mit.edu Acked-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jan Kara <jack@suse.cz>
-
- 11 Sep, 2009 1 commit
-
-
Miklos Szeredi authored
Splice should update the modification and access times on regular files just like read and write. Not updating mtime will confuse backup tools, etc... This patch only adds the time updates for regular files. For pipes and other special files that splice touches the need for updating the times is less clear. Let's discuss and fix that separately. Signed-off-by:
Miklos Szeredi <mszeredi@suse.cz> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
- 19 May, 2009 1 commit
-
-
Miklos Szeredi authored
Unfortunately multiple kmap() within a single thread are deadlockable, so writing out multiple buffers with writev() isn't possible. Change the implementation so that it does a separate write() for each buffer. This actually simplifies the code a lot since the splice_from_pipe() helper can be used. This limitation is caused by HIGHMEM pages, and so only affects a subset of architectures and configurations. In the future it may be worth to implement default_file_splice_write() in a more efficient way on configs that allow it. Signed-off-by:
Miklos Szeredi <mszeredi@suse.cz> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
- 14 May, 2009 1 commit
-
-
Andrew Morton authored
fs/splice.c: In function 'default_file_splice_read': fs/splice.c:566: warning: 'error' may be used uninitialized in this function which is sort-of true. The code will in fact return -ENOMEM instead of the kernel_readv() return value. Cc: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
- 13 May, 2009 1 commit
-
-
Jens Axboe authored
We cannot reliably map more than one page at the time, or we risk deadlocking. Just allocate the pages from low mem instead. Reported-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
- 11 May, 2009 3 commits
-
-
Miklos Szeredi authored
If f_op->splice_write() is not implemented, fall back to a plain write. Use vfs_writev() to write from the pipe buffers. This will allow splice on all filesystems and file types. This includes "direct_io" files in fuse which bypass the page cache. Signed-off-by:
Miklos Szeredi <mszeredi@suse.cz> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
Miklos Szeredi authored
If f_op->splice_read() is not implemented, fall back to a plain read. Use vfs_readv() to read into previously allocated pages. This will allow splice and functions using splice, such as the loop device, to work on all filesystems. This includes "direct_io" files in fuse which bypass the page cache. Signed-off-by:
Miklos Szeredi <mszeredi@suse.cz> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
Miklos Szeredi authored
Allow splice(2) to work when both the input and the output is a pipe. Based on the impementation of the tee(2) syscall, but instead of duplicating the buffer references move the buffers from the input pipe to the output pipe. Moving the whole buffer only succeeds if the full length of the buffer is spliced. Otherwise duplicate the buffer, just like tee(2), set the length of the output buffer and advance the offset on the input buffer. Since splice is operating on two pipes, special care needs to be taken with locking to prevent AN ABBA deadlock. Again this is done similarly to the tee(2) syscall, first preparing the input and output pipes so there's data to consume and space for that data, and then doing the move operation while holding both locks. If other processes are doing I/O on the same pipes parallel to the splice, then by the time both inodes are locked there might be no buffers left to move, or no space to move them to. In this case retry the whole operation, including the preparation phase. This could lead to starvation, but I'm not sure if that's serious enough to worry about. Signed-off-by:
Miklos Szeredi <mszeredi@suse.cz> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
- 17 Apr, 2009 1 commit
-
-
Randy Dunlap authored
splice: fix kernel-doc warnings Warning(fs/splice.c:617): bad line: Warning(fs/splice.c:722): No description found for parameter 'sd' Warning(fs/splice.c:722): Excess function parameter 'pipe' description in 'splice_from_pipe_begin' Signed-off-by:
Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 15 Apr, 2009 6 commits
-
-
Miklos Szeredi authored
There are lots of sequences like this, especially in splice code: if (pipe->inode) mutex_lock(&pipe->inode->i_mutex); /* do something */ if (pipe->inode) mutex_unlock(&pipe->inode->i_mutex); so introduce helpers which do the conditional locking and unlocking. Also replace the inode_double_lock() call with a pipe_double_lock() helper to avoid spreading the use of this functionality beyond the pipe code. This patch is just a cleanup, and should cause no behavioral changes. Signed-off-by:
Miklos Szeredi <mszeredi@suse.cz> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
Miklos Szeredi authored
Remove the now unused generic_file_splice_write_nolock() function. It's conceptually broken anyway, because splice may need to wait for pipe events so holding locks across the whole operation is wrong. Signed-off-by:
Miklos Szeredi <mszeredi@suse.cz> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
Miklos Szeredi authored
Rearrange locking of i_mutex on destination and call to ocfs2_rw_lock() so locks are only held while buffers are copied with the pipe_to_file() actor, and not while waiting for more data on the pipe. Signed-off-by:
Miklos Szeredi <mszeredi@suse.cz> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
Miklos Szeredi authored
Rearrange locking of i_mutex on destination so it's only held while buffers are copied with the pipe_to_file() actor, and not while waiting for more data on the pipe. Signed-off-by:
Miklos Szeredi <mszeredi@suse.cz> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
Miklos Szeredi authored
splice_from_pipe() is only called from two places: - generic_splice_sendpage() - splice_write_null() Neither of these require i_mutex to be taken on the destination inode. Signed-off-by:
Miklos Szeredi <mszeredi@suse.cz> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
Miklos Szeredi authored
Split up __splice_from_pipe() into four helper functions: splice_from_pipe_begin() splice_from_pipe_next() splice_from_pipe_feed() splice_from_pipe_end() splice_from_pipe_next() will wait (if necessary) for more buffers to be added to the pipe. splice_from_pipe_feed() will feed the buffers to the supplied actor and return when there's no more data available (or if all of the requested data has been copied). This is necessary so that implementations can do locking around the non-waiting splice_from_pipe_feed(). This patch should not cause any change in behavior. Signed-off-by:
Miklos Szeredi <mszeredi@suse.cz> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
- 07 Apr, 2009 1 commit
-
-
Miklos Szeredi authored
There's a possible deadlock in generic_file_splice_write(), splice_from_pipe() and ocfs2_file_splice_write(): - task A calls generic_file_splice_write() - this calls inode_double_lock(), which locks i_mutex on both pipe->inode and target inode - ordering depends on inode pointers, can happen that pipe->inode is locked first - __splice_from_pipe() needs more data, calls pipe_wait() - this releases lock on pipe->inode, goes to interruptible sleep - task B calls generic_file_splice_write(), similarly to the first - this locks pipe->inode, then tries to lock inode, but that is already held by task A - task A is interrupted, it tries to lock pipe->inode, but fails, as it is already held by task B - ABBA deadlock Fix this by explicitly ordering locks: the outer lock must be on target inode and the inner lock (which is later unlocked and relocked) must be on pipe->inode. This is OK, pipe inodes and target inodes form two nonoverlapping sets, generic_file_splice_write() and friends are not called with a target which is a pipe. Signed-off-by:
Miklos Szeredi <mszeredi@suse.cz> Acked-by:
Mark Fasheh <mfasheh@suse.com> Acked-by:
Jens Axboe <jens.axboe@oracle.com> Cc: stable@kernel.org Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 03 Apr, 2009 1 commit
-
-
David Howells authored
Recruit a page flag to aid in cache management. The following extra flag is defined: (1) PG_fscache (PG_private_2) The marked page is backed by a local cache and is pinning resources in the cache driver. If PG_fscache is set, then things that checked for PG_private will now also check for that. This includes things like truncation and page invalidation. The function page_has_private() had been added to make the checks for both PG_private and PG_private_2 at the same time. Signed-off-by:
David Howells <dhowells@redhat.com> Acked-by:
Steve Dickson <steved@redhat.com> Acked-by:
Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by:
Rik van Riel <riel@redhat.com> Acked-by:
Al Viro <viro@zeniv.linux.org.uk> Tested-by:
Daire Byrne <Daire.Byrne@framestore.com>
-
- 14 Jan, 2009 1 commit
-
-
Heiko Carstens authored
Signed-off-by:
Heiko Carstens <heiko.carstens@de.ibm.com>
-
- 08 Jan, 2009 1 commit
-
-
KAMEZAWA Hiroyuki authored
A big patch for changing memcg's LRU semantics. Now, - page_cgroup is linked to mem_cgroup's its own LRU (per zone). - LRU of page_cgroup is not synchronous with global LRU. - page and page_cgroup is one-to-one and statically allocated. - To find page_cgroup is on what LRU, you have to check pc->mem_cgroup as - lru = page_cgroup_zoneinfo(pc, nid_of_pc, zid_of_pc); - SwapCache is handled. And, when we handle LRU list of page_cgroup, we do following. pc = lookup_page_cgroup(page); lock_page_cgroup(pc); .....................(1) mz = page_cgroup_zoneinfo(pc); spin_lock(&mz->lru_lock); .....add to LRU spin_unlock(&mz->lru_lock); unlock_page_cgroup(pc); But (1) is spin_lock and we have to be afraid of dead-lock with zone->lru_lock. So, trylock() is used at (1), now. Without (1), we can't trust "mz" is correct. This is a trial to remove this dirty nesting of locks. This patch changes mz->lru_lock to be zone->lru_lock. Then, above sequence will be written as spin_lock(&zone->lru_lock); # in vmscan.c or swap.c via global LRU mem_cgroup_add/remove/etc_lru() { pc = lookup_page_cgroup(page); mz = page_cgroup_zoneinfo(pc); if (PageCgroupUsed(pc)) { ....add to LRU } spin_lock(&zone->lru_lock); # in vmscan.c or swap.c via global LRU This is much simpler. (*) We're safe even if we don't take lock_page_cgroup(pc). Because.. 1. When pc->mem_cgroup can be modified. - at charge. - at account_move(). 2. at charge the PCG_USED bit is not set before pc->mem_cgroup is fixed. 3. at account_move() the page is isolated and not on LRU. Pros. - easy for maintenance. - memcg can make use of laziness of pagevec. - we don't have to duplicated LRU/Active/Unevictable bit in page_cgroup. - LRU status of memcg will be synchronized with global LRU's one. - # of locks are reduced. - account_move() is simplified very much. Cons. - may increase cost of LRU rotation. (no impact if memcg is not configured.) Signed-off-by:
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 30 Oct, 2008 1 commit
-
-
Nick Piggin authored
Nothing uses prepare_write or commit_write. Remove them from the tree completely. [akpm@linux-foundation.org: schedule simple_prepare_write() for unexporting] Signed-off-by:
Nick Piggin <npiggin@suse.de> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 09 Oct, 2008 1 commit
-
-
Linus Torvalds authored
This is debatable, but while we're debating it, let's disallow the combination of splice and an O_APPEND destination. It's not entirely clear what the semantics of O_APPEND should be, and POSIX apparently expects pwrite() to ignore O_APPEND, for example. So we could make up any semantics we want, including the old ones. But Miklos convinced me that we should at least give it some thought, and that accepting writes at arbitrary offsets is wrong at least for IS_APPEND() files (which always have O_APPEND set, even if the reverse isn't true: you can obviously have O_APPEND set on a regular file). So disallow O_APPEND entirely for now. I doubt anybody cares, and this way we have one less gray area to worry about. Reported-and-argued-for-by:
Miklos Szeredi <miklos@szeredi.hu> Acked-by:
Jens Axboe <ens.axboe@oracle.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 05 Aug, 2008 1 commit
-
-
Nick Piggin authored
Converting page lock to new locking bitops requires a change of page flag operation naming, so we might as well convert it to something nicer (!TestSetPageLocked_Lock => trylock_page, SetPageLocked => set_page_locked). This also facilitates lockdeping of page lock. Signed-off-by:
Nick Piggin <npiggin@suse.de> Acked-by:
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by:
Peter Zijlstra <peterz@infradead.org> Acked-by:
Andrew Morton <akpm@linux-foundation.org> Acked-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 27 Jul, 2008 1 commit
-
-
Miklos Szeredi authored
All calls to remove_suid() are made with a file pointer, because (similarly to file_update_time) it is called when the file is written. Clean up callers by passing in a file instead of a dentry. Signed-off-by:
Miklos Szeredi <mszeredi@suse.cz>
-
- 26 Jul, 2008 1 commit
-
-
Nick Piggin authored
Use get_user_pages_fast in splice. This reverts some mmap_sem batching there, however the biggest problem with mmap_sem tends to be hold times blocking out other threads rather than cacheline bouncing. Further: on architectures that implement get_user_pages_fast without locks, mmap_sem can be avoided completely anyway. Signed-off-by:
Nick Piggin <npiggin@suse.de> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andi Kleen <andi@firstfloor.org> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Zach Brown <zach.brown@oracle.com> Cc: Jens Axboe <jens.axboe@oracle.com> Reviewed-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 04 Jul, 2008 1 commit
-
-
Miklos Szeredi authored
If a page was invalidated during splicing from file to a pipe, then generic_file_splice_read() could return a short or zero count. This manifested itself in rare I/O errors seen on nfs exported fuse filesystems. This is because nfsd uses splice_direct_to_actor() to read files, and fuse uses invalidate_inode_pages2() to invalidate stale data on open. Fix by redoing the page find/create if it was found to be truncated (invalidated). Signed-off-by:
Miklos Szeredi <mszeredi@suse.cz> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
- 28 May, 2008 2 commits
-
-
Jens Axboe authored
splice currently assumes that try_to_release_page() always suceeds, but it can return failure. If it does, we cannot steal the page. Acked-by: Mingming Cao <cmm@us.ibm.com Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
Tom Zanussi authored
Splice isn't always incrementing the ppos correctly, which broke relay splice. Signed-off-by:
Tom Zanussi <zanussi@comcast.net> Tested-by:
Dan Williams <dan.j.williams@intel.com> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
- 08 May, 2008 1 commit
-
-
Jens Axboe authored
This reverts commit c3270e57.
-
- 07 May, 2008 1 commit
-
-
Miklos Szeredi authored
generic_file_splice_write() duplicates remove_suid() just because it doesn't hold i_mutex. But it grabs i_mutex inside splice_from_pipe() anyway, so this is rather pointless. Move locking to generic_file_splice_write() and call remove_suid() and __splice_from_pipe() instead. Signed-off-by:
Miklos Szeredi <mszeredi@suse.cz> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
- 29 Apr, 2008 1 commit
-
-
Tom Zanussi authored
Splice isn't always incrementing the ppos correctly, which broke relay splice. Signed-off-by:
Tom Zanussi <zanussi@comcast.net> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
- 10 Apr, 2008 1 commit
-
-
Jens Axboe authored
There's a quirky loop in generic_file_splice_read() that could go on indefinitely, if the file splice returns 0 permanently (and not just as a temporary condition). Get rid of the loop and pass back -EAGAIN correctly from __generic_file_splice_read(), so we handle that condition properly as well. Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
- 03 Apr, 2008 1 commit
-
-
Hugh Dickins authored
The loop block driver is careful to mask __GFP_IO|__GFP_FS out of its mapping_gfp_mask, to avoid hangs under memory pressure. But nowadays it uses splice, usually going through __generic_file_splice_read. That must use mapping_gfp_mask instead of GFP_KERNEL to avoid those hangs. Signed-off-by:
Hugh Dickins <hugh@veritas.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 04 Mar, 2008 1 commit
-
-
Jens Axboe authored
sys_tee() currently is a bit eager in returning -EAGAIN, it may do so even if we don't have a chance of anymore data becoming available. So improve the logic and only return -EAGAIN if we have an attached writer to the input pipe. Reported by Johann Felix Soden <johfel@gmx.de> and Patrick McManus <mcmanus@ducksong.com>. Tested-by:
Johann Felix Soden <johfel@users.sourceforge.net> Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
- 10 Feb, 2008 1 commit
-
-
Bastian Blank authored
Commit 8811930d ("splice: missing user pointer access verification") added the proper access_ok() calls to copy_from_user_mmap_sem() which ensures we can copy the struct iovecs from userspace to the kernel. But we also must check whether we can access the actual memory region pointed to by the struct iovec to fix the access checks properly. Signed-off-by:
Bastian Blank <waldi@debian.org> Acked-by:
Oliver Pinter <oliver.pntr@gmail.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 08 Feb, 2008 1 commit
-
-
Jens Axboe authored
vmsplice_to_user() must always check the user pointer and length with access_ok() before copying. Likewise, for the slow path of copy_from_user_mmap_sem() we need to check that we may read from the user region. Signed-off-by:
Jens Axboe <jens.axboe@oracle.com> Cc: Wojciech Purczynski <cliph@research.coseinc.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 01 Feb, 2008 1 commit
-
-
Jens Axboe authored
Andre Majorel <aym-xunil@teaser.fr> points out that if we only updated the atime when we transfer some data, we deviate from the standard of always updating the atime. So change splice to always call file_accessed() even if splice_direct_to_actor() didn't transfer any data. Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
- 29 Jan, 2008 1 commit
-
-
Jens Axboe authored
A bug report on nfsd that states that since it was switched to use splice instead of sendfile, the atime was no longer being updated on the input file. do_generic_mapping_read() does this when accessing the file, make splice do it for the direct splice handler. Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
- 28 Jan, 2008 1 commit
-
-
Jens Axboe authored
Allow caller to pass in a release function, there might be other resources that need releasing as well. Needed for network receive. Signed-off-by:
Jens Axboe <jens.axboe@oracle.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 25 Jan, 2008 1 commit
-
-
James Morris authored
All instances of rw_verify_area() are followed by a call to security_file_permission(), so just call the latter from the former. Acked-by:
Eric Paris <eparis@redhat.com> Signed-off-by:
James Morris <jmorris@namei.org>
-