1. 12 Jul, 2013 1 commit
    • Colin Cross's avatar
      select: use freezable blocking call · 89e26f82
      Colin Cross authored
      
      Avoid waking up every thread sleeping in a select call during
      suspend and resume by calling a freezable blocking call.  Previous
      patches modified the freezer to avoid sending wakeups to threads
      that are blocked in freezable blocking calls.
      
      This call was selected to be converted to a freezable call because
      it doesn't hold any locks or release any resources when interrupted
      that might be needed by another freezing task or a kernel driver
      during suspend, and is a common site where idle userspace tasks are
      blocked.
      
      Change-Id: I0d7565ec0b6bc5d44cb55f958589c56e6bd16348
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarColin Cross <ccross@android.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      89e26f82
  2. 23 Mar, 2012 1 commit
    • Hans Verkuil's avatar
      poll: add poll_requested_events() and poll_does_not_wait() functions · 626cf236
      Hans Verkuil authored
      
      In some cases the poll() implementation in a driver has to do different
      things depending on the events the caller wants to poll for.  An example
      is when a driver needs to start a DMA engine if the caller polls for
      POLLIN, but doesn't want to do that if POLLIN is not requested but instead
      only POLLOUT or POLLPRI is requested.  This is something that can happen
      in the video4linux subsystem among others.
      
      Unfortunately, the current epoll/poll/select implementation doesn't
      provide that information reliably.  The poll_table_struct does have it: it
      has a key field with the event mask.  But once a poll() call matches one
      or more bits of that mask any following poll() calls are passed a NULL
      poll_table pointer.
      
      Also, the eventpoll implementation always left the key field at ~0 instead
      of using the requested events mask.
      
      This was changed in eventpoll.c so the key field now contains the actual
      events that should be polled for as set by the caller.
      
      The solution to the NULL poll_table pointer is to set the qproc field to
      NULL in poll_table once poll() matches the events, not the poll_table
      pointer itself.  That way drivers can obtain the mask through a new
      poll_requested_events inline.
      
      The poll_table_struct can still be NULL since some kernel code calls it
      internally (netfs_state_poll() in ./drivers/staging/pohmelfs/netfs.h).  In
      that case poll_requested_events() returns ~0 (i.e.  all events).
      
      Very rarely drivers might want to know whether poll_wait will actually
      wait.  If another earlier file descriptor in the set already matched the
      events the caller wanted to wait for, then the kernel will return from the
      select() call without waiting.  This might be useful information in order
      to avoid doing expensive work.
      
      A new helper function poll_does_not_wait() is added that drivers can use
      to detect this situation.  This is now used in sock_poll_wait() in
      include/net/sock.h.  This was the only place in the kernel that needed
      this information.
      
      Drivers should no longer access any of the poll_table internals, but use
      the poll_requested_events() and poll_does_not_wait() access functions
      instead.  In order to enforce that the poll_table fields are now prepended
      with an underscore and a comment was added warning against using them
      directly.
      
      This required a change in unix_dgram_poll() in unix/af_unix.c which used
      the key field to get the requested events.  It's been replaced by a call
      to poll_requested_events().
      
      For qproc it was especially important to change its name since the
      behavior of that field changes with this patch since this function pointer
      can now be NULL when that wasn't possible in the past.
      
      Any driver accessing the qproc or key fields directly will now fail to compile.
      
      Some notes regarding the correctness of this patch: the driver's poll()
      function is called with a 'struct poll_table_struct *wait' argument.  This
      pointer may or may not be NULL, drivers can never rely on it being one or
      the other as that depends on whether or not an earlier file descriptor in
      the select()'s fdset matched the requested events.
      
      There are only three things a driver can do with the wait argument:
      
      1) obtain the key field:
      
      	events = wait ? wait->key : ~0;
      
         This will still work although it should be replaced with the new
         poll_requested_events() function (which does exactly the same).
         This will now even work better, since wait is no longer set to NULL
         unnecessarily.
      
      2) use the qproc callback. This could be deadly since qproc can now be
         NULL. Renaming qproc should prevent this from happening. There are no
         kernel drivers that actually access this callback directly, BTW.
      
      3) test whether wait == NULL to determine whether poll would return without
         waiting. This is no longer sufficient as the correct test is now
         wait == NULL || wait->_qproc == NULL.
      
         However, the worst that can happen here is a slight performance hit in
         the case where wait != NULL and wait->_qproc == NULL. In that case the
         driver will assume that poll_wait() will actually add the fd to the set
         of waiting file descriptors. Of course, poll_wait() will not do that
         since it tests for wait->_qproc. This will not break anything, though.
      
         There is only one place in the whole kernel where this happens
         (sock_poll_wait() in include/net/sock.h) and that code will be replaced
         by a call to poll_does_not_wait() in the next patch.
      
         Note that even if wait->_qproc != NULL drivers cannot rely on poll_wait()
         actually waiting. The next file descriptor from the set might match the
         event mask and thus any possible waits will never happen.
      Signed-off-by: default avatarHans Verkuil <hans.verkuil@cisco.com>
      Reviewed-by: default avatarJonathan Corbet <corbet@lwn.net>
      Reviewed-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      626cf236
  3. 29 Feb, 2012 1 commit
  4. 22 Feb, 2012 1 commit
    • Linus Torvalds's avatar
      sys_poll: fix incorrect type for 'timeout' parameter · faf30900
      Linus Torvalds authored
      
      The 'poll()' system call timeout parameter is supposed to be 'int', not
      'long'.
      
      Now, the reason this matters is that right now 32-bit compat mode is
      broken on at least x86-64, because the 32-bit code just calls
      'sys_poll()' directly on x86-64, and the 32-bit argument will have been
      zero-extended, turning a signed 'int' into a large unsigned 'long'
      value.
      
      We could just introduce a 'compat_sys_poll()' function for this, and
      that may eventually be what we have to do, but since the actual standard
      poll() semantics is *supposed* to be 'int', and since at least on x86-64
      glibc sign-extends the argument before invocing the system call (so
      nobody can actually use a 64-bit timeout value in user space _anyway_,
      even in 64-bit binaries), the simpler solution would seem to be to just
      fix the definition of the system call to match what it should have been
      from the very start.
      
      If it turns out that somebody somehow circumvents the user-level libc
      64-bit sign extension and actually uses a large unsigned 64-bit timeout
      despite that not being how poll() is supposed to work, we will need to
      do the compat_sys_poll() approach.
      Reported-by: default avatarThomas Meyer <thomas@m3y3r.de>
      Acked-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      faf30900
  5. 19 Feb, 2012 1 commit
    • David Howells's avatar
      Replace the fd_sets in struct fdtable with an array of unsigned longs · 1fd36adc
      David Howells authored
      
      Replace the fd_sets in struct fdtable with an array of unsigned longs and then
      use the standard non-atomic bit operations rather than the FD_* macros.
      
      This:
      
       (1) Removes the abuses of struct fd_set:
      
           (a) Since we don't want to allocate a full fd_set the vast majority of the
           	 time, we actually, in effect, just allocate a just-big-enough array of
           	 unsigned longs and cast it to an fd_set type - so why bother with the
           	 fd_set at all?
      
           (b) Some places outside of the core fdtable handling code (such as
           	 SELinux) want to look inside the array of unsigned longs hidden inside
           	 the fd_set struct for more efficient iteration over the entire set.
      
       (2) Eliminates the use of FD_*() macros in the kernel completely.
      
       (3) Permits the __FD_*() macros to be deleted entirely where not exposed to
           userspace.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Link: http://lkml.kernel.org/r/20120216174954.23314.48147.stgit@warthog.procyon.org.uk
      
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      1fd36adc
  6. 21 Mar, 2011 1 commit
  7. 13 Jan, 2011 1 commit
  8. 28 Oct, 2010 2 commits
  9. 12 Mar, 2010 1 commit
  10. 06 Mar, 2010 1 commit
  11. 04 Oct, 2009 1 commit
  12. 23 Sep, 2009 1 commit
  13. 16 Aug, 2009 1 commit
  14. 17 Jun, 2009 1 commit
  15. 14 Jan, 2009 3 commits
  16. 13 Jan, 2009 1 commit
  17. 06 Jan, 2009 1 commit
    • Tejun Heo's avatar
      poll: allow f_op->poll to sleep · 5f820f64
      Tejun Heo authored
      
      f_op->poll is the only vfs operation which is not allowed to sleep.  It's
      because poll and select implementation used task state to synchronize
      against wake ups, which doesn't have to be the case anymore as wait/wake
      interface can now use custom wake up functions.  The non-sleep restriction
      can be a bit tricky because ->poll is not called from an atomic context
      and the result of accidentally sleeping in ->poll only shows up as
      temporary busy looping when the timing is right or rather wrong.
      
      This patch converts poll/select to use custom wake up function and use
      separate triggered variable to synchronize against wake up events.  The
      only added overhead is an extra function call during wake up and
      negligible.
      
      This patch removes the one non-sleep exception from vfs locking rules and
      is beneficial to userland filesystem implementations like FUSE, 9p or
      peculiar fs like spufs as it's very difficult for those to implement
      non-sleeping poll method.
      
      While at it, make the following cosmetic changes to make poll.h and
      select.c checkpatch friendly.
      
      * s/type * symbol/type *symbol/		   : three places in poll.h
      * remove blank line before EXPORT_SYMBOL() : two places in select.c
      
      Oleg: spotted missing barrier in poll_schedule_timeout()
      Davide: spotted missing write barrier in pollwake()
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Eric Van Hensbergen <ericvh@gmail.com>
      Cc: Ron Minnich <rminnich@sandia.gov>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Cc: Brad Boyer <flar@allandria.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5f820f64
  18. 26 Oct, 2008 1 commit
  19. 07 Sep, 2008 2 commits
  20. 06 Sep, 2008 3 commits
  21. 22 Jun, 2008 1 commit
  22. 01 May, 2008 2 commits
  23. 30 Apr, 2008 2 commits
  24. 21 Apr, 2008 1 commit
  25. 06 Feb, 2008 1 commit
  26. 19 Oct, 2007 1 commit
  27. 17 Oct, 2007 3 commits
  28. 12 Sep, 2007 1 commit
    • Alexey Dobriyan's avatar
      Fix select on /proc files without ->poll · dd23aae4
      Alexey Dobriyan authored
      Taneli Vähäkangas <vahakang@cs.helsinki.fi> reported that commit
      786d7e16
      
       aka "Fix rmmod/read/write races
      in /proc entries" broke SBCL + SLIME combo.
      
      The old code in do_select() used DEFAULT_POLLMASK, if couldn't find
      ->poll handler.  The new code makes ->poll always there and returns 0 by
      default, which is not correct.  Return DEFAULT_POLLMASK instead.
      
      Steps to reproduce:
      
      	install emacs, SBCL, SLIME
      	emacs
      	M-x slime	in *inferior-lisp* buffer
      	[watch it doing "Connecting to Swank on port X.."]
      
      Please, apply before 2.6.23.
      
      P.S.: why SBCL can't just read(2) /proc/cpuinfo is a mystery.
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Cc: T Taneli Vahakangas <vahakang@cs.helsinki.fi>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dd23aae4
  29. 09 May, 2007 1 commit
  30. 08 May, 2007 1 commit