-
Eric Dumazet authored
During tbench/oprofile sessions, I found that dst_release() was in third position. CPU: Core 2, speed 2999.68 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000 samples % symbol name 483726 9.0185 __copy_user_zeroing_intel 191466 3.5697 __copy_user_intel 185475 3.4580 dst_release 175114 3.2648 ip_queue_xmit 153447 2.8608 tcp_sendmsg 108775 2.0280 tcp_recvmsg 102659 1.9140 sysenter_past_esp 101450 1.8914 tcp_current_mss 95067 1.7724 __copy_from_user_ll 86531 1.6133 tcp_transmit_skb Of course, all CPUS fight on the dst_entry associated with 127.0.0.1 Instead of first checking the refcount value, then decrement it, we use atomic_dec_return() to help CPU to make the right memory transaction (ie getting the cache line in exclusive mode) dst_release() is now at the fifth position, and tbench a litle bit faster ;) CPU: Core 2, speed 3000.1 MHz...
ef711cf1