Since the activation function is added to add nonlinear factors, why can you use ReLu?

1, max is linear in the range greater than 0, can it still play the role of adding non-linear elements?
2. Why is leaky ReLU not more common than ReLu as an advanced version of ReLU?


Although

ReLu is linear in the interval greater than 0 and linear in the part less than or equal to 0, it is not linear as a whole, because it is not a straight line .

the combination of multiple linear operations is also a linear operation. Without nonlinear activation, there is only one hyperplane to divide the space. But ReLu is nonlinear, the effect is similar to dividing and folding space, combining multiple (linear operation + ReLu) can divide space arbitrarily.

clipboard.png

many improved versions of ReLU:: leaky relu, prelu, elu, crelu. Each has its own effect and performance, which is not as common as relu:

  1. may not be effective, for example, if you do not encounter dying relu, it is not necessary to use leaky relu, and it may not be better.
  2. is more complex and its performance is not as good as relu

reference:

  1. How Do Neural Networks Work?

agrees with the answer upstairs, and add one more point:
explanation of linear nonlinearity: in mathematics, a function is a linear function, so the function is a straight line; and all the remaining cases are nonlinear functions.
according to the above definition, a broken line is not a linear function, and Relu is a kind of broken line, so it is nonlinear.


linearity means that the highest power of all variables is 1, for example, 3x + 5y is linear, 3x ^ 2 + 5y is not linear

Menu