In the book, it says
typically, you’d use either
l1_penalty, but not both in the same training session.
However, the code snippet to train the model both passes arguments to
In addition, in the question part, it’s even suggesting to try out different values for
What is the effect in the training of model, if we pass values for
The penalty is something that is added to the loss in order to reduce the magnitude of the weights or learned parameters. This stops the model from overfitting because putting a constraint on the size of the weights makes it harder for the model to memorize specific things (it cannot pick whatever weights it feels like).
The reason you’d normally only use one of them is that they more-or-less do the same thing so you don’t need both.
The L1 penalty is also known as “lasso” regression; the L2 penalty is also known as “ridge” regression. You may come across those terms in the machine learning literature.
There is also something called “elastic net”, which actually combines the L1 and L2 penalty. So it’s not unheard of to use both at the same time.
With neural networks, we usually just use L2, also known as “weight decay”.