Monday, June 2, 2014

Kaggle tips to avoid pitfalls in Machine Learning

"At Kaggle, we run machine learning projects internally and also crowdsources some projects through open competitions. We’ll cover the gritty details of the most fascinating competitions we’ve hosted to date, from optimizing early stage drug discovery pipelines to algorithmically scoring student-written essays, and explore the methods that won these problems.

After working on hundreds of machine learning projects, we’ve seen many common mistakes that can derail projects and endanger their success. These include:

- Data leakage
- Overfitting
- Poor data quality
- Solving the wrong problem
- Sampling errors
- and many more

In this talk, we will go through the machine learning gremlins in detail, and learn to identify their many disguises. After this talk, you will be prepared to identify the machine learning gremlins in your own work and prevent them from killing a successful project."


sources:
http://strataconf.com/strata2014/public/schedule/detail/32168
https://www.youtube.com/watch?v=tleeC-KlsKA

No comments:

Post a Comment