Mining Massive Datasets

I've downloaded the book «Mining of Massive Datasets», a free book on data mining and machine learning. You can download the book too in:


I'm really enjoying studying this book because it will not try to show you the most simple techniques, as does for example Collective Intelligence, but concentrate in real techniques used by real companies. The book is all the time telling war histories of real projects, which is a refreshing thing. This field can be very harsh and theorical and it's nice to remember why Data Mining is one of the hottest disciples now.

As part of my study of this book I'm going to write a series of posts explaining (for you and more important... for me!) some of the most interesting concepts.

As an example of the utility of these techniques the authors write the following:

Cambridge Press does, however, retain copyright on the work, and we expect that you will acknowledge our authorship if you republish parts or all of it. We are sorry to have to mention this point, but we have evidence that other items we have published on the Web have been appropriated and republished under other names. It is easy to detect such misuse, by the way, as you will learn in Chapter 3.

Yes, to stole Intellectual Property from Data Mining experts can be a dangerous business 😉

