Date limite de participation :
20 juin 2015

Data Science Game - Part 2

Welcome to the 1st edition of the Data Science Game!

To ensure fairness in the contest it is strictly prohibited to ask for help outside of your team: no email, no access to your working platform, and so on… Failure to comply with these rules might lead to the immediate elimination of your team.
For the 2nd stage of the competition you will have to classify YouTube videos into 15 categories. Each video on Youtube is associated with a category. Your task is to guess the relevant category of videos using available information such as the title of the video, its description, its duration, the date, and so on…

The training set consists of a csv file of around 240.000 videos. Each video is described by 15 predictor variables of various types (details in the joint document) and the category as the response variable. You are asked to use this training set to design methods and algorithms allowing you to guess the relevant category. 

Be advised that the category of a video is chosen by the uploader and might not be the most relevant one. This means that for some videos, even though your guess might be accurate, the uploader's choice might not and this will result in the prediction being evaluated as incorrect. Welcome to the real world.

The efficiency of your methods will be assessed using a test set: the score for each submission is defined as the rate of correct guesses. The number of submissions is not limited.

Live temporary rankings will be provided during the competition. Each team needs to provide a minimum of submissions, as described in the following table.

before 4 pm
after 4 pm and before 10 pm
before 8 am
after 8 am and before 2pm
The temporary ranking will be closed on the 21st of June at 5pm French time.

The final ranking will be established with only one submission, on a final test set available from 5pm to 5.45pm on Sunday, and whose results will be available from 6pm.
For the competition each team will be allowed to use the computational resources of their choosing. Powerful laptops equipped with the software you are used to should be sufficient.

Nevertheless, in order to ensure fairness in the competition, each team can use its access to the Cloud Platform of Google ( that was made available to them during the friendly phase of the competition.

Good luck!

You must submit a file in CSV format, using semi-column as separator, with 1 header and 2 columns:

  • Column 1: line Id
  • Column 2: category id

File example:


The score will be the rate of good answers.

Rang Pseudo Score
1.MSU75,770 %
2.Sapienza73,801 %
3.Telecominers -73,795 %
4.Nedap Hinton Hyenas73,780 %
5.Designated Neurons73,686 %

1. Dmitry Ulyanov 112 contributions 21/06/15 14:25 Score 75,45016%
2. Telecominers - 21 contributions 21/06/15 16:02 Score 73,91034%
3. Sapienza 71 contributions 21/06/15 16:46 Score 73,72380%
4. Designated Neurons 31 contributions 21/06/15 16:28 Score 73,60549%
5. MADatascience BRO 93 contributions 21/06/15 16:48 Score 73,43104%
6. ctech IITD 46 contributions 21/06/15 13:41 Score 73,34295%
7. Full Metal 27 contributions 21/06/15 15:36 Score 73,30322%
8. Nedap Hinton Hyenas 70 contributions 21/06/15 16:48 Score 73,25313%
9. Poly Unicorns 81 contributions 21/06/15 13:40 Score 72,97160%
10. Breaking Data 25 contributions 21/06/15 15:45 Score 71,50000%
11. Cocotte Data 94 contributions 21/06/15 15:04 Score 66,98591%
12. GMM INSA 47 contributions 21/06/15 16:00 Score 66,83392%
13. TSE 48 contributions 21/06/15 16:54 Score 66,72079%
14. dsg eNStA 32 contributions 21/06/15 16:31 Score 60,36893%
15. Ayrbus-Fishmanati Ayrbus-Fishmanati 23 contributions 21/06/15 16:49 Score 59,68236%
16. A Giant Mind 28 contributions 21/06/15 16:20 Score 57,00863%
17. MInd overflow 26 contributions 21/06/15 15:13 Score 55,79439%
18. UniMAnalytics Benjamin Schäfer 14 contributions 21/06/15 12:05 Score 42,11135%
19. SID Team 8 contributions 21/06/15 13:33 Score 37,24750%
20. Insight UCD 49 contributions 21/06/15 16:16 Score 33,45453%
