digital twins / research / teaching

Is Kaggle Suitable to Teach Programming (with Python)?

I plan to include some programming lectures in my summer term course on Technology Management. Because there are so many good tutorials available on the web, I’ll try to integrate existing tutorials from the web into my course. Therefore, I tried two Python Tutorials on the Kaggle platform to get a feeling how they work and how they can be integrated in my course. The two tutorials I worked through are:

What is Kaggle?

Kaggle is an online community of data scientists and machine learners, owned by Google LLC. Kaggle allows users to find and publish data sets, explore and build models in a web-based data-science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges. Kaggle got its start by offering machine learning competitions and now also offers a public data platform, a cloud-based workbench for data science, and short form AI education. On 8 March 2017, Google announced that they were acquiring Kaggle.

https://en.wikipedia.org/wiki/Kaggle

In the next two sections I describe my experience with the two tutorials followed by a conclusion.

Basic Python Tutorial

It was easy to start with the tutorials at Kaggle. After I registered myself at Kaggle I could start right away within the browser. For registration you can use your existing Google or Facebook account or create an account based on your email address. This fast and easy start provides students who are new to programming with a hassle free way to their first ‘Hello, World!’ program (which hopefully could be the start of a wonderful never ending journey of programming).

The tutorials are structured in lessons where reading and exercise lessons are alternating. After each reading lesson students can practice what they learned in an exercise lesson. The reading and exercise lessons are based on Jupyter Notebooks (called ‘Kernels’ at Kaggle).

Overview of Kaggle’s Basic Python Tutorial

The reading lessons are Jupyter Notebooks in read only mode and are heavily based on small example code snippets which are used to show and explain the presented concepts.

The exercise lessons are also based on Jupyter Notebooks. When you start an exercise by clicking on its title in the overview, a copy (fork) of the Jupyter Notebook will be created. Within this copy you are guided through your practice while you write and execute your own code directly in the browser.

At the beginning of each exercise lesson you are instructed to load a Python library which is used to test your code, provide hints and show the solution if necessary. This allows students to get fast feedback, if their code works as supposed or not. If their code does not deliver the correct results they can get instant access to hints. This mechanisms can protect students from getting completely stuck and frustrated. In my opinion programming should be fun. Especially at your first steps when you learn your first programming language you should have more success moments than error messages and stack traces (which you cant’ handle yet).

In the following picture you see a simple task from the first exercise lesson of the Basic Python Tutorial. The task introduces the format for the exercise lessons. The first task is to create a variable ‘color’ and assign the value ‘blue’ . In the screenshot the task was completed correctly as shown by the ‘Correct’ message produced by ‘q0.check()’.

Exercise of basic Python tutorial from Kaggle with correct answer.

The following screenshot shows the same question as above but with a wrong solution provided. The check tells you that your answer is ‘Incorrect’ and gives you some explanation. At the bottom of the screenshot you can see the hint and the solution of the task. The check, hint and solution is provided by the aforementioned library which is loaded at the start of each exercise.

Exercise of Basic Python Tutorial from Kaggle with wrong answer, hint and solution.

Machine Learning Tutorial

The Machine Learning Tutorial has a similar structure as the Basic Python Tutorial including the check, hint, and solution functions. There are also reading and exercise lessons based on Jupyter Notebooks. The tutorial teaches participants how to build a first machine learning model. The use case is to predict house prices based on attributes like number of bedrooms, overall condition, or living area with a trained machine learning model.

The reading and exercise lessons use different data sets. The reading exercises use a data set for house prices in Melbourne and in the exercise lesson you use a data set from Iowa. I like the idea of using different data sets for explanation and practice because students have to transfer their knowledge on a new data set.

At the end of the tutorial you are instructed how to submit predictions of your trained model to a competition where you can compare the prediction quality with other course attendees.

The used libraries in the Machine Learning Tutorial like Pandas and SciKit-Learn are widely used Free/Open Source libraries for Python.

Conclusion

I worked through two tutorials (Python Basic Tutorial and Machine Learning Tutorial) to understand how these tutorials work and how they can be used to teach programing to students with no prior knowledge in terms of programming.

What I liked

  • Complete web browser based programming environment which allows a fast and easy start without any IDE installation trouble for the students.
  • The possibility for students to check their own results and get hints or the solution to avoid getting completely stuck.
  • The tutorials use free software (e.g. Python, Jupyter Notebooks, SciKit Learn, and Pandas). Students are able to use the software on their own computers without the need to pay fees.
  • The tools which the tutorials use, are not specific for Kaggle or academia – they are widely used in practice.
    • I think this could be a motivation for students to work with these tools if they know that the tools are also used to develop real world applications.
    • Through the large communities concerning these tools, students can easily get additional information or help in other channels (e.g. Stack Overflow). In addition the students can learn how to interact with an online community.
  • Within the competition in the Machine Learning Tutorial you can compare your solution to the solutions of other course attendees and look at their code (if they provide reading access).
    • I had a lot of fun working on my model in order to improve my scores at the competition. I think these competitions can motivate students not only to work through the tutorial, but also to deal deeply with the topic.
    • Maybe it is a good idea to create an own InClass Competition for my course.

What should be considered using Kaggle to teach programming

  • Students should clearly understand what Kaggle is and what Kaggle is not.
    • Good Article which gives some hints how to use Kaggle as a beginner and explains the difference between Kaggle Competitions and Data Science: The Beginner’s Guide to Kaggle
    • Students should also understand how Kaggle differs from application development.
  • Students without any prior knowledge may need some additional help to get started. I plan that one of our student workers should work through the tutorial to identify possible difficulties from a students (beginners) perspective.
  • There should be a possibility to ask for help for the students like a Forum or Slack Workspace and time slots in my lecture.
  • The competition should just be a fun part for the students and should not affect grading.

Finally I think that Kaggle can provide a good start to teach programming to beginners if the Kaggle tutorials are embedded within a course at university. Therefore, I will use the Kaggle Tutorials in my course in this Summer Term.

Leave a comment

Your email address will not be published. Required fields are marked *