Motivation
In my work as a data engineer, I work closely together with data scientists. I am always interested in the major part of their work - building a model using machine learning. So I applied for a part-time master’s program in Artificial Intelligence that will start in August this year. Before this program start, I want to make some preparations for it. I decided to self teach some courses. So I spend about one month searching and collecting the resources about machine learning and draw this learning path. I am now following it this path and will try to keep update here.
Who should use this path
As I mentioned above, this path is used by me, can be useful to readers who have a background like me, a software engineer who does not have a machine learning or statistics background. It is also suitable for university students (undergraduate or graduate) learning about machine learning. The requirement of this path is some basic python program skills. For those who want to learn python, the Towards Data Science blog post offered an excellent guide.
Time for this learning path
My plan is to try to use 8 to 9 months to finish this learning path. If anything changed, I would try to update it here. There is no perfect plan. A bad plan is better than no plan.
Mathematics for Machine learning
Mathematics is the most scarred part for me in the pathway. After working for several years, I forget most of the mathematics I learn from college. So the first question is that do I need to learn mathematics again. My answer is yes. The open-source book Mathematics for Machine Learning provides the necessary mathematical skills for machine learning. Just a glance about the first chapter of this book lets you know about how mathematics is used in machine learning. But my plan is not to dive into this book thoroughly, instead, I will try to learn these Python’s famous libraries Numpy, pandas, matplotlib first. If I run into a math problem, I will refer to the book or some other resources on the internet.
The learning path for this part is to try to finish the first four chapters of the Python Data Science Handbook, each chapter in one or two weeks. One of the fascinating parts of this book is that it is writing using Jupyter notebooks. It is so convenient that you can use Google Colab to view and run those notebooks.
- IPython: Beyond Normal Python
- Introduction to NumPy
- Data Manipulation with Pandas
- Visualization with Matplotlib
Just reading the book is not enough. Practicing is the only ways to make you really master these libraries. I found some related quiz notebooks are very helpful. It is recommended to practice these quizzes while reading the book.
- rougier/numpy-100: 100 numpy exercises
- ajcr/100-pandas-puzzles: 100 data puzzles for pandas, ranging from short and simple to super tricky
- Top 50 matplotlib Visualizations
- DunderData/Quizzes: Python, Pandas, and Scikit-Learn Quizzes and Solutions
Microsoft’s Essential Math for Machine Learning: Python Edition on edx.org is one of the MOOC course I want to try after finished learning these libraries.
Machine Learning
There are many MOOC courses about machine learning on the internet. It is good to do some researches about the courses you may want to learn and select the most suitable for you. For me, I mainly deep researched about two courses: Machine Learning by Andrew Ng on Coursera and Machine Learning serials by Prof. Hsuan-Tien Lin:Mathematical Foundations, Algorithmic Foundations, Techniques on Coursera. Andrew’s course is well known. It teaches from very basic concepts to advanced concepts, and it is a very comprehensive course for beginners. but it is taught in octave/Matlab, which may be a little outdated.
Prof. Hsuan-Tien Lin’s courses are more updated and are organized in a when-why-how structure, which I think provides a more insightful view about machine learning. But these courses are taught in Chinese. Maybe it will be a problem for some people. The total length of these course on Coursera are
- Mathematical Foundations: 16 hours in 8 weeks
- Algorithmic Foundations: 13 hours in 8 weeks
- Techniques: 64 hours in 16 weeks
I decided to focus on the Prof. Hsuan-Tien Lin’s courses while using Andrew’s course as a reference.
Some notes about these courses share by others are also helpful, but they are also in Chinese.
Deep Learning
I am not going to include deep learning in this learn path. When I finished this pathway, I will create a particular learning path for deep learning.
Other resources
There are other resources that I found that may help.
- donnemartin/data-science-ipython-notebooks
- yanshengjia/ml-road: Machine Learning Resources, Practice and Research
- immersive linear algebra: The world’s first linear algebra book with fully interactive figures.
Reference
- Beginners Learning Path for Machine Learning - Towards Data Science
- Absolute Beginner’s Guide to Machine Learning and Deep Learning
update on 2020-10-02
I tried a few lessons of Essential Math for Machine Learning: Python Edition. It is too basic. So I quite. I finished the Mathematics for Machine Learning Specialization on Coursera. It contains three courses about linear algebra, multivariate calculus, and PCA. Honestly, the quality of the courses is not bad. But considering you need premium access: $49/mo for that, it is not good enough. First, it is not comprehensive. Second, there is no instructor to answer the question in the forum. For a more comprehensive understanding of all the related mathematics, I will try to quickly go through the open-source book Mathematics for Machine Learning.
I also finished Mathematical Foundations, Algorithmic Foundations Coursera course delivered by Prof. Hsuan-Tien Lin. But I don’t have time to take the Techniques course. Based on the two courses I have finished, I have to say, these are really good courses. Courses are well structured can comprehensive. When I was taking the course, I really felt that I was learning step by step. I will definitely take the Techniques course when I have time.