A Quick Guide for Machine Learning

Purpose

這份文件描述如何在最短時間內進入 Machine Learning 的領域. 我還是初學者, 我也在學習, 所以請專家多多包涵, 這是一篇遺漏很多的指南.

備忘錄

這份 quick guide 主要是以 Google Tensorflow 為主, 但 Machine Learning Framework 不只有 Google 的 tensorflow, 還有 Facebook AI research Group 開發的 Torch, 再來是微軟研究院的 CNTK 和 Caffe,. 另外, 還有一個重要的網站 OpenAI 必須瀏覽, 這是由 Elon Musk 投資的重要非營利人工智慧研究公司. 裡面有一個公開的人工智慧研究開發平台 OpenAI Gym, 這裡面有許多 toolkit 以及非常有價值的文件, 可提供學習.

文件大綱與目錄

Purpose 1

備忘錄 1

你必須會最簡單的 Python 2

選擇 Machine Learning framework 2

安裝 AI 開發工具 3

直接開始寫數字辨識 tensorflow code 4

機器學習的理論 4

雜項 6

Mindset 6

Resource/Competition 6

Paper 7

Database 7

Image Processing 9

你必須會最簡單的 Python

網路上有一大堆 Python 的教學文件, 你隨便搜尋都可以找到. 以 Machine Learning 應用來說, 最常使用的還是影像處理. 你可以參考我寫的 Python + OpenCV 的文章, 可以在 10 分鐘內建立 python + OpenCV 開發環境

[install, opencv] How to install Anaconda with OpenCV (view)
[main] Python FAQ Page (view), 裡面包含基本 python 語法, 陣列操作, DSP, 影像處理的範例, 有圖, 有 code.

選擇 Machine Learning framework

其實現在最熱門的就是 Google 的 TensorFlow, 我們從 GitHub 的統計資料 (見下圖) 也可以看得出來. 使用 TensorFlow 來開發的專案數量幾乎是第二名 Torch 的兩倍, 這表示如果你遇到問題時, 可能比較有機會能獲得解答.

Fig. GitHub 上使用 Machine Learning Framework 的專案統計數量.

安裝 AI 開發工具

首先你要先知道, 用什麼工具來實現你的 AI 應用程式. 最簡單的方式就是使用 Google 的 tensorflow 搭配 Anaconda Pythone 開發環境. 你可以用下面我寫的教學文件在 20 分鐘內, 不需要動頭腦思考的方式, 完成安裝開發環境的工作.

[install, Anaconda] How to install TensorFlow using Anaconda (view)

直接開始寫數字辨識 tensorflow code

我的習慣是先看到 code, 然後我照著做, 它會動. 那麼我才會相信這東西可以用. 你可以按照我寫的文件, 一步一步在兩個小時內完成所有 tensorflow 初學者第一個會寫的程式. 完成後, 你會建立無比的信心, 繼續往更近一步的理論前進.

[dense, digital] *** MNIST for ML Beginners -- Digital number recognition -- 手寫辨識 (view) -- 92.14%

[cnn, digital] A Guide to TF Layers: Building a Convolutional Neural Network (view) (ref) -- 97.3%

機器學習的理論

在學會怎麼使用 tensorflow 解決手寫數字的問題後, 你應該要針對機器學習的理論開始研究, 下面我列了幾個不錯的網站, 內容從簡單開始.

[user net] Usenet newsgroup comp.ai.neural-nets FAQ (ref), 裡面有很多的名詞解釋, 對你看不懂的名詞查詢非常有用
[ts, tutorial] Tensorflow 深度學習講義 (ref), 講的是基本 neural network 與程式寫法

[ts, tutorial] Tensorflow 深度學習快速上手班 (ref), 講的是基本參數選擇

[ts, tutorial] AI 從頭學 (ref)
[ts, tutorial] Google Deep Learning Course (ref)

[ts, tutorial] The TensorFlow tutorial (ref)

[openai] WILDML, Artificial Intelligence, Deep Learning, and NLP (ref)

OpenAI Gym, (ref)

Mathematicalmonk 製作的一系列的 (ML 1.1) Machine learning 教學影片, 超棒 (ref)

[nn] Information Theory*** (ref)

[[nn] Neural Networks and Deep Learning (ref)

Bay Area Deep Learning School Day 1 at CEMEX auditorium, Stanford (ref)
Bay Area Deep Learning School Day 2 at CEMEX auditorium, Stanford (ref)

雜項

Mindset

[ml, feature] use “learning” algorithms which aim to learn the features, without any assumptions
[ml, feature, classifer] 人類仔細挑選 "good" feature, 使用簡單的 classifer 分類. 機器挑選 feature (甚至使用原資料), 使用非常複雜的 classifer
[ml, description] 機器學習是找規則, 進行預測
[ml, description] 從資料出發, 要算出 g, 接近真實 f
[ml, condition] 機器學習的必要條件 (view)
[ml, hypothesis set] 從一堆 hypothesis 裡面 (h1, h2,... hn) 的參數調整, 產生結果
[data mining] 找出有趣的東西

Resource/Competition

大數據競賽平台, https://www.kaggle.com/

數據科學家可在其上進行競賽，提供潛在的解決方案. 涉及領域涵蓋了計算機科學、計算機視覺、生物、醫藥、甚至冰川學等等. 每個月Kaggle論壇有超過4,000條新帖子，每天Kaggle比賽有超過3,500次提交.

介紹一, https://technews.tw/2017/03/10/kaggle-joins-google-cloud/
介紹二, https://kknews.cc/zh-tw/tech/vg4zn4.html

國際知識發現和數據挖掘競賽, http://www.kdd.org/kdd-cup

介紹ㄧ, http://www.baike.com/wiki/%E5%9B%BD%E9%99%85%E7%9F%A5%E8%AF%86%E5%8F%91%E7%8E%B0%E5%92%8C%E6%95%B0%E6%8D%AE%E6%8C%96%E6%8E%98%E7%AB%9E%E8%B5%9B

國內人工智慧協會比賽, https://taai-test.herokuapp.com/award/winner
Facebook AI Researc, https://research.fb.com/category/facebook-ai-research-fair/

Paper

機器學習和深度學習引用量最高的20篇論文（2014-2017） (ref)

Database

GooleNet (ref)

Paper (ref)

ImageNet (ref)

[har] Human Activity Recognition Database, WISDM (ref)

Activity: jogging, walking, ascending stairs, descending stairs, sitting and standing
36 users using a smartphone in their pocket with the 20Hz sampling rate (20 values per second)

[32x32, 60k, 10 classes] The CIFAR-10 dataset (ref)

The dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

[28x28, 70k, 250 writers] The MNIST database of handwritten digits (ref)

The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image