2019年4月9日 星期二

[python, sheet] 最簡單的方式存取 Google sheet

How to access google spreadsheet
井民全, Jing, mqjing@gmail.com
這文件教你如何用 Python 存取 Google spreadsheet 文件.


首先, 你得建立一個專門用來存取 Google Drive 與 Google Sheets 的線上服務.
我們的 python code 將透過這個 Google API 服務,
來存取使用者的 spreadsheet 文件.

1. Setup Google API

1.1. Create a Google Cloud project (go)

Step 1: Create a Project
[Select a Project] -> [New Project]


Step 2: Select [Drive] and [Sheets] API
[Google APIs] -> [Library]  or [Enable APIs and Serivces]


1.2. Get credentials for your services

不是阿貓阿狗可以敲門進來使用這個服務, 必須要擁有 credential 的app 才可以使用 (secret json).
我們希望當擁有合法授權的 app 進行存取使用者文件時, 使用 OAuth 的方式來獲得使用者同意.
所以, 我們設定 OAuth 型態的 credential.
Step 1: Setup OAuth Credential
[Create Credentials] -> On the left side: [Credentials]: [OAuth consent screen]
Project Name


Step 2: Create a OAuth client ID for generate access token.
Command:
(a) [Create credientials] -> [OAuth clinet ID]: Other, and setup the name as  climb-auto-gsheet
(b) then, Download the secret json


2. Create a sheet to work

3. Python Code

3.1. Code

Gsheet_test.py
(The code was modified from ref[1])
from __future__ import print_function
from apiclient.discovery import build
from httplib2 import Http
from oauth2client import file, client, tools
import pandas as pd



SPREADSHEET_ID = '1ZsGOUlpYP************TU0' # 指定 sheet ID, 你可以從 URL 得到 ID.
RANGE_NAME = 'Sheet1' # 指定 worksheet 名稱


def get_google_sheet(spreadsheet_id, range_name):
   # 設定 我們的 App 會讀取 user 的 google spreadsheet
   scopes = 'https://www.googleapis.com/auth/spreadsheets.readonly'
   # Setup the Sheets API
   store = file.Storage('credentials.json')
   creds = store.get()
   if not creds or creds.invalid:
       flow = client.flow_from_clientsecrets('client_secret.json', scopes)
       creds = tools.run_flow(flow, store)
   service = build('sheets', 'v4', http=creds.authorize(Http()))

   # Call the Sheets API
   gsheet = service.spreadsheets().values().get(spreadsheetId=spreadsheet_id, range=range_name).execute()
   return gsheet


def gsheet2df(gsheet):
   """ Converts Google sheet data to a Pandas DataFrame.
   Note: This script assumes that your data contains a header file on the first row!
   Also note that the Google API returns 'none' from empty cells - in order for the code
   below to work, you'll need to make sure your sheet doesn't contain empty cells,
   or update the code to account for such instances.
   """
   header = gsheet.get('values', [])[0]   # Assumes first line is header!
   values = gsheet.get('values', [])[1:]  # Everything else is data.
   if not values:
       print('No data found.')
   else:
       all_data = []
       for col_id, col_name in enumerate(header):
           column_data = []
           for row in values:
               column_data.append(row[col_id])
           ds = pd.Series(data=column_data, name=col_name)
           all_data.append(ds)
       df = pd.concat(all_data, axis=1)
       return df


gsheet = get_google_sheet(SPREADSHEET_ID, RANGE_NAME)
print('gsheet = ', gsheet)
df = gsheet2df(gsheet)
print('Dataframe size = ', df.shape)
print(df.head())



3.2. Run

python gsheet_test.py


使用者會被提醒, 是否同意我們的 python 程式存取 google sheet


3.3. Result

4. Reference



5. Further Reading