Eastern Analytics - Blog

How to connect Azure Data Factory (ADF) to Google BigQuery

Written by Scott Pietroski | Oct 1, 2020 1:18:54 AM

Overview:

While working with Azure Data Factory (ADF) there will be times when you need to source data from or copy data to Google BigQuery. For the project at hand we needed to source data from Google BigQuery and copy it to an Azure SQL Database using Azure Data Factory (ADF). The following steps show you how to generate all of the Google BiqQuery OAuth 2.0 objects necessary to link BigQuery to ADF and so you can use an ADF Copy activity/function.

Prerequisites:

1. Google BiqQuery account that contains a project with tables you would like to extract

2. Azure Data Factory (ADF) that you will use to connect to Google BiqQuery

Google BiqQuery – OAuth 2.0 setup, clients, secrets and tokens

To obtain all of the OAuth 2.0 objects required by ADF during creation of a linked service you perform the following steps:

  1. Log into your Google BigQuery console. It can be found here: https://console.cloud.google.com/
  2. Select your project from the drop-down.
Select the project from which you would like to extract data

 

3. In the upper left hand corner select the menu item APIs & Services >> Credentials

Select APIs & Services >> Credentials

 

4. On the screen that follows next go to the top of the screen and select “Create Credentials” >> “OAuth 2.0 Client”.

Select +Create Credentials >> OAuth 2.0 Client

 

5. On the OAUTH2 client definition screen enter the following

A. Application Type: Web Application

B. Client Name: Give the client a name, whatever you want. This name is PERMANENT but will not be see outside this panel.

D. Authorized Redirect URI: Enter the following two URLs –

https://developers.google.com/oauthplayground

http://localhost:8080

 

Careful to make sure both URLs are in the redirect. This is required to get a refresh token for use by ADF.

 

6. Select “Create” to create your Google BigQuery OAuth 2.0 API client!

 

7. Record your Client ID and Client Secret. Remember to store it in a safe place. You will use it in the next couple of steps! Note: You can always access them again. They are stored with the client ID that you created and can be found by selecting the edit/pencil next to the client ID

New OAuth 2.0 Client ID we just created. You can access the details (secret, key, client ID) by selecting the pencil to the right

 

8. Now connect to the Google OAuth 2.0 Playground. The OAuth 2.0 Playground is a Google tool used to create properly formatted HTTP requests and to view the results. We are going to use this tool to obtain a refresh token from Google. The only way to obtain a refresh token is to call the API, and a refresh token is required to connect the Azure Data Factory (ADF) to Google BigQuery, so we are using this tool to generate the refresh token. The Google OAuth 2.0 Playground can be found here https://developers.google.com/oauthplayground

Screen shot of Google OAuth 2.0 Playground

 

9. Once you have reached the OAuth 2.0 playground – select the OAuth 2.0 Configuration button in the upper right corner of the screen, select “User your own OAuth Credentials” and enter in the client information obtained in step 5 above. After entering in the Oauth 2.0 credentials select close on the bottom:

 

10. Next – Expand the authorization list on the left in Step 1 and select (check) the “BigQuery Data Transfer API v1” >> “auth/BigQuery” option as shown below. Next – select the “Authorize APIs” button and enter in your BigQuery user name and password.

Select BigQuery Data Transfer API v1 (make sure it is checked as above), then choose Authorize APIs.

 

11. Next select “Exchange Authorization Code for Tokens”. Notice that the OAuth 2.0 Playground is making HTTP calls to the API in the right hand panel….

 

12. Scroll down in the right-hand Request / Response pane and look at the server response HTML. It contains a refresh token. We will use these to create the connection in Azure Data Factory (ADF). The token also appeared over on the left but is only visible for about 3 seconds. So to find it you will want to look at the response in the Request / Response window.

Refresh token provided by the Google BigQuery OAuth 2.0 Playground

 

13. Save the following information – you will need it while connecting Azure Data Factory (ADF) to Google BiqQuery.

A. Google BigQuery Project ID – Project to extract data from. Selected in Step 2.

B. Client ID – Technical ID of the client – generated in Step 7.

C. Client Secret – Client secret – generated in Step 7.

D. Refresh Token – Refresh token obtained from the HTTP response in Step 11.

 

Azure Data Factory (ADF) – Create Linked Service (connection) to Google BigQuery.



  1. Log into Azure Data Factory (ADF) and select the Manage button on the left hand side.

 

 

2. On the Manage screen left-hand menu under connections select ‘Linked Services’ then ‘New’.

 

 

3. In the New Linked Service box that appears, enter in the Search box “BigQuery” and then select the type when it appears in the search results below.

 

 

4. In the next screen, enter in the information obtained from Google BigQuery then test connection!

Highlighted fields should be set – entering in your own Google BiqQuery OAuth 2.0 credentials where needed.

 

 

Conclusion – Setting up a connection between Google BigQuery and Azure Data Factory (ADF)

At this point in time, OAuth 2.0 is the standard for connecting services via REST APIs. Most services have some variation of the steps required to connect BigQuery and ADF. The Google OAuth 2.0 Playground makes it easy to obtain refresh tokens you just need to know how/where to obtain each authorization component. Hopefully, this helps you thru the steps and made life a little easier. Have a great day!

Eastern Analytics, Inc. is an Analytics consulting company with a focus in Microsoft Azure, Power BI and cloud-based analytics platforms.

 

 

 

Written by Scott Pietroski

Partner / Eastern Analytics, Inc. Solution Architect - Design thru delivery. 25+ Years of Data Warehousing, BI, Analytics, ML Specialties: MS Azure, HANA, Data Engineering, and Machine Learning experience.
 
Scott can be reached at:  Scott.Pietroski@Eastern-Analytics.us