Smarter ideas worth writing about.

Migrating Google Data to SharePoint Online Using PowerShell

Helping our customers migrate their business information into Office 365’s SharePoint Online is one of Cardinal's core competencies, whether it be from On-Premise SharePoint, Tenant to Tenant migrations, pushing documents from File Shares with metadata mapping, or from any other 3rd party systems. Our many offices are constantly working on these types of projects that involve all sizes of content to help our customers with their moves to the cloud. Many times, these projects are under a tight deadline due to business diversification, support deadlines, or just normal cost cutting measures.

To ensure speediness and accuracy of migrations we always rely on an enterprise-level migration tool that integrates with SharePoint. This ensures that content is always moved timely and in a consistent manner, with complete metadata attachment that can be accomplished in short timelines.

That said, there are times when the current 3rd party storage system does not have any API's (Application Program Interfaces) that vendors want to support anymore, or the tools do not work as advertised. This scenario occurred recently during a large migration from Google Sites to SharePoint Online where only two vendors toolsets were found that purported to support the Google API's, but neither would even connect to the Google Sites, let alone move the content. This meant that the team had no reports to review on types of documents, or the volume of documents that might be stored in the File Cabinets / Attachments, instead relying on visual inspection of content. With over 1,500 sites to walk through, manual tracking of the Sites and their content quickly became untenable.

Once the team realized that neither vendor's toolset would work against the Google Sites API, a decision was made to build custom Windows PowerShell script’s using its vast capabilities to query the Google Sites for their documents and metadata against the Google Data API. Right away we found that we had to discern whether the sites were Classic style from before November 2016 or New Sites, as Google had moved away from the GData API format. Luckily most of the Sites were built using Classic mode so only one API format was found to be needed.

Once we had an extract of Google Site URL's to work from and which API to target, a PowerShell script was created that would query a Site for all of its File Cabinet and Attachments, collate their associated metadata together (File Name, Updated Date, Author, CDN Path), and then write out each files metadata to a CSV for reviewing by the team. This would allow everyone to see the volume of content each Site had, and for later using this CSV output to feed the next PowerShell script that would then pull the file down from the CDN and push it into a target SharePoint Site's Document Library.

To get started with this Script we had to break down connecting to the GData API's into their core requirements, then merge them back together into a unified PowerShell script.

Create a Service Account:

The first step we needed was to create a Service Account that would be our broker for accessing all the data. Navigating to the Google API Dashboard (see References below) and signing in with Google admin credentials that were granted by the Suite Administrator, we start by creating a project that will hold our credentials. Once the project is created on the Dashboard, we need to click on the Credentials left link and the Create Credentials dropdown, and then select the OAuth Client ID choice.

On the creation page, select the Web Application choice, give it a descriptive name, and click on the Create button.

The screen will now reload showing us our credentials. Click on the name to load the details screen and copy the Client ID and Client Secret for the next step.

Authorize the Service Account with the right API:

Now with an account we need to navigate to the Google Developers OAuth 2.0 (see References below) and utilize the Client ID and Secret to generate an authorization token to make API queries with. We do this by in the upper-right gear click on the Use Your Own OAuth Credentials checkbox and then update the OAuth 2.0 Configuration with Client ID/Secret.

Now on the left side in the Step 1 Scopes box, enter the Classic Sites API we need to authorize (see References below), and then click on the Authorize APIs button

An Authorization screen will now appear, select your admin account you used above to create the credentials. The screen will refresh for us to allow the scopes to be managed, so click on the Allow button to finish.

The OAuth page will now show Step 2 - Exchange Authorization Code For Tokens. Click the Auto-Refresh token checkbox and then click the Exchange Authorization Code For Tokens button. This will generate the Access Token we need to query Google’s API's. Finally, copy the Access Token from the textbox to use in our PowerShell scripts.

Query to evaluate the returned data:

With the Access Token captured, we move over to using a web request tool to see how the data is returned from the Google API. I prefer Postman but others such as SoapUI or RunScope.com will work as well. In our tool we just need to put together our Google Sites URL and the headers to pass, which requires the Access Token we gathered before from Google. After posting our GET command we can see the body data returned shows all the individual entries from the query. While there is not a lot of metadata stored in Google Sites around documents, we note the core values we'll use to pull down the file and then after uploading to SharePoint, to set its metadata fields. Core Values:

  • Content - CDN path to the file where it truly lives
  • Updated - Modified Date
  • Title - Nice name of the file for displaying
  • Author - Modified By

Query and export to CSV the API data:

Now that we know how the data is formatted from the API, we create our PowerShell script to query the Google Site for its documents to log into a CSV. Using the code below I'll call out the line numbers of the important processing.

  1. Starting with lines 5 - 9 we intake the parameters needed to fire the query and output the values to a CSV. This also makes it modular so it can be called from other PowerShell scripts or Scheduled Tasks
  2. From lines 12 - 19 we setup our API query and the headers needed to authenticate
  3. From lines 21 - 85 we create a Get-Results function of our core code so that we can use it recursively as Google allows FileCabinet pages to be nested within each other, hence the need to call itself again
    • On line 23 we call PowerShell’s Invoke-RestMethod using our setup values from above which ensures a secured query and that the data returned will be in JSON format (JavaScript Object Notation) for object processing, into a variable called $jsOut
    • Now beginning on line 25 - 77, we pass the $jsOut variable to the inherent ForEach method which will allow us to loop through each entry node found in the JSON
      • On lines 26 - 65, we setup all our row variables and do finessing of the data to make sure it conforms for using it later to pull down these documents. The core pieces are the $tmpTitle variable that is the nice name of the file and $tmpContSrc variable which is the Google CDN path of the file to download from
      • On line 67 if the content source URL is not empty from Google, we create a CSV formatted row of all the variables, and then use the inherent Add-Content method to append the row into the file output parameter passed in.
      • If the content source URL variable is empty, then on line 72 we fail into a check to see if this is a FileCabinet link, which if so we recursively call our Get-Results function with the new 'parent' URL to go do all the logic checking on this new child FileCabinet.
  4. The final piece on line 87 is to start calling the Get-Results function, sending in the parameter URL to get things started

The image below shows the CSV file output from the script with our header row of columns, and then all the subsequent rows of comma-separated values in the same order we'll use for migrating the documents later. The highlighted selections show the file name we'll upload, the last Modified Date, the user to set as the Modified By (if they still exist), and the Google CDN URL which is required later to bring the file down for migrating.

Migrate the documents to SharePoint Online:

With our Google Site documents pushed out into a CSV file we can now use that file to feed a download / upload script that will pull the files down locally from Google, and then push them to a targeted SharePoint Document Library. The script below will show just the core download / upload script, it would normally be called by a master script that used a ForEach through all the rows of our above CSV, passing in the variables we want it to evaluate.

Again, using the code below I'll call out the line numbers of the important processing.

  1. Starting with lines 4 - 17 we intake the parameters needed to fire the script and do the necessary processing. While this seems like quite a few parameters, this makes it modular so it can be called from other PowerShell scripts or Scheduled Tasks, as well as at least set the core file metadata
  2. On line 20, we use the Import-Module command to ensure the SharePoint Online PowerShell cmdlets are imported, in case the script is not being run inside there
  3. From lines 29 - 31 we instantiate the System.Net.WebClient object and headers with our Google Authorization Token to force a copy of the file down to our temporary folder location. We need a digital copy of the file for it to be uploaded to SharePoint Online
  4. Now on lines 36 - 43 we setup our SharePoint objects and get a reference to the Site and Library / SubFolder where we want to upload the file to
  5. Then on lines 50 - 54 we instantiate an IO.FileStream object that is hooked to our locally downloaded file, then we create a SharePoint FileCreationInformation object with our FileStream object referenced
  6. Line 55 checks to see if a SubFolder path was passed into it
    • Since a SubFolder path was passed in
      • On lines 57 - 67 we test if the passed in Library / SubFolder exists to get a reference to the location, then use that reference to upload the file to that reference object
    • On line 70 no folder was passed in so we upload to the root of the Library
  7. Then on lines 72 - 73 we add our file upload to the SharePoint Context and have it fire the asynchronous ExecuteQuery() function to push the file up
  8. Now that the file is uploaded, we have a reference so we can update its metadata. Using lines 76 - 86 we use a Try-Catch to test if the user still is part of the company and if so, sets up an Office 365 user id string to set
  9. Finally, on lines 89 - 96 we obtain a reference to all the files columns, set the values for the Title, Modified, & Editor fields and then fire an update on the reference using a final asynchronous ExecuteQuery()

While it seems like a large, ungainly amount of work, with all this together we now have a fully-functional process for querying Google Sites of all their documents, and then to migrate them over into SharePoint Online. The biggest hurdle to overcome was Google's authentication needs and how to get a listing of all the Google Site's file locations so they could be programmatically migrated, including figuring out that recursion was needed. But that's the type of innovative approaches Cardinal Solutions can help your business with. Just let us know how we can help your move to the Cloud.


References

Share:

About The Author

Business Productivity Managing Consultant
Craig is a Managing Consultant for Cardinal Solutions Charlotte office.  He leads client initiatives on Microsoft platforms to help solve business problems by envisioning and architecting Office 365, SharePoint, .NET, and JavaScript/jQuery solutions. Craig has worked with Microsoft development and server technology's for the last 15 years, including integrating SharePoint since it was a beta product.