Actor picture

Google Sheets Import & Export

lukaskrivka/google-sheets

Import data from datasets or JSON files to Google Sheets. Programmatically process data in Sheets. Easier and faster than the official Google Sheets API and perfect for importing data from scraping.

No credit card required

Author's avatarLukáš Křivka
  • Modified
  • Users497
  • Runs1,186,039
Actor picture
Google Sheets Import & Export

Features

The Google Sheets Import & Export Data tool is an easier alternative to the official Google Sheets API.

It can be used to automate the import and export of data across different spreadsheets or from resulting datasets from actors running in the Apify platform and raw JSON files.

  • From Apify storage - To upload data from datasets generated by other tools and stored in the Apify platform, simply provide an id of the dataset.

  • In raw JSON format - Use this actor as a standalone API to import data to your spreadsheet.

Both options behave in the same way regarding the available input fields(e.g. modes, transformFunction, deduplication, etc.)

Tutorial

If you would like a step-by-step tutorial on how to use the Google Sheets Data Import & Export, read our blog post on how to import data from Google Sheets.

You can use this actor with any programming language (e.g. JavaScript, Python, PHP) by calling Apify API.

If you want to fully understand the inner workings of this actor, visit the Google Sheets API referrence page.

Cost of Usage

The Google Sheets Data Import & Export tool is free to use. However, if you are running this actor on the Apify platform, the resources you consume while using the actor (e.g. proxies and data storage, will be deducted from your platform credits in your Apify account.

Note that free accounts are entitled to 5 dollars of platform credits every month. So, as long as your resources consumption stays within this limit, this tool remains completely free. Otherwise, if you need to scale your activities on the platform, consider upgrading your plan.

Authentication and authorization

If you are using the Google Sheets Import & Export Data tool for the first time, you have to allow Apify to work with the spreadsheets in the same Google account where your target spreadsheet is located.

To do that, please check this article for a step-by-step guide on how to set up a Google integration and enable Apify to access your spreadsheets.

After you authorize for the first time, the authentication tokens will be stored in your key-value store* (option tokensStore which is by default google-oauth-tokens) and you don't need to authorize again. So, you can fully automate your workflow after your first usage.

Note: To integrate multiple Google accounts to a single Apify account, each Google account will need its own tokensStore . So, remember to properly name each store, so you can efficiently track which tokens belong to which account.

If you don't want to use the old tokens, you can also always delete the tokensStore (default google-oauth-tokens) from your Storage -> Key Value Stores tab in the Apify Console and authorize again in the next run.

Public spreadsheet (no authorization)

If you only want to use this tool's read function and don't want to bother with authorization, you can publish your spreadsheet and make it public. This enables you to use this tool without the need to provide authentication tokens. To run this actor for a public spreadsheet, just check the Public spreadsheet (read-only) option, which is the equivalent of passing publicSpreadsheet: true to the input options.

To limit the possibility of abuse of our Google API, the "read public spreadsheet" functionality, is only available inside the Apify platform using the secret environment variable.

If you want to read public spreadsheets locally, you have to create your own project in Google console and pass an API_KEY environment variable to your actor process. Example: API_KEY=AIzaSyAPijSDFsdfSSf3kvGVsdfsdfsdsdnAVbcZb5Y apify run -p (replace with a real API key from Google console)

Run the code outside Apify Platform

You can run Google Sheets Import & Export Data locally on your machine. To do that, you simply need to access the actor's source code on its github page and clone it to your computer.

The official tool (running on the Apify Platform) relies on the CLIENT_SECRET environment variable being set. This ensures that official API integration is used. If you want to use this actor locally or copy the source code, you will need to create your own project in Google Developer Console, create your own credentials, and pass them correctly to the googleCredentials input variable. You can find further instructions in the Apify Google Auth library.

However, to ensure the best possible performance, consider always running it on the Apify platform.

Input

Most Apify actors require a JSON input and this one is no exception. The input consists of one object with multiple options:

  • options<object>
    • mode <string> Any of replace, append, modify, read, load backup. Function details explained in the section below. Required
    • spreadsheetId <string> Id of your spreadsheet. It is the long hash in your spreadsheet URL. Required
    • publicSpreadsheet <boolean> If true, you don't need to authorize. You have to publish your spreadsheet and it works only in read mode. More in authorization.
    • datasetId <string> Id of the dataset where the data you want to import are located. This option or rawData is mandatory for replace and append modes and cannot be used in other modes.
    • rawData <array> Array of raw JSON data. Can be either in table format (array of arrays) or in the usual dataset format (array of objects). Objects can be nested, arrays not. Raw data cannot exceed 9MB.This option or datasetId is mandatory for replace and append modes and cannot be used in other modes.
    • backupStore <string> Id of the store where the previous backup was saved. It is the id of the default key-value store of the run from which you want to load the backup. This option is mandatory for "load backup" mode and not usable in other modes.
    • limit <number> Defines how many items (rows) you want to import. Default: 250000.
    • offset <number> Defines how many items you want to skip from the beginning. Default: 0.
    • range <string> Defines which part of your spreadsheet will be impacted by the actor. It is specified in A1 notation. Usually, you want to just add a sheet name. Default: Name of the first sheet in the spreadsheet.
    • columnsOrder <array> Array of keys. First sorts the columns by provided keys. The rest is sorted alphabetically. Default: Alphabetical sorting
    • keepSheetColumnOrder <boolean> If true, keeps the order of columns as they are in the sheet. If there is no sheet data yet, this does nothing. Default: false
    • tokensStore <string> Defines in which key-value store authorization tokens are stored. This applies to both where they are initially stored and where they are loaded from on each subsequent run. Default: "google-oauth-tokens".
    • deduplicateByEquality <boolean> If true, only unique items(rows) are imported. Items are unique between each other if any of their fields are not equal (deep equality). Only one of deduplicateByEquality, deduplicateByField and transformFunction can be specified! Default: false.
    • deduplicateByField <string> Similar to deduplicateByEquality but uniqueness is checked only by the one specified field which means the rest of the fields maybe different but the item will still not be imported. Only one of deduplicateByEquality, deduplicateByField and transformFunction can be specified! Default: null.
    • transformFunction <string> Custom function that can filter or modify the items in any way. It's requirements and behaviour differs for each mode. Only one of deduplicateByEquality, deduplicateByField and transformFunction can be specified! Default: null
    • createBackup <boolean> If true then after obtaining the data from the spreadsheet and before any manipulation, data are stored into the default key-value store under the key backup. Can be loaded in the future run using load backup mode. Useful when you are not sure what you are doing and have valuable data in the spreadsheet already. Default: false.
    • googleCredentials <object> Only pass this for usage of outside the official actor (local or elsewhere). This should contain installed type of credentials and you need to pass it as an object with these fields { client_id, client_secret, redirect_uri }.

Mode Options

This actor can be run in multiple different modes. Each run requires only one specific "mode" to be selected. The chosen mode will also affect the functionality of the remaining input options other options work (details are explained in the specific options).

  • replace: Replaces the old data in a sheet, with the newly imported data.

  • append: Adds new data as additional rows below the old rows already present in the sheet. Keep in mind that the columns are recalculated so some of them may move to different cells if new columns are added in the middle.

  • modify: This mode does not import anything. It only loads the data from your sheets and applies any of the processing settings defined in the input options.

  • read: Loads the data from the spreadsheet. Optionally, it can also process the data and save it as an 'OUTPUT' JSON file to the default key-value store.

  • load backup: Loads any backup rows from previous runs (look at the backup option for details) and imports it to a sheet in the replace mode.

Limits and actor logic

This actor has a maximum limit of 100 runs (imports) per 100 seconds. If this limit is exceeded, the actor run will fail and no data will be imported.

Important! - The maximum number of cells in the whole spreadsheet is 2 million! If the actor would ever need to import data that would exceed this limit, it will just throw an error, finish and not import anything. In this case, use more spreadsheets.

Important! - Regardless of the mode you choose, the tool recalculates the position of the data in the sheet. All cells are updated to reflect the specified changes and any exceeding rows and columns are trimmed off.

Google Sheets value parser

The actor parsing follows the default Google Sheets parsing.

Therefore, depending on the configuration of your system, constructions such as "1.1" and "1,1" can be interpreted either as a number or a string (text). For this reason, it is recommended that you always use valid JSON numbers (e.g. 1.1).

Loading from Spreadsheet

With read mode, you can use this actor to load data from your spreadsheet into the actor.

From an actor, Puppeteer Scraper, Cheerio Scraper or any Node.js program you can use the Apify.call or Apify.callTask (if you want to preset your other input) functions.

const sheetsInput = {
    mode: 'read',
    spreadsheetId: '1anU4EeWKxHEj2mAnB0tA2xGnkTdqXBSB76a7-FRLytr', // update to your ID
};
const myData = await Apify.call('lukaskrivka/google-sheets', sheetsInput);

When calling from other programming languages, from a browser or from Web Scraper you have to use regular run-sync API. Here is an example from browser/Web Scraper using native browser fetch call.

const runUrl = `https://api.apify.com/v2/acts/lukaskrivka~google-sheets/run-sync?token=<YOUR_API_TOKEN>`
const sheetsInput = {
    mode: 'read',
    spreadsheetId: '1anU4EeWKxHEj2mAnB0tA2xGnkTdqXBSB76a7-FRLytr', // update to your ID
};
const fetchOptions = {
    body: JSON.stringify(sheetsInput),
    headers: { 'Content-Type': 'application/json' },
};
const myData = await fetch(runUrl, fetchOptions).then((response) => response.json());

myData has a row-object format which means it is an array where each row is represented by an object of fields and values:

[
    { name: 'Alan', surname: 'Turing' },
    { name: 'Steve', surname: 'Jobs'},
    // ...
]

Webhooks

Webhooks are the way in which apps communicate with each other. Consequently, webhooks are particularly useful when you need to automatically push data from one app to another.

How to use webhooks with the Google Sheets actor?

Very often you need to update the contents of a spreadsheet after every run of your scraping/automation actors. Luckily, you can use webhooks to easily automate this workflow.

Note: The datasetId is automatically passed to the Google Sheets run. So, you don't need to manually set it up in the payload template.

The webhook from your scraping/automation run can either call the Google Sheets actor directly or as a task. If you call the actor directly, you have to fill up the payload template with appropriate input and add this as a URL: https://api.apify.com/v2/acts/lukaskrivka~google-spreadsheet/runs?token=<YOUR_API_TOKEN>

Usually it is more convenient to create a task with predefined input that will not change in every run - the only changing part is usually datasetId. You will not need to fill up the payload template and your webhook URL will then look like: https://api.apify.com/v2/actor-tasks/<YOUR-TASK-ID>/runs?token=<YOUR_API_TOKEN>

Don't forget that for the first time you need to run this actor manually so you properly authorize and authenticate.

Deduplication options and transform function

If you need to only import unique items or add custom functionalities to the actor, you can use one of the following options: deduplicateByField, deduplicateByEquality or transformFunction.

  • deduplicateByEquality: Only unique items(rows) are kept in the data. If two items have all fields the same, their are considered duplicates and are removed from the data.

  • deduplicateByField: Similar to deduplicateByEquality but the uniqueness of items is compared only with one field. So if one item has a certain value in this field, all other items with this value are considered duplicates and are removed from the data.

    • append: Old and new data is put together and checked for duplicates. Only the first item is kept if duplicates are found.
    • replace: Works like append but cares only about new data.
    • modify: Works like replace but cares only about old data.
    • read: Works the same as modify

Transform function

The transformFunction option enables you to add complex data filtering logics and gives you flexibility to handle your data.

In order to use it, you should provide a stringified JavaScript function that will get the data as parameters and then return the transformed data.

The data format is similar to the JSON format of the datasets. However, all the nested objects (objects and arrays) are flattened. This means that it is basically an array of objects (items) with flattened fields, which we can call as a row-object format (see example below).

[{
    "sku": "234234234",
    "country": "US",
    "price": 20.99,
    "sizes/0": "S",
    "sizes/1": "M",
    "sizes/2": "L",
    "sizes/3": "XL"
},
{
    "sku": "123123123",
    "country": "UK",
    "price": 48.49,
    "sizes/0": "M",
    "sizes/2": "XL"
}]

The function should always return an array in the row-object format which is what will be first converted to rows format and then imported to the sheet. The parameters differ based on the mode:

  • append: The function will receive an object with spreadsheetData and datasetData properties as a parameter. spreadsheetData is row-object is an array from the data you already have in the spreadsheet. datasetData is a row-object array of the items from dataset.

  • replace: The function will receive an object with datasetData properties as a parameter. It is a row-object array of the items from the dataset.

  • modify: The function will receive an object with spreadsheetData properties as a parameter. It is a row-object array from the data you already have in the spreadsheet.

  • read: Works the same as modify.

Use-case example: Let's consider a situation where we need to always retrieve the cheapest product for each country, using the append mode.

({ datasetData, spreadsheetData }) => {
    // First we put the data together into one array
    const allData = datasetData.concat(spreadsheetData);

    // We define an object that will hold a state about which item is the cheapest for each country
    const stateObject = {};

    // Now let's loop over the data and update the object
    allData.forEach((item) => {
        // If the item doesn't have price or country field, we will throw it away
        if (!item.price || !item.country) return;

        // If the state doesn't hold the country, we will add the first item there to hold the current position of cheapest item
        if (!stateObject[item.country]) {
            stateObject[item.country] = item;
        } else if (item.price < stateObject[item.country].price) {
            // If the state already holds the country, lets compare if the new item is cheaper than the old and if so, replace them
            stateObject[item.country] = item;
        }
    });

    // Once we went through all the item, let's convert our state object back to the right array format
    const finalData = Object.values(stateObject);
    return finalData;
}

Raw data import

If you want to send the data in a raw JSON format, you need to pass these data to the rawData input parameter. You will also need to have an Apify account so we can properly store your Google authentication tokens (you can opt-out anytime).

Important! - Raw data cannot exceed 9MB which is a default limit for Apify actor inputs. If you want to upload more data, you can easily split it into more runs (they are fast and cheap).

Table format (array of arrays)

rawData should be an array of arrays where each of the arrays represents one row in the sheet. The first row should be a header row where the field names are defined. Every other row is a data row.

It is important to have a proper order in each array. If the field is null for some row, the array should contain an empty string in that index. Data rows can have a smaller length than the header row but if they are longer the extra data will be trimmed off.

Arrays cannot contain nested structures like objects or other arrays! You have to flatten them in a format where / is a delimiter. E.g. personal/hobbies/0.

"rawData": [
    ["name", "occupation", "email", "hobbies/0", "hobbies/1"],
    ["John Doe", "developer", "john@google.com", "sport", "movies with Leonardo"],
    ["Leonardo DiCaprio", "actor", "leonardo@google.com", "being rich", "climate change activism"]
]

Dataset format (array of objects)

rawData should be an array of objects where each object represents one row in the sheet. The keys of the objects will be transformed to a header row and the values will be inserted to the data rows. Objects don't need to have the same keys. If an object doesn't have a key that another object has, the row will have an empty cell in that field.

The object can contain nested structures (objects and arrays) but in that case it will call Apify API to flatten the data which can take a little more time on large uploads so try to prefer flattened data.

Nested:

"rawData": [
    {
        "name": "John Doe",
        "email": "john@google.com",
        "hobbies": ["sport", "movies with Leonardo", "dog walking"]
    },
    {
        "name": "Leonardo DiCaprio",
        "email": "leonardo@google.com",
        "hobbies": ["being rich", "climate change activism"]
    }
]

Flattened:

"rawData": [
    {
        "name": "John Doe",
        "email": "john@google.com",
        "hobbies/0": "sport",
        "hobbies/1": "movies with Leonardo",
        "hobbies/2": "dog walking"
    },
    {
        "name": "Leonardo DiCaprio",
        "email": "leonardo@google.com",
        "hobbies/0": "being rich",
        "hobbies/1": "climate change activism"
    }
]

Changelog

A detailed list of changes is in the CHANGELOG.md file

Latest update - Version 2 (2020-10-08)

Industries

See how Google Sheets Import & Export is used in industries around the world