Labelbox

Data Import

Labelbox has two ways of importing data:

  1. Row by Row: Good for real-time syncing and fine grain control
  2. Bulk import: Good for importing large amounts of data

Row by Row

In order to import data you will need the following IDs:

  • Your User ID
  • Your Organization ID
  • A Project ID

Use this query to collect those IDs

Run this query

query {
  user {
    id
    projects {
    	id
    	name
    }
  }
}

You can create a new dataset with this query. Then save the ID for the datarow query below.

Run this query

mutation {
  createDataset(
    data:{
      name: "<INSERT_NAME_HERE>",
      projects: {
        connect: [{id: "<INSERT_PROJECT_ID_HERE>"}]
      }
    }
  ) {
    id
  }
}

A datarow represents a single piece of data that needs to be labeled. For example if you have a CSV with 100 rows you will have 100 datarows.

The Labelbox API is rate limited at 300 requests per minute. We recommend sending datarow import requests one after another and not in batch.

Run this query

mutation {
  createDataRow(
    data: {
      rowData: "<DATA_THAT_NEEDS_TO_BE_LABELED>",
      dataset: {
        connect: {
          id: "<DATASET_ID_HERE>"
        }
      },
    }
  ) {
    id
  }
}

Python example

# Two things to run this script
# 1. run "pip install graphqlclient"
# 2. Fill in <API-KEY-HERE> (https://app.labelbox.com/settings/apikey)

import json
from graphqlclient import GraphQLClient
client = GraphQLClient('https://api.labelbox.com/graphql')
client.inject_token('Bearer <YOUR-API-KEY-HERE>')

def create_dataset(name):
    res_str = client.execute("""
    mutation CreateDatasetFromAPI($name: String!) {
      createDataset(data:{
        name: $name
      }){
        id
      }
    }
    """, {'name': name})

    res = json.loads(res_str)
    return res['data']['createDataset']['id']


def create_datarow(dataset_id, image_url, external_id):
    res_str = client.execute("""
    mutation CreateDataRowFromAPI($image_url: String!, $external_id: String!, $dataset_id: ID!){
      createDataRow(
        data: {
          rowData: $image_url,
          externalId: $external_id,
          dataset: {
            connect: {
              id: $dataset_id
            }
          },
        }
      ) {
        id
      }
    }
    """, {
      'dataset_id': dataset_id,
      'image_url': image_url,
      'external_id': external_id
    })

    res = json.loads(res_str)
    return res['data']['createDataRow']['id']

images = [{
  "external_id": "ab65d5e99w13",
  "image_url": "https://storage.googleapis.com/labelbox-example-datasets/tesla/104836109-p100d-review-5.1910x1000.jpeg"
},  {
  "external_id": "ljk6s544a7f8",
  "image_url": "https://storage.googleapis.com/labelbox-example-datasets/tesla/2017-Tesla-Model-3-top-view.jpg"
}]

if __name__ == "__main__":
  dataset_id = create_dataset('Dataset Created Through API')
  print('Created Dataset: ', dataset_id)
  for image in images:
    datarow_id = create_datarow(dataset_id, image['image_url'], image['external_id'])
    print('Created Data: '+datarow_id)
  print ('See your new dataset here: https://app.labelbox.com/dataset/'+dataset_id)

Bulk Import

To run a bulk import through the API you'll need to do three things (see the python example at the end).

  1. Provide a Dataset ID
  2. URL to JSON file
  3. Start an import job

1. Provide a Dataset ID
Either choose an existing dataset or create a new dataset. Below is an example of how to create a new dataset.

Run this query

mutation {
  createDataset(
    data:{
      name: "<INSERT_NAME_HERE>",
      projects: {
        connect: [{id: "<INSERT_PROJECT_ID_HERE>"}]
      }
    }
  ) {
    id
  }
}

2. URL to JSON file

Next, you'll need to create and upload a JSON file with the same format as a JSON import through the UI (docs here). See the end-to-end example at the bottom of this page.

3. Start an import job

Run Code

mutation AppendRowsToDataset {
  appendRowsToDataset(
    data:{
      datasetId:"some-dataset-id-here",
      jsonFileUrl:"some-url-here"
    }
  ){
    accepted
  }
}

End-to-end Python example

The below script will create a new dataset, upload a temporary file and start a bulk import. To get it running make these two changes:

  1. run pip install graphqlclient requests
  2. Fill in <API-KEY-HERE>
# Two things to run this script
# 1. run "pip install graphqlclient requests"
# 2. Fill in <API-KEY-HERE>

# This script will do the following
# - Create a new dataset "Example Bulk Import Dataset"
# - Upload an expiring json file to file.io
# - Starts a bulk import
import json
import requests
import tempfile
from graphqlclient import GraphQLClient
client = GraphQLClient('https://api.labelbox.com/graphql')
client.inject_token('Bearer <API-KEY-HERE>')


def create_dataset(name):
    res_str = client.execute("""
    mutation CreateDatasetFromAPI($name: String!) {
      createDataset(data:{
        name: $name
      }){
        id
      }
    }
    """, {'name': name})

    res = json.loads(res_str)
    return res['data']['createDataset']['id']


def start_bulk_import(dataset_id, json_file_url):
    res_str = client.execute("""
      mutation AppendRowsToDataset($datasetId: ID!, $jsonFileUrl: String!) {
        appendRowsToDataset(
          data:{
            datasetId: $datasetId,
            jsonFileUrl: $jsonFileUrl
          }
        ){
          accepted
        }
      }
    """, {'datasetId': dataset_id, 'jsonFileUrl': json_file_url})

    res = json.loads(res_str)
    return res['data']['appendRowsToDataset']['accepted']


"""
file.io is a free file uploading service
files will expire after 1 day or after the first time
they are downloaded (can't download twice)
this function can easily be swapped out for an s3 upload function
"""
def upload_to_file(dict):
    data = str.encode(json.dumps(dict))
    files = {'file': data}
    r = requests.post("https://file.io/?expires=1d", files=files)
    file_info = json.loads(r.text)
    return file_info['link']


example_image = {
    "externalId": "ab65d5e99w13",
    "imageUrl": "https://storage.googleapis.com/labelbox-example-datasets/tesla/104836109-p100d-review-5.1910x1000.jpeg"
}

example_image_with_asset_info = {
    "externalId": "ljk6s544a7f8",
    "imageUrl": "https://storage.googleapis.com/labelbox-example-datasets/tesla/2017-Tesla-Model-3-top-view.jpg",
    "info": {
        "type": "TEXT",
        "value": "This text will provide extra information about this image."
    }
}

example_tile_layer = {
    "tileLayerUrl": "https://public-tiles.dronedeploy.com/1499994155_DANIELOPENPIPELINE_ortho_qfs/{z}/{x}/{y}.png?Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9wdWJsaWMtdGlsZXMuZHJvbmVkZXBsb3kuY29tLzE0OTk5OTQxNTVfREFOSUVMT1BFTlBJUEVMSU5FX29ydGhvX3Fmcy8qIiwiQ29uZGl0aW9uIjp7IkRhdGVMZXNzVGhhbiI6eyJBV1M6RXBvY2hUaW1lIjoyMTQ1OTE0MTE4fX19XX0_&Signature=O~50rrGXdEC6Hi8jPJ3dbT~UtBd7Cw6iQPTxdJ8LU2IaoxeP22R3JpKPkLN3T3~Lcw3CyX7uft2Baj0MH93qUoCYyN~~jNX3OMkYV2jbrHDezf6zQRHAabXX-L2bL-JEGfFL6z3DWccOFeCH56CuhgC29k5CJx7I34P-LQJdnAUsA-KaqKH1IyYsHStRIfmMzdXNAWU58FTfqVljq9SbKXxfgdr2SZ~7VgLaZ8IhA0WnlKUo-JgqTd~jYa5mGCpR8351IMK0aMuY4Mld4SOXssQ-rOtlZtypvo8FDp474TlGIEGz5PHxGOPsqLPF19hEYTgoPqsUj8QEuiTfg-cmsg__&Key-Pair-Id=APKAJXGC45PGQXCMCXSA",
    "bounds": [
        [
            37.86857121694444,
            -122.32616227416666
        ],
        [
            37.87676075527778,
            -122.31316180916666
        ]
    ],
    "minZoom": 10,
    "maxZoom": 23,
    "epsg": "EPSG4326"
}

json_info = [example_image, example_image_with_asset_info, example_tile_layer]
url_to_json_file = upload_to_file(json_info)
dataset_id = create_dataset('Example Bulk Import Dataset')
result = start_bulk_import(dataset_id, url_to_json_file)
print('Go to https://app.labelbox.com/data and see your new dataset "Example Bulk Import Dataset"')