Tuesday, September 13, 2011

Google App Engine: Easy Bulk Download and Upload Tutorial

After wasting much time figuring out how to do a bulk download from appspot.com and upload to locahost, here's what I figured out so you don't have to waste as much time as I did on something that should be made trivial.

Google's doc is here: http://code.google.com/appengine/docs/python/tools/uploadingdata.html. The doc is straightforward except for the part on what command line options to pass to appcfg.py to make it download and upload data to appspot or localhost.

Step 1: Configure the app.yaml file

- remote_api: on

Step 2: Create the bulk loader yaml file

appcfg.py create_bulkloader_config --filename=bulkloader.yaml 

Open the file and search for "TODO". Add the word csv to tell it to use csv format. Each connector line should read like this:
connector: csv
Of course, you can use another data format if csv doesn't cut it for you.

If you use ListProperty models, you'll need to add a transformation to the bulkloader.yaml file. At the top, add something like

- import: my_app.db.list_property_transform

then add this line to every ListProperty you have

    - property: some_list
      external_name: some_list
      # Type: String Stats: 95 properties of this type in this kind.
      import_transform: list_property_transform.ListPropertyTransform

Also, create the file that would match the my_app.db.list_property_transform import. It's contents look like this:

For the local dev environment:
def ListPropertyTransform(x):
return len(x) > 0 and eval(x) or None

For the prod environment:
def ListPropertyTransform(x):
return len(x) > 0 and eval(x) or []

Read more about this transform for lists here: http://www.pressthered.com/importing_a_list_with_bulkloader.

Now let's transfer some data!

Step 3: Download data from appspot.com

First, use the "Make symlinks..." menu in the Goole App Engine application to create the necessary script shortcuts if you don't have them set up.

Here I have a User database entity.

I ran the following command at the command line to download my production data:

appcfg.py download_data --config_file=bulk_loader_remote.yaml --url=http://your_app_id.appspot.com/_ah/remote_api --filename=dump_User.csv --kind=User

Now, I upload it to localhost:

appcfg.py upload_data --config_file=bulk_loader_local.yaml --kind=User --filename=dump_User.csv --url=http://localhost:8080/_ah/remote_api --num_thread=1

It will ask you for your email, but just leave it blank and it will happily continue.

That's it for a basic use case. We'll see if things get dicier as I continue with my project.

