Google docs spreadsheet as database

I have a spreadsheet containing a terminology list, translated in multiple languages:
https://spreadsheets.google.com/ccc?key=0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE&hl=nl

Access via list-based feed

Documentation: http://code.google.com/intl/nl/apis/spreadsheets/data/3.0/reference.html#ListFeed

Example:
https://spreadsheets.google.com/feeds/list/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/1/public/basic
(note: public visibility requires the spreadsheet to be published)
(note: the key, I got when publishing the document)

Result:

<feed>

<title type=‘text’>Blad1</title>

<entry>

<title type=‘text’>christmas</title>

<content type=‘text’>nl: kerstmis, fr: noel</content>

</entry>

<title type=‘text’>easter</title>

<content type=‘text’>nl: pasen, fr: paques</content>

</entry>

</feed>

I have my data, but I’ll have to do some more regex-ing to parse the entries of the feed…

A specific row can be selected using a list feed structured query:
https://spreadsheets.google.com/feeds/list/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/1/public/basic?sq=en%3Dchristmas

Access via Google Visualization API Query Language

Documentation:
http://blog.ouseful.info/2009/05/18/using-google-spreadsheets-as-a-databace-with-the-google-visualisation-api-query-language/
http://code.google.com/intl/nl/apis/visualization/documentation/querylanguage.html

Example:
http://spreadsheets.google.com/a/google.com/tq?key=0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE
Hmmm… smells like JSON
http://spreadsheets.google.com/a/google.com/tq?key=0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE&tq=select%20C%20where%20A%20%3D%20′easter’
The column identifier is the column index (‘A’, ‘B’, …) and not the label. That’s a pity. Could be solved by having another table that makes the relationship between the language names and the column indices, but that’s not very clean.

Access via table feed

Documentation:
http://code.google.com/intl/nl/apis/spreadsheets/data/3.0/reference.html#TableFeed

Example:
https://spreadsheets.google.com/feeds/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/tables
https://spreadsheets.google.com/feeds/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/records/0
Nothing there…? This is a bit silly. Seems that you can access the data using this very simple URL-based API, but to
create a table, you must use the DATA API and perform a POST request containing the table definition and  with authenticated header.
So setting a ‘range’ in the GUI is not sufficient (as I first assumed).

Further study of the Spreadsheets Data API is needed.

If you are curious about how the Google Data APIs work at the basic level using XML and HTTP, you can read the Protocol Guide. This guide details the requests and responses that the Data API servers expect and return. To learn more about the structure of these requests and responses, read the Reference Guide. This guide defines the API’s feed types, HTTP request parameters, HTTP response codes, and XML elements.

To make working with the API easier, we have a number of client libraries that abstract the API into a language-specific object model. There are Developer’s Guides for Java, .NET, PHP, and Python as well as sample code.

So I guess I’m left with these alternatives:

  • building a small java application to create the table
  • start fiddling about with http console or curl-like tools to POST the request manually

Let’s not fiddle… And take a tour with the Java client library.

Here  are examples:



Client library download (installation is unzipping).

There’s also an Eclipse plugin provided, so I probably should not waste effort using NetBeans.

  1. Installation of the plugin
  2. TemplateDemo – creating a first project (targeting Spreadsheet rather than Documents)

Got this error: “java.lang.NoClassDefFoundError: com/google/common/collect/Maps”, because I also need the Google Collections (a.k.a Guava).
Installation is unzipping, and adding guava-r06.jar and guava-src-r06.zip to the project’s libraries.

Now the program runs and gives me a list of my spreadsheets:

Getting Spreadsheet entries…

Terminology liturgical days

spiegelserre

krantenwaaier

Lexfeed defects

Kindermissaal

klapluik

multigrade

Total Entries: 7

A few modifications of the example in “create a table” and some browsing through the client’s library javadoc.

// Create a new Spreadsheet service
SpreadsheetService myService = new SpreadsheetService(“My Application”);
myService.setUserCredentials(args[0],args[1]);

TableEntry tableEntry = new TableEntry();

FeedURLFactory factory = FeedURLFactory.getDefault();
URL tableFeedUrl = factory.getTableFeedUrl(“0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE”);

// Delete table if it already exists
TableFeed tableFeed = myService.getFeed(tableFeedUrl, TableFeed.class);
for (TableEntry table : tableFeed.getEntries()) {
String currTitle = table.getTitle().getPlainText();

if (currTitle.equals(“Terminology”)) {

table.delete();

break;

}

}

// Specify a basic table:
tableEntry.setTitle(new PlainTextConstruct(“Terminology”));
tableEntry.setWorksheet(new Worksheet(“Blad1″));
tableEntry.setHeader(new Header(1));

// Specify columns in the table, start row, number of rows.
Data tableData = new Data();
tableData.setNumberOfRows(200);
// Start row index cannot overlap with header row.
tableData.setStartIndex(2);
// This table has only one column.
tableData.addColumn(new Column(“A”, “en”));
tableData.addColumn(new Column(“B”, “fr”));
tableData.addColumn(new Column(“C”, “nl”));

tableEntry.setData(tableData);
myService.insert(tableFeedUrl, tableEntry);

To change the number of rows (if defined as ‘0’, there are no rows – obviously?), I have to remove the table if it already exists (see bold above).
When retrieving the table feed, I must supply a table number, which seems to increment when tables are deleted and added. So now I can retrieve the data from:
https://spreadsheets.google.com/feeds/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/records/1

But damn me! The records are still printed like “fr: kerstmis, nl: noel, en: christmas”…

A specific record can be retrieved using a structured query:
https://spreadsheets.google.com/feeds/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/records/1?sq=en%3Dchristmas

Cell based feed

First I need the worksheets feed

https://spreadsheets.google.com/feeds/worksheets/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/private/full

In the returned feed, this tag can be found:

<id>https://spreadsheets.google.com/feeds/worksheets/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/private/full/od6</id&gt;

The ‘od6’ is the ID of the worksheet, to be used in the cell based feed URL:

https://spreadsheets.google.com/feeds/cells/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/od6/private/full

The return contains individual cells:

A1

dinsdag 21 september 2010 21:36

en

B1

dinsdag 21 september 2010 21:36

fr

C1

dinsdag 21 september 2010 21:36

nl

Or as xml:

<feed>
<entry>
<title type=‘text’>A1</title>
<content type=‘text’>en</content>
<gs:cell row=’1′ col=’1′ inputValue=‘en’>en</gs:cell>
</entry>
<entry>
<title type=‘text’>B1</title>
<content type=‘text’>fr</content>
<gs:cell row=’1′ col=’2′ inputValue=‘fr’>fr</gs:cell>
</entry>
<entry>
<title type=‘text’>C1</title>
<content type=‘text’>nl</content>
<gs:cell row=’1′ col=’3′ inputValue=‘nl’>nl</gs:cell>
</entry>
<entry>
<title type=‘text’>A2</title>
<content type=‘text’>christmas</content>
<gs:cell row=’2′ col=’1′ inputValue=‘christmas’>christmas</gs:cell>
</entry>
<entry>
<content type=‘text’>kerstmis</content>
<gs:cell row=’2′ col=’2′ inputValue=‘kerstmis’>kerstmis</gs:cell>
</entry>
<entry>
<title type=‘text’>C2</title>
<content type=‘text’>noel</content>
<gs:cell row=’2′ col=’3′ inputValue=‘noel’>noel</gs:cell>
</entry>
<entry>
<title type=‘text’>A3</title>
<content type=‘text’>easter</content>
<gs:cell row=’3′ col=’1′ inputValue=‘easter’>easter</gs:cell>
</entry>
<entry>
<title type=‘text’>B3</title>
<content type=‘text’>pasen</content>
<gs:cell row=’3′ col=’2′ inputValue=‘pasen’>pasen</gs:cell>
</entry>
<entry>
<title type=‘text’>C3</title>
<content type=‘text’>paques</content>
<gs:cell row=’3′ col=’3′ inputValue=‘paques’>paques</gs:cell>
</entry>
</feed>

This should be queriable by a clever XPath via YQL.

feed/entry[gs:col/@col = (<column where first row is $language>) and gs:row/@row = (row where first column is $text)]/content

/feed/entry[gs:col/@col = (/feed/entry[gs:cell/@row = ‘1’ and content = $language]/@col) and gs:row/@row = (/feed/entry[gs:cell/@col = ‘A’ and content=$text]/@row)]/content

But -alas!- there seems to be an issue. The XML table isn’t returning any data at all:

select * from xml where url=”https://spreadsheets.google.com/feeds/cells/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/od6/private/full”

Even after putting https in front of the yql URL… (I suspect it’s because the google feed URL is on https)…

> The explanation is easy: this feed requires authentication…

>> Wait a minute… after reading this, I just did ‘Publish as webpage’ and selected ‘Atom’ and ‘Cells’ as options, and got this URL:

https://spreadsheets.google.com/feeds/cells/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/od6/public/basic

This URL is public, so it works in YQL !!

select * from xml where url=”https://spreadsheets.google.com/feeds/cells/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/od6/public/basic”

I can also get the data in CSV format (been reading this and this)

https://spreadsheets.google.com/pub?key=0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE&hl=nl&single=true&gid=0&output=csv

And perform this query on the CSV open table,

select * from csv where url=”https://spreadsheets.google.com/pub?key=0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE&hl=nl&single=true&gid=0&output=csv”

which on it’s turn can be fed into the XML table….
- or I should be able to add to the query ‘and columns = “en,fr,nl” and en=”christmas”’, the column names list being whatever is appropriate for the queried Spreadsheet.

Or wait,

  1. publish the spreadsheet as CSV
  2. turn the CSV into XML using YQL CSV open table
    REST URL
  3. query the XML using YQL XML table

The query:

select * from xml where itemPath = “/query/results/row/*[../*[name(.) = name(../../row[1]/*[. = ‘en‘])] = ‘easter‘ and name(.) = name(../../row[1]/*[. = ‘nl‘])]” and  url = “http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20csv%20where%20url%3D%22https%3A%2F%2Fspreadsheets.google.com%2Fpub%3Fkey%3D0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE%26hl%3Dnl%26single%3Dtrue%26gid%3D0%26output%3Dcsv%22&diagnostics=false”

This is equivalent to:

select nl from key where en = ‘easter

Google docs spreadsheet as database

Google docs spreadsheet as database

Access via list-based feed

Access via Google Visualization API Query Language

Access via table feed

Cell based feed

I have a spreadsheet containing a terminology list, translated in multiple languages:

https://spreadsheets.google.com/ccc?key=0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE&hl=nl

Access via list-based feed

Documentation: http://code.google.com/intl/nl/apis/spreadsheets/data/3.0/reference.html#ListFeed

Example:

https://spreadsheets.google.com/feeds/list/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/1/public/basic(note: public visibility requires the spreadsheet to be published)

(note: the key, I got when publishing the document)

Result:

<feed>

<title type=‘text’>Blad1</title>

<entry>

<title type=‘text’>christmas</title>

<content type=‘text’>nl: kerstmis, fr: noel</content>

</entry>

<title type=‘text’>easter</title>

<content type=‘text’>nl: pasen, fr: paques</content>

</entry>

</feed>

I have my data, but I’ll have to do some more regex-ing to parse the entries of the feed…

A specific row can be selected using a list feed structured query:

https://spreadsheets.google.com/feeds/list/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/1/public/basic?sq=en%3Dchristmas

Access via Google Visualization API Query Language

Documentation:

http://blog.ouseful.info/2009/05/18/using-google-spreadsheets-as-a-databace-with-the-google-visualisation-api-query-language/

http://code.google.com/intl/nl/apis/visualization/documentation/querylanguage.html

Example:

http://spreadsheets.google.com/a/google.com/tq?key=0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE

Hmmm… smells like JSON

http://spreadsheets.google.com/a/google.com/tq?key=0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE&tq=select%20C%20where%20A%20%3D%20′easter’

The column identifier is the column index (‘A’, ‘B’, …) and not the label. That’s a pity. Could be solved by having another table that makes the relationship between the language names and the column indices, but that’s not very clean.

Access via table feed

Documentation:

http://code.google.com/intl/nl/apis/spreadsheets/data/3.0/reference.html#TableFeed

Example:

https://spreadsheets.google.com/feeds/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/tables

https://spreadsheets.google.com/feeds/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/records/0
Nothing there…? This is a bit silly. Seems that you can access the data using this very simple URL-based API, but to
create a table, you must use the DATA API and perform a POST request containing the table definition and  with authenticated header.

So setting a ‘range’ in the GUI is not sufficient (as I first assumed).

Further study of the Spreadsheets Data API is needed.

If you are curious about how the Google Data APIs work at the basic level using XML and HTTP, you can read the Protocol Guide. This guide details the requests and responses that the Data API servers expect and return. To learn more about the structure of these requests and responses, read the Reference Guide. This guide defines the API’s feed types, HTTP request parameters, HTTP response codes, and XML elements.

To make working with the API easier, we have a number of client libraries that abstract the API into a language-specific object model. There are Developer’s Guides for Java, .NET, PHP, and Python as well as sample code.

So I guess I’m left with these alternatives:

  1. building a small java application to create the table
  2. start fiddling about with http console or curl-like tools to POST the request manually

Let’s not fiddle… And take a tour with the Java client library.

Here  are examples:

  1. http://code.google.com/p/gdata-java-client/source/browse/trunk/java/sample/spreadsheet/worksheet/WorksheetDemo.java
  2. http://ga-api-java-samples.googlecode.com/svn/trunk/src/v1/SpreadsheetExporter.java

Client library download (installation is unzipping).

There’s also an Eclipse plugin provided, so I probably should not waste effort using NetBeans.

  1. Installation of the plugin
  2. TemplateDemo – creating a first project (targeting Spreadsheet rather than Documents)

Got this error: “java.lang.NoClassDefFoundError: com/google/common/collect/Maps”, because I also need the Google Collections (a.k.a Guava).

Installation is unzipping, and adding guava-r06.jar and guava-src-r06.zip to the project’s libraries.

Now the program runs and gives me a list of my spreadsheets:

Getting Spreadsheet entries…

Terminology liturgical days

spiegelserre

krantenwaaier

Lexfeed defects

Kindermissaal

klapluik

multigrade

Total Entries: 7

A few modifications of the example in “create a table” and some browsing through the client’s library javadoc.

// Create a new Spreadsheet service

SpreadsheetService myService = new SpreadsheetService(“My Application”);

myService.setUserCredentials(args[0],args[1]);

TableEntry tableEntry = new TableEntry();

FeedURLFactory factory = FeedURLFactory.getDefault();

URL tableFeedUrl = factory.getTableFeedUrl(“0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE”);

// Delete table if it already exists

TableFeed tableFeed = myService.getFeed(tableFeedUrl, TableFeed.class);

for (TableEntry table : tableFeed.getEntries()) {

String currTitle = table.getTitle().getPlainText();

if (currTitle.equals(“Terminology”)) {

table.delete();

break;

}

}

// Specify a basic table:

tableEntry.setTitle(new PlainTextConstruct(“Terminology”));

tableEntry.setWorksheet(new Worksheet(“Blad1″));

tableEntry.setHeader(new Header(1));

// Specify columns in the table, start row, number of rows.

Data tableData = new Data();

tableData.setNumberOfRows(200);

// Start row index cannot overlap with header row.

tableData.setStartIndex(2);

// This table has only one column.

tableData.addColumn(new Column(“A”, “en”));

tableData.addColumn(new Column(“B”, “fr”));

tableData.addColumn(new Column(“C”, “nl”));

tableEntry.setData(tableData);

myService.insert(tableFeedUrl, tableEntry);

To change the number of rows (if defined as ‘0’, there are no rows – obviously?), I have to remove the table if it already exists (see bold above).

When retrieving the table feed, I must supply a table number, which seems to increment when tables are deleted and added. So now I can retrieve the data from:

https://spreadsheets.google.com/feeds/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/records/1

But damn me! The records are still printed like “fr: kerstmis, nl: noel, en: christmas”…

A specific record can be retrieved using a structured query:

https://spreadsheets.google.com/feeds/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/records/1?sq=en%3Dchristmas

Cell based feed

First I need the worksheets feed

https://spreadsheets.google.com/feeds/worksheets/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/private/full

In the returned feed, this tag can be found:

<id>https://spreadsheets.google.com/feeds/worksheets/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/private/full/od6</id&gt;

The ‘od6’ is the ID of the worksheet, to be used in the cell based feed URL:

https://spreadsheets.google.com/feeds/cells/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/od6/private/full

The return contains individual cells:

A1

dinsdag 21 september 2010 21:36

en

B1

dinsdag 21 september 2010 21:36

fr

C1

dinsdag 21 september 2010 21:36

nl

Or as xml:

<feed>

<entry>

<title type=‘text’>A1</title>

<content type=‘text’>en</content>

<gs:cell row=’1′ col=’1′ inputValue=‘en’>en</gs:cell>

</entry>

<entry>

<title type=‘text’>B1</title>

<content type=‘text’>fr</content>

<gs:cell row=’1′ col=’2′ inputValue=‘fr’>fr</gs:cell>

</entry>

<entry>

<title type=‘text’>C1</title>

<content type=‘text’>nl</content>

<gs:cell row=’1′ col=’3′ inputValue=‘nl’>nl</gs:cell>

</entry>

<entry>

<title type=‘text’>A2</title>

<content type=‘text’>christmas</content>

<gs:cell row=’2′ col=’1′ inputValue=‘christmas’>christmas</gs:cell>

</entry>

<entry>

<content type=‘text’>kerstmis</content>

<gs:cell row=’2′ col=’2′ inputValue=‘kerstmis’>kerstmis</gs:cell>

</entry>

<entry>

<title type=‘text’>C2</title>

<content type=‘text’>noel</content>

<gs:cell row=’2′ col=’3′ inputValue=‘noel’>noel</gs:cell>

</entry>

<entry>

<title type=‘text’>A3</title>

<content type=‘text’>easter</content>

<gs:cell row=’3′ col=’1′ inputValue=‘easter’>easter</gs:cell>

</entry>

<entry>

<title type=‘text’>B3</title>

<content type=‘text’>pasen</content>

<gs:cell row=’3′ col=’2′ inputValue=‘pasen’>pasen</gs:cell>

</entry>

<entry>

<title type=‘text’>C3</title>

<content type=‘text’>paques</content>

<gs:cell row=’3′ col=’3′ inputValue=‘paques’>paques</gs:cell>

</entry>

</feed>

This should be queriable by a clever XPath via YQL.

feed/entry[gs:col/@col = (<column where first row is $language>) and gs:row/@row = (row where first column is $text)]/content

/feed/entry[gs:col/@col = (/feed/entry[gs:cell/@row = ‘1’ and content = $language]/@col) and gs:row/@row = (/feed/entry[gs:cell/@col = ‘A’ and content=$text]/@row)]/content

But -alas!- there seems to be an issue. The XML table isn’t returning any data at all:

select * from xml where url=”https://spreadsheets.google.com/feeds/cells/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/od6/private/full”

Even after putting https in front of the yql URL… (I suspect it’s because the google feed URL is on https)…

> The explanation is easy: this feed requires authentication…

>> Wait a minute… after reading this, I just did ‘Publish as webpage’ and selected ‘Atom’ and ‘Cells’ as options, and got this URL:

https://spreadsheets.google.com/feeds/cells/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/od6/public/basic

This URL is public, so it works in YQL !!

select * from xml where url=”https://spreadsheets.google.com/feeds/cells/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/od6/public/basic”

I can also get the data in CSV format (been reading this and this)

https://spreadsheets.google.com/pub?key=0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE&hl=nl&single=true&gid=0&output=csv

And perform this query on the CSV open table,

select * from csv where url=”https://spreadsheets.google.com/pub?key=0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE&hl=nl&single=true&gid=0&output=csv”

which on it’s turn can be fed into the XML table….

- or I should be able to add to the query ‘and columns = “en,fr,nl” and en=”christmas”’, the column names list being whatever is appropriate for the queried Spreadsheet.

Or wait,

  1. publish the spreadsheet as CSV
  2. turn the CSV into XML using YQL CSV open tableREST URL
  3. query the XML using YQL XML table

The query:

select * from xml where itemPath = “/query/results/row/*[../*[name(.) = name(../../row[1]/*[. = ‘en‘])] = ‘easter‘ and name(.) = name(../../row[1]/*[. = ‘nl‘])]” and  url = “http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20csv%20where%20url%3D%22https%3A%2F%2Fspreadsheets.google.com%2Fpub%3Fkey%3D0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE%26hl%3Dnl%26single%3Dtrue%26gid%3D0%26output%3Dcsv%22&diagnostics=false”

This is equivalent to:

select nl from key where en = ‘easter

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s