Composing URL’s in XSLT

Composing a complex URL can be very ugly in XSLT, using a very long formula of concatenations and encodings.
E.g the REST URL for this YQL query:

select * from html where url=”http://www.doorstroming.net/index.php/actua/49-qnationalisme-is-nationalismeq.html” and
xpath=’//*[contains(concat( ” “, @class, ” ” ), concat( ” “, “MsoNormal”, ” ” ))]’

Is ugly as sin:

http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22http%3A%2F%2Fwww.doorstroming.net%2Findex.php%2Factua%2F49-qnationalisme-is-nationalismeq.html%22%20and%20%20%20%20%20%20xpath%3D’%2F%2F*%5Bcontains(concat(%20%22%20%22%2C%20%40class%2C%20%22%20%22%20)%2C%20concat(%20%22%20%22%2C%20%22MsoNormal%22%2C%20%22%20%22%20))%5D’&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys

Two possible improvements:

Dedicated URL composition template

Create a template that can be called like this:

<xsl:call-template name=”url”>
<xsl:with-param name=”decom”>
<url>
<base>http://query.yahooapis.com/v1/public/yql</base&gt;
<query>
<name>q</name>
<value><xsl:value-of select=”$query”/></value>
</query>
<query>
<name>diagnostics</name>
<value>false</value>
</query>
<query>
<name>env</name>
<value>store://datatables.org/alltableswithkeys</name>
</query>
</url>
</xsl:with-param>
</xsl:call-template>

I’m not elaborating this template here. It should be quite straightforward, but there may be a more generic and even more elegant solution:

String Replacement Template

Using the replace template that was created in the XML Translation project.

<xsl:call-template name=”replace”>
<xsl:with-param name=”string”>
<xsl:text>http://query.yahooapis.com/v1/public/yql?q=$query&diagnostics=true&env=$env</xsl:text&gt;
</xsl:with-param>
<xsl:with-param name=”parameters”>
<query><xsl:value-of select=”encode-for-uri($query)”/></query>
<env><xsl:value-of select=”encode-for-uri(‘store://datatables.org/alltableswithkeys’)”/></env>
</xsl:with-param>
</xsl:call-template>

This is the replace template:

<xsl:template name=”replace”>
<xsl:param name=”string”/>
<xsl:param name=”parameters”/>
<xsl:choose>
<xsl:when test=”count($parameters/*) &gt; 0″>
<xsl:variable name=”string2″>
<xsl:value-of select=”replace($string,concat(‘\$’,name($parameters/*[1])),normalize-space($parameters/*[1]))”/>
</xsl:variable>
<xsl:call-template name=”replace”>
<xsl:with-param name=”string” select=”$string2″/>
<xsl:with-param name=”parameters”>
<xsl:copy-of select=”$parameters/*[position() &gt; 1]”/>
</xsl:with-param>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select=”$string”/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>

Vorstalarm

YQL Open Table Design

use “http://github.com/vicmortelmans/yql-tables/raw/master/weather/weather.frost.xml”  as weather.frost;select * from weather.frost where location = “Ringsaker”

(note: this location is somewhere in northern Scandinavia, where it’s more likely to freeze 🙂

Google Weather API

Google’s secret weather API:

http://www.google.com/ig/api?weather=antwerpen

YQL Open Table Execute

var url = “http://www.google.com/ig/api?weather=$location“;
url = url.replace(/\$location/,encode-uri(location));
xml = y.query(“select * from xml where url=’http://www.google.com/ig/api?weather=antwerpen‘”).results;
var seq = new String();
for each (var lowtemp in xml..forecast_conditions.low.@data) {
seq += (lowtemp > 32 ? ‘+’ : ‘-‘);
}
seq = seq.substr(0,3);
var output = new String();
switch (seq) {
case “+++”:
output = “Aanhoudende dooi in $location”;
break;
case “++-“:
output = “Overmorgen gaat het vriezen in $location”;
break;
case “+-+”:
case “+–“:
output = “Morgen gaat het vriezen in $location”;
break;
case “—“:
case “-+-“:
case “–+”:
output = “Aanhoudende vorst in $location”;
break;
case “-++”:
output = “Morgen dooit het in $location”;
break;
}
output = output.replace(/\$location/,location);

YQL XML open table returns no data when strange characters are in the XML

When running a query on the Google weather API for a city in France, very often the returned XML contains strange characters, like e.g. in “Avignon, Provence-Alpes-Côte d’Azur”. When querying for this xml through YQL XML open table, nothing is returned!

The Google API is not providing encoding information:

<?xml version=”1.0″?>

Based on an octal dump, I’d say it’s UTF8:

0000300   s   –   C 303 264 t   e       d   &   #   3   9   ;   A   z

Yahoo weather YQL tables

select * from weather.woeid where w in (select woeid from geo.places where text=”paris”)

Not very useful… only one day forecast, and e.g. for ‘Paris’ returning all cities with the name.

Creating a pipe works better:

http://pipes.yahoo.com/pipes/pipe.run?_id=4f9865c59104e4f105c331c5fbee4116&_render=rss&location=antwerpen

The pipe first gets the woeid for the location using the YQL woeid open table (see above), then loads the weather info from the Yahoo Weather RSS feed.

The result is formatted html, so extracting the low temp values requires text parsing.

World Weather Online API

Registration required, a simple API, but it won’t take ‘Antwerpen’, but only the english name of this flemish city… not so nice!

http://www.worldweatheronline.com/feed/weather.ashx?q=antwerp&format=xml&num_of_days=3&key=81e058dc89161948102509

So let’s use the YQL geo.places open table for getting the coordinates of our location, and feed this to the longitude/latitude World Weather Online API:

http://www.worldweatheronline.com/feed/weather.ashx?q=51.22,4.40&format=xml&num_of_days=3&key=81e058dc89161948102509

This operation chain is performed in this pipe.

This pipe, on it’s turn, can be used as source for YQL open table execute scripting, via the JSON open table query:

select * from json where url=”http://pipes.yahoo.com/pipes/pipe.run?_id=70a400f5404b6a454caf850de3f62d33&_render=json&location=paris” and itemPath=”//tempMinC”

The final weather.frost open table allows for an extra parameter: the target language.

use “http://github.com/vicmortelmans/yql-tables/raw/master/weather/weather.frost.xml”  as weather.frost;select * from weather.frost where location = “Antwerpen” and language=”nl”

Yahoo Pipe

The wrapper pipe is needed to turn the YQL query output into RSS format.

http://pipes.yahoo.com/pipes/pipe.info?_id=145e579bd145fb0b01f9473f69b7e4c1

RSS:

http://pipes.yahoo.com/pipes/pipe.run?_id=145e579bd145fb0b01f9473f69b7e4c1&_render=rss&language=en&location=Paris

encoded:

http%3A%2F%2Fpipes.yahoo.com%2Fpipes%2Fpipe.run%3F_id%3D145e579bd145fb0b01f9473f69b7e4c1%26_render%3Drss%26language%3Den%26location%3DParis

Todo

It would be nice if the language could be automatically detected, based on the provided city name! (but what about Brussels?).

Google Gadget

The original Frost Alert article refers to the Google Webpage where the Google gadget is published.

Managing the site is done in sites.google.com, but that’s OK.

Editing the gadget is done in the iGoogle Gadget Editor. This is the gadget code:

<?xml version=”1.0″ encoding=”UTF-8″?>
<Module>
<ModulePrefs title=”Vorstalarm!” height=”401″ width=”576″ author=”Vic Mortelmans” author_email=”vic.mortelmans@telenet.be”/>
<Content type=”html”><![CDATA[
<script type=”text/javascript”>
htmlTemplate = ‘<a href=”http://www.addtoany.com/subscribe?linkname=Vorstalarm!&amp;linkurl=http%3A%2F%2Fpipes.yahoo.com%2Fpipes%2Fpipe.run%3F_id%3D145e579bd145fb0b01f9473f69b7e4c1%26_render%3Drss%26language%3Dnl%26location%3D_LOCATION_&#8221; target=”_blank”><img src=”http://static.addtoany.com/buttons/subscribe_120_16.gif&#8221; width=”120″ height=”16″ border=”0″ alt=”Subscribe”/></a>’;
function displayGetRSS () {
var html = htmlTemplate.replace(/_LOCATION_/,document.getElementById(‘input’).value);
document.getElementById(‘content_div’).innerHTML = html;
}
</script>
<div>
<img src=”http://farm4.static.flickr.com/3520/4058180670_ed979a4099_o.jpg&#8221; width=”576″ height=”401″ alt=”breughel_hunters” />
<div style=”position: relative; background: white; top: -350px; left: 193px; width: 190px; height: 110px; padding: 10px” >
<div style=”text-align: center”>
<form>
<p>
<b>Woonplaats:</b>
<br/>
<input id=”input” name=”location” onkeyup=”displayGetRSS()” type=”text”/>
</p>
</form>
</div>
<div id=”content_div” style=”text-align: center”>
<img src=”http://static.addtoany.com/buttons/subscribe_120_16.gif&#8221; width=”120″ height=”16″ border=”0″ alt=”Subscribe”/>
</div>
</div>
</div>
]]>
</Content>
</Module>

Note that it seems the gagdet must be published (and re-inserted into the webpage) before changes take effect !?

Google docs spreadsheet as database

I have a spreadsheet containing a terminology list, translated in multiple languages:
https://spreadsheets.google.com/ccc?key=0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE&hl=nl

Access via list-based feed

Documentation: http://code.google.com/intl/nl/apis/spreadsheets/data/3.0/reference.html#ListFeed

Example:
https://spreadsheets.google.com/feeds/list/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/1/public/basic
(note: public visibility requires the spreadsheet to be published)
(note: the key, I got when publishing the document)

Result:

<feed>

<title type=‘text’>Blad1</title>

<entry>

<title type=‘text’>christmas</title>

<content type=‘text’>nl: kerstmis, fr: noel</content>

</entry>

<title type=‘text’>easter</title>

<content type=‘text’>nl: pasen, fr: paques</content>

</entry>

</feed>

I have my data, but I’ll have to do some more regex-ing to parse the entries of the feed…

A specific row can be selected using a list feed structured query:
https://spreadsheets.google.com/feeds/list/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/1/public/basic?sq=en%3Dchristmas

Access via Google Visualization API Query Language

Documentation:
http://blog.ouseful.info/2009/05/18/using-google-spreadsheets-as-a-databace-with-the-google-visualisation-api-query-language/
http://code.google.com/intl/nl/apis/visualization/documentation/querylanguage.html

Example:
http://spreadsheets.google.com/a/google.com/tq?key=0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE
Hmmm… smells like JSON
http://spreadsheets.google.com/a/google.com/tq?key=0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE&tq=select%20C%20where%20A%20%3D%20’easter’
The column identifier is the column index (‘A’, ‘B’, …) and not the label. That’s a pity. Could be solved by having another table that makes the relationship between the language names and the column indices, but that’s not very clean.

Access via table feed

Documentation:
http://code.google.com/intl/nl/apis/spreadsheets/data/3.0/reference.html#TableFeed

Example:
https://spreadsheets.google.com/feeds/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/tables
https://spreadsheets.google.com/feeds/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/records/0
Nothing there…? This is a bit silly. Seems that you can access the data using this very simple URL-based API, but to
create a table, you must use the DATA API and perform a POST request containing the table definition and  with authenticated header.
So setting a ‘range’ in the GUI is not sufficient (as I first assumed).

Further study of the Spreadsheets Data API is needed.

If you are curious about how the Google Data APIs work at the basic level using XML and HTTP, you can read the Protocol Guide. This guide details the requests and responses that the Data API servers expect and return. To learn more about the structure of these requests and responses, read the Reference Guide. This guide defines the API’s feed types, HTTP request parameters, HTTP response codes, and XML elements.

To make working with the API easier, we have a number of client libraries that abstract the API into a language-specific object model. There are Developer’s Guides for Java, .NET, PHP, and Python as well as sample code.

So I guess I’m left with these alternatives:

  • building a small java application to create the table
  • start fiddling about with http console or curl-like tools to POST the request manually

Let’s not fiddle… And take a tour with the Java client library.

Here  are examples:



Client library download (installation is unzipping).

There’s also an Eclipse plugin provided, so I probably should not waste effort using NetBeans.

  1. Installation of the plugin
  2. TemplateDemo – creating a first project (targeting Spreadsheet rather than Documents)

Got this error: “java.lang.NoClassDefFoundError: com/google/common/collect/Maps”, because I also need the Google Collections (a.k.a Guava).
Installation is unzipping, and adding guava-r06.jar and guava-src-r06.zip to the project’s libraries.

Now the program runs and gives me a list of my spreadsheets:

Getting Spreadsheet entries…

Terminology liturgical days

spiegelserre

krantenwaaier

Lexfeed defects

Kindermissaal

klapluik

multigrade

Total Entries: 7

A few modifications of the example in “create a table” and some browsing through the client’s library javadoc.

// Create a new Spreadsheet service
SpreadsheetService myService = new SpreadsheetService(“My Application”);
myService.setUserCredentials(args[0],args[1]);

TableEntry tableEntry = new TableEntry();

FeedURLFactory factory = FeedURLFactory.getDefault();
URL tableFeedUrl = factory.getTableFeedUrl(“0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE”);

// Delete table if it already exists
TableFeed tableFeed = myService.getFeed(tableFeedUrl, TableFeed.class);
for (TableEntry table : tableFeed.getEntries()) {
String currTitle = table.getTitle().getPlainText();

if (currTitle.equals(“Terminology”)) {

table.delete();

break;

}

}

// Specify a basic table:
tableEntry.setTitle(new PlainTextConstruct(“Terminology”));
tableEntry.setWorksheet(new Worksheet(“Blad1”));
tableEntry.setHeader(new Header(1));

// Specify columns in the table, start row, number of rows.
Data tableData = new Data();
tableData.setNumberOfRows(200);
// Start row index cannot overlap with header row.
tableData.setStartIndex(2);
// This table has only one column.
tableData.addColumn(new Column(“A”, “en”));
tableData.addColumn(new Column(“B”, “fr”));
tableData.addColumn(new Column(“C”, “nl”));

tableEntry.setData(tableData);
myService.insert(tableFeedUrl, tableEntry);

To change the number of rows (if defined as ‘0’, there are no rows – obviously?), I have to remove the table if it already exists (see bold above).
When retrieving the table feed, I must supply a table number, which seems to increment when tables are deleted and added. So now I can retrieve the data from:
https://spreadsheets.google.com/feeds/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/records/1

But damn me! The records are still printed like “fr: kerstmis, nl: noel, en: christmas”…

A specific record can be retrieved using a structured query:
https://spreadsheets.google.com/feeds/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/records/1?sq=en%3Dchristmas

Cell based feed

First I need the worksheets feed

https://spreadsheets.google.com/feeds/worksheets/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/private/full

In the returned feed, this tag can be found:

<id>https://spreadsheets.google.com/feeds/worksheets/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/private/full/od6</id&gt;

The ‘od6’ is the ID of the worksheet, to be used in the cell based feed URL:

https://spreadsheets.google.com/feeds/cells/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/od6/private/full

The return contains individual cells:

A1

dinsdag 21 september 2010 21:36

en

B1

dinsdag 21 september 2010 21:36

fr

C1

dinsdag 21 september 2010 21:36

nl

Or as xml:

<feed>
<entry>
<title type=‘text’>A1</title>
<content type=‘text’>en</content>
<gs:cell row=‘1’ col=‘1’ inputValue=‘en’>en</gs:cell>
</entry>
<entry>
<title type=‘text’>B1</title>
<content type=‘text’>fr</content>
<gs:cell row=‘1’ col=‘2’ inputValue=‘fr’>fr</gs:cell>
</entry>
<entry>
<title type=‘text’>C1</title>
<content type=‘text’>nl</content>
<gs:cell row=‘1’ col=‘3’ inputValue=‘nl’>nl</gs:cell>
</entry>
<entry>
<title type=‘text’>A2</title>
<content type=‘text’>christmas</content>
<gs:cell row=‘2’ col=‘1’ inputValue=‘christmas’>christmas</gs:cell>
</entry>
<entry>
<content type=‘text’>kerstmis</content>
<gs:cell row=‘2’ col=‘2’ inputValue=‘kerstmis’>kerstmis</gs:cell>
</entry>
<entry>
<title type=‘text’>C2</title>
<content type=‘text’>noel</content>
<gs:cell row=‘2’ col=‘3’ inputValue=‘noel’>noel</gs:cell>
</entry>
<entry>
<title type=‘text’>A3</title>
<content type=‘text’>easter</content>
<gs:cell row=‘3’ col=‘1’ inputValue=‘easter’>easter</gs:cell>
</entry>
<entry>
<title type=‘text’>B3</title>
<content type=‘text’>pasen</content>
<gs:cell row=‘3’ col=‘2’ inputValue=‘pasen’>pasen</gs:cell>
</entry>
<entry>
<title type=‘text’>C3</title>
<content type=‘text’>paques</content>
<gs:cell row=‘3’ col=‘3’ inputValue=‘paques’>paques</gs:cell>
</entry>
</feed>

This should be queriable by a clever XPath via YQL.

feed/entry[gs:col/@col = (<column where first row is $language>) and gs:row/@row = (row where first column is $text)]/content

/feed/entry[gs:col/@col = (/feed/entry[gs:cell/@row = ‘1’ and content = $language]/@col) and gs:row/@row = (/feed/entry[gs:cell/@col = ‘A’ and content=$text]/@row)]/content

But -alas!- there seems to be an issue. The XML table isn’t returning any data at all:

select * from xml where url=”https://spreadsheets.google.com/feeds/cells/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/od6/private/full”

Even after putting https in front of the yql URL… (I suspect it’s because the google feed URL is on https)…

> The explanation is easy: this feed requires authentication…

>> Wait a minute… after reading this, I just did ‘Publish as webpage’ and selected ‘Atom’ and ‘Cells’ as options, and got this URL:

https://spreadsheets.google.com/feeds/cells/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/od6/public/basic

This URL is public, so it works in YQL !!

select * from xml where url=”https://spreadsheets.google.com/feeds/cells/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/od6/public/basic”

I can also get the data in CSV format (been reading this and this)

https://spreadsheets.google.com/pub?key=0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE&hl=nl&single=true&gid=0&output=csv

And perform this query on the CSV open table,

select * from csv where url=”https://spreadsheets.google.com/pub?key=0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE&hl=nl&single=true&gid=0&output=csv”

which on it’s turn can be fed into the XML table….
– or I should be able to add to the query ‘and columns = “en,fr,nl” and en=”christmas”’, the column names list being whatever is appropriate for the queried Spreadsheet.

Or wait,

  1. publish the spreadsheet as CSV
  2. turn the CSV into XML using YQL CSV open table
    REST URL
  3. query the XML using YQL XML table

The query:

select * from xml where itemPath = “/query/results/row/*[../*[name(.) = name(../../row[1]/*[. = ‘en‘])] = ‘easter‘ and name(.) = name(../../row[1]/*[. = ‘nl‘])]” and  url = “http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20csv%20where%20url%3D%22https%3A%2F%2Fspreadsheets.google.com%2Fpub%3Fkey%3D0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE%26hl%3Dnl%26single%3Dtrue%26gid%3D0%26output%3Dcsv%22&diagnostics=false”

This is equivalent to:

select nl from key where en = ‘easter

Google docs spreadsheet as database

Google docs spreadsheet as database

Access via list-based feed

Access via Google Visualization API Query Language

Access via table feed

Cell based feed

I have a spreadsheet containing a terminology list, translated in multiple languages:

https://spreadsheets.google.com/ccc?key=0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE&hl=nl

Access via list-based feed

Documentation: http://code.google.com/intl/nl/apis/spreadsheets/data/3.0/reference.html#ListFeed

Example:

https://spreadsheets.google.com/feeds/list/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/1/public/basic(note: public visibility requires the spreadsheet to be published)

(note: the key, I got when publishing the document)

Result:

<feed>

<title type=‘text’>Blad1</title>

<entry>

<title type=‘text’>christmas</title>

<content type=‘text’>nl: kerstmis, fr: noel</content>

</entry>

<title type=‘text’>easter</title>

<content type=‘text’>nl: pasen, fr: paques</content>

</entry>

</feed>

I have my data, but I’ll have to do some more regex-ing to parse the entries of the feed…

A specific row can be selected using a list feed structured query:

https://spreadsheets.google.com/feeds/list/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/1/public/basic?sq=en%3Dchristmas

Access via Google Visualization API Query Language

Documentation:

http://blog.ouseful.info/2009/05/18/using-google-spreadsheets-as-a-databace-with-the-google-visualisation-api-query-language/

http://code.google.com/intl/nl/apis/visualization/documentation/querylanguage.html

Example:

http://spreadsheets.google.com/a/google.com/tq?key=0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE

Hmmm… smells like JSON

http://spreadsheets.google.com/a/google.com/tq?key=0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE&tq=select%20C%20where%20A%20%3D%20’easter’

The column identifier is the column index (‘A’, ‘B’, …) and not the label. That’s a pity. Could be solved by having another table that makes the relationship between the language names and the column indices, but that’s not very clean.

Access via table feed

Documentation:

http://code.google.com/intl/nl/apis/spreadsheets/data/3.0/reference.html#TableFeed

Example:

https://spreadsheets.google.com/feeds/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/tables

https://spreadsheets.google.com/feeds/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/records/0
Nothing there…? This is a bit silly. Seems that you can access the data using this very simple URL-based API, but to
create a table, you must use the DATA API and perform a POST request containing the table definition and  with authenticated header.

So setting a ‘range’ in the GUI is not sufficient (as I first assumed).

Further study of the Spreadsheets Data API is needed.

If you are curious about how the Google Data APIs work at the basic level using XML and HTTP, you can read the Protocol Guide. This guide details the requests and responses that the Data API servers expect and return. To learn more about the structure of these requests and responses, read the Reference Guide. This guide defines the API’s feed types, HTTP request parameters, HTTP response codes, and XML elements.

To make working with the API easier, we have a number of client libraries that abstract the API into a language-specific object model. There are Developer’s Guides for Java, .NET, PHP, and Python as well as sample code.

So I guess I’m left with these alternatives:

  1. building a small java application to create the table
  2. start fiddling about with http console or curl-like tools to POST the request manually

Let’s not fiddle… And take a tour with the Java client library.

Here  are examples:

  1. http://code.google.com/p/gdata-java-client/source/browse/trunk/java/sample/spreadsheet/worksheet/WorksheetDemo.java
  2. http://ga-api-java-samples.googlecode.com/svn/trunk/src/v1/SpreadsheetExporter.java

Client library download (installation is unzipping).

There’s also an Eclipse plugin provided, so I probably should not waste effort using NetBeans.

  1. Installation of the plugin
  2. TemplateDemo – creating a first project (targeting Spreadsheet rather than Documents)

Got this error: “java.lang.NoClassDefFoundError: com/google/common/collect/Maps”, because I also need the Google Collections (a.k.a Guava).

Installation is unzipping, and adding guava-r06.jar and guava-src-r06.zip to the project’s libraries.

Now the program runs and gives me a list of my spreadsheets:

Getting Spreadsheet entries…

Terminology liturgical days

spiegelserre

krantenwaaier

Lexfeed defects

Kindermissaal

klapluik

multigrade

Total Entries: 7

A few modifications of the example in “create a table” and some browsing through the client’s library javadoc.

// Create a new Spreadsheet service

SpreadsheetService myService = new SpreadsheetService(“My Application”);

myService.setUserCredentials(args[0],args[1]);

TableEntry tableEntry = new TableEntry();

FeedURLFactory factory = FeedURLFactory.getDefault();

URL tableFeedUrl = factory.getTableFeedUrl(“0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE”);

// Delete table if it already exists

TableFeed tableFeed = myService.getFeed(tableFeedUrl, TableFeed.class);

for (TableEntry table : tableFeed.getEntries()) {

String currTitle = table.getTitle().getPlainText();

if (currTitle.equals(“Terminology”)) {

table.delete();

break;

}

}

// Specify a basic table:

tableEntry.setTitle(new PlainTextConstruct(“Terminology”));

tableEntry.setWorksheet(new Worksheet(“Blad1”));

tableEntry.setHeader(new Header(1));

// Specify columns in the table, start row, number of rows.

Data tableData = new Data();

tableData.setNumberOfRows(200);

// Start row index cannot overlap with header row.

tableData.setStartIndex(2);

// This table has only one column.

tableData.addColumn(new Column(“A”, “en”));

tableData.addColumn(new Column(“B”, “fr”));

tableData.addColumn(new Column(“C”, “nl”));

tableEntry.setData(tableData);

myService.insert(tableFeedUrl, tableEntry);

To change the number of rows (if defined as ‘0’, there are no rows – obviously?), I have to remove the table if it already exists (see bold above).

When retrieving the table feed, I must supply a table number, which seems to increment when tables are deleted and added. So now I can retrieve the data from:

https://spreadsheets.google.com/feeds/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/records/1

But damn me! The records are still printed like “fr: kerstmis, nl: noel, en: christmas”…

A specific record can be retrieved using a structured query:

https://spreadsheets.google.com/feeds/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/records/1?sq=en%3Dchristmas

Cell based feed

First I need the worksheets feed

https://spreadsheets.google.com/feeds/worksheets/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/private/full

In the returned feed, this tag can be found:

<id>https://spreadsheets.google.com/feeds/worksheets/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/private/full/od6</id&gt;

The ‘od6’ is the ID of the worksheet, to be used in the cell based feed URL:

https://spreadsheets.google.com/feeds/cells/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/od6/private/full

The return contains individual cells:

A1

dinsdag 21 september 2010 21:36

en

B1

dinsdag 21 september 2010 21:36

fr

C1

dinsdag 21 september 2010 21:36

nl

Or as xml:

<feed>

<entry>

<title type=‘text’>A1</title>

<content type=‘text’>en</content>

<gs:cell row=‘1’ col=‘1’ inputValue=‘en’>en</gs:cell>

</entry>

<entry>

<title type=‘text’>B1</title>

<content type=‘text’>fr</content>

<gs:cell row=‘1’ col=‘2’ inputValue=‘fr’>fr</gs:cell>

</entry>

<entry>

<title type=‘text’>C1</title>

<content type=‘text’>nl</content>

<gs:cell row=‘1’ col=‘3’ inputValue=‘nl’>nl</gs:cell>

</entry>

<entry>

<title type=‘text’>A2</title>

<content type=‘text’>christmas</content>

<gs:cell row=‘2’ col=‘1’ inputValue=‘christmas’>christmas</gs:cell>

</entry>

<entry>

<content type=‘text’>kerstmis</content>

<gs:cell row=‘2’ col=‘2’ inputValue=‘kerstmis’>kerstmis</gs:cell>

</entry>

<entry>

<title type=‘text’>C2</title>

<content type=‘text’>noel</content>

<gs:cell row=‘2’ col=‘3’ inputValue=‘noel’>noel</gs:cell>

</entry>

<entry>

<title type=‘text’>A3</title>

<content type=‘text’>easter</content>

<gs:cell row=‘3’ col=‘1’ inputValue=‘easter’>easter</gs:cell>

</entry>

<entry>

<title type=‘text’>B3</title>

<content type=‘text’>pasen</content>

<gs:cell row=‘3’ col=‘2’ inputValue=‘pasen’>pasen</gs:cell>

</entry>

<entry>

<title type=‘text’>C3</title>

<content type=‘text’>paques</content>

<gs:cell row=‘3’ col=‘3’ inputValue=‘paques’>paques</gs:cell>

</entry>

</feed>

This should be queriable by a clever XPath via YQL.

feed/entry[gs:col/@col = (<column where first row is $language>) and gs:row/@row = (row where first column is $text)]/content

/feed/entry[gs:col/@col = (/feed/entry[gs:cell/@row = ‘1’ and content = $language]/@col) and gs:row/@row = (/feed/entry[gs:cell/@col = ‘A’ and content=$text]/@row)]/content

But -alas!- there seems to be an issue. The XML table isn’t returning any data at all:

select * from xml where url=”https://spreadsheets.google.com/feeds/cells/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/od6/private/full”

Even after putting https in front of the yql URL… (I suspect it’s because the google feed URL is on https)…

> The explanation is easy: this feed requires authentication…

>> Wait a minute… after reading this, I just did ‘Publish as webpage’ and selected ‘Atom’ and ‘Cells’ as options, and got this URL:

https://spreadsheets.google.com/feeds/cells/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/od6/public/basic

This URL is public, so it works in YQL !!

select * from xml where url=”https://spreadsheets.google.com/feeds/cells/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/od6/public/basic”

I can also get the data in CSV format (been reading this and this)

https://spreadsheets.google.com/pub?key=0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE&hl=nl&single=true&gid=0&output=csv

And perform this query on the CSV open table,

select * from csv where url=”https://spreadsheets.google.com/pub?key=0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE&hl=nl&single=true&gid=0&output=csv”

which on it’s turn can be fed into the XML table….

– or I should be able to add to the query ‘and columns = “en,fr,nl” and en=”christmas”’, the column names list being whatever is appropriate for the queried Spreadsheet.

Or wait,

  1. publish the spreadsheet as CSV
  2. turn the CSV into XML using YQL CSV open tableREST URL
  3. query the XML using YQL XML table

The query:

select * from xml where itemPath = “/query/results/row/*[../*[name(.) = name(../../row[1]/*[. = ‘en‘])] = ‘easter‘ and name(.) = name(../../row[1]/*[. = ‘nl‘])]” and  url = “http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20csv%20where%20url%3D%22https%3A%2F%2Fspreadsheets.google.com%2Fpub%3Fkey%3D0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE%26hl%3Dnl%26single%3Dtrue%26gid%3D0%26output%3Dcsv%22&diagnostics=false”

This is equivalent to:

select nl from key where en = ‘easter

Dapper clone

Using SelectorGadget and YQL community open table data.html.cssselect

It’s fairly simple to mimick (some of) the behaviour of Dapper using these two tools. I’m into this, because I bumped into a Dapper disadvantage (well, in most cases it’s an advantage, but not in my particular case): it reduces the captured html fragments to plain text, which is good for a simple RSS update, but bad if you want to retain formatting in captured text (e.g. italics may be important and relevant for reflecting the content of a text).

Procedure:

  1. Install the SelectorGadget in your browser (it’s a bookmarklet, so you can simply drag it into your bookmarks)
  2. Open the webpage you want to retrieve data from (or at least a very similar webpage)
  3. Start the SelectorGadget
  4. Play around until you’ve selected the right sections
  5. Copy the displayed CSS selector
  6. Open the YQL console
    You’ll see a sample query select * from data.html.cssselect where url=”http://www.doorstroming.net/index.php/actua/49-qnationalisme-is-nationalismeq.html&#8221; and css=”.MsoNormal”
  7. Replace the value for ‘css’ with the CSS selector generated by SelectorGadget
  8. Replace the value for ‘url’ with the URL of the page containing your data. This can be any page with a similar structure to the page you’ve used originally.
  9. Execute the query or copy the REST URL

The REST URL looks like this:
http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20data.html.cssselect%20where%20url%3D%22http%3A%2F%2Fwww.doorstroming.net%2Findex.php%2Factua%2F49-qnationalisme-is-nationalismeq.html%22%20and%20css%3D%22.MsoNormal%22&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys
By replacing the bold sections with resp. the (url-encoded) URL of the website and the applicable CSS Selector (also url-encoded!), you can apply any query on any webpage.

Restrictions:

  • for more complex selections, SelectorGadget returns stuff like this
    :nth-child(3) , .article-meta
    It looks like the YQL table can’t cope with this (probably the :nth-child() is something coming from a newer CSS standard than supported by the YQL table)
  • Dapper can do lots more, like capturing multiple fields, grouping, rendering RSS and other output formats,…
  • Dapper has an intuitive GUI

Using html open table

To solve the first restriction, using the html open table is a good solution. SelectorGadget has a button for converting the CSS selector into an XPath expression. The query then looks like this:
select * from html where url=”http://www.doorstroming.net/index.php/actua/49-qnationalisme-is-nationalismeq.html” and xpath=’//*[contains(concat( ” “, @class, ” ” ), concat( ” “, “MsoNormal”, ” ” ))]’

Serna plugin development

Problem

I spent a couple of days figuring out some weird behaviour of Serna. When trying to develop my own plugin, it always rendered my content in editable text fields, rather than word-processor-like inline editable text. Looking at the FO-tree dump, it appeared to be rendered using the serna-extensions.

Strip-down

I decided to strip-down the resume project, up to the level at which my own plugin was built up, to see where the text fields would show up.

The trigger for getting the editable text fields, is removing this line from the source resume xml:

<!DOCTYPE resume PUBLIC "-//Sean Kelly//DTD Resume  1.5.1//EN" "http://xmlresume.sourceforge.net/dtd/resume.dtd" []>

A clue on the web?:

It’s not a bug  per se.  That final decision rests with the folks at Syntext.  Serna has the capability to support entities (which XML Schema does not have).  In order to do get that capability, Serna uses in-line DOCTYPE.


That said, to process those documents in the Toolkit one needs to do one of the following:

a) remove the DOCTYPE from the XML docs; you get XSD validation

b) remove xmlxs:xsi namespace and xsi:noNamespaceSchemaLocation attribute from the XML docs;  you get DTD validation

c) modify the DTDs in the Toolkit  to include xmlxs:xsi namespace and xsi:noNamespaceSchemaLocation attribute;  you get DTD validation


Changing the XML parser parameters will not do anything useful since it is dependent on what appears in the XML docs.  The easiest one to do ay this point would be to modify the DTDs in the Toolkit.  That would allow you to process the documents without having to modify each one them.

Further stripping down of the resume example shows how the DTD declares a FIXED xmlns-attribute to the top-level element. Setting or unsetting this attribute would cause Serna to behave differently regarding presentation of the element text in wysiwyg-view. After removing the DTD alltogether, the xmlns-attribute doesn’t seem to be needed at all.

Bare-bone Serna plug-in

This is the complete specification of a bare-bone Serna plug-in that can be used for displaying, editing and saving xml-files using a custom stylesheet.

Serna Document Template (sdt)

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE  t:document-template PUBLIC "-//Syntext//DTD Syntext Document Template  V2.0//EN" "doctemplate2.dtd">
<t:document-template  xmlns:t="http://www.syntext.com/Extensions/DocTemplate-2.0">
 <t:name>Resume</t:name>
 <t:category>XML  Resume</t:category>
 <t:xml-stylesheet>$SERNA_TEMPLATE_DIR/xsl/resume.xsl</t:xml-stylesheet>
 <t:xml-schema>$SERNA_TEMPLATE_DIR/xsd/resumex.xsd</t:xml-schema>
 <t:root-elements>resume</t:root-elements>
 <t:category-pixmap>resume_1_5_1_template</t:category-pixmap>
 <t:rootelem-guess-string>resume</t:rootelem-guess-string>
 <t:document-skeleton>
  <t:name>Resume</t:name>
 </t:document-skeleton>
</t:document-template>

XML Schema (resumex.xsd)

<?xml  version="1.0"?>
<xs:schema  xmlns:xs="http://www.w3.org/2001/XMLSchema">
 <xs:element  name="misc">
  <xs:complexType>
   <xs:sequence>
    <xs:element ref="para"   maxOccurs="unbounded"/>
   </xs:sequence>
  </xs:complexType>
 </xs:element>
 <xs:element  name="para">
  <xs:complexType mixed="true" />
 </xs:element>
 <xs:element  name="resume">
  <xs:complexType>
   <xs:sequence>
    <xs:element ref="misc"   maxOccurs="unbounded"/>
   </xs:sequence>
  </xs:complexType>
 </xs:element>
</xs:schema>

Stylesheet (resume.xsl)

<xsl:stylesheet  version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:fo="http://www.w3.org/1999/XSL/Format">

<xsl:include  href="http://www.syntext.com/xslbricks-1.0/fo/fo.xsl"/>

<xsl:strip-space  elements="*"/>

<xsl:template name="heading">
 <fo:block>
  <xsl:value-of  select="local-name(.)"/>
 </fo:block>
</xsl:template>

<xsl:template  match="para">
 <fo:block>
  <xsl:call-template name = "bold.inline"/>
 </fo:block>
</xsl:template>

<xsl:template  match="misc">
 <fo:block>
 <xsl:call-template name="heading"/>
 <xsl:apply-templates/>
 </fo:block>
</xsl:template>

<xsl:template  match="resume">
 <xsl:call-template name="div"/>
</xsl:template>

</xsl:stylesheet>

Example XML

<?xml version='1.0'  encoding='UTF-8'?>
<resume>
 <misc>
  <para>User  Documentation developer</para>
  <para>Strip-downer</para>
 </misc>
</resume>

Conclusion

I seem to have solved my problem, but I still don’t understand why Serna behaves the way it does. Anyway, lesson learned (once again): if a stylesheet is behaving strange, 9 to 10 a namespace is in your way!

[Update!] Conclusion 2

I must have been completely deceived…. now it seems more like that the element definition in the xsd controls the behaviour of Serna:
This definition renders text fields:

<xs:element name="hyperlink"  type="xs:string" minOccurs="0" maxOccurs="1"/>

And omitting the ‘type=”xs:string”’ renders wordprocessor-style inline text:

<xs:element  name="title" minOccurs="0" maxOccurs="1"/>