Web Clipping with a Mashup Script
Web clipping, also sometimes called screen scraping, allows you to treat the HTML from any URL as the result of a service that you can filter, combine or otherwise transform in a mashup. A web clipping mashup uses the <directinvoke> statement to retrieve HTML which the EMML Reference Runtime Engine converts to XHTML, in the http://www.w3.org/1999/xhtml namespace.
Note: the EMML Reference Runtime Engine uses JTidy to convert HTML to XHTML. Because of this conversion, the result may not match the HTML source exactly.
You can then use this XHTML in the mashup script as a service response.
Example
This example uses the results of a Google query as a web clipping service to output specific links:
<mashup name="Ruby"
xmlns="http://www.openemml.org/2009-04-15/EMMLSchema"
xsi:schemaLocation="http://www.openemml.org/2009-04-15/EMMLSchema
../schema/EMMLSpec.xsd"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:res="http://www.myCompany.com/googleQuery"
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<operation name="queryGoogle">
<output name="result" type="document">
<res:queries/>
</output>
<directinvoke outputvariable = "$searchresult"
endpoint="http://www.google.com/search?q=ruby"/>
<foreach variable="$query" items="$searchresult//xhtml:a[@class='l']">
<appendresult outputvariable="$result">
<res:itemlink>{$query/@href}</res:itemlink>
</appendresult>
</foreach>
</operation>
</mashup>
The XML result from this mashup looks something like this:
<?xml version="1.0" encoding="UTF-8"?> <queries xmlns="http://www.myCompany.com/googleQuery"> <itemlink href="http://www.ruby-lang.org/"/> <itemlink href="http://en.wikipedia.org/wiki/Ruby_programming_language"/> <itemlink href="http://en.wikipedia.org/wiki/Ruby"/> <itemlink href="http://www.rubyonrails.org/"/> <itemlink href="http://www.rubycentral.com/"/> <itemlink href="http://www.rubycentral.com/book/"/> <itemlink href="http://www.w3.org/TR/ruby/"/> <itemlink href="http://www.youtube.com/watch?v=JMDcOViViNY"/> <itemlink href="http://www.zenspider.com/Languages/Ruby/QuickRef.html"/> <itemlink href="http://poignantguide.net/"/> ... </queries>
Enterprise Mashup Markup Language (EMML) Documentation is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.
