Tuesday, December 11, 2012

OpenERP steps for Pentaho Kettle/PDI

Pentaho Kettle / PDI is a powerful tool to transfer data between systems.

If you're running OpenERP in together with other applications, and need them to share data, this should be the first place to look at. Data is extracted using "input steps", transformed and then loaded using "output steps".

Kettle has specialized steps for OpenERP, that interact with the OpenERP server in the proper way - through the XML-RPC API. Unfortunately these steps are not yet available in the stable version.

You try them out using the latest development version. The development versions, however, are not well suited for usage in a production environment. Wouldn't it be nice if you could install these OpenERP steps in a Kettle stable version?

Well, I gave a try at this, and got them working with the 4.3.0-stable version. Here's the recipe:

Download the trunk version and copy the following files to the stable version:
  • from \data-integration\plugins\kettle-openerp-plugin\*, and 
  • from \data-integration\libext\webservices, the files:
    • ws-commons-util-1.0.2.jar 
    • xmlrpc-client-3.1.3.jar, and  
    • xmlrpc-common-3.1.3.jar
 Now (re)start Spoon and a "!BaseStep.Category.OpenERP!" will be available containing three new steps: Delete, Input and Output.

Note that for some reason the "Test" button in the "Database Connection" dialog doesn't work correctly: it reports an error when testing the OpenERP server connection, even though the connection is correct. I believe that this is a bug that has been fixed in later development branches.

A big advantage for using the OpenERP API instead of the direct access to the database tables is the requests go through the ORM, so you get improved data consistency and take advantage of the ORM's create() and write() logic, and model default values.

On the downside, these steps don't handle XML Ids, only database (integer) ids, so you need to set your own strategy to identify records, such as using "reference" or "code" fields. If your record identification relies on XML Ids,  then import_data() base tools, such as import_sqlpush, may still be a good option.