Tuesday, December 11, 2012

OpenERP steps for Pentaho Kettle/PDI

Pentaho Kettle / PDI is a powerful tool to transfer data between systems.

If you're running OpenERP in together with other applications, and need them to share data, this should be the first place to look at. Data is extracted using "input steps", transformed and then loaded using "output steps".

Kettle has specialized steps for OpenERP, that interact with the OpenERP server in the proper way - through the XML-RPC API. Unfortunately these steps are not yet available in the stable version.

You try them out using the latest development version. The development versions, however, are not well suited for usage in a production environment. Wouldn't it be nice if you could install these OpenERP steps in a Kettle stable version?

Well, I gave a try at this, and got them working with the 4.3.0-stable version. Here's the recipe:

Download the trunk version and copy the following files to the stable version:
  • from \data-integration\plugins\kettle-openerp-plugin\*, and 
  • from \data-integration\libext\webservices, the files:
    • ws-commons-util-1.0.2.jar 
    • xmlrpc-client-3.1.3.jar, and  
    • xmlrpc-common-3.1.3.jar
 Now (re)start Spoon and a "!BaseStep.Category.OpenERP!" will be available containing three new steps: Delete, Input and Output.

Note that for some reason the "Test" button in the "Database Connection" dialog doesn't work correctly: it reports an error when testing the OpenERP server connection, even though the connection is correct. I believe that this is a bug that has been fixed in later development branches.

A big advantage for using the OpenERP API instead of the direct access to the database tables is the requests go through the ORM, so you get improved data consistency and take advantage of the ORM's create() and write() logic, and model default values.

On the downside, these steps don't handle XML Ids, only database (integer) ids, so you need to set your own strategy to identify records, such as using "reference" or "code" fields. If your record identification relies on XML Ids,  then import_data() base tools, such as import_sqlpush, may still be a good option.


  1. I have had to alter your procedure considerably in order to achieve the same result with PDI 4.4.0-stable under Ubuntu 12.04 LTS.

    Like you, I get the latest snapshot from here: http://ci.pentaho.com/view/Data%20Integration/job/Kettle/

    From it I was able to obtain only the directory "kettle-openerp-plugin" and the jar "xmlrpc-common-3.1.3.jar". The latter was in a directory "lib", rather than "libext", as you indicate.

    In order to get the others I had to use http://www.findjar.com/. I got:

    With those in place I get as far as defining a connection to my server and specifying a model name. At the moment of exiting the "Model Name" field I am hit with:
    NoClassDefFoundError: com/debortoliwines/openerp/api/RowCollection

    It turns out that the corresponding jar is no longer in the directory "kettle-openerp-plugin". I found a jar "/lib/openerp-java-api-1.0.0.jar" and put that in my "libext" directory.

    With that done I am at last able to attempt to work with plugin without it failing.

    Unfortunately, I still have not been able to do anything useful, due to a reported error in my data.

    Unexpected error
    Failed to commit batch:
    Unknown database identifier '0'
    Row :{}

    I have no idea what that could mean.

  2. I also experienced a failure when trying to test the connection OpenERP. I fixed it by copying (not moving) the jar "kettle-openerp-plugin-TRUNK-SNAPSHOT.jar" from the directory "plugins/kettle-openerp-plugin/" to the directory "libext". Apparently the class loader for connections testing doesn't look in the classpath of the corresponding plugin.

  3. @MHJBramwell Thanks for sharing.

  4. Do you have any idea how I can get past this . . . ?

    Unknown database identifier '0'
    Row :{}

    I happens after launching the transformation.

  5. It seems to be that the plugin is for OpenERP V6.1 and earlier.

  6. You can implement the following fix to orm.py in V7 for the plugin to work:

    --- openerp/osv/orm.py.orig 2013-03-20 22:21:17.892445229 +1100
    +++ openerp/osv/orm.py 2013-03-20 22:05:12.474594534 +1100
    @@ -1499,7 +1499,9 @@ class BaseModel(object):
    except ValueError:
    # in case of overridden id column
    dbid = record['.id']
    - if not self.search(cr, uid, [('id', '=', dbid)], context=context):
    + if dbid == 0:
    + dbid = False
    + elif not self.search(cr, uid, [('id', '=', dbid)], context=context):

    I couldn't get the indentation to work in this comment. You can download the same diff file with better indentation from here

  7. Thank you! I will look into it when I can. Lacking a way forward with Pentaho I've instead made my own explicit Google Spreadsheet to OpenERP V7 data pump.


    Any comments greatly appreciated.

  8. Hy,

    Thanks for your work. I would like to add some little comments :

    First, I read carefully the link here : http://wiki.pentaho.com/display/EAI/OpenERP+Object+Output
    It is written : If no key value fields are specified, all incoming records are treated as new.
    Unfortunately, if I don't use any key values fields, I have this issue : http://pastebin.com/k29yzgkZ

    On a second part, if I want to precise a company_id it's not possible. So for multiple_company environment, how will it work?

  9. I made some little proposals for improvements :
    * http://jira.pentaho.com/browse/PDI-9816
    * http://jira.pentaho.com/browse/PDI-9814
    * http://jira.pentaho.com/browse/PDI-9813

    Hope it could help

  10. Joaquin: you can find in the latest development build, at http://ci.pentaho.com/view/Data%20Integration/job/Kettle