Wednesday 14 September 2016

Apache Sling Pipes - tool for doing extract - transform - load operations

While working on huge amounts of Enterprise content data, I am sure you guys must have faced scenarios wherein there is a need to modify some part of content with a particular property or some property/node is wrongly edited and you are asked to find all those nodes and correct that property . Similar sort of problems become cumbersome if the amount of data is quite huge. General approach to solve this would be to write either some Groovy script or write some Java code to parse the content and change the necessary nodes. To overcome this redundant issue in content and to make a scalable approach to deal with all these content related issues, Apache came up with Sling Pipes.

What is a pipe

Its a tool where you can load content tree nodes , perform some operation and either Retrieve an output or Modify the nodes. The aim here is to provide reusable blocks called pipes which can be configured for any possible operation on content.

         getOutputBinding
               ^
               |
 getInput  +---+---+   getOutput
           |       |
      +----> Pipe  +---->
           |       |
           +-------+
Install Sling Pipes in AEM
To begin with , you would AEM 6.1 + . Start with the download of the github project fromhere  and build it using
1
mvn clean install -DskipTests
Once the build is successful, upload the jar file : org.apache.sling.pipes-0.0.9-SNAPSHOT.jar in /system/console/bundles. If the bundle doesnt get resolved download the org.apache.sling.query jar from  LINK1 and LINK2 and upload in AEM.
Make sure both the bundles are in Active state .
Basic Pipes
A Pipe is basically a jcr node which has several properties :
  • sling:resourceType= slingPipes/plumber
  • name= to be used as an id and could be a key for output bindings.
  • path= defines pipe’s input . If its not present previous pipes output will be used as input for this pipe.
  • expr= expression through which the pipe will execute
  • additionalBinding= It is a node you can add to set “global” bindings (property=value) in pipe execution
  • additionalScripts= scripts which can be used as expressions.
  • conf=optional child node to add additional configurations.
Reader Pipes :
Pipe Type
Description
Properties
 Base Pipe
Dummy pipe, output=input
sling:resourceType = slingPipes/base
Sling Query Pipe
executes $(getInput()).children(expression)
  •  sling:resourceType = slingPipes/slingQuery
  • expr = mandatory property, contains slingQuery expression through which getInput()’s children will be computed to getOutput()
 JSON Pipe
feeds bindings with remote json
  • sling:resourceType = slingPipes/json
  • expr mandatory property
 MultiPropertyPipe
 iterates through values of input multi value property and write them to bindings
  • sling:resourceType=slingPipes/multiProperty
  • path = path of a mv property
 XPath Pipe
 gets resources from the xpath query.
  • sling:resourceType= slingPipes/xpath
  • expr = xpath query expression
 Authorizable Pipe
 retrieve authorizable resource corresponding to the id passed in expression
  • sling:resourceType = slingPipes/authorizable
  • expr = authorizable id
  • autoCreateGroup (boolean)
  • addMembers (stringified json array)
  • addToGroup (expression)
  • bindMembers (boolean)
 Parent Pipe
 outputs the parent resource of input resource
  • sling:resourceType= slingPipes/parent
Filter Pipe
outputs the input resource if its matches its configuration
  • sling:resourceType = slingPipes/filter
  • conf node
  • slingPipesFilter_not=’true’ inverts the expected result of the filter
Containers:
Pipe Type
Description
Properties
Container Pipe
used to assemble a sequence of pipe.
  • sling:resourceType = slingPipes/container
  • conf node
Reference Pipe
executes the pipe referenced in path property.
  • sling:resourceType = slingPipes/reference
  • path
Writers:
Pipe Type
Description
Properties
Write Pipe
Writes nodes and properties to the input of pipe
  • sling:resourceType = slingPipes/write
  • conf node
Move Pipe
 used to move input to target path.
  •  sling:resourceType=slingPipes/mv
  • expr=target path
Remove Pipe
 removed the input resource (node or prop) and returns the parent.
  •  sling:resourceType=slingPipes/rm
  • conf node
Path Pipe
 get or create path in an expression.
  •  sling:resourceType = slingPipes/path
  • nodeType = node type of the intermediate nodes to create
  • autosave = to save at each creation.
Execute Sling Pipe
Once the configurations are ready , broadly there are 2 ways to execute a pipe. If its just a GET call / read operation you can directly call it from browser or use cURL command. If its a POST call / write operation, you have to use cURL. Besides these basics you can also pass few parameters to execute a pipe. We will see more with examples.
Sample Configurations
There are few configurations at the link which explains how the node structure should be in place to make it work. Besides that you can also check the link to explore more. I have also tried few things to test Sling Pipes. You can also install the below attached package from the link below in AEM 6.1 to do some practice  on Sling Pipes.

 Sling Pipes is apparently quite new in the AEM world , so it you might not get much context about it online. If you need to explore something complex using Sling Pipes, start with basics and then try to build upon the complex scenarios.

No comments :

Post a Comment