Friday, June 12, 2015

Expression-oriented programming in Groovy: transpose() is zip()

The term "expression-oriented programming", as mentioned in these blog posts, resonated with me:

Groovy is by no means a purely functional language, but it does include a lot of the basics. Moreover, it includes a few other goodies that make it really nice for expression-oriented programming.

Here's a set of posts about some of these constructs.

One oddity about Groovy's support for functional programming is that Groovy chooses unusual names for some common functions from functional programming.

I believe this is because Groovy chose names from its object-oriented heritage (SmallTalk), rather than from the functional-programming canon.

For example:

  • map() is called collect() in Groovy
  • fold() or reduce() are inject() in Groovy
  • filter() is findAll() in Groovy

One of the most obscurely-named such methods in Groovy is transpose(), and as a result it's easily overlooked.

A very useful function in functional programming languages such as Haskell and Scala is zip(), which is used to combine corresponding elements from more than one collection.

Suppose we have two lists, a and b, containing numbers. And we want to find the maximum from each pair of corresponding numbers from these lists.

Groovy provides a lot of nice methods for working with a single list. But faced with two lists to be traversed together, many would revert to old Java-style code:

def a = [5, 10, 15, 20, 25]
def b = [20, 16, 12, 8, 4]

def r1 = []
for (i in 0..<a.size()) {
  r1[i] = Math.max(a[i], b[i])
}

assert r1 == [20, 16, 15, 20, 25]

We could try to use one of Groovy's iteration methods, eachWithIndex(), but the result is hardly better:

def r2 = []
a.eachWithIndex {v, i ->
  r2[i] = Math.max(v, b[i])
}

assert r2 == [20, 16, 15, 20, 25]

The answer is to use transpose():

def r3 = [a, b].transpose().collect {v, w -> Math.max(v, w)}

assert r3 == [20, 16, 15, 20, 25]

It's a little tricky until you get the hang of it: transpose is called on a list of lists, and it returns a new list of lists. Each list in the new list contains all of the elements at the same position in the original lists.

Actually, for built-in methods like max() that Groovy defines on collections, we can use the spread operator:

def r4 = [a, b].transpose()*.max()

assert r4 == [20, 16, 15, 20, 25]

Suppose instead of taking the max of two items we wanted the sum:

def r5 = [a, b].transpose()*.sum()

assert r5 == [25, 26, 27, 28, 29]

transpose() is also nice because it generalizes nicely beyond the case of just two lists.

def c = [3, 6, 9, 12, 15]

def r6 = [a, b, c].transpose()*.sum()

assert r6 == [28, 32, 36, 40, 44]

Of course, our lists do not have to be of the same type.

Here's an example where one list contains strings and another lengths, and we want to pad each string to the corresponding length:

def strings = ["one", "two", "three", "four", "five", "six"]
def lens = [1, 2, 3, 4, 5, 6]
def r7 = [strings, lens].transpose().collect {item, len -> item.padRight(len)}

assert r7 == ["one", "two", "three", "four", "five ", "six   "]

Java 8 added lambdas and the Streams API, which permit many functional idioms. A zip() method was originally included in the Java 8 SDK previews, but unfortunately was removed before release. Never mind, we have it in Groovy!

So despite its unconventional name, keep transpose() in mind when working with lists. And if you have any interesting usages yourself, post them (or links) in the comments!

Monday, June 1, 2015

Gradle version selector incompatability

We had some interesting issues with Gradle this week.

We build a number of internal projects with Gradle, and some of these projects have interdependencies. The dependency graph of our internal projects extends to several levels.

To illustrate, this diagram shows three projects, with "server" depending on "common", and "client" depending on "server".

For internal dependencies, we use Gradle's "changing module" version selectors: "latest.integration" and "latest.release".

The problem occurred because Gradle 2.3 changed the way these version selectors are written to a published pom.xml dependency section.

Prior versions wrote the version selectors "as-is", e.g. "latest.integration". I believe that this convention originated with Ivy. But this is not a valid Maven version. So Gradle 2.3 changed to write a valid Maven version such as "LATEST" or "RELEASE".

A great post explaining the options available with Maven, and some of the pros and cons, is this on one StackOverflow.

Generally speaking, it's best to use specific version numbers in dependencies for released artifacts, for repeatable builds. But it's also desirable to have changing or dynamic dependencies for snapshot or integration builds, for continuous integration and testing.

The problem is that older versions of Gradle don't understand these "new" values ("LATEST" and "RELEASE").

Let's show this using the sample projects above. For the purpose of this illustration, we'll use a common init.gradle shared by each project, with contents:

def homeDir = System.getProperty("user.home")
def repoUrl = "file:///$homeDir/tmp/repo"

allprojects {
  apply plugin: "java"
  apply plugin: "maven"

  group = "example"

  uploadArchives {
    repositories {
      mavenDeployer {
        repository(url: repoUrl)
      }
    }
  }

  repositories {
    maven {
      url repoUrl
    }
  }
}

We'll use this init.gradle for every build in these examples, using this alias:

alias mygradle="./gradlew -I../init.gradle"

For the "common" project, we'll have these files:

➜ common git:(master) ✗ tree
.
├── build.gradle
├── gradle
│   └── wrapper
│       ├── gradle-wrapper.jar
│       └── gradle-wrapper.properties
├── gradlew
├── gradlew.bat
└── src
    └── main
        └── java
            └── example
                └── Common.java

In build.gradle we have:

wrapper {
  gradleVersion = "2.2.1"
}

version = "01.00"

We can build "common" like this:

➜  common git:(master) ✗ mygradle uploadArchives
:compileJava
:processResources UP-TO-DATE
:classes
:jar
:uploadArchives
Uploading: example/common/01.00/common-01.00.jar to repository remote at file:////Users/jhurst/tmp/repo
Transferring 1K from remote
Uploaded 1K

BUILD SUCCESSFUL

The generated pom.xml is not very interesting:

<project 
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd" 
  xmlns="http://maven.apache.org/POM/4.0.0"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <modelVersion>4.0.0</modelVersion>
  <groupId>example</groupId>
  <artifactId>common</artifactId>
  <version>01.00</version>
</project>

Now we look at "server". The files are:

 
➜  server git:(master) ✗ tree 
. 
├── build.gradle 
├── gradle 
│   └── wrapper 
│       ├── gradle-wrapper.jar 
│       └── gradle-wrapper.properties 
├── gradlew 
├── gradlew.bat 
└── src 
    └── main 
        └── java 
            └── example 
                └── Server.java 

The server project uses Gradle 2.4, and declares a dependency on "common" in its build.gradle:

 
wrapper { 
  gradleVersion = "2.4" 
} 
 
dependencies { 
  compile "example:common:latest.integration" 
} 
 
version = "01.00" 

We build "server":

 
➜  server git:(master) ✗ mygradle uploadArchives 
:compileJava UP-TO-DATE 
:processResources UP-TO-DATE 
:classes UP-TO-DATE 
:jar UP-TO-DATE 
:uploadArchives 
 
BUILD SUCCESSFUL 

Now we have a dependency in the generated pom.xml:

 
<project 
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd" 
  xmlns="http://maven.apache.org/POM/4.0.0"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <modelVersion>4.0.0</modelVersion>
  <groupId>example</groupId>
  <artifactId>server</artifactId>
  <version>01.00</version>
  <dependencies>
    <dependency>
      <groupId>example</groupId>
      <artifactId>common</artifactId>
      <version>LATEST</version>
      <scope>compile</scope>
    </dependency>
  </dependencies>
</project>

This dependency is specified using the new, Maven-compatible version selector.

Now we go to "client":

 
➜  client git:(master) ✗ tree 
. 
├── build.gradle 
├── gradle 
│   └── wrapper 
│       ├── gradle-wrapper.jar 
│       └── gradle-wrapper.properties 
├── gradlew 
├── gradlew.bat 
└── src 
    └── main 
        └── java 
            └── example 
                └── Client.java 

The client project uses Gradle 2.2.1, and declares a dependency on "server" in its build.gradle:

 
wrapper { 
  gradleVersion = "2.2.1" 
} 
 
dependencies { 
  compile "example:server:latest.integration" 
} 
 
version = "01.00" 

We attempt to build "client":

 
➜  client git:(master) ✗ mygradle uploadArchives 
:compileJava 
 
FAILURE: Build failed with an exception. 
 
* What went wrong: 
Could not resolve all dependencies for configuration ':compile'. 
> Could not find example:common:LATEST. 
  Searched in the following locations: 
      file:/Users/jhurst/tmp/repo/example/common/LATEST/common-LATEST.pom 
      file:/Users/jhurst/tmp/repo/example/common/LATEST/common-LATEST.jar 
  Required by: 
      example:client:01.00 > example:server:01.00 
 
* Try: 
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. 
 
BUILD FAILED 

The problem is that Gradle 2.2.1 does not correctly interpret "LATEST" as a changing module version selector.

It's easily fixed - we simply upgrade "client" to Gradle 2.3 or later.

You might think this is a really trivial problem. It is, once it's clear what is going on. We found it a bit confusing at first because we didn't know where the "LATEST" string in the dependency error message was coming from.

There is a further difficulty caused by this change if you use Groovy's Grapes feature in Groovy scripts to fetch dependencies. Grapes uses Ivy to resolve dependencies, and it does not understand this LATEST/RELEASE syntax in POM files either.

To show this, let's use a ivysettings.xml file as follows:

 
<ivysettings>
  <resolvers>
    <ibiblio 
      name="downloadGrapes" 
      m2compatible="true" 
      root="file:///Users/jhurst/tmp/repo"/>
  </resolvers>
  <settings defaultResolver="downloadGrapes"/>
</ivysettings>

We configure Groovy to use this using the grape.config system property:

 
export JAVA_OPTS="-Dgrape.config=$PWD/ivysettings.xml" 

Let's have a Groovy script that has a dependency on the "common" module:

 
@Grab("example:common:01.00") 
import example.Common 
 
println Common.simpleName 

When we run this, it fetches the dependency and runs successfully:

 
➜  groovy git:(master) ✗ groovy ./grabcommon.groovy 
Common 

Let's try another Groovy script that has a dependency on "server" instead:

 
@Grab("example:server:01.00") 
import example.Server 
 
println Server.simpleName 

When we run this, we get a similar failure to that with Gradle earlier:

 
➜  groovy git:(master) ✗ groovy ./grabserver.groovy 
org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed: 
General error during conversion: Error grabbing Grapes -- 
  [unresolved dependency: example#common;LATEST: not found] 
 
java.lang.RuntimeException: Error grabbing Grapes -- 
  [unresolved dependency: example#common;LATEST: not found] 
... 

This one is not so easy to solve.

Gradle originally used Ivy's dependency resolution code, but then switched to using its own code. Groovy's Grapes feature still uses Ivy.

We need either to improve Ivy to support this syntax in POM files, or else perhaps it would be better to get Groovy Grapes to use Gradle's dependency resolution code instead of Ivy. But given that Grapes is configured using an ivysettings.xml, and Gradle does not provide any analogous way to tell a Groovy script where to look for dependencies, it is not obvious how we would switch Groovy to use Gradle's code. Besides, Groovy needs to continue to support ivysettings.xml and all Ivy features, for backwards compatibility.

A colleague of mine pointed out that one solution is to use the long form of @Grab, with transitive = false, like this:

@Grab(group = "example", module = "server", version = "01.00", transitive = false)
@Grab("example:common:01.00")
import example.Common
import example.Server

println Server.simpleName
println Common.simpleName

This works, but it excludes all of the transitive dependencies. If you have a lot of third party dependencies and need this exclusion only for your local modules, it's not that great.