# 3.4. Moving Data between Java and R Code¶

If you read the Evaluating R Language Code to this guide you already know how to execute R code from a Java application. In this chapter we will take things a little further and explain how you can move data between Java and R code.

Renjin provides a mapping from R language types to Java objects. To use this mapping effectively you should have at least a basic understanding of R’s object types. The next section provides a short introduction which is essentially a condensed version of the relevant material in the R Language Definition manual. If you are already familiar with R’s object types you can skip this section and head straight to the section Pulling data from R into Java or Pushing data from Java to R.

## 3.4.1. A Java Developer’s Guide to R Objects¶

R has a number of objects types that are referred to as *basic types*. Of these,
we only discuss those that are most frequently encountered by users of R:
vectors, lists, functions, and the `NULL`

object. We also discuss the two common
compound objects in R, namely data frames and factors.

### 3.4.1.1. Attributes¶

Before we discuss these objects, it is important to know that all objects
except the `NULL`

object can have one or more attributes. Common attributes
are the `names`

attribute which contains the element names, the `class`

attribute which stores the name of the class of the object, and the `dim`

attribute and (optionally) its `dimnames`

companion to store the size of each
dimension (and the name of each dimension) of the object. For each object, the
`attributes()`

command will return a list with the attributes and their
values. The value of a specific attribute can be obtained using the `attr()`

function. For example, `attr(x, "class")`

will return the name of the class
of the object (or `NULL`

if the attribute is not defined).

### 3.4.1.2. Vectors¶

There are six basic vector types which are referred to as the *atomic vector
types*. These are:

- logical:
- a boolean value (for example:
`TRUE`

) - integer:
- an integer value (for example:
`1`

) - double:
- a real number (for example:
`1.5`

) - character:
- a character string (for example:
`"foobar"`

) - complex:
- a complex number (for example:
`1+2i`

) - raw:
- uninterpreted bytes (forget about this one)

These vectors have a length and can be indexed using `[`

as the following sample
R session demonstrates:

```
> x <- 2
> length(x)
[1] 1
> y <- c(2, 3)
> y[2]
[1] 3
```

As you can see, even single numbers are vectors with length equal to one.
Vectors in R can have missing values that are represented as `NA`

. Because all
elements in a vector must be of the same type (i.e. logical, double, int, etc.)
there are multiple types of `NA`

. However, the casual R user will generally
not be concerned with the different types for `NA`

.

```
> x <- c(1, NA, 3)
> x
[1] 1 NA 3
> y <- as.character(NA)
> y
[1] NA
> typeof(NA) # default type of NA is logical
[1] "logical"
> typeof(y) # but we have coerced 'y' to a character vector
[1] "character"
```

R’s `typeof()`

function returns the internal type of each object. In the
example above, `y`

is a character vector.

### 3.4.1.3. Factors¶

Factors are one of R’s compound data types. Internally, they are represented by
integer vectors with a `levels`

attribute. The following sample R session
creates such a factor from a character vector:

```
> x <- sample(c("A", "B", "C"), size = 10, replace = TRUE)
> x
[1] "C" "B" "B" "C" "A" "A" "B" "B" "C" "B"
> as.factor(x)
[1] C B B C A A B B C B
Levels: A B C
```

Internally, the factor in this example is stored as an integer vector ```
c(3, 2,
2, 3, 1, 1, 2, 2, 3, 2)
```

which are the indices of the letters in the character
vector `c(A, B, C)`

stored in the `levels`

attribute.

### 3.4.1.4. Lists¶

Lists are R’s go-to structures for representing data structures. They can
contain multiple elements, each of which can be of a different type. Record-like
structures can be created by naming each element in the list. The `lm()`

function, for example, returns a list that contains many details about the
fitted linear model. The following R session shows the difference between a list
and a list with named elements:

```
> l <- list("Jane", 23, c(6, 7, 9, 8))
> l
[[1]]
[1] "Jane"
[[2]]
[1] 23
[[3]]
[1] 6 7 9 8
> l <- list(name = "Jane", age = 23, scores = c(6, 7, 9, 8))
> l
$name
[1] "Jane"
$age
[1] 23
$scores
[1] 6 7 9 8
```

In R, lists are also known as *generic vectors*. They have a length that is
equal to the number of elements in the list.

### 3.4.1.5. Data frames¶

Data frames are one of R’s compound data types. They are lists of vectors, factors and/or matrices, all having the same length. It is one of the most important concepts in statistics and has equivalent implementations in SAS and SPSS.

The following sample R session shows how a data frame is constructed, what its attributes are and that it is indeed a list:

```
> df <- data.frame(x = seq(5), y = runif(5))
> df
x y
1 1 0.8773874
2 2 0.4977048
3 3 0.6719721
4 4 0.2135386
5 5 0.3834681
> class(df)
[1] "data.frame"
> attributes(df)
$names
[1] "x" "y"
$row.names
[1] 1 2 3 4 5
$class
[1] "data.frame"
> is.list(df)
[1] TRUE
```

### 3.4.1.6. Matrices and arrays¶

Besides one-dimensional vectors, R also knows two other classes to represent
array-like data types: `matrix`

and `array`

. A matrix is simply an atomic
vector with a `dim`

attribute that contains a numeric vector of length two:

```
> x <- seq(9)
> class(x)
[1] "integer"
> dim(x) <- c(3, 3)
> class(x)
[1] "matrix"
> x
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
```

Likewise, an array is also a vector with a `dim`

attribute that contains a
numeric vector of length greater than two:

```
> y <- seq(8)
> dim(y) <- c(2,2,2)
> class(y)
[1] "array"
```

The example with the matrix shows that the elements in an array are stored in column-major order which is important to know when we want to access R arrays from a Java application.

Note

In both examples for the `matrix`

and `array`

objects, the `class()`

function derives the class from the fact that the object is an atomic vector
with the `dim`

attribute set. Unlike data frames, these objects do not
have a `class`

attribute.

## 3.4.2. Overview of Renjin’s type system¶

Renjin has corresponding classes for all of the R object types discussed in the
section A Java Developer’s Guide to R Objects. Table
Renjin’s Java classes for common R object types summarizes these object types and their Java
classes. In R, the object type is returned by the `typeof()`

function.

R object type | Renjin class |
---|---|

logical | LogicalVector |

integer | IntVector |

double | DoubleVector |

character | StringVector |

complex | ComplexVector |

raw | RawVector |

list | ListVector |

function | Function |

environment | Environment |

NULL | Null |

There is a certain hierarchy in Renjin’s Java classes for the different object
types in R. Figure Hierarchy in Renjin’s type system gives a full picture of all
classes that make up Renjin’s type system. These classes are contained in the
*org.renjin.sexp* Java package. The vector classes listed in table
Renjin’s Java classes for common R object types are in fact abstract classes that can have
different implementations. For example, the `DoubleArrayVector`

(not shown in
the figure) is an implementation of the `DoubleVector`

abstract class. The
`SEXP`

, `Vector`

, and `AtomicVector`

classes are all Java
interfaces.

Note

Renjin does not have classes for all classes of objects that are know to
(base) R. This includes objects of class `matrix`

and `array`

which are
represented by one of the `AtomicVector`

classes and R’s compound objects
`factor`

and `data.frame`

which are represented by an `IntVector`

and
`ListVector`

respectively.

## 3.4.3. Pulling data from R into Java¶

Now that you have a good understanding of both R’s object types and how these types are mapped to Renjin’s Java classes, we can start by pulling data from R code into our Java application. A typical scenario is one where an R script performs a calculation and the result is pulled into the Java application for further processing.

Using the Renjin Script Engine as introduced in the Evaluating R Language Code, we can
store the result of a calculation from R into a Java object. By default, the
`eval()`

method of `javax.script.ScriptEngine`

returns an
`Object`

, i.e. Java’s object superclass. We can
always cast this result to a `SEXP`

object. The following Java
snippet shows how this is done and how the `Object.getClass()`

and `Class.getName()`

methods can be used to determine the actual class
of the R result:

```
// evaluate Renjin code from String:
SEXP res = (SEXP)engine.eval("a <- 2; b <- 3; a*b");
// print the result to stdout:
System.out.println("The result of a*b is: " + res);
// determine the Java class of the result:
Class objectType = res.getClass();
System.out.println("Java class of 'res' is: " + objectType.getName());
// use the getTypeName() method of the SEXP object to get R's type name:
System.out.println("In R, typeof(res) would give '" + res.getTypeName() + "'");
```

This should write the following to the standard output:

```
The result of a*b is: 6.0
Java class of 'res' is: org.renjin.sexp.DoubleArrayVector
In R, typeof(res) would give 'double'
```

As you can see the `getTypeName`

method of the `SEXP`

class
will return a String object with R’s name for the object type.

Note

Don’t forget to import `org.renjin.sexp.*`

to make Renjin’s type classes
available to your application.

In the example above we could have also cast R’s result to a *DoubleVector*
object:

```
DoubleVector res = (DoubleVector)engine.eval("a <- 2; b <- 3; a*b");
```

or you could cast it to a *Vector*:

```
Vector res = (Vector)engine.eval("a <- 2; b <- 3; a*b");
```

You can’t cast R integer results to a `DoubleVector`

: the following snippet
will throw a `ClassCastException`

:

```
// use R's 'L' suffix to define an integer:
DoubleVector res = (DoubleVector)engine.eval("1L");
```

### 3.4.3.1. Accessing individual elements of vectors¶

Now that we know how to pull R objects into our Java application we want to work with these data types in Java. In this section we show how individual elements of the Vector objects can be accessed in Java.

As you know, each vector type in R, and thus also in Renjin, has a length which
can be obtained with the `length()`

method. Individual elements of a vector
can be obtained with the `getElementAsXXX()`

methods where `XXX`

is one of
`Double`

, `Int`

, `String`

, `Logical`

, and `Complex`

. The following
snippet demonstrates this:

```
Vector x = (Vector)engine.eval("x <- c(6, 7, 8, 9)");
System.out.println("The vector 'x' has length " + x.length());
for (int i = 0; i < x.length(); i++) {
System.out.println("Element x[" + (i + 1) + "] is " + x.getElementAsDouble(i));
}
```

This will write the following to the standard output:

```
The vector 'x' has length 4
Element x[1] is 6.0
Element x[2] is 7.0
Element x[3] is 8.0
Element x[4] is 9.0
```

As we have seen in the Lists section above, lists in R are also known
as *generic vectors*, but accessing the individual elements and their elements
requires a bit more care. If an element (i.e. a vector) of a list has length
equal to one, we can access this element directly using one of the
`getElementAsXXX()`

methods. For example:

```
ListVector x =
(ListVector)engine.eval("x <- list(name = \"Jane\", age = 23, scores = c(6, 7, 8, 9))");
System.out.println("List 'x' has length " + x.length());
// directly access the first (and only) element of the vector 'x$name':
System.out.println("x$name is '" + x.getElementAsString(0) + "'");
```

which will result in:

```
List 'x' has length 3
x$name is 'Jane'
```

being printed to standard output. However, this approach will not work for the
third element of the list as this is a vector with length greater than one.
The preferred approach for lists is to get each element as a `SEXP`

object first and then to handle each of these accordingly. For example:

```
DoubleVector scores = (DoubleVector)x.getElementAsSEXP(2);
```

### 3.4.3.2. Dealing with matrices¶

As described in the section Matrices and arrays above, matrices are
simply vectors with the `dim`

attribute set to an integer vector of length
two. In order to identify a matrix in Renjin, we need to therefore check for
the presence of this attribute and its value. Since any object in R can have
one or more attributes, the `SEXP`

interface defines a number of
methods for dealing with attributes. In particular, `hasAttributes`

will return `true`

if there are any attributes defined in an object and
`getAttributes`

will return these attributes as a
`AttributeMap`

.

```
Vector res = (Vector)engine.eval("matrix(seq(9), nrow = 3)");
if (res.hasAttributes()) {
AttributeMap attributes = res.getAttributes();
Vector dim = attributes.getDim();
if (dim == null) {
System.out.println("Result is a vector of length " +
res.length());
} else {
if (dim.length() == 2) {
System.out.println("Result is a " +
dim.getElementAsInt(0) + "x" +
dim.getElementAsInt(1) + " matrix.");
} else {
System.out.println("Result is an array with " +
dim.length() + " dimensions.");
}
}
}
```

Output:

```
Result is a 3x3 matrix.
```

For convenience, Renjin includes a wrapper class `Matrix`

that provides
easier access to the number of rows and columns.

Example:

```
// required import(s):
import org.renjin.primitives.matrix.*;
Vector res = (Vector)engine.eval("matrix(seq(9), nrow = 3)");
try {
Matrix m = new Matrix(res);
System.out.println("Result is a " + m.getNumRows() + "x"
+ m.getNumCols() + " matrix.");
} catch(IllegalArgumentException e) {
System.out.println("Result is not a matrix: " + e);
}
```

Output:

```
Result is a 3x3 matrix.
```

### 3.4.3.3. Dealing with lists and data frames¶

The `ListVector`

class contains several convenience methods to access
a list’s components from Java. For example, we can the extract the components
from a fitted linear model using the name of the element that contains those
components. For example:

```
ListVector model = (ListVector)engine.eval("x <- 1:10; y <- x*3; lm(y ~ x)");
Vector coefficients = model.getElementAsVector("coefficients");
// same result, but less convenient:
// int i = model.indexOfName("coefficients");
// Vector coefficients = (Vector)model.getElementAsSEXP(i);
System.out.println("intercept = " + coefficients.getElementAsDouble(0));
System.out.println("slope = " + coefficients.getElementAsDouble(1));
```

Output:

```
intercept = -4.4938668397781774E-15
slope = 3.0
```

## 3.4.4. Handling errors generated by the R code¶

Up to now we have been able to execute R code without any concern for possible errors that may occur when the R code is evaluated. There are two common exceptions that may be thrown by the R code:

`ParseException`

: an exception thrown by Renjin’s R parser due to a syntax error and`EvalException`

: an exception thrown by Renjin when the R code generates an error condition, for example by the`stop()`

function.

Here is an example which catches an exception from Renjin’s parser:

```
// required import(s):
import org.renjin.parser.ParseException;
try {
engine.eval("x <- 1 +/ 1");
} catch (ParseException e) {
System.out.println("R script parse error: " + e.getMessage());
}
```

Output:

```
R script parse error: Syntax error at line 1 char 0: syntax error, unexpected '/'
```

And here’s an example which catches an error condition thrown by the R interpreter:

```
// required import(s):
import org.renjin.eval.EvalException;
try {
engine.eval("stop(\"Hello world!\")");
} catch (EvalException e) {
// getCondition() returns the condition as an R list:
Vector condition = (Vector)e.getCondition();
// the first element of the string contains the actual error message:
String msg = condition.getElementAsString(0);
System.out.println("The R script threw an error: " + msg);
}
```

Output:

```
The R script threw an error: Hello world!
```

`EvalException.getCondition()`

is required to pull the condition
message from the R interpreter into Java.

## 3.4.5. Pushing data from Java to R¶

Like many dynamic languages, R scripts are evaluated in the context of an
environment that looks a lot like a dictionary. You can define new variables in
this environment using the `javax.script`

API. This is achieved using
the `ScriptEngine.put()`

method.

Example:

```
engine.put("x", 4);
engine.put("y", new double[] { 1d, 2d, 3d, 4d });
engine.put("z", new DoubleArrayVector(1,2,3,4,5));
engine.put("hashMap", new java.util.HashMap());
// some R magic to print all objects and their class with a for-loop:
engine.eval("for (obj in ls()) { " +
"cmd <- parse(text = paste('typeof(', obj, ')', sep = ''));" +
"cat('type of ', obj, ' is ', eval(cmd), '\\n', sep = '') }");
```

Output:

```
type of hashMap is externalptr
type of x is integer
type of y is double
type of z is double
```

Renjin will implicitly convert primitives, arrays of primitives and
`String`

instances to R objects. Java objects will be wrapped as R
`externalptr`

objects. The example also shows the use of the
`DoubleArrayVector`

constructor to create a double vector in R. You see
that we managed to put a Java `java.util.HashMap`

object into the
global environment of the R session: this is the topic of the chapter
Importing Java classes into R code.