Exploring inside R functions

Exploring R functions

Problem

The R help documentation is great at providing a short introduction to a function, but not so great for a beginner who wants to understand how to use a function, or upon seeing a function for the first time, figuring out exactly what it does.

The most biggest problem for me when I first learned R, was understanding what variable types to pass into a function. Does the function expect a list, a vector, a dataframe? I spent too much time treating a function as black box, and not being sure what to expect back.

Many a time I needed to understand what was happening inside a function, and for that you need to view the code. R makes it easy to do this but you had to know where to look. Sometimes it is buried a couple of layers down. This stymied me for a long time, until recently. This will help you get beneath those layers.

Solution

We begin by looking at a few things about the function, which will help to determine where we need to look for the code. Sometimes, as I just said, we may have to dig a little deeper. As a last resort, we can always go to the R project page, browse till we find the package we need, and download the source code.

For a complete list of all functions in a package – – library(), library(help=“package-name”)

First thing is to see what packages you have installed on your PC. The library() function without any arguments will do this.

library()

Here are the first ten lines on my pc.

base 
boot 
class 
cluster 
codetools 
compiler 
datasets 
foreign 
graphics 
grDevices 

Let's take a closer look at the base package, which contains many of the basic functions used in R. Use library(help="package-name") to list the function names inside a package. For example, here are the first few functions in base:

library(help="base")
any                     Are Some Values True? 
aperm                   Array Transposition 
append                  Vector Merging 
apply                   Apply Functions Over Array Margins 
args                    Argument List of a Function 
array                   Multi-way Arrays 
as.Date                 Date Conversion Functions to and from Character 
as.POSIXct              Date-time Conversion Functions 
as.data.frame           Coerce to a Data Frame as.environment

help(functionanme) or ?functionname)

To view the help documentation for a known function name:

help(functionname)
help(mean) # for example
help("mean") # single or double quotes are optional

Since we are digging inside packages, let me digress for a moment. After you attach a package using the library() function, you might end up with duplicated names. If this happens, you might need to preface the function name with the package name. The last one loaded takes precedence.

For instance, zoo is a common time series package. If we were to attach the package, like so:

library(zoo)

We get this message:

Attaching package: 'zoo'
The following object(s) are masked from 'package:base':
    as.Date, as.Date.numeric

To view the help file for the function you desire, do this.

?zoo::as.Date   # or help(zoo:as.Date)  
?base::as.Date  # or help(base::as.Date

The double colon separates the package name from the function name. I'll say more about packages at a later time.

example()

example(functionname) This is a pretty slick function that will run all the example code embedded in a help file. Let's do this for the function mean.

example(mean)
#
mean> x <- c(0:10, 50)
mean> xm <- mean(x)
mean> c(xm, mean(x, trim = 0.10))
[1] 8.75 5.50

Or rather than running every example for that function blindly, the alternative would be to open the help file, study the examples, and then cut and paste any lines you want to take a closer look at into the console. I do this quite often.

args()

The help file can be very informative, usually, sometimes, well almost sometimes. The help file usually provides a brief description, usage, arguments with additional details, references, and examples. We've seen that examples() is a shortcut to the examples; and args() is a short cut to the argument list. This is helpful if you just want a quick reminder of what they are.

Here are some examples:

args(lm) # to fit a linear model
#
function (formula, data, subset, weights, na.action, method = "qr", 
    model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE, 
    contrasts = NULL, offset, ...) 
#
args(optim) # general purpose optimization function
#
function (par, fn, gr = NULL, ..., method = c("Nelder-Mead", 
    "BFGS", "CG", "L-BFGS-B", "SANN", "Brent"), lower = -Inf, 
    upper = Inf, control = list(), hessian = FALSE) 

To view the code

Sometimes, the args() results or the help file are not quite enough, so the next step would be to look at the source code for that function. I usually do this to get a better understanding of what to pass into a function, how a function uses its arguments, and what to expect for what gets returned. To view the code, simply type in the function name without any parantheses.

# source code for function array()
array
function (data = NA, dim = length(data), dimnames = NULL) 
{
    data <- as.vector(data)
    dim <- as.integer(dim)
    vl <- prod(dim)
    if (length(data) != vl) {
        if (vl > .Machine$integer.max) 
            stop("'dim' specifies too large an array")
        data <- rep(data, length.out = vl)
    }
    if (length(dim)) 
        dim(data) <- dim
    if (is.list(dimnames) && length(dimnames)) 
        dimnames(data) <- dimnames
    data
}
<bytecode: 0x4f46a48>
<environment: namespace:base>

Straight forward enough, but alas this is not always so.

And that is where the rabbit hole begins.

mean() example

Here is an example that tells us very little.

args(mean)
## function (x, ...) 
## NULL

So let's look at the source code:

mean
## function (x, ...) 
## UseMethod("mean")
## <bytecode: 0x5430a98>
## <environment: namespace:base>

For a beginner, this is not that much more helpful. But it told us that there is an object definition somewhere. The help file for mean said that there was an S3 method with a different argument list. So let's try: methods(mean). This will list out the methods (ie. functions) defined for that object.

methods(mean)
## [1] mean.data.frame mean.Date       mean.default    mean.difftime  
## [5] mean.POSIXct    mean.POSIXlt    mean.yearmon*   mean.yearqtr*  
## 
##    Non-visible functions are asterisked

There is a function named mean.default', and so let's look at it:

mean.default
## function (x, trim = 0, na.rm = FALSE, ...) 
## {
##     if (!is.numeric(x) && !is.complex(x) && !is.logical(x)) {
##         warning("argument is not numeric or logical: returning NA")
##         return(NA_real_)
##     }
##     if (na.rm) 
##         x <- x[!is.na(x)]
##     if (!is.numeric(trim) || length(trim) != 1L) 
##         stop("'trim' must be numeric of length one")
##     n <- length(x)
##     if (trim > 0 && n) {
##         if (is.complex(x)) 
##             stop("trimmed means are not defined for complex data")
##         if (any(is.na(x))) 
##             return(NA_real_)
##         if (trim >= 0.5) 
##             return(stats::median(x, na.rm = FALSE))
##         lo <- floor(n * trim) + 1
##         hi <- n + 1 - lo
##         x <- sort.int(x, partial = unique(c(lo, hi)))[lo:hi]
##     }
##     .Internal(mean(x))
## }
## <bytecode: 0x543b0a0>
## <environment: namespace:base>

Now we can see what kind of variables may be passed into the function mean, in this case x must be numeric, complex or logical. We can also see how it handles missing values, and what the trim argument does. The last line shows that after all the preprocessing, the vector is finally passed to an internal function written in C/C++. Along with the help file, maybe we can get a better understanding.

median() example

However the help file doesn't always tell us everything. If we look at help(median), the help file looks pretty simple. args(median) lists the same arguments as the help file, but if we type median at the prompt:

median
## function (x, na.rm = FALSE) 
## UseMethod("median")
## <bytecode: 0x47eb5d0>
## <environment: namespace:stats>

We get back the shortened function definition. But it does say UseMethods("median"), so let's try:

methods(median)
## [1] median.default

and we see that there is a median.default function. If we try to list the code for that, we get:

median.default
## function (x, na.rm = FALSE) 
## {
##     if (is.factor(x) || is.data.frame(x)) 
##         stop("need numeric data")
##     if (length(names(x))) 
##         names(x) <- NULL
##     if (na.rm) 
##         x <- x[!is.na(x)]
##     else if (any(is.na(x))) 
##         return(x[FALSE][NA])
##     n <- length(x)
##     if (n == 0L) 
##         return(x[FALSE][NA])
##     half <- (n + 1L)%/%2L
##     if (n%%2L == 1L) 
##         sort(x, partial = half)[half]
##     else mean(sort(x, partial = half + 0L:1L)[half + 0L:1L])
## }
## <bytecode: 0x490f358>
## <environment: namespace:stats>

Invisible functions and :::

Lastly, the methods() function may display some function names and a message that states “Non-visible functions are asterisked”. We get an error message if we try to view the source code by just typing the function name with no parentheses.

However, we can use three colons to view this source code. If we can figure out the package name, for instance, from the package zoo, methods(mean) included mean.yearqtr* as one of the methods. At the prompt, type zoo:::mean.yearqtr to get at the code in the invisible function.

zoo:::mean.yearqtr
## function (x, ...) 
## as.yearqtr(mean(unclass(x), ...))
## <environment: namespace:zoo>

PLEASE NOTE

You shouldn't have to refer directly to these hidden functions. For example, just use median(), or mean(), or whatever, as described in the help file. The object knows when the alternate method needs to be used.

Our original purpose was merely to better to understand how a function uses its arguments, which variable types are used, what preprocessing is performed, and what is returned.

IF ALL ELSE FAILS

Download the source code for the package. Go to the project page for that package and look for the zip or gzip file. Download it, extract it, and browse the code.

Hope this helps. Next time, how to inspect objects.

ETF Choices

ETF Choices

These are the ETFs that I am going to focus on for now.














Ticker Family Name
1 EFA ishares MSCI EAFE Index Fund
2 EWJ ishares MSCI Japan Index Fund
3 GLD statestreet SPDR Gold Shares
4 IEF ishares Barclays 7-10 Year Treasury Bond Fund
5 IWB ishares Russell 1000 Index Fund
6 IWM ishares Russell 2000 Index Fund
7 RWX statestreet SPDR Dow Jones International Real Estate ETF
8 TLT ishares Barclays 20+ Year Treasury Bond Fund
9 VNQ vanguard REIT ETF
10 VWO vanguard MSCI Emerging Markets ETF
11 SLV ishares Silver Trust

The US stocks segment is split into large cap stocks based on the Russell 1000 index (IWB) and small to medium cap stocks based on the Russell 2000 (IWM) index.

Foreign stocks are split into Europe, Asia and Far East (EFA), Japan (EWJ), and emerging markets (VWO).

Bonds are split into mid-term bonds (IEF) and longer term bonds (TLT). Later I will consider whether to use short term bonds (SHY), or a money market account to hold cash.

Real estate is split into US REITs (VNQ) and international REITs (RWX).

Commodities are split into physical gold (GLD) and physical silver (SLV). I'll have more to say later about alternatives, for example exchange traded notes that track a commodities based index, such as DBC.

My Reasoning For Choosing These ETFs

Each of the equity and bond ETFs are based on broad indexes, which provides diversification within that asset type. The exception being GLD and SLV, which concentrate on gold bullion and silver bullion. Each ETF covers a different portion of the economy and hopefully react differently and at different times to changing economic conditions. Therby giving me a second layer of diversification, ie across asset types.

I'll dig deeper into the math of why this is beneficial next time.

Disclaimer

I own EFA, VWO, GLD, SLV, RWX, VNQ, IEF, SHY at this time, and could buy and sell at any time.