R Style Sheets

Posted on January 23, 2013 by charlesS

R Style Sheets

I, by no means, have the definitive take on how R should be written. These are just a collection of other's suggested best practices. Pick one that suits you.

Exploring inside R functions

Posted on December 10, 2012 by charlesS

Exploring R functions

Problem

The R help documentation is great at providing a short introduction to a function, but not so great for a beginner who wants to understand how to use a function, or upon seeing a function for the first time, figuring out exactly what it does.

The most biggest problem for me when I first learned R, was understanding what variable types to pass into a function. Does the function expect a list, a vector, a dataframe? I spent too much time treating a function as black box, and not being sure what to expect back.

Many a time I needed to understand what was happening inside a function, and for that you need to view the code. R makes it easy to do this but you had to know where to look. Sometimes it is buried a couple of layers down. This stymied me for a long time, until recently. This will help you get beneath those layers.

Solution

We begin by looking at a few things about the function, which will help to determine where we need to look for the code. Sometimes, as I just said, we may have to dig a little deeper. As a last resort, we can always go to the R project page, browse till we find the package we need, and download the source code.

For a complete list of all functions in a package – – library(), library(help=“package-name”)

First thing is to see what packages you have installed on your PC. The library() function without any arguments will do this.

library()

Here are the first ten lines on my pc.

base 
boot 
class 
cluster 
codetools 
compiler 
datasets 
foreign 
graphics 
grDevices

Let's take a closer look at the base package, which contains many of the basic functions used in R. Use library(help="package-name") to list the function names inside a package. For example, here are the first few functions in base:

library(help="base")

any                     Are Some Values True? 
aperm                   Array Transposition 
append                  Vector Merging 
apply                   Apply Functions Over Array Margins 
args                    Argument List of a Function 
array                   Multi-way Arrays 
as.Date                 Date Conversion Functions to and from Character 
as.POSIXct              Date-time Conversion Functions 
as.data.frame           Coerce to a Data Frame as.environment

help(functionanme) or ?functionname)

To view the help documentation for a known function name:

help(functionname)
help(mean) # for example
help("mean") # single or double quotes are optional

Since we are digging inside packages, let me digress for a moment. After you attach a package using the library() function, you might end up with duplicated names. If this happens, you might need to preface the function name with the package name. The last one loaded takes precedence.

For instance, zoo is a common time series package. If we were to attach the package, like so:

library(zoo)

We get this message:

Attaching package: 'zoo'
The following object(s) are masked from 'package:base':
    as.Date, as.Date.numeric

To view the help file for the function you desire, do this.

?zoo::as.Date   # or help(zoo:as.Date)  
?base::as.Date  # or help(base::as.Date

The double colon separates the package name from the function name. I'll say more about packages at a later time.

example()

example(functionname) This is a pretty slick function that will run all the example code embedded in a help file. Let's do this for the function mean.

example(mean)
#
mean> x <- c(0:10, 50)
mean> xm <- mean(x)
mean> c(xm, mean(x, trim = 0.10))
[1] 8.75 5.50

Or rather than running every example for that function blindly, the alternative would be to open the help file, study the examples, and then cut and paste any lines you want to take a closer look at into the console. I do this quite often.

args()

The help file can be very informative, usually, sometimes, well almost sometimes. The help file usually provides a brief description, usage, arguments with additional details, references, and examples. We've seen that examples() is a shortcut to the examples; and args() is a short cut to the argument list. This is helpful if you just want a quick reminder of what they are.

Here are some examples:

args(lm) # to fit a linear model
#
function (formula, data, subset, weights, na.action, method = "qr", 
    model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE, 
    contrasts = NULL, offset, ...) 
#
args(optim) # general purpose optimization function
#
function (par, fn, gr = NULL, ..., method = c("Nelder-Mead", 
    "BFGS", "CG", "L-BFGS-B", "SANN", "Brent"), lower = -Inf, 
    upper = Inf, control = list(), hessian = FALSE)

To view the code

Sometimes, the args() results or the help file are not quite enough, so the next step would be to look at the source code for that function. I usually do this to get a better understanding of what to pass into a function, how a function uses its arguments, and what to expect for what gets returned. To view the code, simply type in the function name without any parantheses.

# source code for function array()
array

function (data = NA, dim = length(data), dimnames = NULL) 
{
    data <- as.vector(data)
    dim <- as.integer(dim)
    vl <- prod(dim)
    if (length(data) != vl) {
        if (vl > .Machine$integer.max) 
            stop("'dim' specifies too large an array")
        data <- rep(data, length.out = vl)
    }
    if (length(dim)) 
        dim(data) <- dim
    if (is.list(dimnames) && length(dimnames)) 
        dimnames(data) <- dimnames
    data
}
<bytecode: 0x4f46a48>
<environment: namespace:base>

Straight forward enough, but alas this is not always so.

And that is where the rabbit hole begins.

mean() example

Here is an example that tells us very little.

args(mean)

## function (x, ...) 
## NULL

So let's look at the source code:

mean

## function (x, ...) 
## UseMethod("mean")
## <bytecode: 0x5430a98>
## <environment: namespace:base>

For a beginner, this is not that much more helpful. But it told us that there is an object definition somewhere. The help file for mean said that there was an S3 method with a different argument list. So let's try: methods(mean). This will list out the methods (ie. functions) defined for that object.

methods(mean)

## [1] mean.data.frame mean.Date       mean.default    mean.difftime  
## [5] mean.POSIXct    mean.POSIXlt    mean.yearmon*   mean.yearqtr*  
## 
##    Non-visible functions are asterisked

There is a function named mean.default', and so let's look at it:

mean.default

## function (x, trim = 0, na.rm = FALSE, ...) 
## {
##     if (!is.numeric(x) && !is.complex(x) && !is.logical(x)) {
##         warning("argument is not numeric or logical: returning NA")
##         return(NA_real_)
##     }
##     if (na.rm) 
##         x <- x[!is.na(x)]
##     if (!is.numeric(trim) || length(trim) != 1L) 
##         stop("'trim' must be numeric of length one")
##     n <- length(x)
##     if (trim > 0 && n) {
##         if (is.complex(x)) 
##             stop("trimmed means are not defined for complex data")
##         if (any(is.na(x))) 
##             return(NA_real_)
##         if (trim >= 0.5) 
##             return(stats::median(x, na.rm = FALSE))
##         lo <- floor(n * trim) + 1
##         hi <- n + 1 - lo
##         x <- sort.int(x, partial = unique(c(lo, hi)))[lo:hi]
##     }
##     .Internal(mean(x))
## }
## <bytecode: 0x543b0a0>
## <environment: namespace:base>

Now we can see what kind of variables may be passed into the function mean, in this case x must be numeric, complex or logical. We can also see how it handles missing values, and what the trim argument does. The last line shows that after all the preprocessing, the vector is finally passed to an internal function written in C/C++. Along with the help file, maybe we can get a better understanding.

median() example

However the help file doesn't always tell us everything. If we look at help(median), the help file looks pretty simple. args(median) lists the same arguments as the help file, but if we type median at the prompt:

median

## function (x, na.rm = FALSE) 
## UseMethod("median")
## <bytecode: 0x47eb5d0>
## <environment: namespace:stats>

We get back the shortened function definition. But it does say UseMethods("median"), so let's try:

methods(median)

## [1] median.default

and we see that there is a median.default function. If we try to list the code for that, we get:

median.default

## function (x, na.rm = FALSE) 
## {
##     if (is.factor(x) || is.data.frame(x)) 
##         stop("need numeric data")
##     if (length(names(x))) 
##         names(x) <- NULL
##     if (na.rm) 
##         x <- x[!is.na(x)]
##     else if (any(is.na(x))) 
##         return(x[FALSE][NA])
##     n <- length(x)
##     if (n == 0L) 
##         return(x[FALSE][NA])
##     half <- (n + 1L)%/%2L
##     if (n%%2L == 1L) 
##         sort(x, partial = half)[half]
##     else mean(sort(x, partial = half + 0L:1L)[half + 0L:1L])
## }
## <bytecode: 0x490f358>
## <environment: namespace:stats>

Invisible functions and :::

Lastly, the methods() function may display some function names and a message that states “Non-visible functions are asterisked”. We get an error message if we try to view the source code by just typing the function name with no parentheses.

However, we can use three colons to view this source code. If we can figure out the package name, for instance, from the package zoo, methods(mean) included mean.yearqtr* as one of the methods. At the prompt, type zoo:::mean.yearqtr to get at the code in the invisible function.

zoo:::mean.yearqtr

## function (x, ...) 
## as.yearqtr(mean(unclass(x), ...))
## <environment: namespace:zoo>

PLEASE NOTE

You shouldn't have to refer directly to these hidden functions. For example, just use median(), or mean(), or whatever, as described in the help file. The object knows when the alternate method needs to be used.

Our original purpose was merely to better to understand how a function uses its arguments, which variable types are used, what preprocessing is performed, and what is returned.

IF ALL ELSE FAILS

Download the source code for the package. Go to the project page for that package and look for the zip or gzip file. Download it, extract it, and browse the code.

Hope this helps. Next time, how to inspect objects.

ETF Choices

Posted on December 4, 2012 by charlesS

ETF Choices

These are the ETFs that I am going to focus on for now.

	Ticker	Family	Name
1	EFA	ishares	MSCI EAFE Index Fund
2	EWJ	ishares	MSCI Japan Index Fund
3	GLD	statestreet	SPDR Gold Shares
4	IEF	ishares	Barclays 7-10 Year Treasury Bond Fund
5	IWB	ishares	Russell 1000 Index Fund
6	IWM	ishares	Russell 2000 Index Fund
7	RWX	statestreet	SPDR Dow Jones International Real Estate ETF
8	TLT	ishares	Barclays 20+ Year Treasury Bond Fund
9	VNQ	vanguard	REIT ETF
10	VWO	vanguard	MSCI Emerging Markets ETF
11	SLV	ishares	Silver Trust

The US stocks segment is split into large cap stocks based on the Russell 1000 index (IWB) and small to medium cap stocks based on the Russell 2000 (IWM) index.

Foreign stocks are split into Europe, Asia and Far East (EFA), Japan (EWJ), and emerging markets (VWO).

Bonds are split into mid-term bonds (IEF) and longer term bonds (TLT). Later I will consider whether to use short term bonds (SHY), or a money market account to hold cash.

Real estate is split into US REITs (VNQ) and international REITs (RWX).

Commodities are split into physical gold (GLD) and physical silver (SLV). I'll have more to say later about alternatives, for example exchange traded notes that track a commodities based index, such as DBC.

My Reasoning For Choosing These ETFs

Each of the equity and bond ETFs are based on broad indexes, which provides diversification within that asset type. The exception being GLD and SLV, which concentrate on gold bullion and silver bullion. Each ETF covers a different portion of the economy and hopefully react differently and at different times to changing economic conditions. Therby giving me a second layer of diversification, ie across asset types.

I'll dig deeper into the math of why this is beneficial next time.

Disclaimer

I own EFA, VWO, GLD, SLV, RWX, VNQ, IEF, SHY at this time, and could buy and sell at any time.

ETF Primer

Posted on November 28, 2012 by charlesS

ETF Primer

What is an ETF

Exchange traded funds (ETF's) are essentially mutual funds that trade during the day. The price you pay for a share in an ETF is determined by whatever price you agree to with the other person taking the opposite side of the trade. Just like trading an individual stock. The price you buy or sell a mutual fund is determined at the end of the day and by the “net asset value” of that share.

ETF's may hold hundreds or thousands of securities in a particular asset, such as stocks, bonds, currencies, metals, futures, and other strange animals. Much like a closed-end mutual fund, there is a set number of shares that trade, where each share represents a small fraction of the total assets in that fund. However unlike a closed-end fund, new shares are created as needed. Problems insue whenever a fund sponsor is unable to create new shares. When this happens, the ETF shares act more like a closed-end fund, and could deviate from the index they intend to track.

This Forbes article, The Godfather of ETFs describes in good detail how a market maker in ETFs operates. The three largest RTF trading firms are Goldman, Merrill Lynch and Knight Capital.

The market is large. According to the Blackrock's quarterly review of all exchange traded products, titled ETP Landscape – Industry Highlights – Q3 2012, total assets under management for all exchange traded products are in excess of 1.84 trillion dollars, spread over 4748 funds. The Forbes article pegged the assets under management for traditional mutual funds at over 10 trillion dollars.

	AssetType	Assets($Billions)	NumFunds
1	Equity	1290.2	2597
2	FixedIncome	321.4	618
3	Commodities	206.7	906
4	Alternative	6.8	124
5	Currency	5.7	139
6	Other	14.7	364

Largest ETF providers

The four largest fund providers by total assets under management are (ibid):

	Name	Assets($Billions)	Market Share(%
1	iShares	711.8	38.6
2	State Street Global Advisors	333.4	18.1
3	Vanguard	230.8	12.5
4	Powershares	76.4	4.1

What investments are in an ETF

Stocks Initially, ETFs replicated major indexes such as the S&P500. For instance, the State Street SPDR S&P500 is the largest of all ETFs at $118 Billion. There are ETFs that track almost any of the major known indexes. There are ETFs for sectors of the stock market (eg. Technology, Health, Utilities, etc…); style and size indexes (eg. Value, Growth, Small cap, Large Cap, etc…); international stock markets, (by region, by by country, by sectors, and combinations). These are meant to be passive in that the index is specified up front, and the investment manager attempts to adhere as close as possible to that index. Active investment management by the portfolio manager is new feature only recently made available in some ETFs. In this case, the portfolio manager has some leeway to make individual investment decisions.

Bonds Similarly, ETFs focused on bonds may be focused on tracking various indexes based duration (short-term 0-3 years, mid-term bonds 5-10 years, long-term over 10 years), source (governemt, corporate), country (US, or otherwise).

Commodities Usually, these are structured as Exchange Traded Notes, and use futures contracts to track an index. I'll say more about these at a later time. A few do hold physical metals, such as the State Street Gold ETF (GLD), which is actually structured as a trust.

REITs Real-estate investment trusts are securities that are committed to pass along 90% of their income as dividends to the share holder. REIT based ETFs hold shares in these companies.

Other There are ETFs for currencies, short equities, leveraged equities, short leveraged equities, volatility, and many other flavors of indexing.

Resources

Additional sources of information on ETF's. The general information websites all have varying degrees of free information and premium services. I usually go straight to the fund family websites, but probably should spend more time reading the reports and news items from the general sites.

General information sites

Morningstar If your local library has a subscription, see if you can't possibly access Morningstar through the library's web site.
SmartMoney Some free, some premium services.
ETF Trends has many educational articles, and an ETF screener.
ETFdb has more news and screeners.

Fund Family websites

Getting Help in R

Posted on November 27, 2012 by charlesS

Getting Help in R

Problem

Where to find help when using R

Solution

This is a consolidation of many sources, in particular a talk given by Rob Hyndman, which is available as a video, and as a pdf. Although the information is pretty basic, there were a couple of tricks that were new to me.

If you know the function name

Simply key in ?function_name at the prompt. Eg. ?mean, or ?sd

If you know the function name, but not where to find it

??function_name will search within your installed packages. The output is a nicely formatted list of package names and descriptions. The help systems does a approximate string matching on a default set of fields contained in the help files. Here are some examnples:

??DEoptim
??"DEoptim"
help.search("DEoptim")  # same as ??DEoptim or ??"DEoptim"; the quotes are required
??stats::sd  # restricts the search to just the stats package

If you get the chance, check out ?help.search. There are many more options to the help.search function.

If you need to search on something other than a known function name

RSiteSearch("search string") will do the trick. This will open a window in your browser at the search.r-project.org. It searches mailing list archives, help pages, vignettes and task views.

RSiteSearch("differential evolution optimization") # returned 33 items that contained all three search words

Check out these search options for use in the RSiteSearch() function. Options include AND; OR; NOT; grouping with parantheses; * wildcard as a prefix, postfix, or both; regular expression matching with /.../ as delimiters; and lastly searches restricted to specified fields.

RSiteSearch("differential evolution optimization") # returned 1000+ matches
RSiteSearch("diff* evol* optim*") # returned 1500 matches
RSiteSearch("genetic algorithm not neural") # intially found 507, the NOT reduced final results to 476 matches
RSiteSearch("/svm/") # found 33 in task views and vignettes, too many in functions

The home page for http://search.r-project.org/ has lots of other links for searching R. Well worth taking a further look.

If you are looking for a particular function, but can't remember the name exactly, or what package it is in

The third-party package “sos” provides the findFn() function. This function uses RSiteSearch(string, "function") to search only the functions in packages for the specified string. The results are contained in a dataframe that is then displayed as html in the browser.

install.packages("sos") # if not installed already
library(sos)
findFn("psoptim") # displays an html page with 6 results, all in the pso package

Other search facilities

These are repeated from the http://www.r-project.org/search.html page.

RSiteSearch as discussed above.
Searchable mail archives
Gmane's R-lists
RSeek searches multiple R sites.
R Nabble

If you want to see what packages are available in your subject area

Task Views provide edited lists of related packages on various subjects.
Announcements of package updates

What if you want to install every package mentioned in a Task View

Package ctv will do just that:

install.packages("ctv")  # only if not already installed
library(ctv)
available.views() # will display the names of each view, R is case-sensitive, so you need the exact name
install.views("MachineLearning") # careful, this will download and install every package in that Task View
update.views("MachineLearning")

Working with packages

library() at the prompt will list all the packages installed on your system, and the directories where the are installed.

installed.packages() returns the list of installed packages as a dataframe.

In Conclusion

That is enough for now. Thanks.
_

Introduction

Posted on November 27, 2012 by charlesS

Introduction

To paraphrase an ironic curse: May you invest in interesting times.

This has indeed been true over the past ten years. An investment in a S&P500 index fund ten years ago has returned a total of 50% over the past ten years (including dividends), which works out to 6.2% annually. Not only that, but you suffered through a decline in value from the peak in October 2007 to the bottom in March 2009 of over 50%. Only now would your investment have fully recovered back to that peak.

For those who have a lifetime of earnings ahead of themselves, constant and disciplined periodic savings over many years may overcome the severities of the market. However for those of us much closer to retirement, capital preservation becomes an overriding concern.

Consequently, my investment goals now include:

the need for greater diversification, beyond just a mix of stocks and bonds, into other types of assets;
diversification within each of those asset types;
the need to know when to overweight or underweight a particular asset, or to stay out of it entirely;
a way to reduce the maximum loss of the portfolio;
and maybe a way to squeeze out a couple of extra percentage points in the annual return.

That's not asking too much, is it?

Fortunately, what has become known as tactical asset allocation and the widespread use of exchange traded funds would seem to meet that need.

Over the rest of the winter, I will use this blog to discuss how I built such a portfolio. Keep in mind that this is just one investment strategy, and one approach. Learn from it if you wish.

Basically, I use exchange traded funds (ETF's), which are highly liquid, passively indexed, well diversified with respect to themselves. Limit the choices to a broad selection that covers many asset types. Select those that have performed the best amongst that small subset. Rebalance periodically. Adjust the size of the positions as to control volatility of the portfolio. I will also consider putting in a floor of some kind under each investment to control losses.

There are many choices at each step, avenues to research and experiment with, and decisions to be made. I will start by discussing the following questions in the next few posts:

What is an ETF?
Who are the largest providers?
What assets are available in an ETF?
Which assets should I invest in? Compare this to the institutions, endowments, and target date mutual funds.
Of so many similar sounding ETF's, which ones to use?
Risks of ETF's – there are some, believe it or not
Biases – Hindsight and other animals
Start with a simple equal weighted portfolio of say, 5 or 10 ETF's and see how that would have performed over the past ten years.

Preface

Posted on November 20, 2012 by charlesS

Preface

Intended Audience

Hopefully anyone can find something to take away from this discussion. I start from the simplest portfolios and add more sophisticated layers while building upon more advanced concepts. You may choose to incorporate any level that is compatible with your level of risk, understanding of the concepts, and how much time you choose to devote to managing your portfolios.

How is this different from other similar approaches

Tactical asset allocation has been a significant development over the past several years, although many aspects related to creating such a portfolio has been researched over the past twenty to thirty years.

The focus is strictly on building well-diversified portfolios of exchange traded funds. Starting with a variety of ETF’s that cover most asset types, I gradually add more sophisticated layers. Some easy to implement, and others maybe not so easy. I will explain each layer as I add them.

Some of the layers I investigate, may be found in the academic literature, under the such concepts as endowments, institutional management, sector rotation, global asset allocation, momentum trading, as well as Capital Asset Pricing Model, the efficient frontier, and mean-variance strategies.

In addition, I will track a real money portfolio at here at this website: mostlyquant.com. I have staked my retirement savings on these ideas discussed on this website.

Necessary Caveats

Let’s get the usual disclaimer out of the way.

As with any investment, there are risks. Any investment may fall in price at any time, seemingly for unfathomable reasons. Anything contained within must not be construed as individual advice. Only you can determine the appropriateness of any particular investment or strategy.

I may buy, sell, hold any security mentioned within, at any time.

Who am I

I have a BS in Math from University of Illinois at Urbana-Champaign, and a MBA from University of Illinois at Chicago. I’ve been a programmer, a systems analyst, a bookstore owner, and a lifelong investor.

I have been around long enough to watch the 1987 crash from the visitor gallery above the Mercantile Exchange floor; convert busted Savings and Loan institutions into a larger bank in the late 80’s; sold off stocks leading into the tech bubble in the late 90’s; and sat anxiously by as the housing bubble burst most recently, and three rounds of quantitative easing so far have kept that last shoe from dropping.

Securing a comfortable retirement should not depend upon getting lucky a few times in your life. Tactical asset allocation is a way to tweak a passive indexing approach while at the same time hopefully limiting the downside risks that has become such a part of life lately, and maybe adding a couple of percentage points extra return on top of a buy and hold strategy.

That is the goal. I hope to share what I have learned, and the decisions I have made to enhance a tactical asset allocation strategy.

Initial Portfolio post

Posted on November 5, 2012 by charlesS

I will be posting my portfolio holdings on a regular basis. Starting with a portfolio of ETF’s based upon a tactical asset allocation strategy, similar to that of “The Ivy Portfolio: How to Invest Like the Top Endowments and Avoid Bear Markets“, by Mebane Faber and Eric Richardson. I will go into what decisions I have made that may differ from the book, how I made those decisions, and alternatives.

I am still exploring the capabilities of the wordpress.com blog engine, so bear with me as I find the best way to show my updates on a regular basis.

Introduction

Posted on November 2, 2012 by charlesS

Welcome to my blog. I hope to share code snippets, tutorials, and more as they relate to my investigations into quantitative investing. Thanks for visiting.

Mostly Quantitative

Explorations into R programming and Finance

R Style Sheets

R Style Sheets

Exploring inside R functions

Exploring R functions

Problem

Solution

ETF Choices

ETF Choices

ETF Primer

ETF Primer

Resources

Getting Help in R

Getting Help in R

Problem

Solution

Introduction

Introduction

Preface

Preface

Initial Portfolio post

Introduction