The main strategy for creating a successful portfolio is to diversify your investments. If you put all your funds into one asset, you can easily lose a lot of money. When you invest in assets that behave differently on the market, you minimize the risk that all of them will lose their value simultaneously.
Expert knowledge vs. data-driven solutions
There are two ways to approach a data science problem: expert knowledge and data-driven solutions. In our case, expert knowledge would be value-based investments. You must know the companies in your current portfolio and try to find others that will behave differently. For example, you can look in industry sectors that are not dependent on sectors from your portfolio.
The data-driven approach would be to look at a mathematical representation of a diversified portfolio, i.e, a correlation. If the prices of two assets are not correlated, they might be independent. Therefore, the key to portfolio diversification would be to add an asset not correlated with others.
Assets correlation
When there are more than two assets in a portfolio, it is harder to determine whether or not they are correlated. For three assets you will have three correlation factors, for four assets six factors, etc. Having to consider varying amounts of target values makes this approach difficult to use. There is one numeric value that determines how much a set of assets is uncorrelated – the determinant of the correlation matrix. The higher the determinant, the more uncorrelated the portfolio is.
We could compute this determinant for every potential portfolio – our current assets plus one asset candidate – and pick the candidate with the highest determinant. However, that approach is time consuming, and the time required grows with the size of our portfolio. Instead, we can use an approximated solution. Let’s approximate our portfolio using the weighted mean of assets’ prices. Weights used are assets’ percentages of total portfolio value. For example, say we have three assets:
– A worth $250
– B worth $750
– C worth $1000
The total portfolio value is $2000, so A is 12.5%, B is 37.5% and C is 50% of the total portfolio value. When averaging the portfolio price over time, we will use these values as weights. Using the approximation, we can compute a simple correlation factor for every asset candidate; this approach is less time consuming and more resource friendly.
To check if the approximation is working, we can compute determinants for the example portfolio and plot them against the approximated correlation factors. We should see a descending trend.
Picking asset candidates
When selecting stock to be added to our portfolio, lack of correlation is not the only factor to consider. We should also consider the stocks’ likelihood to perform well in the future. Again, we have two ways to tackle that problem. The expert knowledge approach would be to find a set of least correlated assets and analyze each one to find the most promising asset. However, that solution could be tricky since selected stocks may not be our area of expertise.
The data-driven approach would be to use machine learning to find the most promising assets and pick one of the least correlated ones. The most straightforward model to accomplish this would be one that predicts stock prices. Many have tried doing that, and many have failed. A less rigorous model would predict whether particular assets will outperform some mean value. In the case of the US stock market, that would be the S&P 500 Index.
Data we can use for such a model consists of companies’ income statements, balance sheets, and cash flow statements. We can also use variables derived from the stock prices such as returns from the last month, quarter, and year. Feeding all that data into an algorithm such as eXtreme Gradient Boosted Trees yields a model that allows us to order assets and look only at the most promising ones.
Data-driven solution to portfolio diversification
Now that we have all the building blocks, we can recap our data-driven solution. The first step would be ranking stocks based on their likelihood to out-perform the S&P 500 Index. For the high-ranking stocks, we would then compute their correlation factor with our portfolio. We would approximate our portfolio using the weighted mean of individual assets’ prices and choose the asset that correlates the least. To determine how much of the asset to buy, we could use the Markowitz Portfolio Theorem, which allows optimization of capital distribution between assets to reach the best returns-to-risk ratio.