Simpson's Paradox in Investing: When Data Deceives

Aki Kakko
Sep 23, 2023
5 min read

Updated: Feb 13, 2024

Simpson's Paradox is a statistical phenomenon where a trend that appears within several different groups can disappear or even reverse when those groups are combined. This can lead to counterintuitive results and has important implications in many fields, including investing. For investors, understanding this paradox is crucial. It ensures that they don't make misguided decisions based on misleading aggregate data. In this article, we will delve into what Simpson's Paradox is, why it occurs, and provide relevant examples tailored for investors.

What is Simpson's Paradox?

In essence, Simpson's Paradox occurs when the aggregate data of several groups contradicts the individual data of those groups. This means that while each group might show a certain trend, the combined data of all groups might show a completely different trend.

Why Does It Occur?

There are several reasons why Simpson's Paradox can occur:

Unequal Group Sizes: One group might be much larger than others and hence influence the aggregate data disproportionately.
Lurking Variables: There might be a hidden variable that isn't considered in the aggregated data but influences the outcome.
Different Group Dynamics: Each group might have unique characteristics or dynamics that get overshadowed when combined.

Example 1: Stock Performance

Imagine two stocks, Stock A and Stock B. You are comparing their monthly returns over a year.

Month	Stock A Return	Stock B Return
January	+5%	+2%
February	+4%	+3%
....	...	...
December	+6%	+4%

Raw Performance: At a glance, Stock A outperforms Stock B every month. This means that if you only look at the percentage returns each month, Stock A always appears to be rising in value more than Stock B. However, the raw percentage return doesn't account for the size or volume of the stock in the market. Here's why that matters:

Volume of Transactions: This refers to how many times the stock is bought or sold over a period. A stock with high transaction volume has more liquidity and is generally considered more stable. Big fluctuations in a low-volume stock might just be because of a few trades, whereas in a high-volume stock, it indicates a broader consensus among investors.
Market Capitalization: This is the total market value of a company's outstanding shares of stock. It's calculated by multiplying the company's stock price by its total number of outstanding shares. A company with a higher market cap is generally considered to be more stable and less risky than one with a lower market cap.

This means Stock A is a smaller player in the market, while Stock B is a major one. Even if Stock A's percentage returns are higher every month, those returns might be based on a much smaller base value or fewer transactions. On the other hand, even a smaller percentage gain for Stock B might represent a significant amount of money because it's a larger stock with more transactions. For instance, a 5% gain on a small company worth $1 million is $50,000. But a 3% gain on a large company worth $1 billion is $30 million. So while the percentage gain is higher for the smaller company, the actual monetary gain is vastly higher for the larger one.

Weighted Aggregate Return: When evaluating the overall performance of stocks, especially when comparing them, it's common to weight their returns by factors like market cap or volume. This gives a more accurate picture of their real-world impact and value. In our example, when you take into account the sheer size and volume of Stock B, its returns, although smaller in percentage, might have a greater overall impact on the market or an investment portfolio than the returns of Stock A. Hence, when weighted, Stock B might appear as the better performer over the year in terms of actual value generated.

In essence, while percentage returns give an initial idea of stock performance, understanding the context behind those numbers (like market cap and volume) is crucial for a more comprehensive analysis.

Example 2: Portfolio Diversification

Consider two investment portfolios. Portfolio X invests in tech companies and Portfolio Y in healthcare. Over five years, both portfolios show positive returns each year:

Year	Portfolio X Return	Portfolio Y Return
1	+10%	+8%
2	+9%	+7%
...	...	...
5	+11%	+9%

However, when you combine the portfolios into a single diversified portfolio, the aggregate return might be less than either of the individual portfolios. This could be due to negative correlations between some tech and healthcare stocks that weren't apparent when the portfolios were separate.

Combining Portfolios: When we merge Portfolio X (tech companies) and Portfolio Y (healthcare companies) into a single diversified portfolio, we're taking the stocks from both portfolios and creating a new, combined set of investments.
Aggregate Return of the Combined Portfolio: The "aggregate return" of this new portfolio is the overall performance derived from both tech and healthcare stocks. It's not just a straightforward average, but rather a reflection of the combined performance of stocks from both sectors.

Correlation and Its Impact: Correlation, in this context, describes how tech stocks relate to healthcare stocks in terms of their price movements.

Positive Correlation: Tech and healthcare stocks move in tandem. If tech stocks rise, healthcare stocks also rise, and vice versa.
Negative Correlation: Tech and healthcare stocks move in opposite directions. If tech stocks rise, healthcare stocks fall, and the reverse.

In our scenario, let's assume some of the tech stocks in Portfolio X and some of the healthcare stocks in Portfolio Y have a negative correlation. When you assessed Portfolio X and Y separately, you observed positive returns in both. Tech stocks in Portfolio X were rising, and so were the healthcare stocks in Portfolio Y. However, upon merging the portfolios, the negative correlation comes into play. Say, for instance, that every time a particular tech stock rises due to positive tech industry news, a specific healthcare stock drops because investors are shifting their focus and investments from healthcare to tech. This movement offsets the gains from the tech stocks with the losses from the healthcare stocks. As a result, even though both portfolios were showing positive returns separately, when combined, the overall (aggregate) return might be dampened or even turn negative due to these offsetting movements.

In essence, while diversifying investments is a strategy to spread and potentially reduce risk, it's also essential to understand how the assets within a diversified portfolio interact. In our example, the negative correlations between some tech and healthcare stocks led to an unexpected outcome in the combined portfolio's performance.

Example 3: Sector Analysis

When analyzing sectors, an investor might find that within each sector, companies with higher R&D spending have higher profits. But when aggregating all sectors, the data might show that companies with higher R&D spending have lower profits. This could be because sectors with naturally lower R&D expenses (like utilities) are more profitable than sectors with high R&D expenses (like biotech). Combining them might distort the individual sector trends.

How Can Investors Avoid Being Misled?

Always Dig Deeper: Never take aggregate data at face value. Delve into the individual components to ensure you aren't missing important trends.
Be Aware of Lurking Variables: Always consider if there might be a hidden variable that can change the interpretation of your data.
Seek Expertise: If you're not confident in your data interpretation, seeking the advice of a financial analyst or statistician can be invaluable.

Simpson's Paradox is a potent reminder that data interpretation is not always straightforward. Especially for investors, where decisions can have significant financial implications, it's crucial to understand and be wary of this paradox. By always digging deeper into the data and being conscious of potential lurking variables, investors can make more informed and accurate decisions.

Alphanome - AI Research Lab & Venture Studio

Simpson's Paradox in Investing: When Data Deceives

Recent Posts

Comments

Subscribe to Site