My Life Stats: I Tracked My Habits for a Yr, and This Is What I Realized | by Pau Blasco i Roca | Nov, 2023


I first appeared on the particular person time collection for 4 variables: Sleep, Learning, Socializing and Temper. I used Microsoft Excel to rapidly draw some plots. They signify the every day variety of hours spent (blue) and the transferring average¹ for 5 days MA(5) (purple) which I thought-about to be a very good measure for my scenario. The temper variable was rated from 10 (the best!) to 0 (terrible!).

Concerning the information contained within the footnote of every plot: the whole is the sum of the values of the collection, the imply is the arithmetic imply of the collection, the STD is the usual deviation and the relative deviation is the STD divided by the imply.

Complete: 2361h. Imply: 7,1h. STD: 1,1h. Relative deviation: 15.5% (picture by writer).

All issues accounted for, I did nicely sufficient with sleep. I had tough days, like everybody else, however I believe the development is fairly secure. The truth is, it is among the least-varying of my research.

Complete: 589,1h. Imply: 1,8h. STD: 2,2. Relative deviation: 122% (picture by writer).

These are the hours I devoted to my tutorial profession. It fluctuates lots — discovering steadiness between work and learning typically means having to cram initiatives on the weekends — however nonetheless, I contemplate myself happy with it.

Complete: 1440,9h. Imply: 4,3h. STD: 4,7h. Relative deviation: 107% (picture by writer).

Concerning this desk, all I can say is that I’m shocked. The grand whole is bigger than I anticipated, on condition that I’m an introvert. In fact, hours with my colleagues at school additionally depend. When it comes to variability, the STD is actually excessive, which is sensible given the problem of getting a stablished routine concerning socializing.

Imply: 8,0h. STD: 0,9h. Relative deviation: 11,3% (picture by writer).

This the least variable collection — the relative deviation is the bottom amongst my studied variables. A priori, I’m happy with the noticed development. I believe it’s constructive to maintain a reasonably secure temper — and even higher if it’s a very good one.

After trying on the traits for the principle variables, I made a decision to dive deeper and research the potential correlations² between them. Since my objective was with the ability to mathematically mannequin and predict (or at the least clarify) “Temper”, correlations have been an vital metric to think about. From them, I might extract relationships like the next: “the times that I research probably the most are those that I sleep the least”, “I normally research languages and music collectively”, and so on.

Earlier than we do the rest, let’s open up a python file and import some key libraries from collection evaluation. I usually use aliases for them, as it’s a widespread follow and makes issues much less verbose within the precise code.

import pandas as pd               #1.4.4
import numpy as np #1.22.4
import seaborn as sns #0.12.0
import matplotlib.pyplot as plt #3.5.2
from pmdarima import arima #2.0.4

We’ll make two totally different research concerning correlation. We’ll look into the Particular person Correlation Coefficient³ (for linear relationships between variables) and the Spearman Correlation Coefficient⁴ (which research monotonic relationships between variables). We will likely be utilizing their implementation⁵ in pandas.

Pearson Correlation matrix

The Pearson Correlation Coefficient between two variables X and Y is computed as follows:

the place cov is the covariance, sigma X is std(X) and sigma Y is std(Y)

We will rapidly calculate a correlation matrix, the place each doable pairwise correlation is computed.

#learn, choose and normalize the information
uncooked = pd.read_csv("final_stats.csv", sep=";")
numerics = uncooked.select_dtypes('quantity')

#compute the correlation matrix
corr = numerics.corr(technique='pearson')

#generate the heatmap
sns.heatmap(corr, annot=True)

#draw the plot
plt.present()

That is the uncooked Pearson Correlation matrix obtained from my information.

Pearson Correlation matrix for my variables (picture by writer).

And these are the numerous values⁶ — those which can be, with a 95% confidence, totally different from zero. We carry out a t-test⁷ with the next formulation. For every correlation worth rho, we discard it if:

the place n is the pattern measurement. We will recycle the code from earlier than and add on this filter.

#constants
N=332 #variety of samples
STEST = 2/np.sqrt(N)

def significance_pearson(val):
if np.abs(val)<STEST:
return True
return False

#learn information
uncooked = pd.read_csv("final_stats.csv", sep=";")
numerics = uncooked.select_dtypes('quantity')

#calculate correlation
corr = numerics.corr(technique='pearson')

#put together masks
masks = corr.copy().applymap(significance_pearson)
mask2 = np.triu(np.ones_like(corr, dtype=bool)) #take away higher triangle
mask_comb = np.logical_or(masks, mask2)

c = sns.heatmap(corr, annot=True, masks=mask_comb)
c.set_xticklabels(c.get_xticklabels(), rotation=-45)
plt.present()

These which have been discarded might simply be noise, and wrongfully signify traits or relationships. In any case, it’s higher to imagine a real relationship is meaningless than contemplate significant one which isn’t (what we discuss with as error kind II being favored over error kind I). That is very true in a research with quite subjective measurments.

Filtered Pearson Correlation matrix. Non-significant values (and the higher triangular) have been filtered out. (picture by writer)

Spearman’s rank correlation coefficient

The spearman correlation coefficient will be calculated as follows:

the place R signifies the rank variable⁸ — the remainder of variables are the identical ones as described within the Pearson coef.

As we did earlier than, we are able to rapidly compute the correlation matrix:

#learn, choose and normalize the information
uncooked = pd.read_csv("final_stats.csv", sep=";")
numerics = uncooked.select_dtypes('quantity')

#compute the correlation matrix
corr = numerics.corr(technique='spearman') #take note of this modification!

#generate the heatmap
sns.heatmap(corr, annot=True)

#draw the plot
plt.present()

That is the uncooked Spearman’s Rank Correlation matrix obtained from my information:

Spearman Correlation matrix for my variables (picture by writer).

Let’s see what values are literally important. The formulation to examine for significance is the next:

the place r is spearman’s coefficient. Right here, t follows a t-student distribution with n-2 levels of freedom.

Right here, we are going to filter out all t-values larger (in absolute worth) than 1.96. Once more, the explanation they’ve been discarded is that we aren’t positive whether or not they’re noise — random probability — or an precise development. Let’s code it up:

#constants
N=332 #variety of samples
TTEST = 1.96

def significance_spearman(val):
if val==1:
return True
t = val * np.sqrt((N-2)/(1-val*val))
if np.abs(t)<1.96:
return True
return False

#learn information
uncooked = pd.read_csv("final_stats.csv", sep=";")
numerics = uncooked.select_dtypes('quantity')

#calculate correlation
corr = numerics.corr(technique='spearman')

#put together masks
masks = corr.copy().applymap(significance_spearman)
mask2 = np.triu(np.ones_like(corr, dtype=bool)) #take away higher triangle
mask_comb = np.logical_or(masks, mask2)

#plot the outcomes
c = sns.heatmap(corr, annot=True, masks=mask_comb)
c.set_xticklabels(c.get_xticklabels(), rotation=-45)
plt.present()

These are the numerous values.

Correlation Matrix with important values. (picture by writer)

I consider this chart higher explains the obvious relationships between variables, as its criterion is extra “pure” (it considers monotonic⁹, and never solely linear, features and relationships). It’s not as impacted by outliers as the opposite one (a few very dangerous days associated to a sure variable received’t impression the general correlation coefficient).

Nonetheless, I’ll go away each charts for the reader to evaluate and extract their very own conclusions.

Leave a Reply

Your email address will not be published. Required fields are marked *