data analysis – Data Science Portfolio

febrero 12, 2021 Clustering

Aggregation methods in R.

In this post we use three clustering methods (kmeans, hierarchical clustering and model based clustering) to evaluate their accuracy. We see how to select the optimal number of clusters in each method and obtain metrics to select the best of them.

Seguir leyendo

enero 16, 2021 statistics

Inferential analysis in R

This post continues the analysis of the Mid-Atlantic Wage dataset by performing some inferential statistics with language R. The data used is the same as in the post about Descriptive Analysis with R. We start reading the dataset and performing the transformations described in that article:

Seguir leyendo

enero 15, 2021 statistics

Descriptive analysis in R

This post shows an easy descriptive statistical analysis exercise of the Mid-Atlantic Wage Data showing some boxplots and checking for data normality. The dataset can be found here: https://github.com/selva86/datasets/blob/master/Wage.csv The fields in the data are the following: year: Year when the data was collected. maritl: marital status: 1. Never Married, 2. Married, 3. Widowed, 4. Divorced, and 5. Separated. age: worker’s age. race: 1. White, 2. Black, 3. Asian, and 4. Other. education: Education level:

Seguir leyendo