In this post we use three clustering methods (kmeans, hierarchical clustering and model based clustering) to evaluate their accuracy. We see how to select the optimal number of clusters in each method and obtain metrics to select the best of them.
Seguir leyendoEtiqueta: data analysis
Inferential analysis in R
This post continues the analysis of the Mid-Atlantic Wage dataset by performing some inferential statistics with language R. The data used is the same as in the post about Descriptive Analysis with R. We start reading the dataset and performing the transformations described in that article:
Seguir leyendoDescriptive analysis in R
This post shows an easy descriptive statistical analysis exercise of the Mid-Atlantic Wage Data showing some boxplots and checking for data normality. The dataset can be found here: https://github.com/selva86/datasets/blob/master/Wage.csv The fields in the data are the following: year: Year when the data was collected. maritl: marital status: 1. Never Married, 2. Married, 3. Widowed, 4. Divorced, and 5. Separated. age: worker’s age. race: 1. White, 2. Black, 3. Asian, and 4. Other. education: Education level:
Seguir leyendo