Data Mining in R

This post describes an analysis performed on an online news dataset. Data cleaning, data transformation, and dimensinality reduction are performed. Next, we try some supervised and unsupervised models such as decision trees, clustering and logistic models to check their accuracy on the prediction of the popularity of the news.

Seguir leyendo

Analysis of Variance (ANOVA) with R

In this post we are going to perform an analysis of variance (ANOVA) with R in order to analyze the influences of different variables such as race, education level or job class in the wage. The data is the same as in the post Descriptive Analysis with R, so you can visit that post in order to get more detail about the data used. Let’s start the analyis. Discussion By means of ANOVA we have

Seguir leyendo

Linear and logistic regression with R

This post is an analysis that applies linear and logistic regression on provided data with some health parameters of 2353 patiens who suffered surgeries. We try to discover the relation among some of the parameters and predict the probability of suffering an infection during the surgery. Discussion According to the results obtained, we can see that when studied separately, all the variables have an influence on the probability of suffering a post-surgical infection (diabetes, malnutrition,

Seguir leyendo

Inferential analysis in R

This post continues the analysis of the Mid-Atlantic Wage dataset by performing some inferential statistics with language R. The data used is the same as in the post about Descriptive Analysis with R. We start reading the dataset and performing the transformations described in that article:

Seguir leyendo

Descriptive analysis in R

This post shows an easy descriptive statistical analysis exercise of the Mid-Atlantic Wage Data showing some boxplots and checking for data normality. The dataset can be found here: https://github.com/selva86/datasets/blob/master/Wage.csv The fields in the data are the following: year: Year when the data was collected. maritl: marital status: 1. Never Married, 2. Married, 3. Widowed, 4. Divorced, and 5. Separated. age: worker’s age. race: 1. White, 2. Black, 3. Asian, and 4. Other. education: Education level:

Seguir leyendo