NLP: Sentiment Analysis with Pytorch.

In this work we build a sentiment analysis model based on a BERT-GRU model on tripadvisor data, in order to try to predict if an opinion is positive or negative. BERT (Bidirectional Encoder Representations from Transformers) is a pretrained model based on transformers that has into account the context of the words. GRU layer is used instead of LSTM in this case.

Seguir leyendo

NLP: Target and aspect detection with Python.

In this post we perform target and aspect detection on a dataset about tripadvisor opinions. Target or topic are the words or topics the opinions are about. Aspects are parts or features of the target. Here we explore the target detection using word embeddings (Word2Vec) which extracts similar words by context and try to extract aspects of the target by searching close words wusing the WordNet synsets. First, we perform data preprocessing by removing stopwords

Seguir leyendo

Wine dataset analysis with Python

In this post we explore the wine dataset. First, we perform descriptive and exploratory data analysis. Next, we run dimensionality reduction with PCA and TSNE algorithms in order to check their functionality. Finally a random forest classifier is implemented, comparing different parameter values in order to check how the impact on the classifier results.

Seguir leyendo

Analysis of Variance (ANOVA) with R

In this post we are going to perform an analysis of variance (ANOVA) with R in order to analyze the influences of different variables such as race, education level or job class in the wage. The data is the same as in the post Descriptive Analysis with R, so you can visit that post in order to get more detail about the data used. Let’s start the analyis. Discussion By means of ANOVA we have

Seguir leyendo

Linear and logistic regression with R

This post is an analysis that applies linear and logistic regression on provided data with some health parameters of 2353 patiens who suffered surgeries. We try to discover the relation among some of the parameters and predict the probability of suffering an infection during the surgery. Discussion According to the results obtained, we can see that when studied separately, all the variables have an influence on the probability of suffering a post-surgical infection (diabetes, malnutrition,

Seguir leyendo

Inferential analysis in R

This post continues the analysis of the Mid-Atlantic Wage dataset by performing some inferential statistics with language R. The data used is the same as in the post about Descriptive Analysis with R. We start reading the dataset and performing the transformations described in that article:

Seguir leyendo

Descriptive analysis in R

This post shows an easy descriptive statistical analysis exercise of the Mid-Atlantic Wage Data showing some boxplots and checking for data normality. The dataset can be found here: https://github.com/selva86/datasets/blob/master/Wage.csv The fields in the data are the following: year: Year when the data was collected. maritl: marital status: 1. Never Married, 2. Married, 3. Widowed, 4. Divorced, and 5. Separated. age: worker’s age. race: 1. White, 2. Black, 3. Asian, and 4. Other. education: Education level:

Seguir leyendo