Comparing Data Augmentation Techniques to Deal with an Unbalanced Dataset (Pollution Levels Predictions)

Predicting NO2 levels in Madrid While looking for data to develop my data science skills, I came up with the idea of searching open data portals. I wanted to look at actual datasets and find out what they were like. For this purpose, I chose open data from the Madrid Open Data Portal (https://datos.madrid.es/portal/site/egob). I will try to predict NO2 concentration using weather and traffic data. This is not meant to be a definitive prediction

Seguir leyendo

Wine dataset analysis with Python

In this post we explore the wine dataset. First, we perform descriptive and exploratory data analysis. Next, we run dimensionality reduction with PCA and TSNE algorithms in order to check their functionality. Finally a random forest classifier is implemented, comparing different parameter values in order to check how the impact on the classifier results.

Seguir leyendo