Data Science Portfolio – My data science projects.

septiembre 24, 2025 Uncategorized

Full Project RAG (Retrieval-Augmented Generation) – IV

The Reranker and the Power of the Prompt Welcome to the fourth installment of our full RAG (Retrieval-Augmented Generation) project. If you’ve followed the series so far, you already know how to prepare documents, build the index, and retrieve relevant results using multiple strategies. Now it’s time to go one level deeper: filtering, refining, and shaping the final answer. This is where one of the most important pieces of the RAG puzzle comes into play:

Seguir leyendo

septiembre 20, 2025 Retrieval augmented generation

Full project: RAG (Retrieval-Augmented Generation) III

Combining Self-Query and MMR Retrievers in RAG Pipelines: A Practical Guide In Retrieval-Augmented Generation (RAG) pipelines, the retriever plays a central role. Before the LLM can generate answers, it needs relevant information — and retrievers are the components in charge of finding it. Whether pulling from a vector database, a search index, or a hybrid of both, retrievers define what information the model can see. In this post, we walk through a Python implementation using

Seguir leyendo

agosto 30, 2025 Retrieval augmented generation

Full project: RAG (Retrieval-Augmented Generation) II

Optimizing Document Retrieval for RAG Systems: Enhancing the Search Process for SMEs Using Metadata and PGVector For small and medium-sized businesses (SMEs) in Spain, accessing up-to-date, trustworthy information on government grants and financial support is essential. One effective way to help SMEs navigate this complex landscape is through a Retrieval-Augmented Generation (RAG) model. In this project, I used a specialized resource, the Plataforma Pyme guide to government grants, as the source for a RAG system

Seguir leyendo

abril 8, 2025 Retrieval augmented generation

Full project: RAG (Retrieval-Augmented Generation) I

Retrieval-Augmented Generation, known as RAG, harnesses the capabilities of LLMs (Large Language Models) to offer an effective method for accessing external information. LLMs comprehend the inquiry and leverage the contextual details in the given external materials to formulate a response on the subject. This process essentially bridges the gap between the vast general knowledge of an LLM and the specific, often niche, information contained within a user’s own documents. The aim is to create a

Seguir leyendo

febrero 25, 2022 NLP / Recommender Systems

Using NLP to Create a Recommender System

In the article Using Scikit-Surprise to Create a Simple Recipe Collaborative Filtering Recommender System we developed the simplest recommender system using the scikit-surprise package and saw how to use the built-in algorithms it contains, such as KNN or SVD. I’d like to take my recommender systems practice a step further and attempt to create my own prediction algorithm. Surprise allows you to override its core classes and methods in order to tailor your own algorithm and try to improve

Seguir leyendo

febrero 8, 2022 Recommender Systems

Using Scikit-Surprise to Create a Simple Recipe Collaborative Filtering Recommender System.

Companies all over the world are increasingly utilizing recommender systems. These algorithms can be used by online stores, streaming services, or social networks to recommend items to users based on their previous behavior (either consumed items or searched items). There are several approaches to developing recommendation systems. We can build a recommender system based on the content of the item so that the system recommends similar items to the ones the user usually likes (Content-Based recommender

Seguir leyendo

octubre 7, 2021 Time Series

Forecasting Time Series with Auto-Arima

In this article, I attempt to compare the results of the auto arima function with the ARIMA model we developed in the article Forecasting Time Series with ARIMA (https://www.alldatascience.com/time-series/forecasting-time-series-with-arima/). I made this attempt to see how it works and what the differences are.The parameters selected by auto-arima are slightly different than the ones selected by me in the other article.Auto arima has the advantage of attempting to find the best ARIMA parameters by comparing the

Seguir leyendo

septiembre 14, 2021 Time Series

Forecasting time series with ARIMA

In this post, I’ll attempt to show how to forecast time series data using ARIMA (autoregressive integrated moving average). As usual, I try to practice with «real-world», which can be obtained easily by downloading open data from government websites. I chose the unemployment rate in the European Union’s 27 member countries. The data were obtained from the OECD data portal (https://dataportal.oecd.org/). First of all, I’m going to try to clean up the data, in this

Seguir leyendo

agosto 9, 2021 Classification

Comparing Data Augmentation Techniques to Deal with an Unbalanced Dataset (Pollution Levels Predictions)

Predicting NO2 levels in Madrid While looking for data to develop my data science skills, I came up with the idea of searching open data portals. I wanted to look at actual datasets and find out what they were like. For this purpose, I chose open data from the Madrid Open Data Portal (https://datos.madrid.es/portal/site/egob). I will try to predict NO2 concentration using weather and traffic data. This is not meant to be a definitive prediction

Seguir leyendo

febrero 18, 2021 Deep Learning

Deep Learning: COVID-19 detection in X-Ray with CNN

In this project we develop a Deep Learning detector of Covid-19 in radiographs. For this purpose, we use images from the «Covid-chestxray-dataset» [3], generated by researchers from the Mila research group and the University of Montreal [4]. We also use images of radiographs of healthy and bacterial pneumonia patients extracted from Kaggle’s «Chest X-Ray Images (Pneumonia)» competition [5]. In total, we have a number of 426 images, divided into training (339 images), validation (42 images)

Seguir leyendo