Extracting keywords from text

Keywords are used in several fields as a way to summarize and categorize natural text, such as papers, essays, news articles or blog posts. Keywords, or tags, allow a quick and easy categorization of available information, thus facilitating its access and search. However, while many of existing natural texts have been assiegned their own keywords, either by their authors or by third parties, the vast majority of the information out there is currently untagged and, in many cases, virtually inaccessible. This begs the question: how difficult is it to develop a system that performs automatic and reliable tagging of written documents?

Read More

Style over Matter

I always wanted to be a great painter, able to beautifully mix colors and shapes and create works of art that would mesmerize people. But I was never good enough, so I am writing this post instead. One of the many things about art, and in particular paintings, that always fascinated me is the striking balance between style and substance, technique and content that talented artists apparently seamlessly achieve in their work. One, in fact, can be able to exactly replicate objects or shapes, but it is the fashion in which this is done that utlimately makes the difference between a nice picture and a great work of art. But while it is fairly easy to spot a Monet from a Raffaello, it is not trivial to identify the line that separating “style” from the “content” of a picture or image. Well, until this paper came about, showing that style and content are, in fact, separable, and illustrating the procedure to tell a machine to do it for you.

Read More

Exploring Recommender Systems

Recommender systems are ubiquitous nowadays and they exploit patterns in people’s preferences and tastes to provide personalized recommendations to users. Collaborative Filtering Recommenders (CFR) are possibly the most common and powerful engines and are widely used in a variety of domains.

Read More

Serving multiple Jekyll-GitHub sites on a custom domain

Here I share my experience of hosting two Jekyll-powered websites on GitHub (using their GitHub Pages service) and hosting serving them from one custom domain. While there are a plethora of tutorials and posts out there that show how to do this as well, everyone’s needs are personal and utlimately different and, if you are making it thus far as reading this post in your search, perhaps you have not found the right one for you yet.

Read More

Analysis of retractions in peer-reviewed journals

In this post, I look at a problem that has become more and more common over the last years in academia, that is, the bulk of papers that, either for misconduct, negligence, or simply for accidental human error, present inaccurate results and are ultimately retracted.

Read More

Machine learning from Hurricane Sandy

In this post, we will look on how it is possible to use machine learning to identify areas that are vulnerable to natural disasters. The application is specifically tailored to analyze the damage patterns caused by Hurricane Sandy as it hits the East Coast in the Fall 2012.

Read More

Downloading datasets from Kaggle using Python

In this brief post, I will outline a simple procedure to automate the download of datasets from Kaggle. This script may be useful when one wants to run a model from a remote machine (e.g. a AWS instance) and does not want to spend time moving files between local and remote machines.

Read More