Semantic Pill 16

Semantic Pill 16



  • Weather Signal: Big Data Meets Forecasting, from a Scientific American Blog talks about the impact of Big Data on forecasts on different areas from weather to health issues stating polemic the following controversy:  The philosophy of Big Data is that insights can be drawn from a large volume of ‘dirty’ (or ‘noisy’) data, rather than simply relying on a small number of precise observations – a subject covered in detail by Viktor Mayer-Schönberger and Kenneth Cukier in their recent book ‘Big Data’. One good example of the success of the ‘Big Data’ approach can be seen in Google’s Flu Trends which uses Google searches to track the spread of flu outbreaks worldwide. It is also important to remember that Big Data when used on its own can only provide probabilistic insights based on correlation; The true benefit of Big Data is that it drives correlative insights, which are achieved through the comparison of independent datasets. It is this that buttresses the Big Data philosophy of ‘more data is better data’; you do not necessarily know what use the data you are collecting will have until you can investigate and compare it with other datasets.


  • Mike2.0 is an Open Source collaborative private undertaking trying to build and lead a sort of Information Management community;
  • MINE, Maximal Information - based Non Parametrical Exploration, deals with visualization of datasets basically of “pairs” represented as a Cartesian X, Y Map: in order to “see more and better” these maps you need to know first MIC, Maximal Information Coefficient measures the strength of linear or non linear associations between X and Y. MIC belongs to a statistical class experimentally used for Detecting Novel Associations in Large Data Sets (Jun 2012):


Imagine a dataset with hundreds of variables, which may contain important, undiscovered relationships. There are tens of thousands of variable pairs—far too many to examine manually. If you do not already know what kinds of relationships to search for, how do you efficiently identify the important ones? Datasets of this size are increasingly common in fields as varied as genomics, physics, political science, and economics, making this question an important and growing challenge). One way to begin exploring a large dataset is to search for pairs of variables that are closely associated. To do this, we could calculate some measure of dependence for each pair, rank the pairs by their scores, and examine the top-scoring pairs. For this strategy to work, the statistic we use to measure dependence should have two heuristic properties: generality and equitability.



 Source: Scientific American, Smartphone Weather Signal Dashboard