Semantic Pill 25

Semantic Pill 25



  • TDA, Topological Data Analysis is something that should be carefully studied at least conceptually if you are not strong on math as it is fundamental in Data Mining, Visualization, Semantics and now embedded in Big Data:

The main problems are: 1. how one infers high-dimensional structure from low-dimensional representations; and 2. how one assembles discrete points into global structure. The human brain can easily extract global structure from representations in a strictly lower dimension, i.e. we infer a 3D environment from a 2D image from each eye. The inference of global structure also occurs when converting discrete data into continuous images, e.g. dot-matrix printers and televisions communicate images via arrays of discrete points. The main method used by topological data analysis consist of three steps: a. Replace a set of data points with a family of simplicial complexes, indexed by a proximity parameter; b) Analyze these topological complexes via algebraic topology — specifically, via the theory of persistent homology, c) Encode the persistent homology of a data set in the form of a parameterized version of a Betti number which is called a barcode.



  • “To see more and better”, this term as exact has 1,340,000 references in Google appearing like a “meme” or goal of research and innovation. It is also the “motto” of our Darwin Methodology: to build tools to “see more and better” the Web, like the Darwin Semantic Glasses.
  • Tuples: A tuple is an ordered list of elements and Tuple Space a space of tuples to be used sometime, somewhere and somehow:


A tuple space is an implementation of the associative memory paradigm for parallel/distributed computing. It provides a repository of tuples that can be accessed concurrently. As an illustrative example, consider that there are a group of processors that produce pieces of data and a group of processors that use the data. Producers post their data as tuples in the space, and the consumers then retrieve data from the space that match a certain pattern.



  • Vector Processing refers to process by vectors instead of processing by single data or “scalar” one at a time. This technique could be used not only as a possible architecture to build supercomputers bust also as program. Our Darwin Methodology process “by textons” resembling vectors of Web documents;
  • Watkins Q-Learning Algorithm, points to the Watkins thesis (1989): Learning from Delayed Rewards, a crucial work of 220 pages. The thesis faces a crucial query behavioral scientists make to themselves: how might the animals learn optimal policies from their experience? And going a little deeper: is it possible to give a systematic analysis of possible computational methods of learning efficient behavior?
  • Weather Forecast has been reviewed in previous pills however we suggest to read Big Data Reshapes Weather Channel Predictions, an article about The Weather Company from

 "Weather is the original big data application," says Bryson Koehler, executive VP and CIO at the Weather Company. "When mainframes first came about, one of the first applications was a weather forecasting model."


Flash forward to today and the Weather Company ingests some 20 terabytes of data per day to spin out what Keohler bills as the world's most accurate forecasts. To stay ahead of its competition, the Weather Company is in the process of rolling out a new platform built on Basho's Riak NoSQL database and running globally in the Amazon Web Services (AWS) cloud.


Source: DARPA Topological Data Analysis, from Big Data, Wikipedia




Additional information