17.07.2015: “Big Data — Small Devices”, Prof. Dr. Katharina Morik (TU Dortmund)

17. July 2015

Prof. Dr. Morik (TU Dortmund) gave a talk “Big Data – Small Devices”.

Abstract:
How can we learn from the data of small ubiquitous systems? Do we need to send the data to a server or cloud and do all learning there? Or can we learn on some small devices directly? How complex can learning allowed be in times of big data? What about graphical models? Can they be applied on small devices or even learned on restricted processors?
Big data are produced by various sources. Most often, they are distributedly stored at computing farms or clouds. Analytics on the Hadoop Distributed File System (HDFS) then follows the MapReduce programming model. According to the Lambda architecture of Nathan Marz and James Warren, this is the batch layer. It is complemented by the speed layer, which aggregates and integrates incoming data streams in real time. When considering big data and small devices, obviously, we imagine the small devices being hosts of the speed layer, only. Analytics on the small devices is restricted by memory and computation resources.
The interplay of streaming and batch analytics offers a multitude of configurations. The collaborative research center SFB 876 investigates data analytics for and on small devices regarding runtime, memory and energy consumption. In this talk, we investigate graphical models, which generate the probabilities for connected (sensor) nodes.
• First, we present spatio-temporal random fields that take as input data from small devices, are computed at a server, and send results to –possibly different — small devices.
• Second, we go even further: the Integer Markov Random Field approximates the likelihood estimates such that it can be computed on small devices.