User-Friendly Distributed Frameworks for Processing Big Data


February 16, 2018 | 2:30 - 3:30 p.m.


Campbell Hall 443


Da Yan. Department of Computer Science, UAB


Big Data frameworks emphasize on two aspects, "programming simplicity" and "efficiency". The aim is to write a distributed algorithm in just a few lines of code, and to let the underlying execution engine fully utilize the hardware (CPUs, disks and the network) of a cluster. Examples include Google's MapReduce, Pregel and TensorFlow. This talk introduces three such frameworks developed in my group: (1) a library for the distributed sorting of generic data using the TeraSort algorithm; (2) a framework for data-intensive graph analytics (e.g., computing PageRanks, connected components) where users think like a vertex when writing programs; and (3) a framework for comopute-intensive graph analytics (e.g., community detection, subgraph matching) where users think like a subgraph. A demo on each framework will be provided to illustrate its efficiency.