User-Friendly Distributed Frameworks for Processing Big Data


February 16, 2018 | 2:30 - 3:30 p.m.


Campbell Hall 443


Da Yan. Department of Computer Science, UAB


Big Data frameworks emphasize on two aspects, "programming simplicity" and "efficiency". The aim is to write a distributed algorithm in just a few lines of code, and to let the underlying execution engine fully utilize the hardware (CPUs, disks and the network) of a cluster. Examples include Google's MapReduce, Pregel and TensorFlow. This talk introduces three such frameworks developed in my group: (1) a library for the distributed sorting of generic data using the TeraSort algorithm; (2) a framework for data-intensive graph analytics (e.g., computing PageRanks, connected components) where users think like a vertex when writing programs; and (3) a framework for comopute-intensive graph analytics (e.g., community detection, subgraph matching) where users think like a subgraph. A demo on each framework will be provided to illustrate its efficiency.

UAB is an Equal Opportunity/Affirmative Action Employer committed to fostering a diverse, equitable and family-friendly environment in which all faculty and staff can excel and achieve work/life balance irrespective of race, national origin, age, genetic or family medical history, gender, faith, gender identity and expression as well as sexual orientation. UAB also encourages applications from individuals with disabilities and veterans.