Files in This Item:
File Format
b1432223.mp4Streaming VideoView/Open
Title: Making Sense of Big Data with the Berkeley Data Analytics Stack
Originating Office: IAS
Speaker: Franklin, Michael J
Issue Date: 25-Nov-2014
Event Date: 25-Nov-2014
Group/Series/Folder: Record Group 8.15 - Institute for Advanced Study
Series 3 - Audio-visual Materials
Location: 8.15:3 EF
Notes: IAS Seminar Series on Big Data.
Title from opening screen.
Abstract: The Berkeley AMPLab is creating a new approach to data analytics. Launching in early 2011, the vision of the lab is to seamlessly integrate the three main resources available for making sense of data at scale: Algorithms (machine learning and statistical techniques), Machines (in the form of scalable clusters and elastic cloud computing), and People (both individually as analysts and in crowds). The lab is realizing its ideas through the development of a freely-available Open Source software stack called BDAS: the Berkeley Data Analytics Stack. In the nearly four years the lab has been in operation, the speaker and his research group have released major components of BDAS. Several of these components have gained significant traction in industry and elsewhere: the Mesos cluster resource manager, the Spark in-memory computation framework, and the Shark query processing system. BDAS features prominently in many industry discussions of the future of the Big Data analytics ecosystem a rare degree of impact for an ongoing academic project. Given this initial success, the lab is continuing on its research path, moving 'up the stack' to better integrate and support advanced analytics and to make people a full-fledged resource for making sense of data. In this talk, the speaker will first outline the motivation and insights behind his research approach and describe how the research group has organized to address the cross-disciplinary nature of Big Data challenges. He will then describe the current state of BDAS with an emphasis on the group's newest efforts, including some or all of: the GraphX graph processing system, the MLBase machine learning platform, and the SampleClean framework for combining sampling and hybrid human/computer data cleaning. Finally he will present his current views of how all the pieces will fit together to form a system that can adaptively bring the right resources to bear on a given data-driven question to meet time, cost and quality requirements throughout the analytics lifecycle.
Prof Michael Franklin received his PhD in Computer Sciences from the University of Wisconsin-Madison in 1993. He was faculty at the University of Maryland from 1993 to 2001. He is currently Thomas M. Siebel Professor of Computer Science, Chair of the Computer Science Division, and also Director of the Algorithms, Machines and People Lab (AMPLab) at the University of California at Berkeley.
Prof Franklin works primarily in the Database and Operating Systems and Networking Technology areas. The AMPLab which he directs specializes in data management, cloud computing, statistical machine learning and other important topics necessary for making sense of vast amounts of varied and unruly data. It currently works with 23 industrial sponsors including founding sponsors Amazon Web Services, Google, and SAP, and received a National Science Foundation CISE 'Expeditions in Computing' Award, which was announced as part of the White House Big Data Research initiative in 2012.
Prof Franklin received numerous awards including the ACM SIGMOD 'Test of Time' Award, the IBM Faculty Award, Siemens Faculty Development Award, and the US National Science Foundation CAREER Award, etc. He is a Fellow of the Association for Computing Machinery.
Duration: 88 min.
Appears in Series:8.15:3 - Audio-visual Materials
Videos for Public -- Distinguished Lectures