Founder’s Field Guide
Episode 18 The Past, Present, and Future of Big Data
Founder’s Field Guide

Episode 18: The Past, Present, and Future of Big Data

The Past, Present, and Future of Big Data

Ali Ghodsi is the founder and CEO of Databricks, a data analytics platform for data scientists and developers. We cover the history of distributed computing, best practices for using data, and the lessons he has learned as CEO of Databricks and working at UC Berkley AMPLab.

This episode is sponsored by:

Klaviyo. Klaviyo is the ultimate marketing platform for e-commerce. With targeted segmentation, email automation, SMS marketing, and more, Klaviyo helps you create your ideal customer experience. See why Klaviyo is trusted by more than 50,000 brands, like Living Proof, Solo Stove, and Nomad to help them grow their business. For a free trial check out klaviyo.com/founders. Vanta. Vanta has built software that makes it easier to both get and maintain your SOC 2 report, at a fraction of the normal cost. Founders Field Guide listeners can redeem a $1k off coupon at vanta.com/patrick

[00:02:48] – [First question] – What is Databricks

[00:03:34] – History of distributed computing

[00:05:35] – Hardware that made this all possible

[00:07:20] – Early challenges in building out these systems

[00:09:43] – What has made networking technology better

[00:10:35] – Doing something in storage vs with memory

[00:11:45] – Origins of Hadoop

[00:12:42] – Use cases of distributed data in 2010 that weren’t possible in 2000

[00:13:35] – Origins of Spark

[00:15:25] – Early Spark and then the transformation into Databricks

[00:16:50] – Early uses cases

[00:17:37] – Their relationship to the open-source project

[00:21:07] – What customers need in order to work with Databricks

[00:23:11] – Their customer interaction

[00:26:27] – How they think about making investments

[00:28:24] – Their competitive advantage

[00:30:13] – Other companies in moving the needle in building distributed computing industry

[00:32:10] – Walls that need to be broken down today

[00:34:02] – Best practices for companies when it comes to their data

[00:38:47] – Lessons being a CEO

[00:39:53] – Working at the University of Berkeley’s AMPLab

[00:41:56] – What excites him about the future

[00:43:29] – Kindest thing anyone has done for him

The Past, Present, and Future of Big Data

Introduction

Patrick
My guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at UC Berkeley's computer science department. Our conversation ranges from the origins of distributed computing to modern data infrastructure, how companies can leverage their massive data sets, and the transformation of Databricks through its phases of growth as a business. While technical, it's exactly the kind of conversation I like to have on this show. I hope you enjoy my great conversation with Ali Ghodsi.

History of Distributed Computing

Patrick
Ali, I'd love to start our conversation at the end, with what Databricks is today to level set for the audience exactly what you do, what your focus is, and what the business does for customers. Could you just walk us through, as we sit here at the end of 2020, what the company looks like and the service or problem it solves for customers?

Access the full transcript
Sign in or register to view episode transcripts.

Contact

Get in touch at help@joincolossus.com