In this episode we dive into big data, the massive collections of information that we can use can deliver all kinds of insights. These include the ability to predict popular tv shows, determine the best way to combat crime, and identify air pollution hot spots. Big data also calls for a big time guest, and oh baby did we get one! Kate Brandt, the ever-smiling Lead for Sustainability at Google, joins us to explain how Google uses big data to advance sustainability inside the company and around the world.
What we’ll cover:
- What is Big Data?
- How did it start and what drives its growth?
- Why should we care about Big Data?
- How can big data be used to advance sustainability?
- Kate Brandt, Lead for Sustainability at Google
What is Big Data?
Well, first let’s start with data generally. Fundamentally, the word “data” can be applied to any group of information. Think an individual farmer tracking how many bushels of corn grew this year, the number of kids who got A’s in your high school English class, or the time it took for Scott to cook dinner each night. Some may be familiar with the business phrase “what gets measured, gets managed.” We as a society value information because it allows us to make better-informed decisions on just about anything.
What we’re talking about today though is called Big Data, which refers to sets of data that are so large or complex that traditional data processing methods are inadequate to deal with them (1). Whereas the examples we just mentioned may have up to a couple hundred individual data points that aren’t too mind-boggling to study, Big Data concerns amounts of data large enough for us to throw our hands up and say, “why don’t you take a crack at this, computer?” Think things like: the number of Google searches per day (4.5 trillion) (2) or the number of SMS texts sent each day (18.7 billion) (3). Technology plays a huge role here in that it allows us to both create all this data and capture it too, so we can derive correlations and insights from it.
So, ok, you should feel free to consider big data as basically data so large that we need a computer to analyze it (that’s certainly where I stopped trying to understand it). But according to many technology research groups, there’s a bit more to it than that. They one should consider the 3 V’s that constitute big data (4):
- Volume: the amount of data (each day we create 2.3 trillion gigabytes of it!) (5). For context: streaming 1 hour of standard definition video on Netflix consumes about 1 GB of data.
- Velocity: the speed of data (which is always in flux) and processing (analysis of streaming data to produce near or real-time results)
- Variety: different types of data, both structured (i.e., organized) and unstructured
If the 3 V’s kind of went over your head, may we suggest remembering this about big data: the value in Big Data is not the data points themselves, but rather the observations, conclusions, and predictions we can derive from them. Don’t worry, we’ll dive into this in just a bit.
How did Big Data start and what has fueled its (crazy) growth?
The growth of big data is directly related to changes in the way we share information (6). Back in what’s called the “Analog Age,” we shared information on things like paper, film, audiotapes, vinyl, and VHS tapes. However, once we crossed into the Digital Age (which scholars indicate was in 2002), we started sharing information on digital platforms that have the capacity to hold vastly more data than analog sources. Think CDs, DVDs, and portable media platforms before moving to things like computer servers, hard drives, and ultimately the cloud (ooooh!!).
Ok, so we can store more stuff but are we also capturing more? The answer is yes. Data sets can now grow rapidly thanks to the proliferation of wirelessly connected instruments and devices that capture data (think mobile phones, satellites, thermostats, wireless sensors, etc.) and our ability to store more data. In fact, the world's technological per-capita capacity to store information has roughly doubled every 40 months since the 1980s (6).
Ready for a crazy, mind-blowing stat on how much data we’ve been generating recently? Are you all seated for this? Hopefully not on the cloud... According to a new IBM report, 90% of the data in the world today has been created in the last two years alone! (7)
Why should we care about Big Data?
Well, as Kate Crawford (a researcher at Microsoft and a lecturer at MIT and NYU) puts it, "There is little doubt that the quantities of data now available are indeed large, but that’s not the most relevant characteristic of this new data ecosystem." She and others argue that it’s the analysis of this data that’s so impactful, which can find new correlations that were previously unavailable. These correlations can help us spot business trends, prevent diseases, combat crime, and, as we’ll discuss in a second, combat climate change (1).
It’s easy to see Big Data’s value from a business perspective. Data-driven analyses helps you understand your customers and the market way better. For example, Netflix used big data to ensure House of Cards would be a hit. Yeah, in fact, Netflix was so confident, it committed to two seasons, which is 26 episodes of the show, at a rate of almost $4 million per episode before it even aired. This confidence came analyzing its millions of plays, ratings, and searches and discovering that fans of the UK version of House of Cards were also watching movies that starred Kevin Spacey and were directed by David Fincher (one of the show’s executive producers) (8).
How can Big Data be used to advance sustainability?
Having big data means that we can find hot spots for greenhouse gases, inefficient water use, food waste, etc. and then smartly go about addressing those hot spots. One company that is using big data to solve environmental issues is Google.
One of its applications is Project Sunroof. With solar costs dropping dramatically (85% drop from 2009 to 2016) (9), many people are starting to ask: does solar power actually make sense on my rooftop? Since its initial launch in 2015, Project Sunroof has used imagery from Google Maps and Google Earth, 3D modeling and machine learning to help answer those questions accurately and at scale.
So how it works is you put in your address and then it gives you your recommended solar installation size, your potential environmental impact, your financing options (check out our solar financing episode for more info on that), and solar providers in the area. Solar isn’t right for every home right now. For example, according to Project Sunroof, solar installation on my row house in DC would save $2,000 over 20 years but adding solar to my childhood home in the suburbs of Chicago, wouldn't make economic sense. What's great is through the aggregation of all these different data sources, this tool gives you the information you need in seconds to decide if solar is the right option for you.
Ok. Side note. We said google is using machine learning to make this tool work. What is machine learning? We’ll hear it mentioned in our interview as well so let’s define it. Machine learning is an algorithm or model that learns patterns in data and then predicts similar patterns in new data (10). In other words, machine learning is a kind of artificial intelligence where we give machines access to data and then they use that data to better perform a task (11). An example is Spotify’s algorithm that has access to what you play and then uses that information to make a prediction about what other music you might enjoy (12).
Another cool application of Big Data and sustainability is Google’s air pollution tracking (13). They’ve partnered with the Environmental Defense Fund and Aclima to measure air quality using equipment mounted on Google Street View cars. The end result is interactive maps where you can zoom in to see street-level details that show how pollution can change block by block. For example, the area where the Bay Bridge meets the I-80, a major freeway, has sustained higher pollution levels due to vehicles speeding up to cross under I-80 and merge onto the bridge. Knowing these areas of higher pollution levels allow regulators to identify opportunities to achieve greater air quality improvements.
One last cool application is Google’s Global Fishing Watch, which they’ve created with Oceana and SkyTruth. The initiative combines cloud computing technology with satellite data to provide the world’s first global view of commercial fishing activities. It gives anyone around the world — citizens, governments, industry, and researchers — a free online platform to track fishing activity worldwide. This data can help inform sustainable policy and identify suspicious behaviors for further investigation. By understanding what areas of the ocean are being heavily fished, agencies and governments can make important decisions about how much fishing should be allowed in any given area. This is a timely tool considering for every ten fish in the ocean fifty years ago, only one remains (14).
About Kate Brandt
Kate Brandt is Google’s Lead for Sustainability, spearheading sustainability across Google’s worldwide operations and products. In this role, Kate coordinates with Google’s data centers, real estate, supply chain, and product teams to ensure the company is capitalizing on opportunities to advance sustainability and the circular economy (don’t know what the circular economy is, check out episode 22).
Previously Kate served as the Nation's first Chief Sustainability Officer as well as several other senior roles in the US Federal Government. Kate received a Masters degree in International Relations from the University of Cambridge where she was a Gates Cambridge Scholar. She graduated with honors from Brown University. In other words, she’s a smarty pants.