tv Today in Washington CSPAN May 1, 2013 7:30am-9:01am EDT
7:30 am
you are a human being living in the 21st century. you are allowed to have time. you are allowed to have some time. and ironically the more time you take, the more time you have. it works the opposite way. the more you try to catch up, the last time you have. the faster you in your e-mails, the more enough coming. the slower your answer your enough the more you see people somehow solve those problems without you. and they really do. so i guess it's that, you know, that books are such a wonderful discipline. both as an author and the reader, they know. i'm reading this book. if i can create words for people to say no, i'm reading a book, then i've done my job. >> thank you. >> thank you. [applause] >> do you have any questions?
7:31 am
>> please wait for the microphone. she will be in charge. >> [inaudible] >> do you see any sort of correlations of doctor? >> absolutely. i mean, first of all, and this sounds awful but since when is unemployment a bad thing? seriously. everybody, we've got to create jobs, we've got to create jobs. do you want jobs? no, you won't stuff. you don't want jobs, you want the stuff you can get from having a job. jobs are an artifact of the and usher eyed. people used to have jobs. people made stuff and so that it wasn't until charter corporations cannot put anybody out of business by law that went to go work for a company and instead of being paid to make a
7:32 am
thing, you are paid for the time that you put in for the corporation. you were selling your time. that's what the bible calls slavery. that's indentured servitude. the only reason we need, we don't need jobs are we have enough stuff. there's more than enough houses. they are burning down houses and destroying them in california to keep the prices high because so many are in foreclosure. you can't let people just lived. we've got to tear them down. we are destroying food every month. we burn food to keep the market price and to die. so what if they're starving but the only reason we need people to have jobs is so that we can justify beating out the stuff we have to them. not because we need them to make more stuff but instead we create new excuse for people to have jobs to make more stuff that we end up putting in storage units. there's too much stuff to quit more stuff than we need. we create new excuses for people
7:33 am
to buy stuff on black friday. can we get them to consume more? are we tearing down more wood than building houses? if not it's not healthy. why? because of the international age requirement of the economy to grow. so i think, i don't know if in our lifetime but i do think we'll get to the place where we realize, we actually can have robots killing this will and do all the stuff in the fields, and we can just beat it and that's not a bad thing. when we work we work for me. when we worked we work to make things better. when we worked we are teaching children and feeding ourselves, not just trying to import more plastic crap from china in order to keep this economic machine going. it's a program that has outlived its welcome. >> do you think that the reason that digital platform has lashed
7:34 am
narrative that you were saying earlier based on the fact that there's a lack of context awareness may be? maybe how we were feeling, which is seems to be going that way, that would kind of be the problem? >> it might remedy the problem or they might create a new one. the illusion of narrative and digital technology is jenin what they call predictive modeling. so these big data analysis to figure out john is 12 but we can tell from his statistical profile that he will be day by the time he is 14, right? mary is 36 and we can tell from the way she is tweeting and whatever that she's probably going to be given with infertility issues. so they can then send you the ads and things that can help you manifest the person that you're most likely to be. that's not storytelling. that's life creation. that's turning people into
7:35 am
programs rather than letting people be the unpredictable quirky weird thing that they are. i do think that the context that you can get in digital spaces, but that context is much more like beavis & butthead or the simpsons or south park or mystery science theater. it's more of a kind of madison's ability that you see frames within frames within frames. you see the media and more of a fractal sense than you do in a linear sense. so the way we make sense of things is by recognizing like when you watch the senseless, once they hit on the simpson? is it that homer saves the nuclear power plant from disaster? know. it's that you recognize this is a satire. when you recognize that day, when you make the connection you do more oriented. so we're moving towards is a much more moment to moment -- moment that we do get from screens within screens from relationship of things to other things, from this to that.
7:36 am
getting the joke really more than getting to the end. >> you have the mic upside down. perfect moment. >> how do think present shock is affecting -- [inaudible] >> that's interesting. i mean, from the 1 cents we are learning to think of wars less as more we when -- you don't win wars it turns out. you never really wanted were. you just won the battle and killed people. but there is this sense of war now as an ongoing state. one really interesting phenomenon is, and i talked a lot about in the book, is drone fighting.
7:37 am
droned fighting is a digiphrenia approach to war. so here you are, you're the soldier, now you are in a room outside las vegas, right, flying a plane that in afghanistan killing people far away. then you take off her thing, getting your car, go home and get dinner with your wife and kids. it turns out that drone pilots are doing virtual combat experience higher levels of postevent stress disorder than the ones for any actual battlefield. i would argue that's because of this digiphrenia, because of this new way of fighting were in the present come a present shock universe where you're trying to manifest to cells that once the ultimate incompatible. and i think that's really what is going to start to happen now. and in some ways that's a positive sign. because we are saying oh, my gosh, no, there's this other
7:38 am
self-interest part of me, part of us doing this thing that's not consonant with our value of who we are, with what we believe in. we have to reconcile those two things. but the more we alienated we are from it the more it does, that's when it does get pretty weird. >> unita for the mass media audience. >> i think it's kind of misleading -- [inaudible]. >> why don't you just a machine world? >> i think the difference between living in machine world and living any more digital world. i don't think we live in a digital world. i think with the real world. we live in one that now dominated by kind of a digital
7:39 am
bias as opposed to the chemical buys. there's a few, the difference is mechanical age technology does the thing it does, a shovel digs. a car drives. a steam shovel does this. >> [inaudible] >> the difference, the kind of things i look at as digital age technologies, and you can argue they all just create more choice, would i look at digital age technology, like computers, robotics, genomics, nanotechnology. i look at things that you set in motion and then have something like a life of their own. they try to survive. that replicate. they change things. they keep going.
7:40 am
so i think there's a different bias. you can argue it's all part of the same continuing. it's just getting -- >> [inaudible]. >> you can, you can, but come you can but -- >> [inaudible] >> the kind of culture that builds around invention of that ends up for whatever reason being different than the kind of culture that builds up around the printing press. which for whatever reason is different than the kind of culture that builds up around computers and digital technology. it could be the reasons why could be completely stupid, right, and based on nothing but our perception of how those devices work in different periods. but they did a different what we call media environments are there different media environment. there's a light bulb, right, a
7:41 am
light bulb creates an environment of light. we don't care about the content. there's no content in the label and tell you put, negative words. but the lightbulb itself creates an environment. air-conditioning is a technology that creates an environment. fire creates an environment. television creates an environment. telegraph creates an environment. and digital technology creates an environment, too, but without being too technical determined about it, as our culture changes we a top digital tools. our values change. would change the tools that we give up and then those tools that we develop to change the way we thing -- the way we see things. in the back. >> how much, by cursing differences between the culture around print, how many of these things are apparent to environment of the digital culture how much, it's a new
7:42 am
media and is then unregulated and it's been a wild, wild west and you see it less like that? >> it's interesting. i mean, i always have felt like we're in danger of holding the digital media fiber into the industrial age. that's been the main thing that i've been try to kicking and screaming about since 1998. i did this talk why futuristic. the thing i got mad about is the thing was here. it was here. here we've got the technology. digital, digital means the digits. we can actually start making the world. and then i saw everyone talk about this thing is coming. "wired" magazine said this anon is coming that will change your business, invest in that. the long boom is happening so the industrial capitalism can keep growing and growing for ever. and then on the same day that jerry garcia died, netscape went
7:43 am
public. and i thought, i wonder, i wonder if the potential that i'm seeing for new kind of digital media environment is going to be subsumed by the industrial age. it may very well be but i believe if it is, most of us are going to die. i really believe that. that i think that would reached the limit of that way of doing things. and i'm trying to create the most appetizing way of describing what it might be like to live in a world where would have been more kind of steady-state, sustainable approach life together we stop looking at, at life as individuals or life as a nation, this thing that we're going to go and rather than thinking we're going to keep doing.
7:44 am
>> [inaudible] and using accounts or life is different. ask we can change all these other things different, too, like nations, all this other stuff as opposed to just letting you go back to the way it was. >> i don't know if you will go back to the way it was anyway. when i look at, and i've gotten to be in the boardrooms, when i look at the way corporations exist now, there's dying. corporations are sitting on all this money, corporations live on an economic operating system that was designed to help them collect money. they've gotten good at collecting money but they don't know how to make money with their money anymore. corporate profitability over assets has been going down steadily over the last 50 or 60 years. they don't know how to keep doing it. so i feel like they are at the crossroads, too. lots of signs of hope, glimmer of hope. i also think of a genuine media
7:45 am
renaissance but i don't call it a revolution. revolution is so 20th century, too. again it's around the clock, around the go, revolution. i think of it more as a renaissance when we retrieved old ideas and they are reborn in a new context. context. so the kindest ethic of repressed by the original renaissance, when we became centralized, when we became monarchy with a such a currency and charter corporations and all that stuff they got repressed. at a time if you like those are just seeking a. we are at the beginning of it. these things take hundreds of years happen or thousands of years to happen. i'm trying to remain hopeful. >> do you think digital technologies are helping or hurting societal goals like the end of poverty? >> i think they can help or
7:46 am
hurt. you know, when we're using digital technology to sell more cell phones to more people in less time and we stick kids in case, it's hurting. when we use digital technology and open source planning to give developing nations open-source blueprints on how to build agricultural machinery right where they are, it's helping. when we use it to exacerbate the monopolies corporate capitalism it's hurting. when we use it to allow for peer-to-peer marketplace is to emerge in alternative currencies and new economic models, it helps. i hate to call it a double edged sword but he seems to be. it's a very, very powerful stuff which is why i've always been advocating for people to learn how to use it. like a digital later published as a population that knows kung fu. and we can actually at least we don't know how to program
7:47 am
ourselves we will at least be aware of the biases of the digital environment we are inhabiting. if you look at something like facebook, li t recently dead and said this is not fun for me. this is actually making a few vulnerable and yucky and all these ways. i'm just going to stop. as long as i'm stopping i will publish this article. make some hay off of it and get some other people to follow me. there's always that. it's not progress on the grand scheme of things but to the extent that we feel that we are allowed to make choices about what technologies enhance our lives and what don't, it sort of the beginning of doing good things with tech rather than that. >> i think it's come up the obvious reaction digital stuff everywhere, consider returning back to things the way they were.
7:48 am
[inaudible] i think he said something really interesting about, like the values that got repressed before the revolution. i'm trees for you to expand on that. you touched on an indigenous cultures. what were some of those values that you might see comeback? >> right. well, the beauty of a renaissance is everything old is new again. it's not about going back to the good old days. you can't go back. and i'm a progressive, neither can you lean forward. leaned forward, sort of the opposite of traditional values. values. you can't going either way. we are actually right here now. but right here now means we can now be available to things that were unavailable to us before. so we do see, i mean, in its more primitive form, you go to burning man and it gets all like that because that's where we associate those means from,
7:49 am
those activities. the last memory we have of them is the medieval bazaar. so we bring that back. but ideally what we do is bring back any new context. instead of having a peer-to-peer grain-based, going to a grain store to get your stuff, you end up with a peer-to-peer that is on your iphone that you can use peer-to-peer authentication and actually go somewhere else with it. you bring the old forum. in terms of indigenous culture though, i mean, i do think they have access to certain things that we are only now finding again through science, kind of stuff like gosh, you say, for me to say we're just, there's a lunar cycle. everyone in the world knows there's a lunar cycle. men are like, really? i get it. we are dumb. in terms of western scientists
7:50 am
to go every week a lunar cycle they're sort of a different dominant mood and character and narrow chemistry. you go to any shaman and they would be like you know, that's called the move. -- the moon. that's why we do this. if you go back to the ancient jewish calendar before the romans corrupted, they had a true lunar calendar that was respecting a lot of these different rhythms but we lost that. so yes some of those are old and they can find, we can revive some ancient wisdom again. you start reading and you start saying they understood that there was a shape in time. there's a shape to time and maybe they knew it and we can combine that with computers. it was sort of like we have james joyce that part of it. now we are sort of trying to actually do it.
7:51 am
>> i'll sign books and stuff later. >> [inaudible] how does that logic speak to a place like india? >> i don't really, really no. i haven't spent time there. i've ended up thinking more about china maybe just because they are more in the mood than in the. and i look at moments but i'm looking at culture as inspectors i look at chinese olympics, 10,000 people standing there doing tai chi and unlike other kind of chemistry the industrial age promise or are they demonstrating an older clock and things are involved in? i sometimes worry for cultures
7:52 am
that adopt like india are adopted digital age kind of mentality without passing through the same kind of things. these are such powerful tools. i work at a place trying to teach go to people. in america you go on code a cabinet or learn code because they want to create a product. they want to make something and we started and learn how to code and launch an iphone app. when e-mail people in india they are like i want a degree, can you get a certificate so i can go and get a job? people are really poor, they do been allowed to do anything that they can get a job and make progress, it's like it's a great and beautiful thing but i also look to the even know what they're getting themselves into? they will become the work farms for american digital companies. we already have these labor males in china and india where
7:53 am
there doing the boring amazon repetitive tasks. this is a book, really this is a book more about culture than it is about anything else. because i'm a westerner, i'm an american, i'm writing it really has way more about us than it is, and it is about them although i'm really interested. i more interested in what the reaction would be. in other words, will the reaction, will i get e-mails from india about this book the wiki this is really interesting. we are suffering from the same things. or will be the silly americans, you didn't know this all along? if something i guess i will find out. >> unfortnuately, we have to stop there but you'll probably be happy to answer more questions. he will sign books over here on the history table. thank you all so much for coming.
7:54 am
[applause] >> this morning on c-span2, the center for strategic and international studies releases a report on the future of u.s. ground forces. panelists will discuss military budget cuts and the strategy of rebalancing from iraq and afghanistan to the asia-pacific region. live coverage begins at 9 a.m. eastern time. later, former state department officials from the george w. bush administration discuss the use of drones and targeted killings of u.s. citizens abroad. live coverage from the bipartisan policy center begins at 10 eastern on c-span. >> we do know that she was and what you got to the white house. but people think that she in fact didn't participate much and that isn't exactly true. she was very, very involved and she set up her own bedroom right across from the president's office basically, and she was
7:55 am
always able to hear what was going on. she was very active. she read daily newspapers, brought different points of view to the president, was able to?? calm himw down constantly. and, of course, she was the grandmother of the house as well as taking care of her daughter and grandchildren. >> our conversation on allies and johnson is now them on a website c-span.org/firstladies. tune in monday for next program on first lady julia grant. >> the new technological trend is the use of large amounts of data to predict human behavior and events. that's the central idea explored in the book "big data." this discussion is just over one hour. >> good evening and welcome to today's program of the commonwealth club of california. a place where you are in the know. i'm host of tech nation which airs on npr.org in its 20 for
7:56 am
our streams and also on the npr channel on ex-im sirius radio. i am your moderator for this program this evening. tonight's program is being held in association with the commonwealth club's science technology forum, exploring visions of the future through science and technology. find is on the internet at commonwealthclub.org. or download our iphone and android app. for program and schedule information. now it is my pleasure to introduce today's dissing which guests, viktor mayer-schonberger, professor of internet or government and regulations at oxford university, and kenneth cukier, data editor for the economist. together to britain a new book, "big data: a revolution that will transform how we live, work, and think." i had the distinct pleasure of interviewing the professor earlier today for broadcast to
7:57 am
be aired in the coming weeks. and i thought you should know a few other things about these fellows. professor viktor mayer-schonberger has more than one law degree, only one of which is from harvard. is not just a lawyer he's a lawyer lawyer and he is earned a master in economics from the london school of economics. with over 100 academic papers and several books to his credit, i think my favorite title is "delete: the virtue of forgetting in the digital age." is co-author trying to come you best know as longer as the economist or prior to being the data editor, he -- japan's business finance editor and global technology correspondent. you might also known as a technology editor for the asian "wall street journal" in hong kong, all very important because they did isn't just here in the united states, big dave is global. so please welcome viktor mayer-schonberger and kenneth
7:58 am
cukier. [applause] >> thank you very much. it's a pleasure to be here. welcome. "big data" is going to change how we live, work and think. and our journey begins with a story. the story begins with the flu. every year the winter flu kills tens of thousands of people around the world. but in 2009 a new virus was discovered, and experts fear it might kill tens of millions. there was no vaccine available to the best health authorities could do was to slow its spread, but to do that they needed to know where it was. in the u.s. the centers for disease control have doctors report cases, but collecting the data and analyzing it takes time. so the cdc's pictures of the
7:59 am
crisis was always a week or two behind. which is an eternity when a pandemic is under way. around the same time engineers at google develop an alternative way to predict the spread of the flu, not just nationally but down to regions in the united states. they use google searches. google handles more than 3 billion searches a day and saves them all. google took 50 million of the most common search terms that americans use and compared when and where these terms are searched for with flu data going back five years. the idea was to predict the spread of the flu through web searches alone. they struck gold. what you're looking at right now is a graph, and the graph is showing that after crunching
8:00 am
through almost half a billion mathematical models, google identified 45 search terms that predicted the spread of the flu with a high degree of accuracy. here you can see official date of the cdc and alongside our google's predicted that from its search query. but where the cdc has a two-week reporting lag, google can spot the spread of the flu almost in real time. strikingly, google's message does not involve distributing mouth swabs are contacting physicians offices. instead it is built on big data. the billy to harvest data to produce an insight. let's look at another example. a company come in 2003, a computer science professor was taking an airplane and he knew to do what we all think we know to do, which is he bought his
8:01 am
ticket well in advance of the day of departure. that made sense. at 30,000 feet the devil got the better of him and he couldn't help but ask a passenger next to him how much he paid attention of the person paid considerably less. he asked another passenger how much the person day. he also paid less can even though they're both bought the ticket much later than he had. he was upset. who wouldn't be? but he was a computer science professor so not only did he get upset, what he realized is he didn't need to know what are the reasons on how to save money on airfare whether you should buy in advance, whether something called a saturday night stay, that might affect the price. instead he realized the answer was kind of hidden in plain sight. it was over for the taking which is to say, all you needed to know was the price that every
8:02 am
other passenger paid on every single other airline, for every single seat, for every single out for all of americans for an entire year or longer. this is a big data problem. he scraped a little bit of data and he found out that he could predict the high degree of accuracy whether a price these are presented online at a travel site is a good price and you should buy the ticket right away or whether you should wait and buy it later. the price is likely to go down. he called his research project hamlet, to buy or not to buy, that is the question. but a little data got him a good prediction. a few years later he was crunching 75 billion flight price records with which to make his prediction almost every single flight in america civil aviation for an entire year, and that was very good in indeed.
8:03 am
microsoft knocked on his door and he sold his company for $100 million. the point here is that data was generated for one purpose, to be used for another. information has come a raw potato business. it has become a new economic influence. so tempting to think of big data in terms of, it's true, our world was awash with david and the amount of the digital difference being collected is growing fast. saddling almost every three years. the trend is obvious when you look at the science. when the slow and telescope began in 2000, it gathered more data in its first few weeks than have been amassed in the entire history of astronomy. or what 10 years, the telescope
8:04 am
collected astronomy data exceeding 140 terabytes of information, but the success of the telescope due to come online in 2016 would acquire that amount of data every five days. companies are drowning in data. twitter exceed 400 million. youtube has more than 800 million monthly users who upload an hour of video every single second. on facebook, over 10 million photos are uploaded every hour. google processes -- around 100 times the quantity of all printed material in the u.s. library of congress. the quantity of data in the world is estimated in 2013 to reach around 1.2 zettabytes of
8:05 am
which only a small percentage is less, is non-digital. it's tempting to follow the cycle of silicon valley and to see big data as one characterized by the sheer size of digital information collected and used worldwide. but that would be like describing an elephant by the size of its footprint. contrary, we suggest big data is more than just about the volume. we suggest reinforcing quality to characterize big data. more messy and correlation. first, more. today we can select and analyze far more data about a particular problem or phenomenon than ever before. data point it's the relevant size relative to the phenomenon we study. that gets us a remarkably clear
8:06 am
view of the granular, details of conventional sampling that ss. we also can let the data speak, and that often reveals insights that we never would have thought of. the second quality of big data is this embrace of messiness. looking at vastly more data, this doesn't listen up our desire for exact. our ability to measure was limited. we had to treat what we did bother to quantify as precise as possible. in contrast big data is often messy in quality but rather than going after exact and measuring and collecting small quantities of data at the cost, with big data will accept a little bit of messiness. and often be satisfied with a sense of general direction rather than striving to know a phenomenon down to the inch, the
8:07 am
penny, the adam. we don't give up exact entirely. we only give up our singular devotion to it. what we lose in accuracy at the micro level we gain insight at the macro level. these two ships morph and messy way to a third one on a more important in important change. a move away from the age old search for causality but instead of casting why, looking for allusive cozza relationships come in many instances we can simply ask what. and often that is good enough. now, that's hard for us humans to comprehend. because as humans we are conditioned, s some might even argue hard wired, to understand the world as a series of causes and effects. makes the world comprehensive.
8:08 am
comforting, it's a reassuring. oftentimes just plain wrong. if we fall sick after we ate at a new restaurant, our hunch will tell us that it was the food. even though it's far more likely that we got the stomach bug by shaking hands with a colleague. these quick harmful hunches often lead us down the wrong path. with big data we now have an alternative available. instead of looking for the causes, we can go for coalition, for uncovering connections and associations between variables that we might not have known otherwise. correlations make predictions and recommendations to customers. correlations are at the heart of google's translation service. they do not tell us why and the do not know why, but what. at a crucial moment and a time
8:09 am
for us to act. >> these three features of big data, more, messy and correlations, are used today to save lives. miniature babies are prone to infection. very important to know infections very early on, but how do you do that? and analog small data world you would take vital signs every couple of hours, oxygenation level, heartbeat, heart rate, these type things. now, part of our research project in canada, researchers collect 16 real-time data flows from premature babies and collect about over 1000 real-time data points each second from them. then they combine the data and
8:10 am
look for patterns, look for correlations. and we're able to spot the onset of an infection 24 hours in advance. way before the first symptoms would manifest themselves. it's incredibly important for these preemies because then they can receive medication well before the infection is strong and perhaps not be battled successfully. perhaps intriguingly, the best predictor for these vital signs is not at the finals go haywire, but that they actually stabilize. we don't know why, but we do know that in the small data age a doctor would look at the stabilization of vital signs and say, the baby is doing well, i can go home for the night. now we know that that means the baby might actually be in
8:11 am
trouble and might need extra monitoring. it's also a wonderful example of the fundamental features of big data, more, messy and correlations. the data was much more than we typically processed. the data was so fast that it wasn't any claim for. it was messy, and the findings were correlations. they answered what was happening, but not why, not the biological mechanism at work. >> now, often big data has been portrayed as a consequence of the digital age, but that misses the point. what really matters is that we are taking things that we never really thought of as informational, and rendering it into data form. once it is data we can use it,
8:12 am
process it, store it, analyze it and extract new value from. think of location. people have always existed somewhere. nature has always existed somewhere but it was only until recently that we've added on longitude and latitude, then gps, and now a smartphone that we're all probably carrying in our pockets. that now our location has been data fight. all the time. think of books. think of words. in the past we would look up to the template of delphi to c. to mottos etched in stone. later, we have books and even more recently we stand those books, google for example, when too many libraries and scanned books. the first thing that was a
8:13 am
digital rendering of what was on the page. it was digitized. the book was digitized and the digital words. we get some of benefits, we can store it easily, we can process it, we can't process a per se but we can certainly share it. what we can't do is analyze it. it simply an image file but the words themselves have not been datafied. so what happens when we can take those words and extract it and treat these words at data? suddenly what researchers are doing is they're looking back at all the journal articles in the medical sciences, going back centuries, these are hundreds of thousands of articles and they're looking for side effects. a human being reading these journals for century would not be able to spot some of the weird correlation of drug side effects. but a machine can. big daddy kane and that's what you get from the word datafied. all of you in the audience right
8:14 am
now, think of i it in terms of something as fundamental as posture. the way that you are sitting and you are sitting and you and you. it's all different. in fact, the way you are sitting is a function of your weight and the distribution of your weight and you like life. and if we are to measure it in instrument it would maybe, the way you use it would be personal. it would look like a fingerprint. one would fit differently than another. so what do we do with this? researchers in tokyo right now are placing centers into car seats. it's an antitheft device. suddenly the car would know when someone else is driving it and maybe you would put the controls as that was happening, you would call count the engine. if you have a teenager this might be a very useful thing to say that you're not allowed to drive the beamer after 10 p.m. and just like cinderella, turned into a pumpkin, the car engine doesn't start. that's great.
8:15 am
8:16 am
>> traditionally, data was processed for primary purpose with little thought given, but this is changing. the core economic point is that a myriad of reuses of the information are possible that it can unleash new services or prove existing ones. the value of data shifts from the reason it was collected and immediate uses on the surface to the subsequent uses not apparent initially, but are worth a lot. think of driverly vehicles. ups has 60,000 vans oned road. they need maintenance on this. it's a problem, but a problem to be fixed with information. when a car breaks down, it doesn't break down all at once.
8:17 am
it sort of lets you know it. for example, you might be driving it, and it feels follow-upny; right? there's a strange sound it normally doesn't have. if we place sewn sore -- sensors in the enjoy, we data find, measure the vibration or measure the heat, and we can compare that signature with what a normal engine sounds like and what the likely problem is, and suddenly what we can do and what ups does to save money is predict a break down called predictive maintenance. they are able to identify when the reading tells the the heat's going up or out of bounds, you need to bring the van into a service station and get it tune-up and replace a part. they are able to replace the part before it breaks. the company uses data from a hundred million cars to predict
8:18 am
traffic flow, and in the business model, it's to predict how long it's going to take for you to go from one place to another. it's a traffic prediction service. here, what ri that doing is reusing the data and turning it into a new form of economic value because there's a correlation between the road traffic in a city and its economic health, but there's more. one investment fund uses the data from the weekend traffic around the large national retailer because it correlates very strongly with its sales. you can see where this is headed. it can measure the road traffic in the proximity of the stores, and then it can trade that company share prior to the quarterly earnings announcement because it has a lens into whether the sales are going to increase or decrease.
8:19 am
big data overs extraordinary benefits. unfortunately, it also has a dark side. as we just heard so much of data's value remains hidden, ready to be unearthed by secondary uses. this puts big data on direct collision course with how we currently protect information privacy through telling individuals at the point of collection which were notice and consent, why we gather data and ask for their consent, but in the big data age, we simply do not know when we collect the data for what purposes we'll be using it in the future, so as we reap the benefits of big data,
8:20 am
the core mechanisms of privacy protection is rendered ineffective, but there's another dark side, algorithms predicting human behavior that we are likely to do, how we will behave rather than how we have behaved and penalizing us for it before we even have committed the infraction, and if you think of minority reports, that's exactly right. in a way, that provides value; right? isn't prevention through probabilities better than punishment after the fact? yet, such a big data use would be terribly misguided. for starters, they only reflect the statistical probability. we would punish people without
8:21 am
certainty of negating a fundmental tennant of justice. before action has taken place and punishing individuals involved in it, we essentially denied them human that ligs, the ability to live our lives freely and to decide whether and when to act. in a world of predictive punishment, we never know whether or not somebody would actually committed the crime or let fate play out, holding people responsible on the basis of big data analysis that can never be disproven. let's be careful. let's be careful here. the culprit is not big data itself. the culprit is how we use it. the crux is with holding people responsible for actions they have yet to commit is using big data correlation, the what, to
8:22 am
make causal decisions about individual responsibility, the why. as we have explained, big data correlations cannot tells about the why, the causality about things. often, that's good enough, but it makes correlations singularly unfit to decide who to punish and who to hold responsible. the trouble is that we humans are trying to see the world through the lens of causes and effect. that's big data is under constant threat of being abused for causal purposes and threaten to imprison us, perhaps, literally, in probability. what can we do? to begin with, there's no denying of big data's dark side. reap the benefits of big data if
8:23 am
we also expose its evils and discuss them openly, and we need to think about how to contain evils and how to prevent the dark side from taking control. one suggestion is privacy in the data age needs ad moo -- modified foundation. privacy by the individual has to be augmented by direct accountability of the data users. second and importantly, on the dangers of punishing people on predictions rather than actual behavior, we've to expand our understanding of justice. justice is just different in the big data age than the small data age. the big data age requires us to enact safeguards for human free will as much as we currently protect procedure status. government must never hold an
8:24 am
individual responsible for what they are only predicted to do. third, most big data analysis i going into the future's too complex for the individuals affected to comprehend. if we want to protect privacy in the big data age, we need help, professional help. much like privacy offers us aid in ensuring privacy measures are in place, we envision a new path of experts, call them ailing rite ms -- algorithms if you want, big data, lead viewers of big data predictionment we see them take a vow of impartiality, of confidentiality, and of professionalism like civic engineers or civil engineers or
8:25 am
doctors do. of course, big data requires more than these individual right safe fards to fulfill amazing potential. for instance, we may need to ensure data is not held by an ever smaller number of big data holders. much like previous generations rose to the challenge held by the robber barrens that dominated railways and steel manufacturing in the 19th century. we may need to constrain the reach of national data barrens and to ensure big data markets stay competitive. we have seen risks of big data and how to control them. there's unique challenges, in the big data age, society that has to be extra vigilant to
8:26 am
guard against, what we call the dictatorship of data. it's the idea we fetishize the data, end dough it with more meaning and importance than it deserves. as big data starts to play a part in all areas of life, this ten sigh to place trust in the data and cut off our common sense may only grow. placing one's trust in data without a deep appreciation of what the data means and an understanding of its limitations can lead to terrible consequences. in american history can be a war on behalf of the data point. the wall of vietnam and the data point was the body count used to measure progress when the situation was far, far more complex, so in the big data age, it's critical we do not follow blindly the path that big data
8:27 am
seems to set. big data will help us, understand the world better, improve how we make decisions from what medical treatments work to how we educate our children to how a car can drive itself. it can also bring challenges and a danger. we have to harness the technology understanding that we remain its master, that just as there's a vital need to learn from data, we also need to carve out a space for the human, for our reasons, our imagination, for acting in defiance of what the data says because the data is always just a shadow of reality, and, therefore, it is always imperfect, always
8:28 am
incomplete. as we walk into the big data age, we have to do so with humility and humanity. thank you very much. [applause] >> wonderful, our thanks to the professor at oxford university, and dennis, data editor for the economist, co-authors of "big data: a revolution transform how we live, work, and think." [applause] now time for the audience question period with a number of questions, and i looked at some of them already. we have more, please turn them in, and if i could ask to have those over on this side as well, we'll do that. i want to get to everyone's questions. i asked the last one. i saw the title before i met
8:29 am
these guys who i'm crazy about, and we could talk for another three days, is that everyone always writes these books where the subtitle is live, work, and play, but not them, it's live, work, and think. they are all business. on top of that, it's not like being at work, my kind of work, and it's really, really like the questions, and it starts, a couple of the dark sides, what is the worst? the negative, the loss of privacy, well, the list goes on. what's the take back? >> i mentioned the danger of intensity, the dataship of data, and the privacy challenge. the privacy challenge is one
8:30 am
because mechanisms we protect privacy, but we, writing the book, we thought more that the propensity challenge is one that got often overlooked, but going forward is going to become incredibly important, and so what we really thought we want to impress on the audience is not only that big data may challenge informational privacy that we have, but it really does challenge the role of free will and human vie ligs. there's a number of possibilities to do that, but that's really what keeps me awake at night. >> well, in a real sense, we're
8:31 am
all automatically collecting and decimating all the data, many we generate ourselves. if you're been to a hospital, you give data out, not just what you sign op a form. do we really have an expectation of privacy in the big data age in >> well, in some instances, we have to ask the question should we have an expectation of privacy? let's take health care as example. we developed a legal regime that actively blocks the sharing of health care data. you can just imagine in a hundred years our children are going to look back on us and be bewildered how we could have ever left priceless information that would improve care, just slip away, and, n., have a federal government that actively blocked it, not just here in america, but around the world. in fact, what we have to do is have a healthy debate and change the narrative entirely and say, well, perhaps we should make it as a condition of citizenship that all health care data of the
8:32 am
individual gets shared. now, it is true that there is a problem. there's a risk of inadvertent disclose shore that leads to bad consequences for individuals. look at mechanisms that try to prevent that and police that, but just learning from the data is certainly a social goal. >> you know, it's very interesting because to say we'll deidentify or encrypt that, defie you to leave the room and not leave d enrings a behind. we know who you are. dna data on somebody, we know who you are, so in a real sense, this is a conundrum. go ahead, victor. >> right, but what's happening right now is that a lot of the health care data gets collected, and then it is being used by the health care provider or the insurance company, perhaps to discriminate, perhaps not to discriminate, depending on the regulatory regime in which you live. what the data is rarely used for is research into what could actually help you if you have the condition or how that
8:33 am
condition could be prevented, but what we really need to do is unleash the power of big data of the research side rather than to unleash the power of big data on the sort of cross effectiveness of the insurance policy side, and we're pretty bad at that. >> you know, in science, we have, you know, 2 # 20 people in the study or 16 people in the study, or even 500 people in the study, and very few long term clinical studies with 15,000 to 20,000 people in it with big data. we can actually change the face of science; right? >> absolutely, and it's about laughable because we were in silicon valley today, speaking of facebook, and you absolutely know that facebook would never dare change a pixel on the website unless it actually tested it with millions of people, yet we approve drugs with a few hundred. it's laughable. it really underscores the degree to which we need a new mind set. >> reminding you you're
8:34 am
listening to the common wealth radio program, guests are victor, regulation at oxford university, and data editor for "the economist" discussing the pluses and pitfalls of big data. there's video of commonwealth programs online at for atv, and everybody wants to know, ken, you know, data editor, what is -- you're not a data input clerk; right? >> i'm not. >> okay. now we have a promotion from the universe. what's a data ed tosser? >> it's a new title. i'm the first one. we recognize there's new techniques in viz railizing data to use data as the basis of stories instead of, if you will, antedote based journalism where
8:35 am
you talk to a source and pattern, recognition stories that we can interview a data base, just our sources lie to us as a journalist, but data can lie as well. we have to keep our suspicions up, but what we can do is crunch off numbers to visualize it, and tell a story, and there's a service provider of the rest of the organization to see if that happens. >> in january and february, there was a flu outbreak, the midwest, the east coast, flu telling us where it was, and apparently, google flu overestimated the flu outbreak. what happened? >> well, first of all, it's a prediction. it's a prediction that tells us
8:36 am
that says 15% of the time you're wrong. it's a prediction game, of course, in is is a dynamic world in which you need to rerun your models all the time because if cnn reports on flu trends or reports on the flu season, people might google the flu even though they don't have it. there's a feedback mechanism in place, and, of course, google is compared to the center of disease control data so maybe the fluke is control data rather than the google data. we don't know, and so we should not do is to immediately create a causal link and say, oh, this must be because google's model is one, that because, that causality part, that's dangerous in a big data world. shouldn't do that. when we look at these spikes and so forth, we should invest gait
8:37 am
with an open eye and mind. >> yeah, i would just underline what victor said which is to say the presumption in the question is that actually the cdc represents the true flu cases; right? google is just simply a shadow of that; right? that may not be true. it could be the inverse. for example, when they did the fitting of the model, we were not in the recession. perhaps now with the symptoms, they don't go to the doctor because they can't afford to take off a day of work or afford it. the trend could be accurate, and the cdc data has more availability. >> well, i'll let you introduce the cdc. they'll be delighted to hear that. >> they will be. >> no, i'm actually going with you in all directions.
8:38 am
once you know something is being collected, in science, you know that, it's why there's double blood type studies. it's like this is you're doing it, game the system, you know -- i mean, i go out and find out if i have flu symptoms? i mean, we don't know. once it becomes public, how people behave change, and the data collect the is there, so that's tough as well as i think it's a massive point that you have is that a cdc is only reporting those people who go to doctors. that change dramatically, even with the internet, you know, even with the internet so we have to really be good at this big data role of analytics. for some people, algorithmists, based on algorithm, and that has some 50 years ago, donaldson
8:39 am
news day, the best way to explain it, a stanford professor in computer science had a list, a wonderful guy, obviously, wrote a book called "fundamental algorithms, a textbook, and all undergraduate computer and science measures have to take this, and the beauty of the book is that in the beginning, he quotes from the betty crocker cook book how it's recipes, step by step, and we checked everything we can, and we tried to bring you a precisely how we are doing everything, studied it, and that's all an algorithm is. when you have an algorithmsist, when you have that, this is someone coming from the algorithm and how we look at big data and account for what we toss around here because it's a dynamic kind of a thing.
8:40 am
you did speak, victor, about this new job category, if you will. what are the call fieres? who goes into this? what does one need? >> an excellent question. the coming generation of algorithms -- algorithmists need to know how to put it in storming, not in the old-fashioned structured way, but in a more unstructured storage that we see today with technologists, and then they need to look at the data and analyze it. they need to use physical packages, network analysis tools, there's a wide variety of tools and messes available. they might also need good grounding in the latest of statistics. a lot of the statistic messages we use were designed for small data range. there could be need to upgrade
8:41 am
or improve them to an extent, and then they might need aceps of visualizing the data if we go into the big data age, and in addition to all of that, we like to view them with a theoretical grounding of not just maltmatics, but perhaps the loss and more general theory. often times people who are doing very well as algorithmists are those who come from the natural scientists, particularly physicists who are well trained to deal with huge amounts of data, either through the astronomy through telescopes and data gathering there or take accelerators, and so that is the kind of mixed interdisciplinary, multidisciplinary mix we need, and unfortunately, relatively
8:42 am
few universities around the world have programs yet to educate algorithmists rite ms. i hope that changes. >> now, we have traditional statistic, which i did not do well, probably not alone there, and we all remember them, standard key tests, square, levels, ect.. do we have new techniques for data, data merging itself? >> yes, there are. we are looking for more approaches to look for lippier regression, that is linnier relationships. if a increases, then b will increase or decrease in the same way. a lot of times, that's not the case. it's much more complex, but relationship might be more difficult than that, and so we
8:43 am
need some advances there. we need insight there. we need better ways to measure the sickness of a model to data, and today, stay statisticians went around to talk about how a particular model fits data. in the big data world, we have to upgrade a lot of these tools, these methods available. that doesn't mean the tools are bad. that just means there's room for improvement. >> victor, also, is it possible to -- what can we regulate with respect to big data and what can't we? >> boy, you always throw curve ball questions. [laughter] what can we regulate? i think what we need to do is to make sure that we are not stifling innovation in the big data age. we both agree strongly that big data, that the benefits of big
8:44 am
data outweigh the risks of the draw back, but that doesn't mean we have to take the risks lightly, and so we need to focus on the risks. we talked about the private challenge, the propensity challenge, and the dataship of data challenge, and we need to find pragmatic solutions, pragmatic safeguards. in the book, we go into quite a bit of detail in chapter 8 how to do that, in a innovation market friendly way that still ensures societies and the individuals are going to be protected, so you ask what can we not regulate? that's the very question -- here's not a very good answer. i'll up crystallize the issue. [laughter] right new, if we go to a doctor, and we are told we have to have an operation, we can ask the doctor, why, and the doctor can tell us saying, well, i learned this in medical school, this is
8:45 am
why you need the operation. he can point to the textbook or literature, but 30 years from now that the doctor is not making decisions belying belief, but using big data algorithms like a commercial airline pilot would never dare land a plane without the benefit of the instruments of the auto pilot. we would ask the doctor why do i need the operation? the risk is that the doctor says, i don't know. you can also say this, you know, more regimely. you may ask the bank that dpns you a loan, why was i denied a loan? today it's because your credit rating missing these factors. this is how you score. what if of a thousand variables looked at, there was 400 strong variables and 600 weak, and all of theme, in a complicated formula tailored to the individual always changing over time of the reason why you were denied a loan. where would the fairness and
8:46 am
transparency be? hence the role of the algorithmists rite mist is there to break up the box and give the public the confidence it needs for big data to go forward. >> i think the whole idea of social responsibility, i think, society and government, can't break them apart. in this case, here's the city of san fransisco, roughly three-quarters of a million people in the center, much larger bay area, say 10 million people go that far, and what should the city of san fransisco do, priority, one, two, three about big data? >> well, simple. the first thing you need -- >> i love he said "simple." yeah. >> yeah, great confidence. it's simple because, first, san fransisco is in the leadership position in the united states in terms of collecting data and using data age a and opening the data up. we should applaud the -- >> in what way? i didn't know that. >> yeah, well, there was a
8:47 am
gentleman, i want to say the name chris vine comes to mine, the cto of the city of san fransisco who now, i believe, works at the white house who kind of took a strong leadership position in getting the government to open up data such as crime reports and the public transport data so that developers can build apps alongside it. he's hosted developers to come in and build apps and things called hack-a-thons to bring developer in together. san fransisco is doing very good things, but where the real gem stone is in the united states is in new york city, and there they have a director of analytics, so san fransisco might want to look at that model, and what he's done is the fellow created a small team to agent as a service provider to the other administrative agencies in the city. one of the problems the city faces is overcrowded buildings basically, so you imagine you stuffed in ten times or a hundred times as many people in
8:48 am
a building that it should have, and those buildings are huge risk of buyers, and when those buildings catch on fire, the likelihood of a fireman to be euroinjured or die is extremely high so what he's done is said we don't know which buildings at the outset are those at most risk for fire, the worst offenders, versus those that are just a problem. we have 65,000 complaints a year to the help line so we just have 200 inspectors. how can data help us? he built a predictive model bringing in all of the data from other agencies like ambulance visits, whether there's a financial lean, and whether whether there's brick work done, all in the model, and now when the inspector goes, in the past there was a vacate order to clear the building, and in 15% of the visits, and now they do it in 70% of the visits, a five-fold increase in efficiency
8:49 am
so that inspectors love it so they are more effective. the mayor loves it, bloomberg, a data guy, doing more for less in the age of austerity, and the fire department loves it because it means less danger for the fire minute. >> if we get the right data and right analytics to the right people, we can make a big difference? now, i'm always -- i get chagrined and others as well saying, no, we talked to 12 people and describes 12 million, and this is the things like if i don't believe you with some people, how will i believe you with all these people? what's the argument there? just going to get worse? >> well, in a way, if i take your question in a larger context at the way that gets of the heart of what big data is. in a way, in the -- in the small
8:50 am
data age, the way we approach problem solving, decision making, was we would -- because we were starved for data, there's a theory about how the world would work, and based on this theory, would then develop a hypothesis, and then based on the hypothesis, we go out and collect the sma data necessary, the sample necessary to prove or disprove the hypothesis, and then if it was disproven, we go back and change a little of the hypothesis and try all over again to collect the data of the analysis and so forth. that was called the scientific method, a trial and error, step-by-step approach working reasonably well, and the small data age, but it is an an arian my fact in the small data age. in the big data age, we can do more, if we have not just a sample of data, but close to all of it, we can look at details,
8:51 am
general areas, look at subgroups and sub categories we couldn't do before, but we can also let the data speak in the sense that we can use big data to produce hypothesis and test them. take the example of google flu friends. when they trieded to find out for 50 million searches, which 45 searches for the best to predict the spread of the flu, they had no clue what 50 million to pick. in the old-fashioned day, pick the first, try it, didn't work, pick the next, try it, didn't work, every time, sample again; right? potentially. now, that's crazy to do that. what is the exact combination of doing that? that's crazy to do that. what you want is a method by which you can create a way of
8:52 am
producing hypothesis and testing them. in a way, we are using big data analysis not just to tell us whether we're right or wrong about a hypothesis, but come up with good hypothesis, and that's just the teaching. >> now, are they checking the terms just because they computationally can be check at the same time, oh there's a new strategy for coming up with the terms? >> yes, so what they are doing is they take the 50 million most common search terms, and they essentially try each one to see if it improves the model, and if there's one good with the model, they try another. >> are they saying, does it say "elephant" or "sniffles"? >> no, that's the point. it's essential. they are not making prejudgments of what the useful term is. so, for example, in the top 100 terms was the term "high school
8:53 am
basketball." high school basketball is played in winter time, seasonal flu happens in the winter. there's a fit there; right? there's a correlation, but keep in mind it was deep in the 60s or 70s that term. the model tried the 44th term. it was good. it improved the predisabilitiability of the model. it tried the 45th term. it worked. it proved the predictability of the model. tried the 46th term, the model deteriorated, they cut it off at 45. >> so what's the model -- what do they compare it to? to what the cdc says? >> cdc storage data. >> okay, which we decided is okay, no problem. >> keep in mind, there's another wrinkle to it. if you get new data from the cdc, you reround the model; right? you continue to learn from the past. >> and you check it against the past as well to see this. it's, of obviously, tricky,
8:54 am
progress eve, gets better over time as long as you don't think at any one point in time you have the data, and there is the answer. i think that's part of it. we're moving ahead now, billing a history of big data now that we didn't have before, really in the big thing, and, honestly, i did not come up with this question. given this field, this person, given the field is growing so fast, estimate of shelf life of your book? >> you know, a book is timeless. >> will be on the shelf the rest of your lives, how's that? >> the book is timeless, and the reason why -- [laughter] >> he would say that, would he not; right? [laughter] >> and the reason is that it is the first book out of the gate to define this new trend, and this trend is not going to affect business. it's not beginning to affect government. it's not going to affect health care. it's going to affect everything. it's like computing in the 1950s. if you're asked, well, where do you think computing's going to
8:55 am
go next? what industries would be useful? the person has to honestly answer that's not the right question because by the year 2013, computers will have warmed their way into everything until they are almost invisible; right? so, too, society now is going to learn from data. it's going to be data finding, learning from big data. change the way we approach things. the future is going to be based on information. the way we have self-driving cars is not because we program a computer to drive a car. we tried that. it failed. it's because we pour in a lot of data and let the statistics of the machine teach itself, infer what to do. the light a red, the light is green, accelerate. >> now, a member that social network participants are not representative, and this is just one example within really large
8:56 am
data sets, like all the dna, all the dna generated by the complete genomes generated by the new national cancer hub, ect.. we're talking about the large complex data sets, this if you just can't look at it and easily in the excel spread sheet, it's big data. you are correct. i think that answers -- asks some of the questions done here. how do you judge the quality of the decisions that we go? >> i think that the limitation of data sets that you collect, otherwise you run the risk of repeating the problem of 1936, if i recall correctly, where the reader's die jest erroneously
8:57 am
predicted a republican landslide in the presidential election. they did that because their sample was bias. now, they had a large sample, and if you have half a percent of the population you sample, but you do it well, then that gives you a good position of what the public thinks. if there's 3% or 5% bias in the sample, that doesn't improve anything, that makes it work, but if you, in the big data age, collect 97 or 99% of the whole data, then even if that is slightly bias, that 1% that you're not collecting is not going to undo it, but we have
8:58 am
the right direction, which is often good enough. >> it's like how it a hundred french mep be wrong, it's not right, but it's such a great body here, and intuitively you know, and there's not an example to disprove that. now, can you discuss -- this is the last question unless you don't have an answer, then there's another one -- because we actually have come to the end here -- can you discuss big data as it per tapes to climate change? >> yes, obviously. big data is important for all global challenges, but at the first step, what we need to do is quantity my the problem, and so the era of quantify cation -- quantify cation is moving to datafication. we are all sensors with the
8:59 am
mobile phone. a company has a clever app, take a photograph of a leaf or an animal on a path, and it'll tell you what it is. the service is not really designed for you to be able to identify what the leaf is and the tree, but it's now when you have many, many people doing this is able to identify is spring coming early this week? or this year, rather, or are these what exist in a certain climent suggesting that climate change is creeping up further. this is to say as we instrument the lives, bedrock of society, we are going to be able to put a measure and a quantitity to things and things like climate change, may not avert it with big data, probably can, but we can identify it and take steps. >> thanks to victor, professor of internet governance and regulation at oxford university,
9:00 am
and dennis, co-editor for the "economist," awe though res "big data: a revolution that will transform how we live, work, and think." we thank the add -- audiences here and on the internet held by the commonwealth club technology forum exploring visions of the future through science and technology. we also want to remind everyone here that copies of the guests' new book are in the lobby on sale and pleased to sign them outside the room immediately following the program, and we appreciate you letting them make their way to the signing table as quickly as possible. i'm host of tech nation on npr, and now this meeting of the commonwealth club of california, the place where you're in the know, is adjourned. [applause]
85 Views
IN COLLECTIONS
CSPAN2 Television Archive Television Archive News Search ServiceUploaded by TV Archive on