tv The Communicators Brewster Kahle Internet Archive CSPAN April 4, 2020 5:07am-5:48am EDT
5:07 am
continues, to provide an unfiltered yell government. theady, we have brought you presidential impeachment process and now, the federal response to the coronavirus. on can watch all of c-span television, online, or visit on our free radio app and be part of the national conversation through c-span staley "the washington journal" programs or our social media scene. by private industry has a public service and brought to you today by your television provider. the communicators is at the state of the net conference in washington, d.c.. wewill show you interviews conducted with members of congress, government officials, and technology leaders. brewster kahle, what do you do for a living? >> i run internet archive, and internet library on the internet
5:08 am
that gives away software for free, trying to build the ofernet into the library alexandria for the digital age. peter: that sounds like the internet, doesn't it? brewster: the internet is getting there, but the average life of a webpage is only 100 days before it is changed or deleted. peter: 100 days? brewster: so we've built our culture on ever shifting sand. the archive takes a snapshot of the webpages on websites every two months. snapshot, snapshot. it's been doing it since 1996 and offers it as a free service of the way back machine on archive.org and it is used by hundreds of thousands of people a day. all of these things have disappeared either maliciously or sometimes just -- they drop off the net. peter: how many websites are there today? brewster: hundreds of millions, and they are coming and going all of the time, that we collect
5:09 am
about 800 million pages every day. the total collection of about 800 billion urls. it is kind of huge, and it turns out that is only part of what we do. we also archived television, abc, nbc, fox, but also international television and if you go to tv.archive.org, you can find clips of what other people said and put those on blog posts. the idea is so people can quote, compare and contrast, the critically about what happens on television. with jonaily show stewart, he did something like that with going and saying he said this, now he said that. can we do that now? it is used by journalists and end-users all the time. it is a free library, a library on the internet. peter: why couldn't i just go to google and type in jon stewart?
5:10 am
brewster: still find the jon stewart show and they may have put up certain clips from their on youtube, you might see a smattering, but you don't know what show it came from, it doesn't have that context of television. hours is just a run of television. pick bits and pieces of television before we shut it down, make it so the publishers aren't unhappy with us but if you want the whole thing, we printed on a dvd or thumb drive and lend it to you, and you have to send it back. people want it for documentary and the like and go to the publishers to say can i use this clip for my documentary? thes just like a library in sense that you are borrowing things from the library. we also do this with books. we digitized several thousand books a day, about a million books a year now, and digitizing
5:11 am
these and weaving them into the net so that more and more wikipedia footnotes, if you go to a footnote and it has a page number, click on it and it opens right to the right page. back and pagepage forward but if you want more of it, you have to borrow it and if someone has already checked it out, you have to wait, but at least you get a couple of pages see you can fact check and go deeper than wikipedia. if wikipedia is the encyclopedia of the internet, we want to be the library of the internet. where do you go deeper? how do you get to the publi shed pages of humankind? what kind of law department do you have to have at the internet archive to handle all the rights? brewster: there isn't any law department at all. are amy library, -- library.
5:12 am
is to not offend people or take -- make and feel like they have been taken advantage of so we don't make any money. with very nonprofit library and we cut short, like television. is just clips. cd's,sic collection, with we try to link it over to spotify, so we have the albemarle -- album art, but it is only selections -- 30 seconds. musset's older stuff. thoses wacky and fun, so are downloadable and you can listen to them, but they sound -- like the ones you crank up, the horn and the dog, like that. is largelyntury forgotten because it wasn't put on to records and cds. peter: how are you funded?
5:13 am
brewster: the same way wikipedia or npr is funded. end of the year "please" donations, we get grants. donationsird of our come from libraries to collect webpages for them. we collect webpages for the national archives and the library of congress. we have a room inside the adams building. you should go visit had. -- it. it is part of a room in the library of congress where they bring book cards down and we are digitizing all day long. we have 20 locations around the country and now the world. digitizing books. ok, you think shouldn't this all be done by a robot or hasn't it been done by now? it isn't that it hasn't been done. if you look at the number of books on the internet archive, it goes up, up, up to 1923 and
5:14 am
there is copyright. isrything beyond that somewhat restricted so it goes up and then crash. than decades of almost nothing online, then it comes back up again at the end of the 20th century or 21st century. we are missing the 20th century, and amazon itself -- all right, so it is not online, i can just buy the book. to go to amazon and people are studying what books by decade are available on amazon new and it goes up, up, up, 1923, crash. the 20th century is basically not online so we think there is so much information on it, and there is. of this is good, but the 20th century, the published material is almost nonexistent. it is almost not there, so we are raising a generation and ourselves, really, on not the best we have to offer.
5:15 am
me basically have amnesia about the 20th century. that's a pretty important century to not forget. we will be doomed to repeat it if we just forget the lessons from other times, so we are trying to go through the 20th century. better world books is donating all the books we don't already have to the internet archives and they get those from and we are trying to basically fill in the 20th century and make it so all those wikipedia footnotes turn live. we even went and fixed the broken links in wikipedia, so wikipedia, the executive director of wikipedia was afraid that the truth might fracture if we didn't work on trying to make wikipedia stronger, cited by better sources that people would be citing sources that are available, but not good, and though citation words behind the
5:16 am
scene on articles are based on how good those articles are and if you can see them. we committed to going and fixing all the broken links and filling in all the books and the journal literature that is linked to from wikipedia. we fixed 11 million broken links in wikipedia in the last couple of years and now, we are going through all of the books, finding them and replacing those with a blue link so you can click on it and go to it. if the books are missing, we try to find those books, digitize them, put them up. peter: how did you come up with this idea? brewster: came was a vision of the internet that a bunch of us, certainly i had, of what i wanted the internet to be. 1980, why don't we go and make the library of alexandria for the digital age? we had to build the computers and the internet and the world wide web and i helped participate in this.
5:17 am
yak, internet hall of fame, i've been at this stuff for a long time. i've been building things before the web come helped get the publishers on the web, but by 1996, we had enough momentum that i thought i could turn to build a library. the idea is to make all the published works of mankind just one click away. inyou are in a rural place africa and one access, you should have access. that was the dream i signed onto. we are in 2020 and still not there yet, but there is a mounting number of us saying, let's get there. make aa good idea to hyper connected set of information. let's do that. some of what is motivating me is misinformation, fake news. people are just making stuff up and not being called on it
5:18 am
because you can't get to the cited material. you can't go and say here's information. people are just making stuff up and we can't live that way, so we've convinced the whole generation to turn to the net. we don't go to libraries anymore the same way. booksprobably not to pull except kids books and things like that, audiobooks, great. reference materials? it is the net and the net isn't good enough yet. we are working on it. we are the 300th most popular website. we have 4 million users every day that come to us and look for information. some people just want to live in their bubbles, but an awful lot want to go deeper and the internet archive is part of that ecosystem. peter: you had a little alexa at onelled
5:19 am
point. brewster: it was a company amazon.com bought. it is actually not the little talking widget. alexa was named for the library of alexandria. i worked directly for jeff bezos for three years, terrific time, really smart guy, and hopefully -- peter: hopefully he paid you in stock. brewster: he did, and the smartest thing i did was not sell all of that so it has helped the internet archive grow and grow. thank you to jeff bezos and steve case, who bought my company before that. he ran america online. americaa company that online bought. five been very fortunate but it was all toward the goal of building the library. since 1980, i've only had one idea and so i'm just trying to stay at it.
5:20 am
by 2020, october 20 20 -- i set the school years ago, let's be able to say -- gold years ago, let's be able to say the internet is a library. the internet is a library and it will have all the features that we grew up with, whether it is the old periodicals, it has reliable access that is card catalog that you can find things. can we actually make the library of the digital age come to be, that has enough to raise educated citizens? if we don't, we are going to end up with a generation that learns from whatever they have in front forhem, and if it is paid stuff from political points of view for foreign points of view or just strolling people that are making stuff up, we are going to end up with a mess. we are sort of seeing my plan out, so why don't we go and
5:21 am
stand up and help out the facebook's, the twitters, that are trying to make referenceable material -- not as much as they should be -- but how do we make it possible so people can go and know what it is they are looking at? but at may be made up, least you can know it is made up based on the analysis of the authors of the materials. how can we build an internet that is a global brain that we can learn to trust? right now, we are in this position where it is starting to be scary out there. people are starting to worry that maybe the internet is just but we don't have another alternative to go to otherwise, so how do we go and reinforce, make some websites that want to be better able to be better come a referenceable. how do we help authors, contributors? had we give them access to the
5:22 am
library of the books in the library so they can reference right to it? how can we give the readers -- my favorite thing, recently with weaving the books into the web with wikipedia was my next-door neighbor. she's 15 years old, and i was telling her we are going to digitize books, we've them into wikipedia. she lit up. she said i want that. i never get a rise out of my 15-year-old next-door neighbor and i said why do you want that? she said my school to let me quote wikipedia in my richer's -- research papers. that's not good enough. you have to follow through, and if i could click on it and open the book, i could do my homework in the middle of the night. that's good, right? that's what we want. we want people to be able to go deeper and make it so that publishers still sell books of a storm coming in a cell even more books, but readers get the books, music,t of
5:23 am
radio, old periodicals that they know where it came from and what they can trust. peter: you have nine months for your 40-year-old goal. are you going to make it? brewster: well, we are trying to get -- they say in silicon valley, the minimal viable product. can we have enough to do this? phillipsandover, academy andover, they went and had the full library, they lent it to us so we could digitize it, and we now have the full library of one of the best prep now as in the country, is high school library for anybody that wants that access. isn't that great? is agrove college, which college that just went out of business, unfortunately in detroit. it was a catholic girl school and became coed, but just last year was its last time, and what they did with their library was
5:24 am
they donated it to the internet archive, and now, we're in the process of digitizing over the next nine months, we will now have a college library and a complete prep school library, plus about 1.2 million other books, and if we can get up to a aboutof 4 million books, an $80 million project, so a lot of money but doable, we would have you, princeton, or boston public class library available to anybody who wanted it on the internet. that's the dream of what we are going for. we will start with these first steps, and weaving them into wikipedia for people to find them. that's just on the book side. the web side is going well and we are using it to help journalists be able to know when our things being disappeared by people, and being able to keep some of the web referenceable, even though they may have been taken away.
5:25 am
peter: what are the mechanics of digitization? the someone have to stand there page by page by page? brewster: let's take book digitization. it holds the book like this so it doesn't break the binding, and it raises and lowers glass with a foot pedal. think of it as a workout. if you raise and lower class, it clickns the page, goes -click. a person turned the page. --, shouldn't that i'll be done with a robot? we tried. i tried to create a robot company that would get this to work and it rips books and was inefficient and broke a lot, so we just said, let's just have people do it. people are doing this now at a couple thousand books a day. google has already digitized an enormous number of books, and some of them are available, but
5:26 am
they got caught up in copyright so our approach of doing digitize and lend, where we have a physical copy, we digitize it, and only one reader at a time can read it. so you can get a couple of pages to preview it like an amazon, look inside the book, but if you want the whole thing, you check it out for two weeks. then it comes back and the next person wants it. any time there is one book or three copies or other libraries have those, they can lend them out, as well. it is restricted. it is not even all that great because it is pretty restricted, balanced with the copyright interest to make sure there are no more copies floating around than were originally purchased from publishers. peter: brewster kahle, in 1980, when you came up with this idea, was it a lightning strike or was it just a gradual thought
5:27 am
process? what were you doing? brewster: i was walking over the charles river. a friend of mine -- posed this question, which has really haunted me, although it has directed me all of these years, which was brewster kahle, you are a technologist, you are also a utopian idealist. isnted portrait that positive of your technology. we are good at complaining about things whether it is nuclear war or nicaragua, but coming up with a positive vision was much harder. i can only come up with two ideas. home was trying to save people's privacy, even though people are going to throw it away. the other was build a library about everything. i thought a library of everything was too obvious so i started working on the privacy one and i found it was too difficult to try to make
5:28 am
cost-effective privacy devices by making chips in 1980, so i went to plan b, and i've never turned back. there are a number of us who had this vision of what the internet, the world wide web should be and time to deliver. we've made progress. it is easy to say, the internet it also hasle, but all sorts of terrific things and participation by lots of people. but we need better tools to make our way through it. it feels like a delusion. -- deluge. it feels sometimes even threatening to people and by people being actively spreading disinformation and misinformation, we need better tools, so i'm not going to let this go the wrong way. there is a large number -- we are 150 people at the internet archive, but there are thousands and thousands of others who are
5:29 am
all participating. wikipedia, public library of a, the openzill source world, they all have the same general dream of building something that is more than just ourselves. it is an information interconnection that connects people with information that they need. it gives them an idea of what they can leave behind by writing things that will endure. that is the dream of the internet that i'm still after and many, many others are, as well. peter: what was your role in the development of the internet and the world wide web? did you have one? brewster: the actual internet -- i was on the side, more or less. for a time, i was part of the engineering steering group of the internet, how you build it, but i was not the leader of that.
5:30 am
was a system for how to be the first publishing system on the internet and i did that, it was called waze. it became before the web, which is privately why i am in the internet that probably why i am in the hall of fame, but when tim berners-lee got the technology going, all of these technologies. -- folded in. it was part of that, but the web was better. i tried to get publishers online. i got "the," "the new york ," "ap," i gotrs them all on board by getting these things online, so the open world worked. this is a time when it could have been in the small silos of , aol,nexis or compuserve
5:31 am
they were very controlled but we wanted an open environment where everyone could be a publisher. a little bit of wild west. era butkey part of that once that era started going and i sold to aol, i started building the library itself. to -- that, we are trying we architect the web to be more decentralized. can we make a decentralized web? even though you may be blocked in some countries, you still get access to it, or if one publisher goes away, then it is still replicated in other places. a peer-to-peer backend for the web is a new and exciting development that is coming out of some of the same people in bitcoin and other decentralized technologies. can we keep the web architecture itself moving forward. peter: you mentioned the charles river in boston cambridge. were you employed by m.i.t. at the time? brewster: i was a student at
5:32 am
m.i.t., studying artificial intelligence and my minor was buddhism. some of the era. i got to learn -- one of the great things i learned was think big. come up with a goal that you won't achieve in your lifetime. achieving your goal is a little overstated. a big can come up with idea, whether it is artificial intelligence, seems like a good idea to time, and for me it was the universal access to all knowledge. that was something them bigger -- bigger than just me the large number of us could all work together without having to work for each other. besides, you just don't achieve it. peter: ai in 1980. were you working on? brewster: magic networks were in style. some of the neural net that became how the vision systems of the tesla or the siri on your
5:33 am
telephone, all of those were neural nets that were actively being worked on at that time, that i felt that we were data starved. i thought we were basically trying to build a machine, but without the memory. we were trying to study and have a machine learn from too little. are -- if wed we are going to build the next generation beyond humans, let's have them read good books. i figured that was a good idea. we could go and do that, so i anded with richard feynman stephen wolfram and said why don't we do this? that's a great idea. so we tried to figure out how large was the library of congress, the largest library in the world and when we did this
5:34 am
in 1980, 1981, 1982, and we knew computers were getting faster and bigger so we knew exactly what we would be able to store. all the words in the library of congress, and we knew all the movies. we just chart it out and that curve has been true, but what was a little depressing is the long since passed that. in the librarys of congress, 28 million books -- characters, alion megabyte per book, 28 million isabytes, 28 terabytes, that four hard drives that you can buy in best buy for less than a month's rent. the libraryords in of congress, so it has been true for a long time that we can do this. why hasn't it been done, and i've covered a little of the institutional g, lack of funding, lack of support in the next generations learning tools, but we knew back then how long it would take us to this -- do this.
5:35 am
i started on the chart, we need a supercomputer to be able to do this so i helped build a connection machine, supercomputer, and one of the first applications after we got the chips to work on the super computer was to make a search engine. we used it at dow jones electronics. 400 magazines and newspapers and you could just ask it freeform questions. you could go and find articles and say i like that one. find me more like that one. all the google things, this was back in the 1980's. it was 15 years before google. it was the first search engine on the internet. search engine, the first on the internet and came early. we all use google, which is much better than what we had then, but it is all from these technologies that all of us built. we built parallel computers with networks -- computers, we networked them, all of this to
5:36 am
get this vision built. how do we build a smart machine? how do we build a global brain? how do we build ourselves into a better, smarter society where you have computers and networks of other people to augment what you are and how smart you are. that was the idea. that's the dream, and it is actually a large part of what makes the internet so fun and wacky. wacky, and ially never would have known about icelandic goth music, but i'm listening to it right now. you just got what was on the radio. , on the internet, it is a much broader range and you can go and explore. how do you do that? by networks, computers, search engines, by people participating and trusting that they can go and expose themselves and their ideas on this open internet.
5:37 am
99.9% of those people are not getting paid. we've basically made a system that people are building together as a big societal project, and that's the wonder of the world wide web. beatnk in these times we up on ourselves and say this isn't going right and there are all these things that aren't going right, but there are a lot of things that are. i think the internet and how sharing people have been, how trusting people have been is something we should absolutely preserve. should make sure that the system doesn't make people all go back to the keep your head down. their advice from their parents when they went off to college, keep your head down. careful a club to join. not only for what they are, but what they might become because of all the mccarthy stuff. keep it quiet, just do your job. keep it down. that was my generation at all.
5:38 am
get out there, try stuff, put stuff out there, fail fast. let's give things a try. keep a northstar that allows us to keep going and doing what we want to do. let's make a world for this next aseration that has been supportive as for all of us in this generation, and let's go and make a little -- think a little more critically sometimes about how it can go wrong, be abused by government, how it can be abused by corporations, and keep our greed down, because some of these technologies are just too powerful to be in the hands of just a few people, and they are accumulated too much power. so i'm not saying it is all roses out there, but let's not go backwards to a pre-internet world where there was such restrictions on information that you only got the state-sponsored approach toward particular
5:39 am
ideas. let's not go back to just three hours of television news every evening. that's all there was for us growing up, and it was very restrictive. it was very prescribed. let's try to figure out how to have multiple points of view, but without giving the abuse in their. that is the challenge now. had we not throw out the baby with the bathwater? -- how do we not throw out the baby with the bathwater? had we put the greed under some control without squashing free month -- freedom of expression? peter: he predicted my next question, which is the monetization of the societal project that we are all contribute into. is that the biggest danger, in your view? brewster: i think it is one of the biggest failings. roger mcnamee is right in his critique of facebook, the advertising model. if i have one real regret for being involved in all of these eras of development of the
5:40 am
internet, it is not putting in place a better business model. we didn't have a digital money system that works very well. we couldn't make it so you could do transactions. my favorite business model is the royalty system. advertising has this problem. it is winner take all, so you might start with your own advertisers but there are these advantages to have an aggregated ad network. those aggregated ad networks leverage the same sales team for multiple properties. that's when you get magazines all collapsing on each other or television networks all collapsing in on each other. on the internet, it is even worse than that because of surveillance capitalism. to watchpanies want you across all sorts of websites and it is spooky and terrible. i'm not sure there is much to save the advertising model from being spooky and terrible. i'm not sure how you regulate them, but there are other models. there are subscription-based models, fairly winner takes all.
5:41 am
but books had a royalty model. gutenberg started in 1452. by 1610, we had a model that worked quite well, which was royalties. he paid one person, the bookseller, and a little bit of money went back upstream. not enough. there wasn't that ever, authors ever paid enough. don't let me say those wonderful days, no. you have a strict point of control, then you are in trouble. there were many readers, many booksellers on the many printers, many publishers, many authors and they didn't get all constricted. if there is ever a constriction point, someone can exploit that monopoly and go back upstream and cream everybody out. what is happening in book publishing is we are down to five major publishers and they are all under threat by another
5:42 am
control soause they much of the marketing that they can go and set prices or content restrictions. they can do anything they want, so they control so much of the pipe that i'm really worried my model of how to make a successful industry work out of the internet, even books are under threat. royalty system would have been great. we didn't put it in place. we have some of the tools for having digital currency and being able to go and biting smoothly on the internet -- buying smoothly on internet, but if you wanted to sell something on the internet, you have to post it to amazon or itunes, or they will handle all the transactions for you. that is too much control in one place. peter: brewster kahle, you mentioned your parents. who were they and what did they do? brewster: i grew up outside of
5:43 am
new york city. forarents, my father worked a fortune 500 company, commuting on the train every morning into manhattan. i tried that for a summer. gosh, that was a sacrifice my parents made for the kids that i wouldn't do for mine. living in the birds and spending -- birds and spending all your time commuting. i live in san francisco right in the city, but i really appreciate the good schooling that they afforded me. college withduate no debt, so i could pursue her -- pursue going working in a three-person little start of making children's toys out of college rather than the ibm offer i had also gotten. i wanted to do something else, so that was a support that my parents did in terms of letting and i reallyw, appreciate and i just wish there
5:44 am
were more kids able to graduate college with no school debt. what a burden. what a burden that would be if you didn't find a job right away. fortunate,eel very but san francisco is home. -- i love theis dreamers. when i was trying to figure out where to put the internet -- this is in the late 1980's, and head ofbill dunn, the dow jones electronics. he came up with the term metadata. he said the metadata is more important than the data itself. this is revolutionary in its time, bill dunn. where should we do this? go someplace where people don't think you are crazy. where would that be? he said, if you were starting a publishing company, he said he would think about l.a.
5:45 am
you should maybe go from boston, which is where i had been for 10 years and strikeout. i struck out for san francisco and the wonder of san francisco is people don't call you crazy. they'll say, what your idea? that's a neat idea, can i help? is there some way i can participate in this? there was a support network that was all yes's, rather than on no's, which is what i was feeling in the 1980's on the east coast. there was also a recession. that's when i picked up and turned out to be right. a lot of the internet and new technologies and the web was developed in lucerne and timber we moved to m.i.t.. i wasas the same time reestablishing in san francisco because that's where i thought the iraqi people would go and try to figure out what they
5:46 am
should be. not everything in san francisco is roses, and i think has changed a lot. there's an enormous concentration on just making lots and lots of money, which i don't think is -- it is not like people graduate on their tombstone and say i'm awesome because i made lots of money. that isn't the way the world works. you do good things, relate to good people, feel bad -- good for what you have done. i wish we could tone down some of the greed. peter: brewster kahle is the founder of the internet archive and he has been our guest. brewster: thank you very much. peter: a reminder that this communicators program, as well as all others, are available as podcasts. >> television has changed since c-span began 41 years ago, but our mission continues.
5:47 am
to provide an unfiltered view of government. already this year we've brought you primary coverage, the presidential impeachment process, and now the federal response to the coronavirus. hugh can watch all of c-span's public affairs programming on television, online, or listen on our free radio app and be part of the national conversation three c-span's daily "the washington journal" programs or our social media feed. c-span, created by created private industry, america's cable companies, as a public service and brought to you today by your television provider. >> officials with the world health organization held a briefing in geneva, switzerland on the coronavirus pandemic. they confirmed there are over a million cases globally and more than 50,000 deaths. the managing director of the international monetary fund described how they are collaborating with the who in responding to financial needs.
254 Views
1 Favorite
IN COLLECTIONS
CSPANUploaded by TV Archive on
![](http://athena.archive.org/0.gif?kind=track_js&track_js_case=control&cache_bust=2041770952)