tv Tonight From Washington CSPAN January 18, 2010 8:30pm-11:00pm EST
8:30 pm
>> google recently threatened to leave china after cyber attacks that originated there. we look to that dispute on "washington journal" for 40 minutes. >> robert knake is a fellow with foreign relations. here to office explain the dispute between google and china, take us back to last week, what happened? >> guest: well, what happened last week is that google made the unusual stop of letting the world know that they had been hacked probably by the chinese or agents affiliated number groups affiliated with the chinese government. and the target had been not only
8:31 pm
information on chinese dissidents, use google products like gmail, but also their intellectual property and intellectual property of up to 33 other groups. >> host: what could the end result of that, for google but could the result be that they take their business entirely out of china? >> guest: that's what they're threatening to do. google has for three years, since 2006, ben and china and allowed the chinese to restrict access to google search terms that they find objectionable. tiananmen square june 4th, tibet, the dalai lama, things like that. and google said they would no longer do that beginning immediately. >> host: for viewers than most everybody uses google every day here in the u.s. and around the world. google has different sites for different countries around the world here at our other countries able to say to google the same sorts of things, we
8:32 pm
want to have this much input to what sort of content you can deliver to users in that particular country said the uk or france or any other country where google may have a presence. >> guest: there are other countries that will restrict very limited search terms. like in france and germany don't restrict things like searching for propaganda. or on the ebay auction site buying paraphernalia like schloss because. so that is a common practice in other countries other than the united states have some restrictions. in few countries that made an investment that china has made in restricting their the internd restricting freedom of information into the country. >> host: even access is restricted, this is one of a number of pictures from late last week about chinese not protesting is certainly showing something that the four google. how is google viewed by most of the chinese? >> guest: china has a third of the total search market. are they local company that gets
8:33 pm
the majority of chinese traffic. but that google tends to attract wealthier, better educated, more influential middle-class chinese and the major urban areas. and so, google will be a major loss for them because they rely on them to get more information through google than they can or do buy other companies that tend to be more in-line with the chinese communist party's interest. >> guest: were going to open up our phone lines and of course you can't send us a twitter. (202)737-0002 for democrats. (202)737-0001 for independence. (202)628-0568. so what is the next up for google? what is the next logical thing that is supposed to happen here? >> host: google in china and
8:34 pm
the chinese government are currently trying to figure out what the next steps are since google has opened up their google.cn website and is no longer restrict the search terms. but contrary to chinese law. so right now there's a discussion going on privately between google and the chinese government as to whether or not they will be able to maintain that site and maintain any presence in china at all, since they are unwilling to filter their search results in a bunker. that's the next play for google in china. >> host: in terms government response, this article from saturday's newspaper about comments late issue official protest to china over the attack on google. in the article, they read that the state department planned action coincides with the speech on internet freedom that secretary of state hilary rodham clinton is to deliver on thursday. and she's expected to allude to the incident. quote, which he talks about this issue, china will be one of the countries she points to on an
8:35 pm
administration official said. how important is the speech and the administration's action going to be? >> host: >> guest: . i think there two issues here. the google china chose this one is the censorship of the united states position in regard to that. the other is the issue of chinese espionage, particularly economic espionage targeted at countries. the state department's first response, the day marseilles is going to simply ask for an explanation. can you help us understand how this happening. can you help us understand why it happened? and can you help us understand who carried out that? was secretary clinton will also do next week is make an announcement about how the state department is going to sponsor internet freedom initiative, how we can, through a being and betting human rights organizations, internet freedom organizations, health chinese dissidents and other dissidents
8:36 pm
in iran and north korea and elsewhere gain access to the internet. >> host: wanted to give folks an idea of just some background on google that was founded in 1998 by larry paige and sergey brin and the chairman is eric schmidt also an advisor to president obama. he incorporated in september of 1998. it also has annual revenues of some 5.94 billion. how much of that income comes from their china operations? >> guest: very little. about 1%. they've about 300 million i believe market share in china. that's not that significant. what is significant is that china is now the largest internet market in the world. their 30,060,000,000 internet users and that number will double over the next decade if not sooner. so by leaving its market, while right now it's not a big part of
8:37 pm
china's revenue, it is a big part of the internet revenue that they can expect in the future. they also are putting up as the potential billions they could make in china on their phone market, with their new android platform. >> host: let here from viewers. detroit, charlie, democrats line for the nine. hello there. detroit, you're on the air. >> caller: good morning. good morning to the guest. google makes the revenue with advertisements. something in china is different from in india, the united states, or europe. it depends on what the searchers want to know. in china, they're not interested in your father's column discover
8:38 pm
america political for freedom. there is 1.3 billion chinese that went through what they want and the lessons of their culture so, you young man sitting there, trying to bring the cold war phenomenon divided in groups, freedoms and restrictions, so on and so forth. it is 21st century. >> host: will get a response. thank you for the call. >> guest: the first thing to understand is that technically google was operating its own site within china. google.cn. cn is for china, the chinese national domain registry. the reason they chose to do that in 2006 was because the google.com website that most americans will go to what they want to search something was
8:39 pm
often been blocked or censored. so they went into china and they worked with the chinese government to establish the google.cn domain and they've been targeting and tailoring their content to the chinese market. and i think that they have become fairly successful at doing that, particularly within a coveted demographic elite that's got money to spend, that can respond to their advertising. post go democrats line, hello there. >> caller: i've been listening since early this morning with the martin luther king day trying to get in touch with these guys. i'd exactly just wanted to ask, can i do online school. i basically want to ask, how do you know that google is the only search engine that they have access to? how do we know that day haven't tapped into something else or
8:40 pm
hacked into something because on the released information we've got information no better than mentioning anything or they know better of the same sort of thing to transform us. twitter, myspace, there's so many search engines that so many kids, young kids, older kids, elderly people that we all go want to. i pay my bills everything overline. like, how do i know that's going to be safe? >> host: i think he mentioned that china's largest engine is another company, baidu. >> guest: jess. i think what the caller is asking us relate to the chinese hacking. to the espionage targets and the other companies and our other online sites like twitter is safe to use, facebook safety is, and his google safe to use? and i think the answer to that is that we know in this hacking attempts, yes there were at least 20 on the possibly up to
8:41 pm
30 other companies that were targeted. the reason we know this is that when google found out they were being hacked, they hacked back. and they got onto server in taiwan that had information on it that i've been axel traded, not only from google but from dow chemical, joachim martin, from the american think tank, from the american law firm and from other companies. so this incident, yeses fairly widespread. more broadly, there is a consistency and involving problem involving the net of infiltration and insecurity. it's not that easy to secure these systems. >> host: and i'm reading a report that exposed some problems with internet explorer in this attack, correct? >> host: yes, that's true. but the attackers did was they used a previously unknown vulnerability that is called the zero day threat, i'm sorry zero day vulnerability to carry out
8:42 pm
their attack. so that there was an error in the coding of internet explorer that nobody else knew about, that microsoft didn't even know about. and they used t into the system. >> host: i want to ask our producer to bring up if we can the google site in china. this is google.cn soldier in the united states were able to see the site. and i believe he is typed in fall and gone into the search box. we can look at the site and compare side-by-side typing search items in the ugo.com, the domain we see and also the one china into the differences here. but the chinese cannot see google.com in china, correct? >> host: that's correct. there are ways to get around. china has though with some people call the great firewall of china on essentially they've taken their portion of the internet and security that the borders so they can filter the content going in and out and limit it and then within their borders, they've allowed
8:43 pm
in-flight google.cn, which they can directly control instead of ebay they've got a local auction site called ali baba. and so that way they are able not only to stop content from getting in, but also to control the content that's produced domestically. >> host: with go to coatesville, pennsylvania. and good morning, marie, on our independent line. >> caller: good morning, america, how were you? i think we need to be very careful about talking about biting and censorship of china when we have that in our own country by our very own government. witness the aid of the verizon and at&t phone companies in the spying upon america due to the patriot act, weirdly named, i
8:44 pm
think it is the most unpatriotic act of her past. also, the silencing of the press in our country on the war in iraq, witnessed the protest millions of us out in the street and our press was afraid to cover it. any criticism of our government at that time was totally silenced here at also, we should be wary of google's reputation. what took them so long? they've been going along with this. is it because they got caught pirating? >> host: marie, thanks for the call. she asked, what took them so long? >> guest: i think what took them so long as they did a three-year experiment in china to decide if they could make that market compatible with their country values. i think this was essentially the
8:45 pm
last straw for them when they realized the chinese government was not only censoring their public, but was also breaking in to companies like google that had been cooperating with them to steal information so that they could further repress dissent within their country. and i think that sergey brin, one of the founders of google, he's from a family of people who fled the soviet union because they were being persecuted for their religion. so i think he has been common in many ways, the moral compass of google. and i think he pressed very hard to say okay, this experiment is not working. i think google can be defended for going in originally. i think the three years that google has been there, though they have restricted some searching a lot more information has been made available through google dance or any other website or any other web portal into china. so there is a degree to which google is completely forced out
8:46 pm
of the country. there will be a loss for the chinese people. >> host: didn't yahoo!, jerry jerry yang come under for some business practices? >> guest: and led to the imprisonment and they were heavily criticized for doing so. yahoo has since sold their stake in yahoo! china to another company within china. they maintain some investment and he is no longer directly in china in the market because of that incident. and yahoo! with stepped-up and i won a few companies to support google in its protest against the chinese. >> host: and as we go to her neck collar, give viewers a look at and listeners a reading of comparisons of the two top search engines in china, baidu
8:47 pm
and also google by revenue did baidu with 53 -- 62% of the revenue, google with 33%. baidu, 62% of searches, google 15% and you can see from the other search is a far level of that. a tear from mountain bell, california. robert is on the republican line. >> caller: good morning, everyone in thank you so much for c-span. i'm here in silicon valley and i've been in china for about six years working back and forth. and i also have been in the software outsourcing business. and, you know, one thing that's kind of ironic is the investment money that started ali baba actually came out of the investment companies here in silicon valley. and there's a very large tendency of chinese software engineers who work at google, who work at yahoo!.
8:48 pm
there are is such an integration between china and silicon valley. it's almost like the same and if you will. so roberts, have you looked into this? have you really seen the reality of our american investment companies that have invested in china? would have duplicated the google model because that's coming you know, that's what they are doing. >> guest: i think if i understand the question correctly, it's how they looked at the degree to which silicon valley and the american investors have invested in china. and yes, i think on the whole china is a huge market. it's an expanding market and it's a very important market for our technology companies. i think in no way would i guess
8:49 pm
that google's act is going to show a further pull out of the market by other american companies. microsoft authority stood up and said, absolutely not. they're in china for the long haul. they're going to stay there and they don't see the benefit to them as a company or to the chinese dissidents if they pulled out. >> host: address the political issue on this. what sort of pressures does the administration feel? what do they have to do in response outside of what google may do? what would the administration do? what would congress want the administration to do in terms of defining u.s. business practices in other countries? >> guest: there's a bill before the house right now that would essentially ban american companies from doing business in any country that restrict dead the freedom of the internet. so that if you were -- if you are a company that wanted to do business in china, you would not be able to -- i don't see that bill going very far.
8:50 pm
i think what has happened because of this is one on the censorship issue, president obama had already raised that very quietly and very subtly with the chinese in his visit. i think now he's going to raise that a lot less subtly and i think secretary clinton is also going to raise that issue. i also think we'll see some support coming out of the secretary clinton speech next week for developing anti-censorship network that would allow chinese dissidents in the chinese public to get outside the great firewall of china and be able to access the rest of the internet uncensored. i also think this has created a lot more pressure on the issue of cyber espionage, particularly attracted against the economy. this is something that was never the president's agenda with china. it was never something that would be publicly discussed. mostly we understand that the administration has quietly protested to the chinese over certain cockiness government system that they really find
8:51 pm
beyond repair. i think now the issue of targeted espionage and effective intellectual property by the chinese is going to become a major issue. domestically, over the last year, the president and his administration has not wanted to tackle cyber security is an issue. it took almost a year to appoint a cyber czar and during that time it was a lot of concern that any new cyber security could harm the economic competitiveness of companies like google. i think now that seems a little ironic. i think now the issue of cyber security and help in the private sector to protect the wealth is going to become part of the obama administration's platform. >> host: john and michigan, go ahead. >> caller: . good morning. i appreciate that this is available on the calling show here. i recall being initially disappointed by what i heard about google's willingness to
8:52 pm
act with us for the censorship as a condition of their business in china. i believe it was something in quite a noble experiment, however in retrospect it went a little more distance from it personally. really my question would be for the guest and perhaps for the other callers that it seems to me at this point google has now given a taste of what i believe is the corporate culture of google, you know, the freedom of demands by the book bangs etc. as being something that if it goes away, then they will miss it and appreciate it to what extent it could have been a free
8:53 pm
uncensored opportunity for learning as the net really is. and really, my question would be don't you suppose really is a matter of world civics that it would be a fine thing if it were to be the case that google were to hold the line? i would say that would be a freedom loving unpatriotic type of thing to do, although i don't want to introduce the government of national patriotism here. >> guest: all right, we'll get any reply. >> guest: i think google is acting out of their self-interest in many ways by taking this stand. it's unlikely that china is going to cave, but i think it's become very important for google and other internet companies to see the spread of the internet. there are about 1.5, 1.6 billion internet users today.
8:54 pm
google continued to grow at a pace that it is down, but only if that number cononnues to increase intake and a larger population of the planet. so i think google needs to see an open, a free internet in order for their model of making the world's information available to the world's users. >> host: coral springs, florida and this is chris on our democrat line. go ahead for robert knake. >> caller: when google first and did china [inaudible] and we wouldn't build is the classic image. so i just heard that they open that up and tried that test again on the chinese version of google. it is the seventh most popular image, the most popular result for that phrase. so really the likelihood was the
8:55 pm
chinese users of google to meet the circumstances of that firewall and find out the image to your destruction of google. >> guest: i think that if you were a fairly technically savvy user, though not an expert, and you had the means, you could certainly come down the great wall of china. you could come do a private network where you encrypt your traffic out to a server somewhere outside china and the united states or europe and then from there you could conduct search engines. so the capability to do that certainly existed. again, i think most users in china just were motivated to do that in doing so did violate the law. but it certainly was possible. the percentage of chinese internet users who availed themselves to those services, i don't know. >> guest:
8:56 pm
>> host: robert knake, are just talking about google in china. about 15 more minutes with your calls to savanna. good morning to her independent line. >> caller: i'd like to know what the chinese people reaction is to this house for the long-term future, what is this going to do to their access of information? >> guest: i think that the chinese people's reaction has been fairly muted because news of the story has also been censored within china. i think certain populations had become aware of it. and so, we've seen some beautiful demonstrations of users within china's love for google and love for the information that they got through google. they've lit candles, they've laid flowers outside google's headquarters in china. since then, the chinese universities have banned that activity for their students and so that they would risk expulsion. so we've seen a lot of evidence that there are people within
8:57 pm
china, within that elite community of google users who really do love the service. i think there's very little chance that the chinese government is going to acquiesce and say google, you can be in china without any censorship. but i think over the long-haul, this issue will start to undermine and threaten the chinese communist party's control over the country. i think sends tnm and there's been somewhat of a deal with the chinese elites and with the communist party, which as we were asked for democracy. we want demands, freedom like we did in 89, as long as our economy continues to improve and as long as our daily life and our standing in the world grows. i think in this incident, there is the potential that some chinese elites are going to start saying, hey wait a second, you really are starting to impact our national
8:58 pm
competitiveness. we don't have freedom of information. if i can't, google the same way that an american can google, that hurts me and i don't like that. >> host: was sort of role did the summer olympics in 2008 have in sort of expanding the awareness of searches of information available? asko i think that in 2008, the chinese government did open up a lot of portals. they also put about four restrictions on their own people so they were able to why think create something of a façade were foreigners traveled and they had access to foreign sites, but a lot of chinese population didn't. the chinese have incredible control over their portions of the internet. when there were riots in the region of china, china was able to cut off internet access for six months to the entire region. everyone in it. so i think we shouldn't underestimate the degree of control that they do have over their internal web. >> host: and on that point,
8:59 pm
"the new york times" reports this morning that china has restored text messaging and jingsheng is the name of the province. the chinese government continued to ease its six -month-old blackout in the northwest region by restoring some text messages on sunday. that's according to the state news media. next up, the san diego. >> caller: i'm just not buying this. google is absolutely wretched in dealing with china. they were with the central government in establishing the great firewall of china. the reason google is reacting to this intrusion is because they are stream will be based on cost computing. willing to walk away from the billions they threw into the chinese investment because more billions are at stake in google. i cannot buying any of the spin. host: he used the term cloud
9:00 pm
9:01 pm
acknowledge to the world that they were hacked and had data loss. most companies don't disclose that, either publicly or to the government. in this case, googled did both. they informed the u.s. government and the fbi in september and this last week they went publi and the fbi in december and this last week they went public globally so if they wanted to protect the reputation and security the best way to do that would have been to have said nothing. >> michigan, good morning to becky on the democrats' line. >> caller: good morning. i just wanted to say go google because i remember when microsoft first went into china and i heard they were going to go along with others, not putting certain things on the search engine and i couldn't believe american companies, especially microsoft that do so many good things. and, you know, to think that they were actually going to go
9:02 pm
along with this and i feel the same way i guess about google originally doing it. but what they are doing now i say go, you know, and i just hope they don't get hurt by it. i would stand by them any way i can because it is just wrong and, wrong, wrong for people not to be able to know things. >> host: next is a need new jersey adam on the independent line. good morning. >> caller: how are you guys doing today? my first question is could you explain the great fire wall? i heard that on the news a couple weeks ago and can't really figure out what that is. and also, is your current speaker and members of the group or has he ever attended their meetings? thank you. >> host: great firewall question if you want. >> guest: the great fire wall and the escort or two different programs. the great fire wall of china is a monitor we've applied to what
9:03 pm
the chinese have done in the west but essentially they've taken portions of the internet and at the points where it leaves their country they've placed controls on those routers so they can screen all traffic inbound and outbound through the country and eliminate speech that they find offensive, block websites they find offensive, screen for pornography green day and usx corporation was a program they wanted to put on every new pc to be sold in china including made by american companies like dell, h-p and apple and it would have essentially been a censorship technology on the desktop. it would have been in the chinese control over basically any new desktop sold in the country. >> host: and a user wouldn't be a will to override. >> guest: and user wouldn't be able to override that. what it turned out is the software had a lot of vulnerabilities and it probably would have reduced china's
9:04 pm
cybersecurity over all rather than increased. >> host: to texas, robert good morning on the republican line. >> caller: good morning. i would like to make a comment about the chinese. on v m 81-years-old and i may caribbean veteran and on fault over there against the chinese. i can remember so plainly they would send these korean citizens blowing their pupils after we fired through them here comes the chinese. we didn't win korea because of the chinese, and we lost in vietnam. my brother was in vietnam, the chinese ran us out of there. >> host: thanks for the call this morning. albuquerque and mexico. john on the democratic side. go ahead for robert knake.
9:05 pm
>> caller: thank you for taking my call. first to the caribbean and veteran, guys like that without them we wouldn't be free. with that have anything to do with them shipping their goods. i know the state department got involved in the mikey situation. the shoe sellers are selling shoes made in the nigh key factories they didn't get the licenses through beaver to oregon and all of a golf manufacturers are outraged because they will come out with an expensive set of clubs and produce the same like to come from the same factor with the generic name and sell them cheap on ebay or other sites that come out of china on the internet. free treating ought to be for everybody mog just the big corporations. >> host: thank you. we will get a response. >> guest: i think the caller is right. there is something related to the misuse. this is at heart separate from
9:06 pm
the censorship issue of 50 and intellectual property where the chinese companies or chinese individuals or the chinese government have been perpetrating attacks against american companies trying to steal their intellectual property and taking the design for everything from fighter jets to pharmaceuticals of companies servers bringing it back to china and producing goods in china without having to pay the development costs themselves or without having to license the development. so i think it is a very much related issue and on the whole the security of the supply chain is something that will increasingly be important where we are importing goods from china that may contain spry where, balart and things like digital cameras, pocket from drives and other electronics
9:07 pm
that may be compromised before you have even plugged them into your computer and have gotten them on the web. >> host: next missouri, stephanie. go ahead. >> caller: my name is stevan. >> host: i'm sorry, go ahead with your question for robert knake. >> caller: my question is what is with all of the china covering up there doing over in china, why all of the security and blocks and everything, what are they trying to hide over there? >> guest: i am not a china expert. but my understanding is that the chinese government fears too much information, too much information on democracy, too much external influence could destabilize the country and hurt their power base. a lot of totalitarian and authoritarian regimes have been afraid of the internet since it came on line, and they think in countries like north korea they
9:08 pm
made the decision to simply restrict access to the internet. very few people have internet access to all and those that do can only go to a couple web sites within the country's like the leaders home page. china has taken a more open approach where they've said we see value in the internet and giving our citizens internet access. but on the very limited basis simply because they want to maintain their power and maintain their economic growth without having any kind of popularr decratic distractions. >> host: cnn news wrote a piece on friday about or thursday of last week about googled moly other options in china. other businesses in china. what considerations are businesses like, well dahuk to its limited extent or other business is considering now in the week of these -- this
9:09 pm
conflict between google and china? >> guest: i think a lot of people have raised the issue of microsoft word microsoft has come out very clearly and said they don't support google's action and they're going to stay in the market. the bing search engine that hasn't caught on in china very well has opened the door for microsoft to move in with bing and they're going to make a heavy push to become the number two search engine behind the local speed. the issue for most american companies isn't so much about what they sell in the chinese market but whether or not their security is good enough to prevent their intellectual property from being stolen, and i think when they look at their mainland and china operations there went be increasingly concerned about whether that is a place their intellectual property is slipping out in to their competitors' hands. >> host: one more call, lexington kentucky on the democrats' line. mcgeorge you mute your television or radio or feedback
9:10 pm
and go ahead with your question or comment. >> caller: good morning. i think you all had a guest that called in businessmen that moved to business to china and inspectors were to come into the business and the workers were told they only work five days a week, 40 hours a day when in reality they worked seven days a week, 24 hours a day and he was very unhappy but moving to china was the only way he could make money. i am behind google if they pull out of china just to make a statement. they are a communist country,
9:11 pm
and i think it is high time that companies not tolerate the totalitarianism even as american companies. >> host: any final thoughts? >> guest: in the coming weeks we are going to see this issue please more. i'm not sure we are going to learn more about what happened to google or these other countries but i think we are going to see the chinese respond and take some action either to block google completely or restrict the search engines in some other ways and then i think we will probably see the state department respond with a tax strategy that is going toive hopefully people living in the second regimes to a free and unfettered internet. >> host: robert knake with foreign relations, thanks for coming by this morning. >> guest: thanks for having me putative now a discussion of the will of google and history and its program due to digitize books.
9:12 pm
speakers include the manager of google books and media and information professors from george mason university and the university of california. from the annual meeting of the american historical association, this is two hours. [inaudible conversations] >> hello to everybody and thank you very much for coming. on and shom martin from the university of pennsylvania and executive director of the american association for a history of computing and we are extremely happy to be cosponsoring this session with the research division of the american historical association on is google good for history, which is a very engaging an interesting topic and i hope the lot of you are already have questions and thoughts in mind for discussion. so we have three great speakers that are going to be talking for about ten to 15 minutes each may be a little bit more and i think what i will do is introduce all
9:13 pm
of them to you now. they will talk in that order and then we will have time for, plenty of time i think for questions at the end. so, without further ado, i will introduce the three speakers and the first is daniel cohen, associate professor in the department of history and art history george mason university and director of the center for history and immediate. his own research is in european and american intellectual history and history of science and most notably for this group he is co-author of digital history guide for gathering preserving and presenting the past on the web from the university of pennsylvania press. no personal connection there i can assure you. our second speaker is paul duguid, adjunct professor of information at the university of california berkeley and professorial research fellow at queen mary university of london and visiting fellow in business management at york university in
9:14 pm
the u.k. at also honorary fellow for the institute for of abortion and enterprise management. his current research focuses on the history and development of trade marks and again notably for this durham he's written to articles about google and the issues of google raises for history in particular one inheritance or loss brief survey of google books and first monday in and also lines of self organization pure production and the laws of quality also from the first monday. and last but certainly not least is brandon badger product manager for google books. he's been at google for four years and worked on google earth and google maps and he studied computer science in college and assures me he got a b negative and of his history courses so he is well qualified to talk to us today. without further ado i will let the speakers go.
9:15 pm
thank you. >> it's great to be here and to talk about this topic. is google good for history, of course it is. historians are searchers and sifters of information evidence and google is probably the most powerful tool ever given to us or any other human being to do that. it has constructed a deceptively powerful simple way to spend billions of documents instantaneously. it spent hundreds of millions of dollars of its own money to allow us to read millions of books in our pajamas. good, how about a great? but then historians like other humanities scholars are natural born critics. we can find fault with nearly anything, and we do and this position has unsurprisingly, it is unsurprisingly exacerbated
9:16 pm
when a large company consisting mostly of bitter big graduates from the other side of the campus muscles into our turf. and google spent hundreds of millions of dollars on the library at harvard we would have complained about all of those steps to the front entrance. partly out of fear and nd i think it is easy to take shots at google. while it seems an obsessive book about google comes out every week we're all the volumes of criticism about how severe or other large information companies that serve the academic market and more troubling ways? these companies which also provide search and information services charge university's exhort but i rates for the privilege of access. the leach money out of budgets every year they can be going to other more productive uses. google on the other hand gives google books, google scholar, stand newspaper archives and more often dustin commercial
9:17 pm
offerings and often being completely free. in the bigger picture away from the myopic obsession with the biggest company of the moment and i sure some of you can remember the same obsessions and diatribes against microsoft and ibm and prior year loss google has been good for history and historians and one can only hope they continue to exert pressure on those who provide costly alternatives. of course like many others who feel a special bond with books and our cultural heritage i wish google books was a project not under the control of a private entity. for years i've called for public project as have others were at least university consortium to scan google books excuse me scan books on the scale google is attempting and i miss it and i'm envious of the announcement in france to spend a billion dollars on public scanning. in addition the senter i work at and direct for history and the
9:18 pm
media has a longstanding partnership with the international archive. to put content in a nonprofit environment that will maximize its utility and distribution and be content truly free and all sense of the word. i would much rather see google books of the internet archive for the library of congress or somewhere else. but the likelihood of a publicly funded scanning project in the age of tea party reaction as some is slim. long-term readers of my blog and other writings know that i've not pulled punches when it comes to googled. to this day the biggest spiking their leadership on my block was an early in google books scanning project put a scan of the human hand covering a page of plato and the post in the up on the deed and since then it's been one of the many examples detractors of google of use to show a lack of quality and the library project. let's discuss these quality issues for a brief moment since it is one point of obsession to the academy.
9:19 pm
it is an obsession i feel as you will see slightly misplaced. of course google has pores and as the saying goes waste. i have yet to see scientific survey of the overall percentage of pages unreadable or missing and they are minuscule fraction of the total and has john of google books as noted when you're dealing with the trolley and pieces of the met had data you are sure to get a million or 2 million wrong. let us also not pretend the bibliography kohl will be of google is perfect. many of the data problems the google books comes from library problems and other side of google. more important google has run these and i assume that printable talk about these from many of these inadequacies. google was improving its ocr capability and medved data correction often clever ways. he recently purchased or acquired the recapture system some of you may have heard from carnegie-mellon which uses on
9:20 pm
putting humans to transcribe difficult or smudged words from old books when they log in to sites and they've recently added a feedback mechanism for the users to report warns against on the page they are viewing. so i find myself a bit nonplussed by quality complaints about google books that have engineering solutions that is what google does. it solves engineering problems extremely well. indeed we should recognize and not without criticism as i will note momentarily that at its heart google books is the outcome like so many other things and google of an engineering challenge and its associated series of mathematical problems. how can use and tens of millions of books in a decade? it's easy to say they should do a better job and get all the details right but if you do the calculations with those variables as i assume brandon and his team have done you will probably see getting in nearly perfect libraries kennon project would take 100 years rather than ten. that might be a fine tradeoff
9:21 pm
that's a different argument or different project and of those involved in the ocr know that getting from 99% accuracy to 99.9% accuracy which by the way would still have hundreds of thousands of errors would probably start in order of magnitude longer an order of magnitude greater expense. so that is a trade-off google decided to make and as a company interested in the search for a mere 100% accuracy is unnecessary considering the possibilities for getting toward protection for i'm in perfect first version it must've been an easy decision for the company to make. google books is incredibly useful even with these false although i was traded places with large research libraries of google books gayle and now at an institution far more typical thai your head with a mere million volumes and a few works. at places like google books as a savior in a bowling research that can only be done if you got into the right places.
9:22 pm
i regularly have students find new topics to research and new discoveries through the search is on google books and you can only imagine how historical researchers and other scholars and students feel and even less privileged places. despite its flaws it's undoubtedly true google books will have tremendous impact on historical scholarship around the globe over the coming decades. it is a tremendous level, democratizes of access to historical resources. google is also good for history that it challenges age-old assumptions about the way that we've done history. before the dawn of massive digitization projects and equally important indices we necessarily have to pick and choose from a sea of analog documents. all that searching and sifting we did in the documents and evidence we chose to write on and continue to choose to write on were and are and let's admit prone to error. we did all we were told in graduate school. but whoever does.
9:23 pm
we sift through large archives based on intuition. sometimes we find a poor evidence out of sheer luck. we sometimes made mountains out of molehills because we only have time to sift through mole hills, mountains. regardless of our techniques we always leave something out in this analog world. this world has rarely been a world of comprehensive historical search. this widespread problem of anecdotal history, as i have called it, will only get worse as more documents are scanned and go on line. many works of historical scholarship will be exposed as flimsy and haphazard. the existence of modern research technology should push us to improve historical research. it should tell us out or analog this is an early partial methods have hidden from us the potential of taking a more comprehensive view aided by less capricious retrieval mechanisms which despite what detractors might say are often more
9:24 pm
objective than leasing a rapidly through the folios on a time limited to an archive. in addition listening to google may open up new avenues for exploring the past. in my book equation's from god i argue mathematics was generally considered a defined language in 1800 but the bill for the 19th century it was secularized. part of my evidence came from mathematical tree disease which often contain a religious duty could define language in the early 19th century but lost that language toward the end of the century. by necessity doing my research in the pre-google books world, i had to pick and choose. my text will evidence had to be limited. i could only reach a certain number of these mathematical works. and so, and i sure this sounds familiar, i chose to focus on the writings of some high-profile mathematicians. the fastness of google books for the first time presents me and others to do more comprehensive scanning. for me a victoria mathematical writing, for evidence of
9:25 pm
religious language. this holds true for many historical research projects. so goebel has provided us not only with a free research which is what also with a helpful direct challenge to our research methodology for which we should be grateful. is google good for history? of course but does that mean we cannot provide constructive criticism of google to make the best it can be especially for historians? of course not. i would like to focus on one serious issue i feel ripples through many parts of google books. for a company that is a champion of openness the googled remains strangely closed when it comes to google books. google books seems to operate and weighs three different from other google properties. where do cool aims to give it all away. for instance i still cannot understand why google doesn't make it easier for historians
9:26 pm
such as myself who want to technical analysis for historical books to download them more easily. if it wanted to, google could make a portal to marlo to allow people to download all public domain books. i've heard the excuse from google. we spend millions to digitize this, we are not going to just give it away. and yet google has also spent a similar amount, hundreds of millions of dollars on android and wave and other software projects and chromo and they are giving those away. google's hesitance with regard to the book project shows open this only goes so far have google. i suppose we should understand that you get google is a company. it's not a public library. but that is not the philanthropic year of a cast about google book at its inception or even today in dramatic op-eds touting the social benefit of google books. in short, complaining about the
9:27 pm
quality of the goebel scans i feel distract sauce from the larger problem at google books. the problem especially for those in the digital humanities but increasingly for all others is that google books is only open in the reader book in my pajamas way. to be sure you can download pds of many public domain books but they make it difficult to download the ocr text from multiple books which is what you need for more sophisticated historical research. and only move beyond the public domain google has pushed for a troubling restricted regime for millions of so-called orphan books. i would like to see a settlement offers greater and not later access works. in addition to greater availability for what cliff lynch called computational access to google books. a higher level of access was about reading a particular page image on your computer than applying digital tools to many pages are books at one time to create new knowledge and
9:28 pm
understanding. this is partially promised i should say in the google books settlement in the form of text mining research center's. just two of them. but the centers will be behind velvet robe and i suspect the casual historian which includes most of last will be unlikely to ever use the centers. google has a leverett epi or application programming interface for most of its terse services. it provides only the most superficial access to google books. free company that thrives on openness and the empowerment of users and software developers, google books is thus to be a bit of a puzzlement. with much fanfare google recently launched evidently out of internal agitation what it calls a liberation front to ensure portability of data and openness throughout the company. on data litigation hoard the web site for this front which i encourage you to visit these last 25 google projects and how to maximize portability and
9:29 pm
openness. these are virtually all of the services at google. sadly google books is nowhere to be seen on the site even the but also includes user created data such as the google books library feature not to mention all of the data that is books that we paid for with our tax dollars and tuition. so while the jacob vara's put their revolutionary fist on the one side of the google, on the other side colleagues with a circumscribed group authors and publishers place restrictions on to large swaths of our cultural heritage tourism and few in the academy support. john and dan and brandon badger have done a remarkable job explaining the internal process at google books and i applaud them for that. it's better than the engineers and product managers at places like microsoft or ibm. but still, the project fields removed an alien in a way other google products are not. that is partly because they are
9:30 pm
weird up and hamstrung from responding to some questions academic half were from instituting more liberal policies and features. the same house but that would lead a company to digitize entire libraries also lead it to go too far with in copyright books leading to a breakdown of authors and publishers in the settlement we have in front of us today. we should remember the reason we are in a settlement now is google didn't have enough chutzpa to take the higher tougher road, direct challenge in the courts and public opinion or congress to the intellectual property regime that governs many books and makes them difficult to bring on line even though their authors and publishers are long gone. while google regularly uses its power to alter market radically it has been uncharacteristically weak and attacking had on this property tower and its powerful corporate defenders. had google taken a stronger stance historians would likely have been fully behind their
9:31 pm
efforts since we face the annoyances that unbalanced copyright law we places on the pedagogical and scholarly use of textile, visual audio and video evidence. i would much rather have historians and google work together. weigel google as a research tool challenges traditional methods historians very well in his ability to challenge it make a better what google does. historical and humanistic questions are often at the height of complexity among engineering challenges google faces. similar to and even beyond for instance machine translation. and google's engineers might learn a great deal for our scholarly practice. google has been optimized over the last decade to search through the apr and documents of the web. but the same algorithms to falter in phase with the challenges of changes for centuries and the alias of the past and old books and documents that historians examine daily.
9:32 pm
because google books is the product engineers with tremendous talent and computer science, but less sense of the history of the book where the book as an object rather than bits its founders in many respects. google still has no decent sense how to rank search results in humanities the poor. giglio metrics and text mining were poorly on the sources as opposed to the highly structured scientific papers google scholar specialize in. studying how professional historians rank and sort primary and secondary sources might tell google a lot which it could use interim to help scholars. so ultimately the interesting question for me is it is google good for history. it might be its history good for google and to both questions my answer is yes. thank you. [applause]
9:33 pm
>> good afternoon. thanks very much to whoever it was, i'm not quite sure who invited me here. [laughter] i'm grateful to be invited and slightly apprehensive. but for the same reason because i'm not a historian and so talking to you about a historical question is tricky. i'm also not an engineer so i have a feeling i am a shell on the platform here. i do have perhaps a slightly embarrassing division. i teach a history of information which in 15 weeks goes from twitter. the claims it around that are not very strong. i was looking at my evaluations for last semester and said what do you not like about this course, history of information and there was a one word answer, history. laughter koza i've not done a good job of representing you in other parts of the world and afraid. let me if i may looking at the
9:34 pm
broad question again make pretty much the same step daniel made and so the question says is google good for history to talk mostly about google books, the sort of shifted a i think i was going from the famous scribble, scribble to scan, mr. google. two reasons for doing this one is the was the premise, maybe three, because i sound off on the topic and a third because life in splendid by historians for doing so. so that we see if i can't get back up further in my letters as it seems to be fought. but my answer in fact is exactly the same as dan. is google books good for history? absolutely. is it good enough and my answer is no. dan said the tone of the answer but i look at different points some of which i think he wanted to dismiss and say are not important supply me try and outlined three issues around this. i will go through three. i want to say what i think is wrong with it as manifested at
9:35 pm
the moment and why i think it went wrong and therefore will be harder to fix than we might imagine, and then finally to the point why it seems it sometimes surprisingly hard to criticize google. so that is a different take. dan seems to think that google is an easy target and everybody's taking shots. i actually take the other side people don't tend too highly think of google. and that i would remember my kristoff to deserve a lot of criticism and was a good thing to do. and i think we must criticize -- i think that probably the people i am most at odds with and this is the one remaining neutral on the platform were actually librarians. i think librarians -- wi-fi gun-free site. libraries it seems to me in general, and i'm going to make some exceptions with regard to google, and they are the people that could have held google's feet to the fire and failed to
9:36 pm
do so. so in a way there is a sort of story which i haven't found the punch line for, of those english men and i richmond went into a bar and this is a kind of engineer a library and scholar embarked on this project who won, was the punch line i am still trying to see. let me first say what i think is wrong. i will explain this a little later what i think of, and i really think this way working in silicon valley as a resource and constraint issue. i will amplify that leader but we look in some ways as books as paper bound collections of information, and if we can come along and extract the information and leave the paper behind that will make life a lot easier. and in many ways it does. but i think it overlooks the fact that books are actually, and i made this point again enormously complex and annoying things. but they are that way in part because they embody physically a
9:37 pm
resolution to a lot of very complex questions about human communications. they may not do it perfectly and they often get in the way but a lot of the difficulties we have in communicating our results in the book. when you take that away you have to think a lot about what you are losing as well as what you are gaining. my argument would be the one reason google is fraught with problems i think google books is because meta data is bad. dan thinks it isn't such a large problem, i tend to think it is a large problem because when you take away those, almost invisible resources and the book is the way it was built you actually have to provide yet better meta data to help resolve the remaining information that you have. information doesn't speak for itself and its own validity. if you write a check to somebody and they looked quizzically as
9:38 pm
if to say should i trust it or not it doesn't actually help to lean forward and write good for payment on the front of the czech. that isn't going to get someone to rush off immediately and say that makes it okay that's a little more information. my general theory you have in some ways to think a lot about the ways in which you try and complete data and the way it has been trying to lead in the past and i think that google has made a hash of that because they didn't understand the problem. let me say again you always have to say this body use google and google books extensively. in fact i just finished a piece on the contribution of the union label to innovation remarks on the united states. google books was fantastic in many ways for lots of ways i was able to make the argument, things i wouldn't have known existed, things i couldn't have said. i rely on it enormously but i was also with the same time using something like the
9:39 pm
database infuriating those days, i was using 18th-century books online and many of you may have read the wonderful piece on the inadequacies of that the difference is quite extraordinary and the point is not to say simply i agree with dan on this entirely wouldn't it be wonderful if the had managed to stand to the extent that google does but on the other hand would it be wonderful if a goebel managed in some way to arrange some kind of met the data in the way that they have and that is what i would like to to come together. the question on medicaid i'm going to pass over quickly that point you all to a paper mccaul record for the chronicle higher education, jeffrey numbered about google's as a meta data train wreck. he pulled up series after a series of things that google had gotten along. what is interesting and i will mention later he got qualified for that not so much by historians would very much wetter ian's the people who think him for it when actually
9:40 pm
the people that dan has mentioned both dan and john went to google and thanked jeff for the fact the of these millions of meta data errors in the database. so they took it seriously. one of the things they got from? this identification by title i will refer to one of those later. masses of the side of the kitchen by author with henry james writing madam boveri, masses of acidification by pacification with thomas brown's tabare real classified under gardening. and masses and masses of misidentification by date so i was delighted to find i had written a book published in 1879. you may wonder why remained looking so young. that last point of course it seems to me is critical for historians what do historians have going for them if it isn't dates? [laughter] and famous updates, you guys haven't got a lot left. [laughter] okay so is it surprising they went down this road? no, i don't think so.
9:41 pm
i don't remember him talking to me in 2004 and saying we are going to scan books and i said which ones are you going to scan? ascent every one of them. that will resolve all the problems of selection. that is what we got a very good way to begin and i think that google talking to people at google talking to other people that work on google books my sense is they will never say it publicly that they took a lot more that they realized they were taking on. the other thing i discovered and i had this to my and confirmed by google but i talked a lot of your leg libraries the first six the were working with google when google came to then they asked for the books and were offered meta data. and google said we don't need meta data. that's library staff. we can do it with algorithms. in fact they tiptoed little later as dan clancy will lead and say that stuff is called meta data. we wouldn't mind if we had a little of that and to get away with them.
9:42 pm
to do away with meta data, you need as they have suggested a very good algorithms. i suspect that is not enough but you need very good algorithms and google has much of those because people like brenda nouri enormously intelligent people but you also need to understand the problem and my hunch is golden and understand the problem and because that in a way is what the scholars to and if you push dan clancy to say yes we don't really a understand that stuff and why didn't you talk to the scholars and one reason i think is library instead as a kind of proxy between the two and that wasn't very helpful. so what is the issue in a way i think this is it. what we have got out of google books is a splendid wonderful fascinating early and irreplaceable marvelous fantastic can i see it again and again bunch of books but it isn't a corpus. and that is a huge difference because if we are going to invest in the library of the
9:43 pm
future that is what we want. we don't simply want the old library with a great running over it we want something a lot more adventurous than that and haven't got that i think. my way of thinking about what went on is to quote from the admiral will george bush and say it was a case of ms. underestimations. [laughter] i think they didn't really understand the problem was and some evidence for that of the and things like how they were discussing the project under way it got to me. win and jeff, my colleague was describing some of the rotten meta data and google a library and stood up and said, a senior library income and said don't worry about all broken meta data. you can find that put another way. and when i need a few remarks about goebel books and just as well in fact a rather sort of critical historian who dislikes almost everything i say about google world online google books
9:44 pm
is wonderful. saved me many trips to the library and a lot through the internet -- into the library and i hung my head and thought of that is really what we have invested huge resources because libraries have invested not just google and all we are getting is a weight is a trip to the library and the serendipitous finding on search but is a pretty sad outcome. and i think others have argued this is probably a once and for all scanning. but clancy to the wall again and he will admit you have to push him quite hard nobody was ever likely to take this task on again. google squeezed that space out. okay. so it's also started a lot of other scholarly endeavors. i talked to major research centers with a huge amount of money whose directors are saying to them give up that scanning, google is doing it purely as a
9:45 pm
people with particularly to and expertise are being told to stop because it went of people who have no durham and expertise of all are going to do it better. and in fact if you look in the google settlement one of the things they are claiming they are going to start building on the flight sub corporate so they can sell these to libraries as slightly lower place so they have to decide what is a domain and what is a topic and what are the books in site using the data they've built. and that is a pretty good concept. the other thing we have to worry about why lysing to be critical is also that google might get up and then we are left with a half finished project badly executed but nobody else willing to take on and i say that because branded again if you look some of the people of google in the i there's a lot of fear about what this is taken on and how much it is cost. it frightened then as far as i can tell. i talked to several people at google. then the question if i'm right that this kind of what is wrong
9:46 pm
why did it go wrong? i think one of the things is splendidly minute romanticism. they actually thought books were a whole lot easier than they really are because books look deceptively easy. now can you call those hard-headed engineers splendidly by yves romantics? this is a little unfair but let me say one reason why they often start off in one direction before you get a course correction. currently the predominant business model for commercial search engines is advertising the goal of the advertising business model and always correspond to providing quality search to users. we expect advertising funded search engines will be inherently biased towards the advertisers and we from the needs of the consumers. a search engine by this is particularly insidious. and it goes on to recognize the words written by page and are outlined for the google search tool. or you get a letter talking and
9:47 pm
brandon said people don't buy books anymore. you should put books online for free so a letter asked him who would pay me the salary to work on the book and who would pay for my trips to google including airfare, hotel and car, which ended the book and do the tour and marketing and prepare the index etc., etc.. by the end of my questions, he wanted to change the subject. the reason i think is he is innocent faith in the internet and adequate knowledge about how books are published. so these people are enormously smart but when you hear them talk about books you shudder a little. his article which he was writing while we were actually collectively at the conference in new york about google library, what was his probation for the library? in 1960 by the heat of a reference to an electric car and that idea that what we have got is just some serendipitous tool that can pull out little facts and again that makes the project
9:48 pm
frivolous beyond all measure. one of the reasons i think this is problematic and i have to go back to what i said before is that my experience working in silicon valley ten or 12 years and being around engineers a lot is that there is a very deep conviction that you can divide the world into resources and constraints. and what a good engineering solution does is it removes the constraint and you are left with the results. but so often the world doesn't provide easily that way. and the paper and books is one example in which it doesn't. and when we start to see the issues that come about when you try to remove -- let's remember 1909 as the first example of someone saying within a couple of years we are going to have newspapers without paper. and we are still battling a century later and we still haven't got there and nobody knows what the business model will be. and even people in googled was who i know very well are absolutely perplexed about what future is. so that idea that you can
9:49 pm
suddenly remove constraints and end up with resources i think is deep and that goes along with to enormously powerful but also problematic engineering treats or one really and that is the notion of proof of concept. that is i can come up with something with tools of god and i have a smart team that will show how this will work and at that point a lot of the very best engineers tend to walk away. and that is why things remain better because actually the parties work saying how you get from the proof of concept to serious execution can hang around for a long while. let me just give one brief example and i know this will drive down crazy because he asked for proof and i don't have scientific proof like all anecdotal but let me tell one place in which a cross swords with some of the people, it was an enjoyable and counter but it put up on the google blog the latest idea and i don't think branded was involved in this so without causing offense they said we can pick illustrations
9:50 pm
of books and put them on the cover and one that look pretty for all of those listed books and shriveled pretty and there on the book or four examples of books they had done this with. the first was called butterflies the center, it was about balloons and france and the next was american studies [inaudible] -- all three were misidentified. the first and probably the wrong title and missed all the serious information that went along with it. the second one, but that was also all the wood was in fact it took on butterflies a 19th century the was classified as a juvenile nonfiction. what about ballooning in france and 18th century was also classified as juvenile nonfiction and the studies of american fun guy was classified as cooking. [laughter] i didn't look inside the book because this was now not only worth looking to cede to get the
9:51 pm
information from the title page right, no but if they're boasting about the illustrations let's have the look of the illustrations and you go to the first illustration and of course it is a folded illustration and google can't cut the full of illustrations so you get this mass of folding so then i discover what the real skill how the algorithm was going into the book and finding the illustrations that google had and putting those on the cover. the thing than was i pointed this out on a list which i share with several people from google and was intriguing is they went and changed the name and author of the book and they got it even worse. my idea was they didn't know they were wrong in the first place and to use this book to boast about how skillful they were. they wanted to say look what a good job we can do. when i pointed out there was a problem they rushed to fix it and they didn't know what they were doing when they fixed it so that fixed it so they made the thailand author even worse than
9:52 pm
had been before. and that is a sense they don't -- what was interesting was the complaints i got about my complete was the first person said what on earth are you doing looking inside of this book as if that were some kind of outrageous behavior and then the second which i'm sure brandon would understand somebody said for heaven's sake this is just an engineer. [laughter] the first point is nobody says about someone at google they are just an engineer. it's like someone is just a member of the institute for advanced studies. it doesn't work that way. but anyway, google has clever ideas. the latest file of because google has chaos in dealing with multivolume work is the very nice idea to find an index in the back of a book you could actually make hyperlinks in the index so you can find the page in the book, click on it and get to page 43. that is fine unless it happens to be for volume work because the link will only take you to the page with in the book the
9:53 pm
index of years and the other volumes are left out what's actually meaningless to click on samuel johnson and he isn't mentioned in the section of the book. etc, etc. soberly and idea but actually i do. one of the issues i think for me then is the question of which banaa brought up concerns me a lot and this is where as scholars we ought to put enormous pressure is the question of quality in google books. how do you judge quality? one of the ways in which people often like my you flee to judge is to have faith in the old books. those old books and this is what got me into this problem in the first place is that is the faith in the project that said we will now load all these old books on to the net. what to remember is in the late 19th century is the home of some of the most atrocious publishing that ever happened. and those books hockley have
9:54 pm
been put out of the reach of most ennis and readers. if you are a scholar and want to look that bad additions you can find them but they are not the first copy that you will put your hand on. why are they the first copy? most libraries in fact have the through a -- inouye when doing system and move them into different repositories and you have to go down into the dungeons or drive down roads or fly across the state to find those books, and that was actually a quality assessment process that was enormously powerful. and with good publishing and good editing and better scholarship we ended up in a position where the books on the shelves probably one way or another you could say were reasonably the best quality. now google has gone in and they have sucked back out of those repository's because those are the first place is given. they didn't go to the shelves to which we have access.
9:55 pm
libraries said go down the road, store down there with all of the junk we have got. so the first that came out onto the net through google were the worst additions and i've been making the point about four years and each time i go back and look it is even worse on google books. on the first or the first dozen books incomprehensible to anybody who understood anything about the book apart from the fact that we landed volume three of one addition and down to volume two of addition. it was published 100 years later etc. so what we have done in a way with this great step forward is push it back into the past partly because the effective copyright always but also a naivety and you say this to dan clancy that is where the library and sent us so google should think nothing about it. so finally, and i've gone on too long but let me say a couple points why is it difficult to criticize this project.
9:56 pm
one thing it is great. it's absolutely wonderful. i would hate to be without and one of the worrisome things about the settlement under the department of justice guidelines is it wants to push us back to this did you for an awful lot of books but anyone who has a a snippet of view the fault all books will go back to that is disastrous and the way don't necessarily support the settlement that is one reason we should think about it. it is free. i have this problem when i criticized the measure are all these people giving the labor how can you come along and criticize it is likely to high school bake sale and saying that doesn't look like it came from [inaudible] but that surely is in the issue so simply. i don't think we could pass it up just because we are getting it for free. there's an awful lot of money and an awful lot of library resources that have gone into this and an awful lot of that opportunity costs turned down to do this and will never open again. so freedom isn't good enough for me to let them off the hook. i think one of the things that
9:57 pm
makes it difficult as google has remarkable chameleon character. when they want to call the library as "the new york times" article he wrote, they called it a library. and when you say this doesn't look to me like a library they say it's not elaborate. you can't call this a library. sometimes they have to come out and say are the for the general reader. if you say it is a research tool people will say no it is just for the general reader you can't hold it to those standards. if using it is for the general reader you get strange things like look i've got volume one and i can't find all due to of the same book and the library and said to me people don't read that way and that one of outrage. since coming from volume one to volume two isn't a bad thing to do. the other thing people are do is look at is just going to get better. maybe, maybe not. i have been falling one particular book. they know every critical and i talked to some of them about it and it hasn't gotten any better. it's probably gotten worse if you call it deals with christian over time that the other thing
9:58 pm
is we can't simply allow that inevitably, that technological determine the future. if it is going to get better we've got to come in and we have got to shop and see what is wrong with it and why should scholarly libraries have been cooperating in something that doesn't look in the least scholarly. if they want to go to the public library to the public library. the other thing is of course if you criticize things like this you start to look like [inaudible] and it is one of the most critical things dealing with librarians. as i said jeffrey nordenberg at a stroke point of 200,000 errors in the catalog and he was a prank by google. libraries went up the wall. and one of the things they did and one of the things i think disturbing most of all, and dan mentioned this, libraries do things worse and you will find senior library in standing up there whenever you criticize the google meta data they say we take a mess of meta data. we are awful. one finger is they don't list to enter a thousand books as the data for 1889 at the stroke
9:59 pm
keyboard. that isn't even in the capability of libraries. they don't have systematic problems. they are difficult because the drought problems. they don't have the same problems if using google tips the volumes, volume said all they would say we get the volume set wrong. the two problems are entirely different. libraries don't tend to get for volume works published at the same time long. degette multi volume published over 70 years wrong and who can blame them. but that isn't the same problem. but library and i think, and this is what worries me if we talk to senior library as the ones who talk about me as ranting about google, they will spend a great deal of their time to reading google bich -- by trashing their stuff i see we are incompetent therefore why should we worry about google being bad. i think that is a deeply disturbing thing. it's even more disturbing in some ways because when we suddenly say well, look, google is that they say we've got a solution now. we are coming up with a heavy
10:00 pm
trust. the trust organized by libraries is going to redeem all of the errors google has made and all of the failures that the library and the first place of setting any standards of quality on google and what do you read about the trust that the trust is no reprieve for a solution to the archive and digital content to can rely on the expertise of librarians and information technologies. ..
10:01 pm
to ask the volume to follows volume one or that you should be able to find whether madam over it was written by henry james or not is called biblio graphic fastidiousness. god help us all. of course the idea is you don't use it, or introducing the nuance of the debate. or indeed that you are feeling and pandering to special interest. none of that holds up to me. yeah google books is great, yes it is good for history but equally if historians don't start moving pressure not necessarily on google but on their libraries to think a little more seriously about what the digital collections of the future should look like and turning them not simply into a bunch of digitize books but into
10:02 pm
a reliable corpus then we are all wasting our time. thanks very much. [applause] >> thank you. those were both good talks. my name is brandon badger and i'm a product manager for google books and i must say i'm here for selfish reasons. because you guys are our producers in one of the things we try to do mcdougall was listened to our users and particularly larry and sergei are harping us to build for the power users because if you do that well, your product will work really well for everyone else and so for example when i worked on google earth and google maps because i had a chance to buy with that, you can spend a gullivan speier neighbors and see what they are building in their backyard, so in the world i was meeting a lot with people in geologist, people who worked with the world then tried to soak up all of their earnings so we could improve google maps then google earth
10:03 pm
then we still have lots of problems. i do a search for how to drive from new york to london and it puts mom-- tillie to swim across the atlantic ocean. but it tells you to drive off the bridge, so there are lots of tears right and we are constantly working to improve that. part of that is improving the sources of our data, part of it is improving your algorithm. you have all these data points of where the addresses are located and where the roads are and alico rhythms trying to pick the shortest path between the shortest points. you can improve that and also on google maps we worked a lot to help the users fix bang so now if you type in your address for your home and it is in the wrong spot you can go to google matsen corrected and it uses it to correct the system so there's a lot of analogies for what we can learn now that i'm on google books how we can fix things on google book so i think it helps us why is google doing this?
10:04 pm
weisband microsoft doing this and we are a corporation and we have shareholders. it is not a charity. why are we doing this and i think it helps to understand occulter little bet and i think they are very unique. which is what makes it exciting and fun place to work and you have probably seen some of the videos that you can ride around on scooters and it is a fun environment but it is an engineering driven company and i've worked at other places where it is the mba's in their suits and they are doing market research and telling reengineers what to do and this is a unique company were to slip then it is very much bottom-up driven. a lot of problem-- products are coming from the engineers ideas. if anything management told them that was a stupid idea there are deep tons out there but the engineer felt passionate and was given the freedom to continue to work on it in projects like you books and many others are the
10:05 pm
same way. someone is passion about the topic and google is a unique environment where you were given the freedom and resources to be able to pursue those. and with google, larry and sergei going back to their days they were a little older than me at stanford, one of the problems, they are problem solvers and they like to solve their problems and one of the first problems was how difficult it was to find information in scholarly journals and books so one of their first ideas even before their idea for the search engine was scanning books and making those more accessible, which seemed like a crazy idea so they went on to their other crazy idea which was making a search engine and luckily that went very well. [laughter] in a lot of ways it is that success has given them the opportunity to do this project which they are very passionate about so this isn't part of google.org. it is not a charity project but
10:06 pm
it is hard to qualify it as necessarily a just business oriented as well. there's something driven from the passion of google. we are people and we feel passionate and we are excited about what we work on and that does play a large percentage of what we work on here. so what are some of the things i am excited about or continuing on that story so why is google-- you have for the mission of google to organize all the world's information and get it-- make universally accessible so the ideas what about the information that is pre-web? for example of the information in books and what can we do? so that is where this came from, so there's a lot of difficult engineering projects there. scanning books, you have seen the hands and a prince and some of those are mine. [laughter] here is a book i was reading on the plane that i am turning to-- trying to learn how to chip in
10:07 pm
my golf game but if you were to tell me you would stand as but it would be, i could get a high-quality camera and i could spend all day taking pictures of it in working on it and i could spend time researching it in getting the meta-data to do a really good job so our challenge is how can you make that scale? how can you do that not for one book but really all the world's books, because even go-go we don't have the money. there are research constraints and it is money per buchan also time. to do every book perfectly you could do it in a way where you do a perfect from the beginning and it takes 100 years but then we are all dead so, this really is the start of a long project. one analogy is do you remember earlier in the web days when you load in image on the web page digitate peg images would look perfectly by fill-in slowly and that is one good way to fill-in
10:08 pm
and image. now you see it loads the whole image and it is sort of laurien phil sanon teledias perfect and sort of the analogy is that is what we are doing with google books. we are trying to make this information as accessible and useful as much as we can for many of you it isn't in our lifetimes, and then we can continue to go back and fill in and work in the gaps. and so what else gets me excited? i think obviously the market is asian of this information and making-- it is exciting for me to think a student anywhere in the world can access this information. and, so i think at this point i will leave time for questions, but thanks. [applause] >> okay, thank you colligan to
10:09 pm
all of our speakers and we do have plenty of time for questions, so i have been encouraged since we have the television there that anyone who has a question please go to the microphone and speak it in there, but otherwise i will leave it open to the audience. anybody, yeah. >> my name is jim goodell. i have 21 books published and my concern with google books is a lot of stuff i do i have dealt in a somewhat classified environment. i have gotten it all in the unclassified four, but i spent many years researching people, and having little red.on my chest and the desert. you know, what is my incentive if everything i do a potentially goes on line to continue to do what i do because i have seen some of my stuff that only i have never required.
10:10 pm
i have seen it on the net from places in europe and asia and whatever. that is one of my main concerns. i know we have copyright and trademark standards here in the u.s. and a lot can sign up to a but there are a lot of characters out there that really could care less about intellectual property. >> that is a really good question and i think with the internet it offers a lot of opportunities but a lot of difficulties and challenges as well. on google, we are trying to be basically help you reach the audience of users who want to purchase your books so what we are trying to do, what is the topic you wrote your book signed? [inaudible] >> what i am hoping for is when someone just go to google and i search for the lockheed blackbird plane, we want to be able to show your book, and ideally give them a little
10:11 pm
snippet that reinforces them that this chapter 3 of your book answers the question and then we want to have buy links where they can purchase the book, so right now one googleplex we have links to amazon, barnes & noble and borders in basically the book retailers can imprint their price in the system and then we include users after that. we are also working an initiative that we talked about call google additions where we will self digital access to users will be able to click on user books, pay $25 or whatever you want to pick the price for your buchan then they can have immediate access to the whole book riding google books and what is exciting about that, with digital reading i don't think we will ever lose the paper but but digital reading the idea is you can keep your books and the cloud, but google books can store the fact dewall knupp buchan we are making available lots of different access points for the users to
10:12 pm
read that, so the idea is that you can be at home and maybe it is saison e-book reader or sampson book reader. you can start to read your book and then when i am on the go and i have my phone with me, i can continue where i left off so i think it is a compelling offer for consumers and then ideally you would sell more books to this exposure but you were right there is a lot of piracy on the web. when twilight are some big book comes out of scenes like the next day it is-- and those are not stands from google. they have even crowd sourced it where they basically say everyone on our web site, everyone types of one page and they put it all together so they are quickly able to transcribe these popular books. i not sure there is a solution. one of the solution is having any alternative legal way users can buy things quickly and we have seen that in the music industry. for a long time to music
10:13 pm
publishers were resistant to selling digital music so there wasn't a legal avenue for consumers to buy music so there was a lot of piracy. it seems to me when you see jobs and apple came not with itunes it mitigated that. oftentimes it is not worth your time, to go try to find the free music. it is easier to buy the song for 1 dollar as we work together to make easier to buy digital books hopefully that should mitigate some of the piracy. >> thank you for your brilliant overview. my name is christopher and i mer researcher from hawaii. i understand the new google iphone and the opportunity to look at some new trends in history research that can help a lot of students. if you look at the area of let's say wikipedia, it is basically familiarizing people with a broad overview before they get into a very detailed specialized
10:14 pm
research. are there ways to visualize or conceptualize the liberalization process weathered his mind mapping nour media timelines? is there a new way to help people look at that from motorization period. there are some good books on timeline and the ability to look at timelines, stretch and shrink it and then go right then may be helpful, simple way to bring history together for a lot of people to look at multi-trends, technical ecologic trends within a particular time period and through the visual structure similar to g thy s that you are looking at layered informational systems, ecological, economic and using a layered system a long a timeline of, that would really help to visualize and familiarize that system in a way that makes sense. i work with one of the leading geniuses of gis, professor
10:15 pm
mccardle and he started with a simple principle of a layer cake construction reality. when you are a little child to have a birthday and a birthday celebrated with a birthday cake and that wonderful thrill of cutting through the layers and understanding the thrill of you know all the different elements. i think if you would look into how to enable most the world to understand huge complexities of the world echo system behavior with this simple understanding of the layer cake structure with system similarly following that similar model of stretch timelines with scientific technological layers and historic context that would begin to go in or go out but there is simply organize in a way that is very powerful. >> i think there to keep things there. once two of digitized buchan scant did it allows you to study information in interesting ways that you otherwise couldn't do and beyond that it allows you to interact with the content and
10:16 pm
learn new ways as well. the interesting trends and analysis you do with the digital book, so for example on google books we have an interesting project where we are able, now that we have scan the book were able to look through and find a place names, and so let's say it is a book like around the world and 80 days and were actually able to extract of the places then we can do a matthew of the book so to help launch this on google earth so you go to google earth and you can zoom in on the location mike is a man on london and then you see the references and any book we know about to that location. id is an interesting way to find a book that way to. google web search has something where they extracted from webpages and then are able to delay timelines dso you can search for like george washington are some figures from history and then see the spikes
10:17 pm
on the timeline of the books that were referencing that person. that is in the data extraction side which is interesting and i think it is very exciting to how you can augment books and make it easier for students to engage with the contents so here is my golf book and it has got photos for how-to to a pet shop but what i really wanted to see a video and if i'm studying a history book, i was just out on the aircraft carrier. if i am reading a book about that i want to see the videos from the world war ii footage and i want to see maps, see where this aircraft carrier went and more photos and would it be great if users could contribute, like you said layers of google maps, people can create these mind maps where people have their photos and videos to it. if we could do the same thing with books were you still have a core book. you are not messing with that but to this id could have these delayers people could turn on internal and sharon ahmed. i think back to my days going to
10:18 pm
school i would read a book and maybe i would understand 20% of it. my professor would help me sort of get a deeper understanding of it in we would have that classroom discussion and we would all get our input. in the digital world where we are all connected it is very exciting to think about virtual book clubs are university class is where you can get together and discuss things and maybe professor writes there authoritative later that sort of slices and dices the book explains it and i can find that as well. >> a fica just comment on that because one of the comments i was trying to get out earlier right now they are all those layers that existing googleplex in their written by google engineers. they are things you can see that google has access to. they clearly have a function that they can write to see all the books, even if it is not cited. we don't have access, and we haven't been able to create anything on top of that. the google properties that i was
10:19 pm
talking about before act as platforms on which scholars, software developers and users can build things. we have not seen that yet out of google books and some of this is because the legal problems but still the intent when you look it visn here is where it very much agree with paul is there's a tone deafness to the way history is rich and complex so the kinds of things engineers do with history is they extract place names and make a map and hope that is interesting but there are others among us that want to do other things on top of google books. we don't have the kind of access to the programming environment dep kiewit like. >> fica just add to that, dan and dire going to become best of friends up were not careful. the point is absolutely right in underlines what i say about proof of concept. i think the google algorithm is great except it is wonderful to discover that king lear takes place in upstate new york because of goebel melbany.
10:20 pm
there is a place name for you. or for instance the trollope when he wrote the wharton was writing about maternity and child bearing, because when google took out those wonderful quotations in the books they forgot their advertisements in the front and back of books so if you look at the most quoted passages in trollope they have to be books published by the same publisher that your that your advertised in the front matter so the proof of concept was wonderful but the detail the execution to get it right so that your data doesn't take you to the wrong place in the wrong time needs exactly the sort have been put a dan is calling for and b that google doesn't allow and that is why in the way some of these tools are enormously powerful and completely under fulfilled. >> i am ander lee. i'm writing my dissertation in history but my real money job is history is not enough money, i am a librarian at new york eunice versus thee and i'm one
10:21 pm
of those luddite's. this is the university mac because they couldn't buy me a pc laptop so i'm learning how to use a map-mack. i think that books are still important and those of a rhythm's r.e. problem. we are using algorithm's to read half a million bucks out of the library, books that not been circulated by any humanist and a lot of other fields, they don't check out every book that they use. they go to look up a reference. that book doesn't get circulated. if it's taken off the shelf, gets looked at and some people come back and some people don't. they don't get counted so my experience with google books is one of was first coming up. my dissertation is on gender and anarchism in spain. there are so many problems with mapping because when the spanish republic and-- all the streets renamed and when the revolution
10:22 pm
took place suddenly things were renamed and then when franco won the streets were renamed. then when amman mike came back and franco died the streets were renamed. and, oslo, the capital of norway was-- for a long time, so those are problems i think google has to solve. there are problems with the people use and it concerns me and we see it in my university and when i talk to other colleagues and people on the international association of labor history institution so i am very interested in the kenyan article. people are coming to the wirier because they think they are finding everything they need to find using google in looking at things on critically and they are not, i don't know how many students come to me and say i want to write about the battle of saratoga. and i need to find accounts are even better recently a student wanted to write about sailors,
10:23 pm
sailors maeneen pranked seamen, accounts of the napoleonic wars of being a sailor and i said he will be hard-pressed to find first-person accounts because most sailors were not literate. but they used google and found an article of the diary that was sold and they want to know why they could find a library. they were unable to make the distinction and i think ann is right that google books is great, proquest, elsevier god rest whatever sold they may have are much more of a problem because we get gouged and analysts' market routine were over 50 percent of our budget goes to electronic products and what many of you may not know the product itself may not be that expensive. they had this every year with maintenance fees but with a look trinal-- electronic journals we are pinker the main in. >> that every year escalates. and i am also on the text mining projects on board with the center for history in new media
10:24 pm
and some of those products are really great and i really look forward to seeing what can come out and being able to do some of the stuff that dan says you have dalco repost orrin making it available so we can get people to start using them but my concern is a library is more people come to me and say i can't find anything on the greek forcible removal of greeks from turkish territory after world war i and they don't realize that they don't know greek and they don't know turkish google is not going to find it for them. >> definitely google books is just a tool and die of which i could make-- but there's more to it than that and i'm sure you guys can attest to it. i think, i think as soon as you see it as a tool so maybe it starts to point you in the direction of the libraries you can't help but he still need to get out of their pajamas and go to the library i think and interact with the original books as well.
10:25 pm
go ahead. >> it seems to me that there are two different strands of criticism of google books. on the one hand questions about access, and actually in some ways in which google allows access to things like proquest and elsevier and questions about the quality of scans or individual things, so i am curious to have our panel of critics, paul in dan and from a wider perspective if sean wants to add things way out how you would evaluate what is more important because i think one of the points stan try to make is that the access questions aren't being asked as much as we hear about and hands on pages or things out of order, meta-data issues and i'm just curious about how each of you would weigh those out? >> i would say very quickly dan in many ways offers a solution.
10:26 pm
i tried to talk about some of the problems of opening it up will make a huge difference. [inaudible] but i am not very excited about-- >> just a brief comment, you could come come pair goebel's meta-data to the internet archives of the library project which they have completely crowd source then they have gotten very strong support from the library of congress which is donated meta-data records and other places. i would love to hear-- there was at some point it big meeting about the graphic control and some kind of unified system meta-data improvement and sharing and again, so i feel a lot of the quality can flow out of a more open environment where we have better sense of what is going on. errors can be corrected in prodder ways with the
10:27 pm
institutions and individuals. >> the question of which is more important, i think they are both important and we have to fix both of them so going back to the maps analogy and how can you fix the bad directions and what not part of this is fixing all the repost but part of it is fixing the rod data in building tools so the experts-- who knows better about where your house is located then use though in that world we gave, let you place on the map for your houses. of course you have to build tools for the prankster high school kids who will start moving the hospital around but the console that. to people say that the hospital is over here. you can trust the credibility and what not so we have taken baby steps towards that on googleplex so as i mentioned we do have a feedback button with when you find a button you can flighted and that brings it up to our attention so we can have a human operator look at it and try to fix it but that is 5% of where we need to go and that is
10:28 pm
definitely the direction where we are going so i want to get to a world where when you see an ocr era where we have done something funny and messed up in the transcription there, why shouldn't he be able to type it in and fix it for us and then we can have a system that then takes this input into people say this word should be this weekend trust that and i definitely think that is where we are going. >> just one comment about the meta-data issue. i have written about the cells were. i think meta-data is pretty weird stuff and i think i looked at a project which simply tries to open source the identification which tracks on proper albums in the archive that identifies it you use itunes. it goes astray very quickly. it actually is a hard thing to resourced and i'm not sure the archive is actually tabbing the success that it hoped with
10:29 pm
whipkey on that so in principle i think it is a very good idea. i think meta-data is very difficult. >> i agree and it doesn't completely solve it. just turning meta-data into wiki's complete lease salsa but i think it is one step. >> as chair i try to step back and not make a comment but since it has been pointed out several times that i am the only library and upon this platform i feel obligated to say a little something here and from a librarian perspective of all of these issues are there lots of criticisms to be made about google? certainly however we can certainly say at least they used to work it the university of michigan of the iowa's and directly involved in this project that we had digitization programs going on for a long time before this where we were dealing with all of these issues but the problems of intellectual problems are so huge it was usually just easier not to deal with them, so we didn't.
10:30 pm
and for the first time when google comes along and says we can do all of this, we can digitize all of these books at the university of michigan would have taken over 1,000 years to do with staff we currently have and i think we owe them a tremendous debt of gratitude in that sense or if nothing else realizing these issues exist and we haven't been dealing with them. so from my perspective a lot of the time but i think we really ought to be focusing on here and what is important is that we have tremendous possibilities that googleplex and things like googleplex open up. is it perfect? no. are we ever going to live in a perfect world? no, so what kinds of things can we do and some of the things the poor brought up with mapping, how can we work to try and get something that is maybe not perfect but it least it can pass? also kind of the things you were bringing up as an author.
10:31 pm
myself too, i want to get my work out there in distributives and how can i do that in a way electronically that is probably again, is there going to be piracy? yes but i'm also going to get my work out there to a much broader audience than they have been possible before so i guess from a library in point of you this is really how we see it, that this was a tremendous opportunity to bring these issues to the floor and now it is a matter, there's lots of imperfections but how do we work together to try to get them into something that is at least passable and most of us can say is okay? so, thank you. >> i first want to thank you all of you. i was surprised to find how much sympathy i was having with all of your arguments as we went along. i am all there for openness. i want that data. i am trained as a medieval list
10:32 pm
in literature, so when you start talking about the books and talking especially about the meta-data about those books, the problem of who owned it and where have they been, i am right there with you. i now work as an stach mag a strange beast called an instructional technologies. i am working on trying to coat up, it is not quite up to google's standage but it is pretty cool stuff. it is interesting, what you were starting to get into and focusing and the question i have for you because what i heard from everyone i think in the current theme here is trust. how do we decide when we trust data that we have and how do we decide whether we trust the answers that google is giving us, so i was wondering if i could just ask all of you to maybe look back on what you have got so far and think as a
10:33 pm
scholar, trying to produce an argument, trying to get my information together, what are the standards of trust that i would like to see at work and didn't play in a project like this? >> that is a good question. >> i will start. you know, i think trust is a good question and i am not sure as a historian and an academic, i mean, it is always good to not be fully trusting, right? we have all gone into archives. i certainly have, where people live in letters that we read in forge states and airbrushed people out his paintings and photographs, and it is not too bad to have a little bit of paul with us all the time to be the skeptic, the knowledgable skeptic-- [laughter] who can see that there may be reasons to not trust things. i think that is not a bad feeling and i want to give a
10:34 pm
quick example because i have been in brandon's position at the center for history of when we could-- cocreated with the american project, which was a very large archive of the materials that we gathered via the web in partnership with the library of congress and the smithsonian to work collecting physical artifacts and we ended up with 150,000 objects that we gathered from the web. i remember going to the society of american archivist is order present this project and the results of i.t., and immediately people started signing up, it is not then archive then you know that someone lied when they said they were colin powell on their web site in what are you going to do about that? i guess my blasé response at that time was you know that we just, we don't check their brains at the door when we look at these things, that we have to
10:35 pm
go into some amount of skepticism and the i think what paul's very good point is that we actually need more meta-data. when we did the september 11 archives we tried to provide researchers who look at our archive which is now the library of congress with as much meta-data as possible. this was the ip address, here's the g.i. location of that ip address. we tried to e-mail the person. the emile did not go through. here are all of the things we did but it the end of the day what we had was a wrought archive that we knew was problematic and we tried our best to reiterate toward something that was decent and usable but we left it to the historian in 50 or 100 years to analyze all that material and measure it against other sources as we do every day. so lets some point we are going to hit that wall i think with google books. we are not there yet. i think brandon will lead matt,
10:36 pm
as in free speech, more free speech is better, more meta-data is better. where did this meta-data come from? all of these things can be helpful to the scholar to assess voracity and trust. >> the only thing i would add to that is one of the key things for trust is having sources for example so we probably stand tens of versions of the particular book hamlet so let there were a case where one but had something airbrushed out or scribbled out the ideally we are helping by giving you access to wall versions of that book from ten different libraries so on the meta-data as well, what more resources you can make a better decision about what you trust. we are trying to make our corp. web site better. traditionally our side has been focused around search and we are trying to make it easier to rouson to subject. i got it set up so you can browse the vimont so you click
10:37 pm
on the computer subject. i did this just the other day. wise hamid the beginning of the computer subject so we start looking into it and get meta-data sources from 30 different sources and 20 all say that this book should be in the computer section but then we have one random data source that says it is a romance novel so our algorithms were not set up to deal with that. we were too trusting in the case of that data source but the advantages we often have multiple sources so you can start to trust that some people say this book belongs in the computer section, is more likely than the one who says it is a romance book. >> i think the question is it a very important one. it is a tricky one. in general there is a way in which this is shoved off on someone else, often shoved off on librarians where people talk about a concept called digital
10:38 pm
literacy and assume there is some magical way-- i would invite you to do a google search on anything and look at the results you get in then think about which one you click on and think in a way what kinds of years and years of them bodied knowledge goes into the decision about the first wing key click on and what the one you laboi is. how do you shortcut that? i don't know but i think it is very tricky. one of the things google has going for it, i come back to the early page in brin because i think google searches getting worse in many ways and are weighed down by commercial concerns but they have an enormously reliable ranking and in general if you don't unless you have something strange go much beyond the first page along the first three pages. but dan brought this up and it is a terribly important point. there is no way of understanding how the first ten books rank,
10:39 pm
what that means. it is certainly not the standard page of a rhythm or anything like it so what that means and how you were meant to make sense of that i don't know and that would be alright but somehow that was like but the fact that it looks exactly like the standard google surgeon therefore somehow you are meant to trusted in the same google search strikes me as enormously problematic and they are all these ways in which you say you should not disrupt that order because it doesn't work. another thing in fact i would just say although i know a lot of people i used to work with that xerox went to google inside meyer aidid date-- great deal but it somehow worries me the first thing you see on google is coming to be blunt, a lie. if you look at the page count for the number of hits it is almost always utterly false. why? well, some of the-- it would take as a couple of nanoseconds
10:40 pm
longer to get that figured right. take a couple of nanosecond's but if you want people to trust you don't begin by lying to them with the very first thing they see. >> i think to that point there is a lot of difficult engineering that goes into figuring out how many hits a certain query hits on, but go ahead. >> i apologize in advance if this question is confusing because the subject matter i find very confusing. i was wondering if the first two speakers might comment on the underlying tension between the i guess what i would call the fragmentation on one hand and comprehensiveness on the other hand when we think about information resources. it seems to me that there are
10:41 pm
costs and benefits to having inquiry approaches that either are kind of embrace fragmentation or go for a comprehensive approach on either side. often human life can feel quite fragmented and so, the modern world may be best understood from this angle or that engel and at the same time historical societies often like to aim for having a comprehensive view of culture and knowledge of cedric and perhaps that got harder after dante's time, i am not sure but i guess what i am looking for from you is how google books contributes to the latest round in this tension, and kind of the societal perceptions of whether this sources they use their fragments or comprehensive to start and
10:42 pm
may be the interesting analog is the example you brought up of the book, which looks very beguiling simple and commands an authority of some kind, some kind which in fact this much more complex underlying that, so maybe this is a roundabout way of asking, is kind of the comprehensive the that google is hoping to get in may not achieve, is that really just the new fragmentation for our era, and what does that mean to users at the end of the day? maybe you can make something out of that model question. >> sure. [laughter] look, google excelled at first and continues to excel at with the exception of google books is that it is a master of aggregating resources that are web scale in providing search and access to those things.
10:43 pm
that is what it was also set up to do. i said earlier that i found google books to be on the property and here again i feel it is very strange they are trying to come up with a comprehensive library on the congress when what they have done all along is say gofourth and have 1,000 servers with 1,000 books and we will provide overarching access to those materials. one could have imagined different google books which really was a google search that would have as paul noted, is subject to domain experts digitizing small quantities of high-quality and providing as they do with web sites access to google with google provided what is good that. what is strange about google books is that they are trying to do the aggradation themselves and for me that is not web skills so again i find it to be a very odd project for them to take on and i wonder if there is a different version of this that we take what has been scanned that is scattered around the globe and does something on top
10:44 pm
of that as they did with web search. >> i think that is a very on this point. >> very early on google talk about the library project before they started denying it was a library project and they said our goal is to create a comprehensive searchable virtual card catalog of all books in all languages and a change from that. in some ways that they had done that and let other people do this canning and provided ways as they say to help readers discover books around the world that with this then i think in many ways playing to more of their strengths. now they could certainly have we hope given some of the money they put into the project to help other people do this scanning but it might have been better if they have left that to other people. the basic question you asked at lincoln you know this has all sorts of the antecedents, the alexandria myth, the idea we can collect everything in one place and of course again i mean, i don't want to keep going on
10:45 pm
about this but there is a wonder roll call naivity which got the whole project started but when they began they said there were going to organize all the world's information. conceptually that makes no sense at all. there's no such thing is all the world's information. maybe it was a good idea trying to start of doing that because you get something done but it creates an idea of conceptual, so to come back to the question in a way that you asked me about the book i draw attention to my colleague who wrote a very good essay on the collection of that future of the book and he began by pointing out that we talk about the book as if it were a unified optic, and it is not. one of the first things that scanning in digitizing did was it pulled apart many of the different aspects of the book, so for instance it was very easy to take the boeing parts catalog or the railway timetables and those will never appear is books
10:46 pm
again. they were not naturally books. bajis happened to be in codex form. whereas we know this scholarly historical monograph is causing all sorts of problems so the way in which different john res are invested in that particular material property is very different and we confuse ourselves are we suddenly lumped them all together and libraries have lived with that confusion for long time but found ways around it. it glosses over more problems than it reveals. >> on the fragmentation, there's a story from a professor that the assigned topic to their students, a research topic and the students came back with all the source material, all books that were failed bun google books so i think the risk is that google books is to use the land too easy to do research in your pajamas and scholars use it
10:47 pm
as the only tools so i think it is up to you to see it as what it is, when to let you still need to go to the library and extend your research so we don't want to get you in the world where it is fragmented and when someone goes and researches the topic they only do the research from google books. >> can we talk about how google communicates what it does? i think that is a fine call to arms for all of us, but since you are trying to kind of create your user-- >> you might want to talk into the microphone. >> since you are tried to create the optical-- optical-- how will you direct us into what it is and what it is senton burr you say we are going to be comprehensive kind of year but we recognize there is everything over here and we are going to give the these platform tools that spoken to the rest of life as opposed to kind of maybe making people think this is kind of the la jolla or something
10:48 pm
like that. >> that is a good question. we are doing our best to scan as many of the books and make them available as we can and we are adding tools that the user community can use on top of that. we do have programming interfaces available where you can search for books then the return meta-data that we have, you can imbed our preview on your own site and build on top of that so that is definitely the direction we are going. and a going back to google, there is google books and there's also google the search engine and our hope is other people will continue to scan books and do similar projects and then from google search weep point the users to whoever has the best source of information. >> anjos to build a little bit on what brandon was saying and again speaking from formerly michigan university standpoint, at least the books we owned that were our books and by google, we built our own database for
10:49 pm
searching of those books, and it many ways we were trying to improve on some of these issues we have already been talking about. we did some things, indexing for instance is a lot easier to do with m books, which is the implementation of all the google that we had at the time and are continuing to build on. and also trying to correct some of the meta-data issues so really from a library perspective this was a great way to get the books the and and then we could kind of handle all the problems at least for our own stuff as they came along and as these books get google indexed, presumably if people search on a particular topic our books will come up there as well as the google books that descanso everly delancy these things as being mutually exclusive in a lot of ways the trying to work together so building more meta-data, more functionality is really at think not incumbent just on historians
10:50 pm
but librarians to think about the things we need. >> my name is alex from the-- state college in my question is, it seems to me the criticisms of the academics and the librarians about the way google has put together and that meta-data is organized are probably very accurate but it doesn't sound like there is any incentive for google to respond to any of their suggestions however academic they may be so i'm guessing it may be because google owns and operates the means of physical storage for all of the electronic information. i suspected some point it will be so why don't you team up? why don't libraries and academic institutions help share the cost for the physical stores of google books in that way you can have more of an advantageous position for getting the information stored correctly? >> definitely we want to work together, right? we have expertise that we are
10:51 pm
willing to take on, sort of like where the computer scientist then those of benefits especially when you were dealing with millions and millions of books. if there's a problem in the meta-data for this but it is real easy to fix this one book about our general premise mind is not just to fix the one problem but the drawed how can we fix it so that when we fix the out for them were not just fixing it for this one book for all of them so we bring the experts to the table but obviously historians and librarians and people have their own expertise and we do want to work together with the different groups. >> i feel like i am talking too much here but from library perspective again. i don't think google is the only ones storing this. michigan was storing all of the books they stand in our own databases, so really and for us, which as librarians, an issue that is incredibly important is preservation of this material which is not an important issue
10:52 pm
for google so it is really important that these things be stored in databases other thing google and i think light green certainly see that as an important issue and have been dealing with that and are dealing with that and storing these and lots of multiple places. >> i am a graduate student from the university. i would also like to follow up on a comment he made earlier about google books being a real extort may democratizing implants for students like myself who are not at harvard, michiganers danford but now have access to all those libraries. i have got a phone that is talking to me. one of the concerns i do have and you alluded to it earlier when you were discussing why it is that google books is important to google, and one of the things to highlight it was the fact it is very critical to part of the mission that google. my concern of course is that
10:53 pm
founders take up golf, they have grandkids and if anyone has been falling apple they get sick. and, that is my concern, that it some point what is potentially going to be the most extraordinary digital archive in existence is going to potentially be able to be turned off when it this suits take control in this it's always end up taking control. [laughter] i think you can-- that is true. as a shareholder of google i have some point want the sued sued take control and it is critical that it stays affordable for people like me to be able to continue to research and access this data and that is probably an unfair question because it is not really something you can answer but it does seem to highlight the point that there has to be a way for this information if something does happen with google, that it has to be stored in different places because of google turn sseba then it is done.
10:54 pm
i currently have 1600 volumes of medical journals from the 19th century in my library and if google decides one day, due to the current laws right now this is not worth the hassle when they switch it off i have lost all that data. >> that is a good question. obviously, hopefully the founders don't fly in airplanes together at any one moment, they are not allowed to in case there's an accident but i think the solution, they have done a really good job of pushing out the culture throughout the company and it is a part of basically everyone there, so i think even if the founders did go away and some evil businessmen came in and took control i think we would kick them out pretty quickly. [laughter] there is enough control, there really is a bottom. the power is distributed at google. there is really no master plan from the top down. it is really spread out to the corps engineers who are making a lot of these decisions so you
10:55 pm
have all heard the, don't be evil and it sounds kind of silly but people really do take it seriously. it is the company built on trying to be different than your stereotypical thoughts of what he might have thought of microsoft or some of these other companies as well. but you know, could google stock tank and go away? what would happen to these books? i think it is a valid question and that is why it is important for these librarians so when we take-- check the books they archive that and that is why it is important by the people do this as well. when we checked out a book from the mishkin library just like if you were to check that a buckweed check it out, we scan it didn't give it back. it is important that other people, when we do that we don't damage the book. we definitely don't want to be the only ones with the digital version like the heritage's literature. also to say, we all wish the
10:56 pm
best for google obviously and ifill somewhat ironic since i am wearing a suit. but paul mentioned the antitrust negatively but without going into that comment, what they founded to do was bring all of the whatever it is now, the google libraries that were having their books digitize together and try to figure our way to preserve this content so that, if heaven forbid google went bankrupt and goes away, then at least there is somewhere that has all this content together and is working hard to preserve it in some kind of managed way, so libraries of thought about this issue and are working on this issue. >> i just want to pay a complement to google and google books which i know nothing about but google earth is an absolutely magnificent product,
10:57 pm
and it gets its appeal from the amount of data and oecd is to move about and you just feel like you have the whole world on your hands. is there in google books-- without revealing your secrets, where are you going with novel ways of moving about all this huge amount of information besides just keyword search, which i imagine is a very good beginning. >> i am in charge of the web site and in a lot of ways i'm almost embarrassed by the web site at times. step one has been sort of getting all the content and getting these books and i think the next phase is to have all these books, how do you make it easier to find affirmation and to find books? i thought the music world, if you have access to wall the world's music the problem is what you want to listen to write now? it is the paradigm, the paradox
10:58 pm
of choice so i think we have, you go to the google books homepage and you know there are a million books there but you don't know where to begin so that is something we need to start to address. some of the interesting trends, i think social is going to be an interesting way to discover content so for example that is how we discover what to read often in the physical world like your friend starts reading a book and recommends it to you or someone read the book and you want to read it so you can discuss it together. it will be exciting as we connect users to the book and dhanji milieu have their contacts and your friends. if we could start having people with their reading lists and things like that, the big thing now with facebook and frand peaden some of these other activities streams, what that is is basically pushing up dates so when you check your activities jamie see here is a picture of your friend bob that this issue or jane updated her status updates so uninteresting way to
10:59 pm
find books is you get updates of bob finished this book or jane started to review this book and you start to find to read books that way. but then also giving the people the tools to create their own collections and so, if you are interested in golf, someone creeds their top ten books for golf and you will allow the community basically to vote on an filtered those collections as well. a lot of side to this pretty well, where you search on the topic and you can search those as well so i agree with you that it is a problem. the problem now is you have so many books, how can you do a better job of helping find the books for them? >> we still have time for a few more questions if anyone has one. ..
357 Views
IN COLLECTIONS
CSPAN2 Television Archive Television Archive News Search ServiceUploaded by TV Archive on