

Full description not available
J**D
All the King's Words
Erez Aiden and Jean-Baptiste Michel are interested in word and phrase frequency and what it can reveal about history and culture. They illustrate their approach with a timeline graph of the phrases "The United States are" and "the United States is." We are unsurprised to see the "is" phrase increase in frequency after the Civil War, as the "are" phrase fades from view. This example supports our intuitions about allegiance to the Union supplanting allegiance to one's home state. It also builds our confidence in their historical profiling method for those other times when it finds a counterintuitive result.The authors are confident in the value of historical word frequency analysis. "Big data is going to change the humanities, transform the social sciences, and renegotiate the relationship between the world of commerce and the ivory tower." They begin searching for larger and larger collections of text to analyze. They eventually wind up in the office of Peter Norvig, Google's Director of Research. They convince him to grant them access to Google Books, a tremendous digital library containing more books than have ever before been collected online. Not only do Aiden and Michel spend several years conducting historical-linguistic research, but they also author a tool (available at books dot google dot com forward-slash ngrams) that allows everyone else to do the same kind of studies.Their book outlines how word and phrase frequency can be used to learn about cultural and historical change. It tells the story of Google Books and how the authors began to use this collection of digitized documents in their research. And it provides examples of interesting trends they have brought to light. Examples include:- Tracing the relative "fame" of Neil Armstrong and Buzz Aldrin following the 1969 moon landing.- Illustrating the effect of official persecution by tracing references to banned European authors before, during, and after World War II.- The same approach is used to illustrate the effect of Hollywood blacklisting during the McCarthy era.- The effects of "flashbulb" events such the sinking of the Lusitania in 1925, the Japanese attack on Pearl Harbor in 1941, and the 1972 Watergate scandal.- Graphs of the relative popularity of various world population centers (cities).- The explosive increase in use of George Carlin's "seven words you can't say on television."The book introduces some of the techniques of text analysis and "big data" in an accessible way. However, it is lighter on methodological detail than I would have liked. Having stimulated my interest, the authors might have done more to teach me how to do their kind of trend analysis. I have to forgive them because of the extensive and readable Notes section at the end of the book. There is a lot of information here that I am still digesting. Slowly, I am learning more about their methods.This book is worth reading, particularly if you are interested in history, culture, and language. Be sure to check out the authors' online ngram tool, too. It's worth spending some time with.
R**D
Is Big Data the End of Scholarship?
Two young research scientists from Harvard University, Erez Aiden and Jean-Baptiste Michel teamed up with Google in 2010 to create the Ngram Viewer. It sifts through millions of digitized books and charts the frequency with which words have been used. On the day that the Ngram Viewer debuted, more than one million queries were run through it. Some consider it to be at the center of a major revolution.In an interview with Studio 360`s Kurt Andersen, Aiden and Michele said how pleased they are that the new technology can open up academic research to the "independently curious.""It's good that a tool that's at the leading edge of science can generate so much enthusiasm in the general public." Michele cautions however, "it's inevitable that a tool like that will generate a large number of discussions that are actually irrelevant or that are flat-out wrong . . . it's still important that bona fide experts are the ones interpreting the research." [1]In their new book Uncharted: Big Data as a Lens on Human Culture, however, they are nowhere near so humble about the so-called "big data revolution," nor are they convinced about the value of "bona fide experts.""At its core, this big data revolution is about how humans create and preserve a historical record of their activities. Its consequences will transform how we look at ourselves. It will enable the creation of new scopes that make it possible for our society to more effectively probe its own nature. Big data is going to change the humanities, transform the social sciences, and renegotiate the relationship between the world of commerce and the ivory tower." [2]Well, if for whatever reason this is going to be a contest between capital and academia, or academics versus the "independently curious," then let's hear first from the so-called "ivory tower." The following passage is from Simon Schama's introduction to his The Embarrassment of Riches: An Interpretation of Dutch Culture in the Golden Age :". . . there is nothing especially daring about a working definition of culture drawn from social anthropology. I follow the kind of characterization offered by Mary Douglas of cultural bias as "an array of beliefs locked together into relational patterns." In the same essay, however, she cautions that for those beliefs to be considered the matrix of a culture, they should be treated as part of the [social] action and not separated from it." I have tried to follow this rather Durkheimian command in what is, essentially, a descriptive enterprise that emphasizes social process rather than social structure, habits rather than intuitions. Acting upon one another, beliefs and customs together form what Emile Durkeim called "a determinate system that has it's own life: . . . the collective or common conscience . . . it is by definition diffuse in every reach of society, Nevertheless it has specific conditions that make it a distinct reality." [3]Now, let's hear from the big data revolutionaries:"Consider the following question: Which would help you more if your quest was to learn about contemporary human society--unfettered access to a leading university's department of sociology, packed with experts on how societies function, or unfettered access to Facebook, a company whose goal is to help mediate human social relationships online?""On the one hand, the members of the sociology faculty benefit from brilliant insights culled from many lifetimes dedicated to learning and study."On the other hand, Facebook is part of the day-to-day social lives of a billion people. It knows where they live and work, where they play and with whom, what they like, when they get sick, and what they talk about with their friends. So the answer to our question may very well be Facebook. And if it isn't--yet--then what about a world twenty years down the line, when Facebook or some other site like it stores ten thousand times as much information, about every single person on the planet?" [4]Aside from the vague and uninformed illogicality that pervades Uncharted, I am particularly struck by the air of self-congratulatory triumph that permeates the entire book, suggesting that big data has already won--hands down.Why are so many enthralled by this stuff? All I can say is, "In the land of the blind, the one-eyed man is king."[1] from Studio 360, Public Radio International, broadcast August 9, 2013.[2] Aiden, Erez; Michel, Jean-Baptiste (2013-12-26). Uncharted: Big Data as a Lens on Human Culture (Kindle Locations 133-137). Penguin Group US. Kindle Edition.[3] Simon Schama. The Embarrassment of Riches: An Interpretation of Dutch Culture in the Golden Age. New York: Random House, 1987, p. 9.[4] Aiden, Erez; Michel, Jean-Baptiste (2013-12-26). Uncharted: Big Data as a Lens on Human Culture (Kindle Locations 185-189). Penguin Group US. Kindle Edition.
T**S
I found this statistical approach fascinating and an update on things I worked on myself.
As someone who was employed more or less full time since my college days making computers do what other people, researchers in academic institutions, computer timesharing vendor, then software and services company, and finally modern "groupware" clients, wanted to accomplish, I have tracked the emergence of "Big Data" and the automated search refinement algorithms. Back in the 1990's, I worked in the "skunk works" of a fading software development and services company that had pioneered in "touch screen" and automated text based search and retrieval options for businesses. In those days we did not have the vast, inexpensive server farm full of data to mine, so one of my personal projects was what I called "content based garbage cleaning" which used stored search statements to automatically 'throw away" dated material in a data repository, thereby keeping its' size and search speed manageable for the equipment we had available. The kind of work these Harvard social scientists did as a demonstration of what is now possible fascinates me. The one "issue" I have with their approach is a rather esoteric but essential issue that has to do with "sampling bias." The universe of text data which was made available and analyzed suffers from a whole range of possible sources of selection biases. Their interpretations cannot speak authoritatively to the differences in "living" cultures where individuals now use some of the same tools to selectively filter and focus what their culture contains on a day-to-day basis, but only hint at what can be learned from repositories of digitized English language books.
M**R
Quantifying historical change
Google has been creating its own version of the ancient Library of Alexandria by digitising books for its Google Books project. This project has had many obstacles , none greater than American copyright law, which has extended the copyright of a book to 70 years after the author's death. Thus, a large proportion of books published in the 20th century are still under copyright. Despite this, two young researchers at Harvard convinced Google that they could access Google Books in a general way without infringing the copyright. This involved searching the text of these books for ngrams, where a 1-gram is a single word, a 2-gram is two words, a 3-gram is a three-word phrase, etc. The results of their research included the creation of the Google Books N-gram Viewer and this book.Most big data has been collected in the last few decades. Google Books is unusual in being both big data and long data, where long data indicates its historical reach. The authors were interested in the history of change of English grammar, which is a perfect subject for the data held in Google Books. They investigated the frequency over time for the 1-grams "burnt" and "burned" presenting their results in a chart of frequency against time showing how "burned" is taking over. Other questions were posed and the results charted. Near the start of the book the authors are careful to point out that correlation of data does not mean causation. The fact that charts seem to show correlations between n-gram frequencies does not mean that any underlying reason has been discovered; that would require further analysis. However, the N-gram Viewer is an interesting and powerful tool. It can reinforce assumptions. For example, looking at "London" and "New York" from 1800 to the present, "London" has a constant frequency throughout the period but "New York" overtakes it in about 1900. "France" is overtaken by "China" in the late 1980s. A chart of "Trotsky" using a dataset of Russian books rises rapidly in 1917, falls rapidly in 1956 then flat-lines until 1988; that is, it changes at the onset of the Russian Revolution, the Stalinist purges and perestroika.THE BOOK has 212 pages plus a 24 page Appendix containing n-gram charts, with 2 charts per page, a 30 page Notes section and a short Index. Other charts are scattered throughout the text. They have a hand-drawn quality, which the authors say is deliberate and was inspired by the xked Web comic style. When there are multiple plots on a chart they are often differentiated in greyscales rather than by line pattern, suggesting that the originals were produced in colour. This makes it difficult to tell the lines apart.CONCLUSION: The book's subtitle is "Big Data as a Lens on Human Culture". This is a little misleading. If you are looking for a technical account of investigations on big data you will be disappointed. A better subtitle would be "Non-technical Adventures in Google Books". The writing style is chatty and humorous, which can be off-putting. They explain things carefully and in detail. This is irritating if you already understand the ideas e.g. half-life, statistical bias, false positives and false negatives; but it can also be of use if these things are new to the reader e.g. Zipf's Law , Andvord's cohort method, Ebbinghaus's learning curve. However, the book is easy to read and to understand. No prior technical knowledge is needed. The first three chapters give the background to the story. The following chapters describe the investigations undertaken by the authors. I was expecting this book to be physically bigger and to be more technical. Somehow, big data suggested this. Instead I found an interesting, non-technical account of a new tool. It looks like a lot of fun to use and it is capable of delivering unexpected results, some of which will prompt further investigation and may provide new insights.LINKS: The results of the authors' research were published in a paper in the scientific journal Science titled "Quantitative Analysis of Culture Using Millions of Digitized Books"; the abstract of this paper can be found online. Google created the Google Books N-gram Viewer and made it available on the Internet. The authors also created a web page about their work, called culturomics. See Comment for links.==================== N-gram frequencies in this review ====================1-gramsfrequency 51: thefrequency 26: afrequency 24: offrequency 17: and, in, isfrequency 12: tofrequency 9: thisfrequency 8: itfrequency 7: are, be, book, for, has, thatfrequency 6: authorsfrequency 5: books, by, they, n-gram, results, whichfrequency 4: can, charts, copyright, on, page, werefrequency 3: at, been, chart, data, frequency, Google, if, its, new, technical, their, viewer, youfrequency 2: about, account, also, an, burned, but, chapters, created, culture, does, e.g., found, further, however, I, interesting, investigations, links, London, looking, mean, non-technical, not, other, over, paper, project, published, rapidly, research, style, subtitle, text, than, these, things, throughout, time, tool, two, understand, use, using, was, Web, where, will, wouldfrequency 1: 2, 24, 30, 70, 212, 1800, 1900, 1917, 1956, 1988, 1980s, 1-gram, 1-grams, 20th, 2-gram, 3-gram, abstract, access, adventures, after, against, already, American, analysis, ancient, Andvord, any, apart, appendix, as, assumptions, author, available, background, being, better, between, big, bigger, both, burnt, called, capable, carefully ..................................... three, three-word, thus, titled, Trotsky, under, underlying, undertaken, unexpected, until, unusual, version, way, when, with, without, word, words, work, writing, xked, years, young2-gramsGoogle Books 7, big data 4, long data 2, New York 2, false negatives 1, false positives 1, learning curve 1, quantitative analysis 1, Russian Revolution 1, statistical bias 1, Zipf's Law 13-gramsLibrary of Alexandria 1 Library of Alexandriamany obstacles"Trotsky"Zipf's Law
W**R
No flow, no glue
While an interesting topic, and while the book contains interesting information, and good examples, the book rambles on in such a way that it becomes boring and confusing. One starts to wonder if they've read the same passages previously, there seems to be no glue that holds the text together, and at the same time, there is no direction or flow to it. Perhaps it best to read a few sections, and then at a much later time, read the next few, etc. and just pay attention to the information conveyed.
G**N
Intelligent - and funny
An amusing and intelligent read that throws new light on how we see our culture. Look forward to reading more on this topic.
A**Y
good quality of book
quick delivery, good quality of book.
I**A
interesante y provocador
Resulta curioso poder analizar como evoluciona la frecuencia de uso de cada palabra (ngram) en el tiempo y su relación con la cultura.. Las consideraciones finales sobre la posibilidad de registrar completamente la vida y pensamientos de un individuo generan una obvia desazón.
Trustpilot
1 month ago
2 days ago