http://www.ted.com/talks/susan_etlinger_what_do_we_do_with_all_this_big_data
I really needed her level set. Big Data is overwhelming. Yes, there's a lot of information out there. Yes, it's amazing that we suddenly have access to it all. Yes, there are 100 Big Data jobs for every single applicant. Yes, zettabyte is a real word. Yes, we create the same amount of data every ten minutes in 2015 that were created in history up to the year 2003. Yes, we're awesome. So what?
What do we do with all this information? Why is it useful? The article titled, "The Promise of Big Data" by the Harvard T.H. Chan School of Public Health showed an extreme usefulness to Big Data. Who WOULD'T like to prevent TB patients from dying when access to Big Data might reveal a cure? Contrarily, The Consumerist article about SAT test prep price differences due to poorly programmed Big Data pointed to an extreme negative. After all, I DO live in an affluent Dallas suburb with three children who have taken or will take the SAT. I don't like that I may have paid more for one of their SAT prep courses.
The benefits of Big Data from an genomic and/or medical perspective are pretty clear. But is that how the majority of Big Data are being used? Or is Big Data primarily being sold and culled with the end result of my privacy being compromised? The later seems to be the case. The uber-paranoid article, "The hypocrisy of the Internet journalist' seems to prove my theory (a weird, ego-centric article, by the way).
But I come back to Susan Etlinger, a strong proponent of Big Data. But she's very clear that Big Data, consisting of images, text, video, and audio files, require context to be useful. As we've studied throughout the semester, data itself, let alone how it's collected and its collection framework, is entirely up to interpretation and, conversely, misinterpretation. A 2014 article by Nicole Fallon on Big Data in Business News Daily states, "technology does the tedious work of pulling the data apart and categorizing it, so humans can spend time truly deriving insights and taking action."
Etlinger mentions an "accident truth" in a misquote by Ronald Reagan during the 1988 Republican National Convention. He stated that "facts are stupid things." He meant to quote John Adams, "Facts are stubborn things." Well, facts ARE stupid; as Ms. Etlinger states, "Data doesn't create meaning; we do."
All that being said, Big Data is helpful in developing cures for diseases and understanding societal behaviors and trends (there's good and bad in that). And Big Data is certainly inevitable. So how do we tame it? Regarding the usefulness of Big Data, Etlinger is a proponent of, "post hoc ergo proctor hoc." Meaning, "just because something happens after something else has happened does not mean that second thing had anything to do with the first thing" (Huffington Post article definition). We need to collect data while honoring people's privacy and giving consumers an "opt in" option. We need to speak clearly about our hypotheses and conclusions. And, importantly, we need to "show our math" when drawing conclusions. These steps will help increase participation and hence, the amount of data, as well as expose any fundamental framework and process flaws. Like with high school Algebra, showing our data work will reveal where we may have broken down.
In an interview with Science Node, Etlinger explained further:
"The key, ultimately, is context– and, without getting too metaphysical, what constitutes context is contextual. Ultimately I think it comes down to clarifying the desired research question, and making it as specific as possible:
- Is there a link between X and Y?
- Can we detect whether it is correlative or causal?
- What is the likely impact of X on Y population?
- With what confidence level are we able to determine this?"
But how do we ensure that everyone culling Big Data is using the same set of principles? We can't, really.
Great Article
ReplyDeleteIEEE Projects for CSE in Big Data
Final Year Project Domains for CSE