Exploratory Data Analysis
|
In his seminal work on the subject, Exploratory Data Analysis (Addison-Wesley, 1977),
John Tukey proposed a new approach to data analysis, based heavily on visualization,
as an alternative to classical (mathematical) data analysis.
Being dependent on graphics, this approach only became practical with
the advent of modern computers. However, in addition to advocating the graphical
techniques of visual data analysis, he proposed the methodology of data exploration,
a methodology in which a model of the phenomena might be inferred instead of pre-imposed.
It is this powerful combination that led him to coin the phrase "exploratory data analysis",
commonly referred to simply as "EDA".
|

John W. Tukey
1915-2000
|
The exploratory approach is very appropriate for data analysis because it allows you
to explore your data with an open mind. The graphical techniques of visual exploration,
in combination with your natural pattern-recognition capabilities and knowledge of the
subject, facilitate the discovery of the structural secrets of your data.
Tukey suggests that you think of exploratory analysis as the first step in a two-step
process similar to that utilized in criminal investigations. In that first step,
you search for evidence using all of the investigative tools that are available.
In the second step, that of confirmatory data analysis, you evaluate the strength
of the evidence and judge its merits and applicability. It is in this second
step that you would likely evaluate the model(s) which you have inferred during
your exploration and likely apply the techniques of classical data analysis.
Please note that it is common, though incorrect, to think of exploratory data
analysis as being synonymous with visual data analysis. This is a shortcoming
which is common in many tools, too. A tool does not support exploratory data
analysis by simply providing graphing capabilities. It is only with the addition
of an exploration paradigm that this methodology is truly facilitated by a tool.
VisiCube
VisiCube is such a tool. It supports both data exploration and visual data analysis.
In fact, it is a pure exploratory data analysis tool.
It does not include features which apply only to the other competing methodologies
(such as data mining or mathematical data analysis).
It is limited so that it can be used naturally and easily.
Unlike some competitors, it is not a collection of unrelated analysis tools.
More info
For a wonderful and extensive description of EDA,
I suggest you refer to the EDA section of the
Engineering Statistics Handbook
provided online by the National Institute of Standards.
|