This is an expository subject that aims to present the fundamental concepts of Exploratory Analysis – thus allowing the understanding of the main mathematical and statistical tools used in understanding and analyzing data sets – and the main concepts of Data Visualization – in order to develop the training for the construction of good graphics, both exploratory and explanatory/communicative. In this sense, the subject will cover the following topics: Types of variables. Main measures of centrality and dispersion. Data cleaning, missing data, outliers. Descriptive statistics. Hypothesis testing. Clustering: k-means and hierarchical clustering. Types of graphs. Interactive graphics. Design principles and presentation of results. Tools: R: ggplot2, ggthemes, skisse. Python: matplotlib, plotly, seaborn, and bokeh Tableau and Power BI.
Basic Information
Mandatory:
- Mandatory: Tukey, J. Exploratory Data Analysis. Pearson. 1977
- Wes McKinney. Python Data Analysis, O'Reilly. 2017
- Cole Nussbaumer Knaflic. Storytelling with Data: A Data Visualization Guide for Business Professionals. Wiley, 2015
Complementary:
- Philipp K. Janert. Data Analysis with Open Source Tools: A Hands-On Guide for Programmers and Data Scientists. O'Reilly, 2011.
- Osvaldo Martin. Bayesian Analysis with Python. Packt. 2016
- Shai Vaingast. Beginning Python Visualization: Crafting Visual Transformation Scripts. Apress. 2014.
- Petrou, Theodore. Pandas Cookbook: Recipes for Scientific Computing, Time Series Analysis and Data Visualization
- Rossant, Cyrille. IPython Interactive Computing and Visualization Cookbook