/    /  Statistics – Data visualization

Statistics – Data visualization:

The technique used to convert a set of data into visual insight is known as data visualization. The main aim of data visualization is to give the data a meaningful representation. To create an instant understanding from multi-variable data, it can be displayed as 2d or 3d format images with techniques such as colorization, 3D imaging, animation and spatial annotation.

According to Fisher, Data visualization is “the visual interpretation of complex relationships in multidimensional data.” Presenting data in a way that is pleasing to the eye and has the ability to inform and provide value to the user.

A pivotal concept behind the introduction of data visualization is the aim to produce statistical stories. An audience is more likely to learn an idea within a story rather than remember the actual data. By visually displaying data there are many more opportunities for this to occur. Representation of Statistical stories should be able to grab a user’s attention, invoking the  thought, being informative and ideally entertaining.

The primary objective of any statistical story should be to inform its audience and be newsworthy. It must use the statistics available to provide substance and stimulate interest. It should seek to delve through the large pool of data and only surface those details which will be useful and pertinent to the needs of the user. Once this data has been uncovered the next step must be to ensure that the presentation of the story is in a format that is understandable and easy to use. All statistical stories have a target audience and it is critical that their needs are considered.

There are many different tools for visualizing statistical data. They may be open source or we can purchase .some of the most heard tools mostly open source are :

Excel:

You can actually do some pretty complex things with Excel, from table of cells to scatter plots. As an entry-level tool, it can be a good way of quickly exploring data, or creating visualizations for internal use, but it has limited default set of colors, lines and styles. Excel is part of the Microsoft Office suite, so if you don’t have access to it, Google’s spreadsheets can do many of the same things.

R:

R is free software environment developed for computing statistics and graphics. A statistical package used to parse large data sets, R is a very complex tool, but has a strong community and package library, with more and more being produced. The learning curve is one of the steepest of any of these tools listed here, but you must be comfortable using it if you want to get to this level.

Tableau:

We can create and share data in real time with Tableau. Tableau public is a popular data visualization tool which is completely free is packed with graphs, charts, maps and more helping users can easily drag and drop data into the system to update in real-time, there by collaborating with other team members for quick project turnaround.

There is different presentation for qualitative and quantitative data variables.

Qualitative variables:

The Bar Chart (or Bar Graph) is one of the most common ways of displaying categorical/qualitative data. Bar Graphs mainly holds 2 variables, response (dependent variable) and predictor (independent variable) which can be arranged on the horizontal and vertical axis of a graph. The relationship of the predictor and response variables is shown by a mark of some sort (usually a rectangular box) from one variable’s value to the other’s

Example :

A survey of 145 people asked them “Which is the nicest fruit?”:

Fruit:AppleOrangeBananapomegranateGrapesMango
No. of votes:30251552035

Lets see how we can use Excel tool to represent this data as bar chart

Open Microsoft Excel and enter the table. Select the table which gets highlighted.

DATA VISUALIZATION(i2tutorials.com)

Insert menu->Bar->2-D Bar

Bar chart excel(i2tutorials.com)

We get Bar chart displayed horizontally.

Bar chart (i2 tutorials.com)

A histogram is similar to a bar chart but is used for continuous data. Usually, there is no space between adjacent columns. The columns are positioned over a label that represents a continuous, variable. The column label can either be a single value or a range of values. The size of the group can be equal to the height of column. The area covered by each bar is proportional to the frequency of data.

A Pie-Chart/Diagram is a graphical device – a circular shape broken into sub-divisions(sectors). These sector areas are proportional to the divided parts. The sectors may be colored differently to show the relationship of parts to the whole.

Lets check out pie chart version for this data

Fruit:AppleOrangeBananapomegranateGrapesMango
No. of votes:30251552035

 

A line graph can be, for example, a picture of what happened by/to something (a variable) during a specific time period (also a variable). Usually a line graph is plotted after a table which shows the relationship between the two variables in the form of pairs. Just as in 2Dgraphs, each of the pairs results to a specific point on the graph, and are connected to one another forming a LINE.

Example:

Sales of ice creams over a week in the month of august are:

Day:MondayTuesdayWednesdayThursdayFridaySaturdaySunday
Sales in Rs.75060080099010751125985

Lets see line graph  in excel:

line graph in excel 1(i2tutorials.com)

 

line graph in excel(i2tutorials.com)

Scatter Plot is used to show the relationship between 2 numeric variables. A scatter plot matrix is a collection of pair wise scatter plots of numeric variables.

Example:

we have height and weight data for students of a class as follows

Height

In cms

Weight

In kgs

18072
17871
17069
15065
14550
16560
16258
15848

Data variables 1(i2tutorials.com)

 

Data variables (i2tutorials.com)

We can choose apt presentation for the type of data variables.