Tutorial:Descriptive Statistics

From Howto Wiki

Jump to: navigation, search



Origin provides comprehensive Descriptive Statistics support including basic statistics (mean, median, variance, etc.), frequency counts, and correlation coefficients of data you select. In addition to strong plotting features, Origin's statistical tools help you summarize and analyze your data.

This tutorial will show you how to:

  • Use the Statistics on Column Dialog to calculate descriptive statistics for grouped data.
  • Copy statistical results to a new worksheet for further processing.
  • Unstack Columns to a graph.
  • Analyze data sets with the Correlation Coefficient Tool.

Minimum Origin Version Required: Origin 8.0 SR6

Finding Frequency Information for Groups

Start with some data. We can use the Discrete Frequency Tool to quickly obtain frequency information for groups of data.

  1. Start with a new project or a new workbook. Import the data file \Samples\Statistics\automobile.dat by using Import Single ASCII Image:Button_Import_Single_ASCII.png
  2. Highlight the first two columns. Select Statistics: Descriptive Statistics: Discrete Frequency to open a dialog. Column A and Column B are automatically picked as Input Data. Click OK

Results of discrete frequency are sorted in descending order of Count; the most frequently occurring data will appear first. You can rearrange the results by sorting worksheets even though there are locks on the columns.

Calculating Descriptive Statistics on Grouped Data

Using the Statistics on Columns tool, we can find basic statistics for each group of data.

  1. Switch back to the first sheet.
  2. Select Statistics: Descriptive Statistics: Statistics on Columns to open the Statistics on Columns dialog.
  3. Open the Range 1 branch and click the interactive button Image:Button_Select_Data_Interactive.png. The dialog will "roll up" and you can set Data Range as Column C ~ Column G by selecting C(Y) and dragging to G(Y) in the Worksheet. Click the button in the rolled up dialog to restore the dialog. To set Group Range to B(Y): Make, click the triangle button Image:Button_Select_Data_Right_Triangle.png next to Grouping Range and select B(Y) : Make.
  4. Here, we will show how to make a box plot for the grouped data and put all groups in a graph for a quick comparison. Do the following: 1) Expand the Output Settings branch and the Graph Arrangement sub-branch. Select the Arrange Plots of Same Type in One Graph check box. 2) Expand the Plots branch, and select the Box Charts check box.

    Image:Tutorial Descriptive Statistics on Grouped Data 001.png
  5. Click the OK button to get the results in a report sheet.

    Image:Tutorial Descriptive Statistics on Grouped Data 004.png

You can double-click to open the graph containing the box plot and customize the graph. Click the Close button on the graph to restore the modified graph to the Report Worksheet.

Using Statistical Results for Further Operations

After using the Statistics on Columns dialog to produce a report tree, you may wish to do further analysis and plotting on the statistical results.

For example, to get average attribute values (i.e. horsepower, 0-60 mph time, weight, mileage) by vehicle Make from 1992 to 2004, perform the following:

  1. In the report sheet, right-click on the title of the Descriptive Statistics table and select Create Copy as New Sheet from the short-cut menu.

    Image:Tutorial Descriptive Statistics on Grouped Data 005.png
  2. When the new sheet is active, select Worksheet: Unstack Columns.
  3. In the dialog that comes up, set columns D and E as Data to be Unstacked. Since the triangle button Image:Button_Select_Data_Right_Triangle.png fly-out menu supports only one selection, you need to use the interactive button Image:Button_Select_Data_Interactive.png.
  4. Set column A as Group Variables.
  5. Select the Include Other Columns check box and set Other Columns to column B.
  6. Set Put Grouping Info. to to Long Name. Click the OK button.

    Image:Tutorial Descriptive Statistics on Grouped Data 006.png
  7. In the result of Unstack Columns, we get the mean and standard deviation of Power, 0~60 mph time, Weight, Gas Mileage and Engine Displacement for the 18 different car makes.
  8. Highlight the whole result worksheet. Select Plot: Multi-Curve: Stack from the main menu.
  9. In the pop-up dialog, all columns in the worksheet are automatically set as Input. Set Plot Type to Scatter and click the OK button.

    Image:Tutorial Descriptive Statistics on Grouped Data 0055.png

In the above screenshot, the top X-Axis Tick Labels have been rotated 45 degrees for clarity. To do this, double-click on the tick labels to open the X-Axis dialog. Set the Rotation on the Custom Tick Labels tab.

Analyzing the Relationship between different Indicators

We can use a correlation coefficient to explore the relationship between columns of our automobile data. In addition, we can plot a scatter matrix with a confidence ellipse to get a graphical representation of the correlation.

  1. Go to the original worksheet with the source data. Highlight the last five columns.
  2. Select Statistics: Descriptive Statistics: Correlation Coefficient from the Origin menu to open the Correlation Coefficient tool. Note that Pearson is the default selection. This method is suitable for quantitative data.
  3. Under the Plots branch, select the Add Confidence Ellipse check box. The Scatter Plot check box should then be automatically selected. This means that the tool will create a scatter matrix with a confidence ellipse added to each scatter plot. Click OK.
    Image:Tutorial Descriptive Statistics on Grouped Data 007.png

Note the high positive correlation between Engine Displacement and Power and the high negative correlation between Gas Mileage and Engine Displacement.

Personal tools