Having achieved organization and familiarity with our variables, we can begin your analysis and look for general trends. Most researchers like to begin this step by taking a broad look at the data. What trends pop out immediately? Which results are most common? Most divergent? Most unexpected? To help you answer these questions, you should start with descriptive analysis, or in other words, your summary analysis.
Summary analysis allows you to describe and summarize your data.
The first part of this stage is just getting a general sense of your data. The following measures are meant to give us a look at how people responded to the survey questions. They tell us about the people in our sample, but not necessarily about the population of people that they represent.
What is a Typical Response?
One of the first things a reader will likely want to know about your survey questions is “how did people typically respond to each question?”
To answer that, we use what are called “measures of central tendency,” or mean, median, and mode. Of these measures, the mean is the one we use most frequently. Mean is what people normally think of as “average.” Although it is the most popular measure of the three, the mean can look a lot less accurate when the data contains uncommonly large or small values (also known as “outliers”).
The median is an excellent measurement to use when our data contains outliers. Since the median is unaffected by the presence of an outlier, it is commonly used to describe topics that are more likely to include values that are far from what is normal in a given population, such as income.
Mean and median, however, are only useful when discussing numeric responses. We are better served to utilize the mode for categories, scales, and similar information. Mode refers to the most frequently occurring value. It is not uncommon for a question to have two or more modes if two or more responses are equally the most popular. It is also not unusual for some analytic programs to only report one mode, even if another category or number appears in the data just as frequently. So, if you are describing categories or scale responses, it is often better to discuss the frequencies that all the categories were selected.
How “Popular” are the Responses?
For questions that ask people to select between items or one or more items, one of the more intuitive ways we have to understand the responses we receive is to demonstrate how frequently each item was selected. Although magazines and informal venues will often give a “top five,” “top ten,” or similar list, that is not likely to be very helpful to the reader without a little more information.
When assessing multiple-choice questions, it is a good idea to document how popular each option in your list was. You can do this through a raw count of how many respondents selected each item. That can be helpful but not necessarily meaningful, especially with large samples where many of the responses could look “popular.” It is often more helpful to provide what percentage of respondents chose each item. That will help give clearer insight into the people in your sample. Though, if you are using percentages, make sure to tell how many people responded to each question. Otherwise, if you mention that 100% of surveyed respondents chose a particular option, it may be hiding that only one or a few people even answered that particular question.
If you have information about your respondents, such as their vocation, income category, or where they get information they trust, then you can get an even more nuanced view of the people who responded to your survey. You can break down the responses by group to explore the similarities and differences in their response. One common way to do this is by using pivot tables or creating subsets of the data by category.
How Varied Were the Responses?
You can analyze your data by looking at how varied the responses to your questions were. This can be determined via measures like range and standard deviation.
Standard deviation describes how varied responses were by calculating how different, on average, each response was from the average response. Smaller values of standard deviation indicate that people were fairly consistent in the way they responded to the questions, whereas larger values demonstrate less consistency. Even so, that will not necessarily tell much about what the extremes of the responses were. For that, we use range.
Range is expressed as either the lowest and highest numeric responses, or the difference between the two. Presenting the lowest and highest response values will give your readers a sense of where the extreme responses were. By evaluating the highs and lows, you can demonstrate whether people tended to be varied across the scale in their responses, or if they tended to answer only on one end of a potential spectrum. By presenting the difference between the lowest and highest response values, readers will quickly see only how far apart the most extreme values were.
Ethics check
If people tend to answer on only one end of a scaled response questions, it may indicate that the question is biased toward one type of response.
Sometimes, measures of variation will inspire more questions in addition to answers. If you see a lot of variation, you may wonder whether there is an error in how the responses were recorded. People may sometimes hit an extra digit when entering their age or some similar type of response. In other cases, one or more survey participants may have been unusual (an outlier) in terms of the population you are describing.
You may find it helpful to visualize the responses using histograms, boxplots, or stem-and-leaf plots if you are interested in assessing whether there are any potential outliers or actual problems with your data.
Personal Project
Use this information about summary analysis to get a big picture look at your data. From there, refer back to your research questions and think about what specific comparisons you can make to get the most detailed answer to your question and the most interesting information out of your survey.