Coding and categorization are two essential elements that contribute to qualitative data analysis. You will find these processes helpful both for identifying themes in your data and interpreting your findings in order to write up a conclusion. Now, let’s familiarize ourselves with some approaches to categorization and coding.
Coding data
Coding is the process of combining data to identify themes, ideas and categories and then attaching a label to related or similar segments of the data. You can then use these labels to compare passages, detect patterns, and build an overarching theory (Gibbs and Taylor, 2010). You may think of coding as identifying a person in a collection of photos and then using the labeled photos to tell the person’s story. And because coding is an iterative process, you could start doing it as soon as you begin to collect data and plan for data analysis.
As we discussed in Guiding Theories for Analysis, we often code the data in multiple cycles to make sure we end up with a representative and universal set of codes. Let’s look at some common techniques we can use in each cycle to achieve that goal:
- In the First Cycle of coding, you will select portions of data that range from a single word to large sections and assign them a primary code (Saldaña, 2013). This coding phase aims to capture your initial understanding of the selected data through the lens of your chosen coding approach. After completing this initial coding phase, you are ready for the next cycle.
- The Second Cycle of coding brings with it the activities of the first cycle while also reorganizing and further developing your codes. Often during this second phase, you will use subcoding to condense your original number of primary codes into ones that best represent your data overall. In addition, the rearrangement and reclassification of your codes allow you to take your newfound interpretation and develop categories from it.
Now that you have learned a bit about coding, let’s explore some different approaches to carrying it out. The following techniques describe some of the most common ways data is coded:
Coding Techniques
Generating the first codes
Your initial set of codes is the starting point for your analysis. You could either generate them from themes you want to examine even before looking at the data, or let them emerge from your observation of the data.
In the first approach, you set aside your prior ideas and stick to what is present– or grounded– in the data. You will accumulate and revise the codes as you move through the dataset until you have a set of codes representative of the entire dataset. This is the approach suggested by grounded theorists. Grounded codes reflect the current themes and patterns in the dataset, but they can be laborious to generate and may not always be the most relevant to your research question.
In the second approach you create a priori codes based on ideas from your research question, previous research or theory on the topic, questions in your interview guide, and even your notes from the data collection process. The flexible coding theory we discussed earlier would support this approach. A priori codes allow us to focus on themes that are most relevant to the research question, but we also want to revise them to incorporate unexpected patterns that emerge as we code the data.
Descriptive Coding
When we retell a story or a conversation to others, we often start by summarizing the main events that happened in it. Similarly, we can start coding our data by first summarizing each section of it with a noun or phrase. These descriptive codes answer questions like: What are the people doing and saying? What concepts do they use in their statements? What is the outcome of their actions or beliefs? (Gibbs and Taylor, 2010) They let us quickly grasp the content of each section, and so are the starting point for later analysis where we look at the implied contexts and social structures that drive the events.
You can try creating codes by yourself on this snippet of an interview from Burnard et al. (2008):
Try it yourself: Welcome to the qualitative data coding sandbox!
Have fun coding this interview segment! You can use your cursor to 🖍 highlight or remove highlight from parts of the text. Your most recent highlight will appear in the first entry box below. Then, you can assign a code to it in the second entry box and click 🔘 to add it to your list of codes. Feel free to experiment with a priori codes, grounded codes and descriptive coding before looking at our sample work.
Interviewer: ‘Can you tell me about what you like to eat?’
Child: ‘I like crisps, chips, sweets. I like sweets and chocolate the most. I like apples, grapes and oranges. Oh and pizza, I really like pizza.’
Interviewer: ‘What do you like about those things?’
Child: ‘…Well the apples and the other fruit I just really like the taste and they are healthy I suppose. We eat those in school now and my friends like them, so I eat them with my friends.
‘I really like sweets and chocolates though, they are my favorites but I know they aren’t really good for you. If you eat too many they can be bad for your teeth. They can make them go brown or drop out.’
Your Codes:
Organizing Codes
Now that we have descriptive codes to summarize each piece of our data, we can either organize them into larger categories or create subordinate codes, called subscodes, that embody more precise, detailed information in them.
By adding subcodes, you can identify the relationships between more specific segments of the data and their relationship to the parent code(s). For example, in the piece of data labled with “time”, we could further point out segments corresponding to “examples of time commitment”, “contexts of devoting time”, and “causes of devoting time”. We can arrange the data hierarchically by grouping these subcodes under their parent code, like branches from a tree (Gibbs and Taylor, 2010). That is why this approach is also called tree coding. You may find this arrangement helpful when you want to interpret your findings theme by theme and find examples to support each.
The alternative to subcoding is to present all the codes in a non-hierarchical arrangement, like a list. In our example, this approach called flat coding may produce codes like “controlling pace in class”, “devoting time outside of class”, “offering opportunities”, and “passion for work”.
Personal Project
Memos
If you feel overwhelmed by the amount of analysis work and to keep track of the most noteworthy codes and segments of data, you can consider writing memos. As a “note to self”, memos help us keep track of important insights that can otherwise fade quickly from our memory.
We can also record in a memo what we were thinking when we made changes to codes or code categories. It is easy to think that we will remember what it was that drove us to make these decisions. But, again, those thoughts fade as we immerse back into the text.
One last advantage of memos is that they make it a lot easier to communicate what you are doing to others when you are working as a group. Just as you cannot expect to recall what drove a small decision two or more days ago, it is also unreasonable to assume that your colleagues will automatically understand anything new that you introduce. Memos make it easier to get such things across to others.
Code mapping
From time to time, you may want to step back from the data to examine your progress and how your established codes are interacting with each other. One way to do this is through code mapping, where you visualize the relationship between your codes. Depending on how you organize the codes, this technique could involve mapping out a hierarchical diagram for tree coding or a list of codes and their definitions for flat coding. Having your analytical thinking visualized in this way will make it easier to develop a theoretical framework as you begin to analyze these concepts.