# How to read survey data

Some of our data comes from surveys. Surveys take responses from a sample rather than the whole group.

For example, a survey might ask 10% of parents their opinion on changing the curriculum rather than asking all parents.

The quality of survey data depends on:

- how people were selected
- the number of people surveyed (the ‘sample size’)
- how the questions were asked

Once survey data has been collected and processed, analysts can use techniques to make it more representative or assess how reliable the sample is.

## How surveys are completed

The method of a survey can affect the accuracy of the answers given, influencing results.

People are usually more honest when completing surveys about sensitive topics without an interviewer present, for example when giving details about smoking or drinking.

People are also more likely to make mistakes when responding to online or paper surveys. This is less likely with face-to-face or phone surveys because people can ask for things to be repeated.

## Making survey data more representative

Different techniques can be used to make survey data more representative of the whole group, and to assess the reliability of estimates and differences.

‘Weighting’ adjusts the results of a survey so they better represent the whole group.

A survey of 25 women and 75 men may not reflect the views of the general population (which is 50% men, 50% women).

Analysts can ‘weight’ the results to more accurately represent the split of men and women in the general population.

Survey weights help to make sure the survey sample has broadly the same gender, age, ethnic and geographic proportions as the general population.

## Confidence intervals

A confidence interval is a range of values that the 'true' value for the whole group is highly likely to lie within. The estimate is the central point of this range.

Example:

Data from 2018 shows that 15.0% of White adults smoked cigarettes. This is an estimate based on a sample of around 135,000 people. The 'true' figure was very likely to be between 14.7% and 15.2% – the upper and lower bounds of the confidence interval.

A confidence interval of 95% means if we took 100 random samples, the estimate would fall within its upper and lower bounds 95 times. Confidence intervals of 90% and 99% are also sometimes used.

### Using confidence intervals

The wider the confidence interval, the less reliable the estimate. Wide confidence intervals can be the result of small samples, or large variation within survey responses.

Example:

Data from 2018 shows that 7.7% of adults from the Chinese ethnic group smoked cigarettes, based on a sample of 729 people. There is greater uncertainty about the estimate because of the small sample size. This is reflected in a wider confidence interval, of between 4.9% and 10.5%.

Confidence intervals can be used to find out if differences between the statistics for 2 groups are meaningful, for example differences between 2 ethnic groups or 2 time periods.

Differences are meaningful ('statistically significant') if the confidence intervals for the 2 sets of statistics don't overlap, meaning they don't have any values in common. If they overlap, this shows that the difference in the 2 estimates is not meaningful. This means the findings cannot tell us if the difference found in the sample is also shown in the whole population.

## Combining multiple years of data

We sometimes combine multiple years of data to make sure estimates are reliable. We might do this if the number of people surveyed in a single year was small.

Example:

Our estimates for pensioner income are based on 3 years of data. The percentages shown are an average for a 3-year period.

Estimates are updated every year, which means time periods will overlap. For example, data from April 2016 to March 2017 appears in the 3-year averages for:

- April 2014 to March 2017
- April 2015 to March 2018
- April 2016 to March 2019

We advise against comparing data for time periods that overlap. This is because the samples are not separate from each other – some of the people surveyed are the same.

Estimates based on multiple years of data can be tested for significance so we know that differences are reliable. For example, we can test for differences between ethnic groups within the same time period. We can also test for significance across time periods that do not overlap.

If significance testing cannot be done, we only comment on differences that are large enough to be reliable.

## Commonly used surveys

### Population

The Census of England and Wales is carried out every 10 years. Every household has to complete it.

Around 23.5 million households completed the last Census in March 2011. Find out how we use Census data on our website.

The Annual Population Survey is a continuous household survey which covers the UK. Around 320,000 people aged 16 and over take part.

The topics covered include:

- employment and unemployment
- housing
- ethnicity
- religion
- health
- education

### Housing

The English Housing Survey is a continuous national survey.

The survey collects information about people’s housing circumstances, and the condition and energy efficiency of housing in England.

Around 13,300 households a year take part.

### Crime

The Crime Survey for England and Wales asks people about their experiences of crime. It involves a structured interview which usually takes place in respondents' homes.

The types of crime covered include:

- fraud
- online crime
- theft
- assault

Around 35,000 households take part every year.

### Travel

The National Travel Survey collects information on how, why, when and where people travel, and things that affect travel, such as access to a car.

Around 16,000 people aged 16 and over in 7,000 households in England take part every year.

### Culture, media and sport

The Taking Part Survey collects data on people’s engagement in:

- art
- museums and galleries
- archives
- libraries
- heritage
- sport

Around 8,000 people were interviewed for the 2018 to 2019 survey.