The Health Poverty Index (HPI) Visualisation Tool Background Image Background Image Background Image
   
Welcome to the HPI tool
HPI Indicators
HPI Demonstration
HPI Background
The HPI Tool

All Indicators > Indicator IB1: Lifestyle

Definition Measures of healthy lifestyles
Dimension Intervening factors
Sector Behaviours and environments (individual)
Components
  • IB1_1 Smoking prevalence
  • IB1_2 Fresh fruit intake
  • IB1_3 Alcohol abuse
  • IB1_4 Drug misuse
  • IB1_5 Doing under 3 hours of physical activity in a week
Source Various – see component details

Component IB1_1: Smoking prevalence

Definition Modelled estimate of the proportion of cigarette smokers
Source

2001: Health Survey for England, 2001 (Joint Survey Unit of the National Centre for Social Research and the Department of Epidemiology and Public Health, University College London/Department of Health), General Household Survey 2000-2001 (Office for National Statistics) and the Omnibus Survey, Jan, Mar, April, June, July, Sep, Oct and Nov 2001 (Office for National Statistics) (See: Health Survey for England)

2001 Ethnic: Health survey for England, 1998-2001 (Joint Survey Unit of the Nation Centre for Social Research and the Department of Epidmiology and Public Health, University College London/Department of Health) (See: Health Survey for England)
2003: Health survey for England, 2001-2003 (Joint Survey Unit of the Nation Centre for Social Research and the Department of Epidmiology and Public Health, University College London/Department of Health) (See: Health Survey for England)

Additional details

In the absence of any suitable administrative or census data, survey data was the only source of information available to construct an indicator of smoking. However there are a number of problems associated with using survey data to produce Local Authority District (LAD) estimates, including small or non-existent samples in some areas leading to large variances and unstable estimates and biases introduced by particular sampling strategies.

A great deal of work, particularly in the last twenty years, has gone into addressing these issues. Although a number of different approaches have been used, all the methods tend to fall somewhere on a continuum between using direct estimates, suitably weighted for sample design, and a modelling approach using local area covariates to estimate the indicator of interest. Some are based on only one or other of the methods. However the two methods each have their own particular problems. Direct estimates, weighted as necessary, are unbiased but may have large variances; on the other hand the modelled estimates will have small variances but will be biased. Hence many estimates attempt to combine information from both in order to solve the common problem of minimising the Mean Square Error of the final estimate.

The method used in the HPI required that a well-fitted micro level model could be identified. It also assumed that the important ways in which a group may have been over-sampled in a survey sample can be captured by covariates available in the survey and at a small area level. It involved combining all surveys available for the required year with the necessary dependent and independent variables (e.g. socio-economic status, age, gender and ethnicity).

Data were gathered from the Health Survey for England (HSE, 1998-2003) for estimates for both the whole population and for the ethnic groups. In addition for the 2001 whole population estimate, the General Household Survey 2000-2001 a long with all eight phases of the Omnibus Survey were used to create the dataset of smokers (see below).

Survey point Section  
Jan 2001 M210_1 210: Consumption of tobacco
Mar M210_1 210: Consumption of tobacco
April M210_1 210: Consumption of tobacco
June M210_1 210: Consumption of tobacco
July M210_1 210: Consumption of tobacco
Sep M210_1 210: Consumption of tobacco
Oct M130_2 130: smoking
Nov M130_2 130: smoking

The questions used from the surveys were:

  • Omnibus “Do you smoke at all nowadays?”
  • GHS “Do you smoke cigarettes at all nowadays?”
  • HSE “Do you smoke cigarettes at all nowadays?”

Less than 3% of the smoking population smoke only pipes, so the bias introduced by not having an ‘all smoking’ question on the omnibus survey was not believed to be great.

In 1999, the focus of the HSE was the health of minority ethnic groups as a means to increase understanding through the monitoring of trends and by enabling us to make predictions. For this purpose a boost sample was designed in order to yield interviews with members of the most populous six minority ethnic groups: Black Caribbean, Black African, Indian, Pakistani, Bangladeshi, Chinese, and Irish. For the purpose of this estimate, Irish is included under White, and the Black African group added (although this sample was not boosted, hence the low numbers). The table below shows the number of ethnic groups available for each year that were used in the modelling of estimates for ethnic groups:

    Year Total
Ethnic Group  
1998
1999
2000
2001
White
18019
10437
8851
17322
54629
Black Carribean
183
2029
143
296
2651
Black African
143
73
98
172
486
Indian
321
1909
203
287
2720
Pakistani
198
2148
91
225
2662
Bangladeshi
73
1905
64
83
2125
Chinese
39
961
17
37
1054
Total
18796
19462
9467
18422
66327

Only the main, adult sample, and not the oversampled ‘special populations’, was included in the modelling process for the whole population. For the ethnic population estimates, the adult and 1999 ethnic minority boost was used.

Step 1
Using combined survey data, with LAD geocoding, a multi-level, variable intercepts, logistic model was run, with level one being the individual i, level two the primary sampling unit j and level three the LAD k. Covariates from within the survey, shown in lower case, and LAD level data, shown in upper case, were used to predict the individual level behaviour.

Logit (Pijk) = Xijk B + Ujk + Vk + Eijk

Where P is a vector of probabilities associated with individual i in Primary Sampling Unit (PSU) j within LAD k, B a vector of regression coefficients, X a matrix of covariates associated with the individual measured within the survey, U a random vector of area effects associated with the PSU and V the LAD and E is a vector of independent random ‘noise’ elements. The matrix of covariates included PSU area measures, based on aggregated individual level survey counts within the PSU. The covariates for both the whole population and the ethnic groups are given in the tables below:

2001 Total Population - Smoking
    Covariates
  Constant -0.814
Individual effects 20-24 years 0.562
  25-29 years 0.628
  30-34 years 0.467
  35-39 years 0.334
  40-44 years 0.219
  45-49 years 0.246
  50-54 years 0.126
  55-59 years -0.072
  60-64 years -0.214
  65-69 years -0.497
  70-74 years -0.538
  75+years -1.208
  Male 0.087
  Income Support recipient 0.674
PSU area effects Proportion Asian -0.650
  Proportion higher social class -0.609
  Proportion Income Support recipient 0.299
LAD area effects Proportion Income Support recipient 0.847

 

2001 Ethnic Groups - Smoking
    Covariates
  Constant -1.152
Individual effects Bangladeshi -3.921
  Black African -0.824
  Black Caribbean -0.211
  Chinese -1.185
  Indian -1.57
  Pakistani -2.256
  20-24 years 0.468
  25-29 years 0.307
  30-34 years 0.19
  35-39 years 0.09
  40-44 years -0.009
  45-49 years -0.046
  50-54 years -0.143
  55-59 years -0.319
  60-64 years -0.467
  65-69 years -0.596
  70-74 years -0.839
  75+years -1.419
  Male Bangladeshi 4.297
    Black African 0.678
    Black Caribbean 0.596
    Chinese 0.94
    Indian 1.399
    Pakistani 2.107

 

2003 Total Population - Smoking
  Constant -0.724
Individual effects 20-24 years 0.506
  25-29 years 0.463
  30-34 years 0.261
  35-39 years 0.223
  40-44 years 0.180
  45-49 years 0.104
  50-54 years -0.092
  55-59 years -1.158
  60-64 years -0.377
  65-69 years -0.664
  70-74 years -0.731
  75+years -1.422
  Male 0.062
  Income Support recipient 0.820
  Higher Social Class -0.372
PSU area effects Proportion Asian -0.815
  Proportion higher social class -0.594
  Proportion Income Support recipient 0.642
LAD area effects Proportion Income Support recipient 0.304

Step 2
The fixed effects part of each model were then taken and applied to the matrix of small area covariates X held by SDRC for 100% of individuals and LADs across England, the random LAD area effect added (where it was available for an LAD), and the anti-logit applied. The probability was then summed and averaged over the LAD to produce a vector of synthetic LAD level estimates:

Yk = 1 / Nk x Sum ( anti-Logit ( Xijk B + Vk ) )

This method does not use weighting to remove bias in the parameter estimators introduced by unequal selection probabilities in the survey sampling schemes. Instead important characteristics of the sample are included in the model as covariates. The sample indicator variable S will therefore be unrelated to Y conditional on these covariates. In this case the sample can be viewed as uninformative and ignorable. There is little conflict in including theses covariates because they are, by definition, predictors of Y and so should be included in the models. If they were not, the sample design would not bias the standard estimators of the parameters.

Included in our models are measures of non-manual social classes and a ‘level’ for the primary sampling unit. Together these will capture, to a great extent, the unequal selection probabilities associated with the sample design. Other variables such as age will ensure that where a question or measure was taken of only a particular age group in a specific survey year that the estimates will not be biased.


Component IB1_2: Fresh fruit intake

Definition Modelled estimate of the adult population consuming less than 5 portions of fruit and vegetables a day
Source 2001, 2001 Ethnic, 2003: Health Survey for England, 2001, Joint Survey Unit of the National Centre for Social Research and the Department of Epidemiology and Public Health, University College London/Department of Health

Additional details

In the absence of any suitable administrative or census data, survey data was the only source of information available to construct an indicator of fresh fruit intake. However there are a number of problems associated with using survey data to produce Local Authority District (LAD) estimates, including small or non-existent samples in some areas leading to large variances and unstable estimates and biases introduced by particular sampling strategies.

A great deal of work, particularly in the last twenty years, has gone into addressing these issues. Although a number of different approaches have been used, all the methods tend to fall somewhere on a continuum between using direct estimates, suitably weighted for sample design, and a modelling approach using local area covariates to estimate the indicator of interest. Some are based on only one or other of the methods. However the two methods each have their own particular problems. Direct estimates, weighted as necessary, are unbiased but may have large variances; on the other hand the modelled estimates will have small variances but will be biased. Hence many estimates attempt to combine information from both in order to solve the common problem of minimising the Mean Square Error of the final estimate.

The method used in the HPI required that a well-fitted micro level model could be identified. It also assumed that the important ways in which a group may have been over-sampled in a survey sample can be captured by covariates available in the survey and at a small area level. It involved combining all surveys available for the required year with the necessary dependent and independent variables (e.g. socio-economic status, age, gender and ethnicity). Data were gathered from the Health Survey for England (HSE) (1998 – 2001). Due to changes in the question asked in the HSE, resulting in inconsistency in the definition used over time, the data has been frozen, and one model used.

In 1999, the focus of the HSE was the health of minority ethnic groups in order to increase understanding through the monitoring of trends and by enabling us to make predictions. For this purpose a boost sample was designed in order to yield interviews with members of the most populous six minority ethnic groups: Black Caribbean, Black African, Indian, Pakistani, Bangladeshi, Chinese and Irish. For the purpose of this sample, Irish is included under White, and the Black African group added (although this sample was not boosted, hence the low numbers). The table below shows the number of ethnic groups available for each year that were used in the modelling of estimates for ethnic groups:

    Year Total
Ethnic Group  
1998
1999
2000
2001
White
18019
10437
8851
17322
54629
Black Carribean
183
2029
143
296
2651
Black African
143
73
98
172
486
Indian
321
1909
203
287
2720
Pakistani
198
2148
91
225
2662
Bangladeshi
73
1905
64
83
2125
Chinese
39
961
17
37
1054
Total
18796
19462
9467
18422
66327

For the ethnic population estimates, the adult and 1999 ethnic minority boost was used.

Step 1
Using combined survey data, with LAD geocoding, a multi-level, variable intercepts, logistic model was run, with level one being the individual i, level two the primary sampling unit j and level three the LAD k. Covariates from within the survey, shown in lower case, and LAD level data, shown in upper case, were used to predict the individual level behaviour.

Logit (Pijk) = Xijk B + Ujk + Vk + Eijk

Where P is a vector of probabilities associated with individual i in Primary Sampling Unit (PSU) j within LAD k, B a vector of regression coefficients, X a matrix of covariates associated with the individual measured within the survey, U a random vector of area effects associated with the PSU and V the LAD and E is a vector of independent random ‘noise’ elements. The matrix of covariates included PSU area measures, based on aggregated individual level survey counts within the PSU. These covariates are given in the table below:

2001 and 2003 Total Population and 2001 Ethnic Groups - Fresh Fruit Intake
    Covariates
  Constant 0.414
Individual effects (x) Bangladeshi -0.187
  Black African 0.82
  Black Caribbean 0.144
  Chinese -1.304
  Indian 0.196
  Pakistani 0.325
  20-24 years 0.039
  25-29 years -0.194
  30-34 years -0.403
  35-39 years -0.481
  40-44 years -0.642
  45-49 years -0.762
  50-54 years -0.925
  55-59 years -1.255
  60-64 years -1.095
  65-69 years -0.984
  70-74 years -1.012
  75+years -1.085
  Male 0.551

Step 2
The fixed effects part of the model were then taken and applied to the matrix X of small area covariates held by SDRC for 100% of individuals and LADs across England, the random LAD area effect added (where it was available for an LAD), and the anti-logit applied. The probability was then summed and averaged over the LAD to produce a vector of synthetic LAD level estimates:

Yk = 1 / Nk x Sum ( anti-Logit ( Xijk B + Vk ) )

This method does not use weighting to remove bias in the parameter estimators introduced by unequal selection probabilities in the survey sampling schemes. Instead important characteristics of the sample are included in the model as covariates. The sample indicator variable S will therefore be unrelated to Y conditional on these covariates. In this case the sample can be viewed as uninformative and ignorable. There is little conflict in including theses covariates because they are, by definition, predictors of Y and so should be included in the model. If they were not, the sample design would not bias the standard estimators of the parameters.

Included in our models are measures of non-manual social classes and a ‘level’ for the primary sampling unit. Together these will capture, to a great extent, the unequal selection probabilities associated with the sample design. Other variables such as age will ensure that where a question or measure was taken of only a particular age group in a specific survey year, the estimates will not be biased.


Component IB1_3: Alcohol abuse

Definition Directly age and gender standardised rate of admissions to hospital for alcohol related conditions
Source Numerator

2001, 2001 Ethnic: All ethnic all coded admissions to hospital for alcohol related conditions, Hospital Episode Statistics (HES), 1998/99, 1999/00, 2000/01, 2001/02, Department of Health

2003: All admissions to hospital for alcohol related conditions, Hospital Episode Statistics (HES), 1999/00, 2000/01, 2001/02, 2002/03 Department of Health
Source Denominator

2001, 2001 Ethnic: Mid year population estimate 2001, ONS

2003: Mid year population estimate 2003, ONS

Additional details

There are many factors which influence how much someone drinks: occupation and family background, the acceptability of drinking in a culture, and the occurrence of stressful or major life events. Statistics show that people’s drinking habits have changed over the last 30 years. Recent years have seen an increase in the number of women drinking above recommended levels, and a worrying trend for teenagers to drink large quantities. There is evidence also of substantial numbers of men and women drinking heavily and in a binge drinking pattern. Alcohol is a major contributor not only to death, injury and illness, but also social damage such as crime and disorder, and social exclusion (2001, Annual Report of the Chief Medical Officer).

Alcohol abuse, captured by the rate of admissions to hospital for alcohol related conditions, is one indicator of unhealthy behaviour leading to poor health outcomes, as well as wider social problems.

The International Classification of Diseases Version 10 (ICD-10) codes used to extract data on admissions for alcohol related conditions from the HES dataset were:

  • E52, F10, G312, G621, G721, I426, K292, K70, K860, O354, P043, Q860. R780, T506, T510, T519, X65, Y15, Y573, Y90, Y91, Z133, Z502, Z637, Z714, Z721, Z811, Z864.

Cases were used if one or more of these codes were found in any of the seven diagnosis fields. Individuals who had more than one admission for an alcohol related condition in a given year were counted once only.

To control for differences in the age and gender structure across small areas, direct standardisation was used. Direct standardisation involves the application of small area age and gender structures to a standard population, which in this instance is derived from the HES data. This produces an expected number of events (admissions for alcohol abuse) in the standard population as if the risk profile of the individual areas was in place. This is contrasted with the actual number of observed events in the standard population to give a ratio. Thus a measure of higher or lower than expected occurrence of admissions for alcohol abuse is created.

For indicators derived from the Hospital Episode Statistics (HES) the estimates are based on the relationship between all hospital stays, and those recorded for a specific condition of interest. Detail is added from census data to depict the spatial distribution of individuals in ethnic groups. All estimates are statistically smoothed to reduce noise within the distribution, enabling the underlying trend to be highlighted. For more details see the discussion paper. <link to be added >


Component IB1_4: Drug misuse

Definition Directly age and sex standardised rate of admissions to hospital for drug related conditions
Source Numerator

2001, 2001 Ethnic: All ethnically coded admissions to hospital for drug related conditions, Hospital Episode Statistics (HES), 1998/99, 1999/00, 2000/01, 2001/02, Department of Health

2003: All admissions to hospital for drug related conditions, Hospital Episode Statistics (HES), 1998/99, 1999/00, 2000/01, 2001/02, Department of Health
Source Denominator

2001, 2001 Ethinc: Mid year population estimate 2001, ONS

2003: Mid year population estimate 2003, ONS

Additional details

Around 4 million people use at least one illicit drug each year and around 1 million use at least one of the most dangerous drugs classified as Class A. Many of these individuals will take drugs once but for thousands of problematic drug users in England and Wales, drugs cause considerable harm to themselves and others (2002, Home Office Updated Drug Strategy).

Obviously there are significant health risks associated with drugs. Drug misuse, captured by the rate of admissions to hospital for drug related conditions, is one indicator of unhealthy behaviour leading to poor health outcomes. Research suggests that there are all kinds of reasons for misuse, key factors including unemployment, low self esteem, educational failure, boredom and physical, psychological or family problems.

There are also strong links between drug misuse and crime, violence and hidden social problems - in homes and schools, on the roads and in the workplace. This indicator can also reflect such problems in society.

The International Classification of Diseases Version 10 (ICD-10) codes used to extract data on admissions for drug related conditions from the HES dataset were:

  • F11, F12, F13, F14, F15, F16, F18, F19

Cases were used if one or more of these codes were found in any of the seven diagnosis fields. Individuals who had more than one admission for a drug related condition in a given year were counted once only.

To control for differences in the age and gender structure across small areas, direct standardisation was used. Direct standardisation involves the application of small area age and gender structures to a standard population, which in this instance is derived from the HES data. This produces an expected number of events (admissions for drug misuse) in the standard population as if the risk profile of the individual areas was in place. This is contrasted with the actual number of observed events in the standard population to give a ratio. Thus a measure of higher or lower than expected occurrence of admissions for drug misuse is created.

For indicators derived from the Hospital Episode Statistics (HES) the estimates are based on the relationship between all hospital stays, and those recorded for a specific condition of interest. Detail is added from census data to depict the spatial distribution of individuals in ethnic groups. All estimates are statistically smoothed to reduce noise within the distribution, enabling the underlying trend to be highlighted. For more details see the discussion paper. <link to be added >


Component IB1_5: Physical Activity in a week

Definition Modelled estimate of proportion doing under five hours of physical activity in a week
Source 2001, 2001 Ethnic and 2003: Health Survey for England, 1998 to 2001, Joint Survey Unit of the National Centre for Social Research and the Department of Epidemiology and Public Health, University College London / Department of Health
(See: Health Survey for England)
Note Due to changes in the question asked in the HSE, which has resulted in inconsistency in the definition used over time, the data has been frozen and one model used.

Additional details

In the absence of any suitable administrative or census data, survey data was the only source of information available to construct an indicator of physical activity. However there are a number of problems associated with using survey data to produce Local Authority District (LAD) estimates, including small or non-existent samples in some areas leading to large variances and unstable estimates and biases introduced by particular sampling strategies.

A great deal of work, particularly in the last twenty years, has gone into addressing these issues. Although a number of different approaches have been used, all the methods tend to fall somewhere on a continuum between using direct estimates, suitably weighted for sample design, and a modelling approach using local area covariates to estimate the indicator of interest. Some are based on only one or other of the methods. However the two methods each have their own particular problems. Direct estimates, weighted as necessary, are unbiased but may have large variances; on the other hand the modelled estimates will have small variances but will be biased. Hence many estimates attempt to combine information from both in order to solve the common problem of minimising the Mean Square Error of the final estimate.

The method used in the HPI required that a well-fitted micro level model could be identified. It also assumed that the important ways in which a group may have been over-sampled in a survey sample can be captured by covariates available in the survey and at a small area level. It involved combining all surveys available for the required year with the necessary dependent and independent variables (e.g. socio-economic status, age, gender and ethnicity). Data were gathered from the Health Survey for England (HSE) (1998 – 2001). Due to changes in the question asked in the HSE, resulting in inconsistency in the definition used over time, the data has been frozen, and one model used.

In 1999, the focus of the HSE was the health of minority ethnic groups in order to increase understanding through the monitoring of trends and by enabling us to make predictions. For this purpose a boost sample was designed in order to yield interviews with members of the most populous six minority ethnic groups: Black Caribbean, Black African, Indian, Pakistani, Bangladeshi, Chinese and Irish. For the purpose of this estimate, Irish is included under White, and the Black African group added (although this sample was not boosted, hence the low numbers) The table below shows the number of ethnic groups available for each year that were used in the modelling:

 
Year
Total
1998 1999 2000 2001
  White 18019 10437 8851 17322 54629
Black Caribbean 183 2029 143 296 2651
Black African 143 73 98 172 486
Indian 321 1909 203 287 2720
Pakistani 198 2148 91 225 2662
Bangladeshi 73 1905 64 83 2125
Chinese 39 961 17 37 1054
  Total 18796 19462 9467 18422 66327

For the ethnic population estimates, the adult and 1999 ethnic minority boost was used.

Step 1
Using combined survey data, with LAD geocoding, a multi-level, variable intercepts, logistic model was run, with level one being the individual i, level two the primary sampling unit j and level three the LAD k. Covariates from within the survey, shown in lower case, and LAD level data, shown in upper case, were used to predict the individual level behaviour.

Logit (Pijk) = Xijk B + Ujk + Vk + Eijk

Where P is a vector of probabilities associated with individual i in Primary Sampling Unit (PSU) j within LAD k, B a vector of regression coefficients, X a matrix of covariates associated with the individual measured within the survey, U a random vector of area effects associated with the PSU and V the LAD and E is a vector of independent random ‘noise’ elements. The matrix of covariates included PSU area measures, based on aggregated individual level survey counts within the PSU. These covariates are given in the table below:

    Covariates
  Constant -0.816
Individual effects Bangladeshi 1.295
  Black African 0.751
  Black Caribbean 0.118
  Chinese 0.796
  Indian 0.757
  Pakistani 1.097
  20-24 years 0.064
  25-29 years 0.046
  30-34 years 0.136
  35-39 years 0.139
  40-44 years 0.254
  45-49 years 0.298
  50-54 years 0.373
  55-59 years 0.577
  60-64 years 1.002
  65-69 years 1.284
  70-74 years 1.53
  75+years 2.481
  Male -0.434
LAD Area Effects Proportion higher social class 0.287

Step 2
The fixed effects part of the model are then taken and applied to the matrix of small area covariates X held by SDRC for 100% of individuals and LADs across England, the random LAD area effect added (where it was available for an LAD), and the anti-logit applied. The probability was then summed and averaged over the LAD to produce a vector of synthetic LAD level estimates:

Yk = 1 / Nk x Sum ( anti-Logit ( Xijk B + Vk ) )

This method does not use weighting to remove bias in the parameter estimators introduced by unequal selection probabilities in the survey sampling schemes. Instead important characteristics of the sample are included in the model as covariates. The sample indicator variable S will therefore be unrelated to Y conditional on these covariates. In this case the sample can be viewed as uninformative and ignorable. There is little conflict in including theses covariates because they are, by definition, predictors of Y and so should be included in the model. If they were not, the sample design would not bias the standard estimators of the parameters.

Included in our models are measures of non-manual social classes and a ‘level’ for the primary sampling unit. Together these will capture, to a great extent, the unequal selection probabilities associated with the sample design. Other variables such as age will ensure that where a question or measure was taken of only a particular age group in a specific survey year, the estimates will not be biased.

Further Information

The HPI tool is in the third phase of development. We would welcome your feedback.

Please remember to reference the project if you use the data or charts from this site.

Dibben, C, Sims, A., Watson, J., Barnes, H., Smith, T., Sigala, M. , Hill, A. and Manley, D. (2004) The Health Poverty Index. South East Public Health Observatory, Oxford.

Email us:
» Keep up to date
» Send feedback

 

Information Centre logo