Statistik Bisnis 1 Week 11 Sampling and Sampling Distribution
Learning Objectives In this chapter, you learn: • To distinguish between different sampling methods • The concept of the sampling distribution • To compute probabilities related to the sample mean and the sample proportion • The importance of the Central Limit Theorem
SAMPLING
Why Sample? • Selecting a sample is less time-consuming than selecting every item in the population (census). • Selecting a sample is less costly than selecting every item in the population. • An analysis of a sample is less cumbersome and more practical than an analysis of the entire population.
A Sampling Process Begins With A Sampling Frame • The sampling frame is a listing of items that make up the population • Frames are data sources such as population lists, directories, or maps • Inaccurate or biased results can result if a frame excludes certain portions of the population • Using different frames to generate data can lead to dissimilar conclusions
Types of Samples
Samples Non-Probability Samples
Judgment
Convenience
Probability Samples
Simple Random
Systematic
Stratified
Cluster
Types of Samples: Nonprobability Sample • In a nonprobability sample, items included are chosen without regard to their probability of occurrence. – In convenience sampling, items are selected based only on the fact that they are easy, inexpensive, or convenient to sample. – In a judgment sample, you get the opinions of preselected experts in the subject matter.
Types of Samples: Probability Sample • In a probability sample, items in the sample are chosen on the basis of known probabilities. Probability Samples
Simple Random
Systematic
Stratified
Cluster
Probability Sample: Simple Random Sample • Every individual or item from the frame has an equal chance of being selected • Selection may be with replacement (selected individual is returned to frame for possible reselection) or without replacement (selected individual isn’t returned to the frame). • Samples obtained from table of random numbers or computer random number generators.
Selecting a Simple Random Sample Using A Random Number Table Sampling Frame For Population With 850 Items Item Name Item # Bev R. Ulan X. . . . . Joann P. Paul F.
001 002 . . . . 849 850
Portion Of A Random Number Table 49280 88924 35779 00283 81163 07275 11100 02340 12860 74697 96644 89439 09893 23997 20048 49420 88872 08401
The First 5 Items in a simple random sample Item # 492 Item # 808 Item # 892 -- does not exist so ignore Item # 435 Item # 779 Item # 002
Probability Sample: Systematic Sample • Decide on sample size: n • Divide frame of N individuals into groups of k individuals: k=N/n • Randomly select one individual from the 1st group • Select every kth individual thereafter N = 40 n=4 k = 10
First Group
Probability Sample: Stratified Sample • Divide population into two or more subgroups (called strata) according to some common characteristic • A simple random sample is selected from each subgroup, with sample sizes proportional to strata sizes • Samples from subgroups are combined into one • This is a common technique when sampling population of voters, stratifying across racial or socio-economic lines. Population Divided into 4 strata
Probability Sample Cluster Sample • Population is divided into several “clusters,” each representative of the population • A simple random sample of clusters is selected • All items in the selected clusters can be used, or items can be chosen from a cluster using another probability sampling technique • A common application of cluster sampling involves election exit polls, where certain election districts are selected and sampled. Population divided into 16 clusters.
Randomly selected clusters for sample
Probability Sample: Comparing Sampling Methods • Simple random sample and Systematic sample – Simple to use – May not be a good representation of the population’s underlying characteristics
• Stratified sample – Ensures representation of individuals across the entire population
• Cluster sample – More cost effective – Less efficient (need larger sample to acquire the same level of precision)
Evaluating Survey Worthiness • • • • •
What is the purpose of the survey? Is the survey based on a probability sample? Coverage error – appropriate frame? Nonresponse error – follow up Measurement error – good questions elicit good responses • Sampling error – always exists
Types of Survey Errors • Coverage error or selection bias – Exists if some groups are excluded from the frame and have no chance of being selected
• Non response error or bias – People who do not respond may be different from those who do respond
• Sampling error – Variation from sample to sample will always exist
• Measurement error – Due to weaknesses in question design, respondent error, and interviewer’s effects on the respondent (“Hawthorne effect”)
Types of Survey Errors (continued)
• Coverage error
Excluded from frame
• Non response error
Follow up on nonresponses
• Sampling error • Measurement error
Random differences from sample to sample
Bad or leading question
SAMPLING DISTRIBUTION
Sampling Distributions • A sampling distribution is a distribution of all of the possible values of a sample statistic for a given size sample selected from a population. • For example, suppose you sample 50 students from your college regarding their mean GPA. If you obtained many different samples of 50, you will compute a different mean for each sample. We are interested in the distribution of all potential mean GPA we might calculate for any given sample of 50 students.
Developing a Sampling Distribution • Assume there is a population …
• Population size N=4 • Random variable, X, is age of individuals • Values of X: 18, 20, 22, 24 (years)
A
B
C
D
Developing a Sampling Distribution
(continued)
Summary Measures for the Population Distribution:
X μ
P(x)
i
N
.3
18 20 22 24 21 4 σ
(X
i
μ)
N
.2
.1 0
2
2.236
18 A
B
20
C
22 D
24
Uniform Distribution
x
Developing a Sampling Distribution (continued)
Now consider all possible samples of size n=2 1st
Obs
2nd
16 Sample Means
Observation
18
20
22
24
18
18,18
18,20
18,22
18,24
20
20,18
20,20
20,22
20,24
1st 2nd Observation Obs 18 20 22 24
22
22,18
22,20
22,22
22,24
18 18 19 20 21
24
24,18
24,20
24,22
24,24
20 19 20 21 22
16 possible samples (sampling with replacement)
22 20 21 22 23 24 21 22 23 24
Developing a Sampling Distribution (continued)
Sampling Distribution of All Sample Means Sample Means Distribution
16 Sample Means 1st 2nd Observation Obs 18 20 22 24
18 18 19 20 21
_
P(X)
.3 .2
20 19 20 21 22 .1
22 20 21 22 23 0
24 21 22 23 24
18 19 20 21 22 23 24
(no longer uniform)
_
X
Developing a Sampling Distribution (continued)
Summary Measures of this Sampling Distribution: μX
X i 18 19 19 24 21
σX
N
16
2 ( X μ ) i X
N (18 - 21)2 (19 - 21)2 (24 - 21)2 1.58 16
Comparing the Population Distribution to the Sample Means Distribution Population N=4
μ 21
Sample Means Distribution n=2
μX 21
σ 2.236
σ X 1.58
_
P(X)
.3
P(X) .3
.2
.2
.1
.1
0
18 A
20 B
22 C
24 D
X
0
18
19
20 21 22 23
24
_ X
Sample Mean Sampling Distribution: Standard Error of the Mean • Different samples of the same size from the same population will yield different sample means • A measure of the variability in the mean from sample to sample is given by the Standard Error of the Mean: (This assumes that sampling is with replacement or sampling is without replacement from an infinite population)
σ σX n • Note that the standard error of the mean decreases as the sample size increases
Sample Mean Sampling Distribution: If the Population is Normal • If a population is normally distributed with mean μ and standard deviation σ, the sampling distribution of X is also normally distributed with
μX μ
and
σ σX n
Z-value for Sampling Distribution of the Mean • Z-value for the sampling distribution of X : Z
where:
( X μX ) σX
( X μ) σ n
X = sample mean μ = population mean σ = population standard deviation n = sample size
Sampling Distribution Properties
μx μ (i.e.
x is unbiased )
Normal Population Distribution
μ
x
μx
x
Normal Sampling Distribution (has the same mean)
Sampling Distribution Properties (continued)
– As n increases, – σ x decreases
Larger sample size
Smaller sample size
μ
x
Example Oxford Cereals mengisi ribuan kotak sereal dalam satu shift (8 jam). Sebagai manajer operasional, anda bertanggung jawab untuk memonitor jumlah sereal yang diisi pada tiap kotak. Agar konsisten dengan label pada kotak, kotak-kotak tersebut harus rata-rata berisi 368 gram sereal. Karena kecepatan proses, berat isi sereal bervariasi dari kotak ke kotak, menyebabkan ada kotak yang isinya lebih sedikit dan ada kotak yang isinya lebih banyak. Jika proses tersebut tidak bekerja dengan benar, berat rata-rata dari kotak-kotak tersebut dapat terlalu bervariasi dari berat label 368 gram tersebut.
Example Karena menimbang semua kotak akan terlalu menghabiskan waktu, biayanya besar dan tidak efisien, anda harus mengambil sampel. Untuk tiap sampel yang anda pilih, anda berencana untuk menimbang masing-masing kotak dan menghitung rata-rata sampel. Anda perlu menentukan peluang munculnya rata-rata sampel tersebut dari populasi yang rata-ratanya 368 grams. Berdasarkan analisis, anda harus memutuskan apakah anda perlu mempertahankan, menyesuaikan atau menutup proses pengisian sereal tersebut.
Example a. Jika anda memilih 25 kotak secara acak tanpa dikembalikan dari ribuan kotak yang diisi pada sebuah shift, sampel ini jumlahnya jauh lebih sedikit dari 5% populasi. Diketahui bahwa simpangan baku proses pengisian sereal adalah 15 gram, hitunglah kesalahan baku rata-rata (standard error of the mean)?
σX
σ n
15 25
3
Example b. Bagaimana kesalahan baku rata-rata (standard error of the mean) dipengaruhi oleh peningkatan ukuran sampel dari 25 hingga 100 kotak?
σX
σ n
15 100
1.5
Example c. Jika anda memilih 100 kotak, berapakah peluang rata-rata sampel dibawah 365 gram?
Example c. Jika anda memilih 100 kotak, berapakah peluang rata-rata sampel dibawah 365 gram?
365
368
X μ 365 368 3 Z 2 σ 15 1.5 n 100 P( x 365) P( z 2) 0.0228
Example d. Temukan selang yang berdistribusi simetris disekitar rata-rata populasi yang mencakup 95% rata-rata sampel, jika sampel yang diambil adalah 25 kotak.
Example d. Temukan selang yang berdistribusi simetris disekitar rata-rata populasi yang mencakup 95% rata-rata sampel, jika sampel yang diambil adalah 25 kotak. 95% XL
368
XU
Dengan demikian: P( X X L ) 0.025 P( X X U ) 0.975
Example P( X X L ) 0.025
P( X X U ) 0.975
Z X L 1.96
Z XU 1.96
X L 368 - 1.96 15 25
X L 368 1.96 15 25
X L 368 1.96.3
X L 368 1.96.3
X L 362.12
X L 373.88
EXERCISE
Exercise 1 Biro Sensus U.S. mengumumkan bahwa median dari harga jual rumah baru pada tahun 2009 adalah $215.600, dan rata-rata harga jualnya adalah $270.100. Asumsikan simpangan baku dari harga jual adalah $90.000. a. Jika anda memilih sampel, n = 2, bagaimanakah bentuk distribusi sampling X . b. Jika anda memilih sampel, n = 100, bagaimanakah bentuk distribusi sampling X . c. Jika anda memilih sampel, n = 100, berapakah peluang rata-rata sampel akan kurang dari $300.000? d. Jika anda memilih sampel, n = 100, berapakah peluang rata-rata sampel akan berada antara $275.000 dan $290.000?
Exercise 2 Waktu yang dihabiskan untuk menggunakan surel (e-mail) per sesi berdistribusi normal, dengan = 8 menit dan = 2 menit. Jika anda memilih sampel acak 25 sesi, a. Berapakah peluang rata-rata sampel berada diantara 7.8 dan 8.2 menit? b. Berapakah peluang rata-rata sampel berada diantara 7.5 dan 8 menit? c. Jika anda memilih sampel acak 100 sesi, berapakah peluang rata-rata sampel berada diantara 7.8 dan 8.2 menit? d. Jelaskan perbedaan hasil pada poin (a) dan (c).
Exercise 3 Jumlah waktu yang dihabiskan oleh seorang teller bank untuk melayani tiap pelanggan memiliki rata-rata, = 3.10 menit dan simpangan baku, = 0.40 menit. Jika anda memilih sampel acak 16 pelanggan, a. Berapakah peluang rata-rata waktu yang dihabiskan per pelanggan paling tidak 3 menit? b. Terdapat 85% peluang bahwa rata-rata sampel akan kurang dari berapa menit? c. Apakah asumsi yang harus ada untuk dapat menyelesaikan poin (a) dan (b)? d. Jika anda memilih sampel acak 64 pelanggan, terdapat 85% peluang bahwa rata-rata sampel kurang dari berapa menit?
ANSWER
Exercise 1 a. Karena mean > median, distribusi populasi harga jual akan menceng ke kiri. Karena n=3 (n<30) maka distribusi sampelnya juga akan menceng ke kiri b. Karena n=100 maka distribusi sampelnya akan mendekati normal dengan rata-rata $274.300 dan simpangan baku $9.000 c. 0.9996 d. 0.2796
Exercise 3 a. P(X >3) = P(Z>-1.00) = 1.0 – 0.1587 = 0.8413 b. P(Z<1.04) = 0.85 X = 3.10 + 1.04 (0.1) = 3.204 a. Distribusi populasi paling tidak harus simetris b. P(Z<1.04) = 0.85 X = 3.10 + 1.04 (0.05) = 3.152
CENTRAL LIMIT THEOREM
Central Limit Theorem As the sample size gets large enough…
n↑
the sampling distribution becomes almost normal regardless of shape of population
x
Sample Mean Sampling Distribution: If the Population is not Normal
How Large is Large Enough? • For most distributions, n > 30 will give a sampling distribution that is nearly normal • For fairly symmetric distributions, n > 15 will usually give a sampling distribution is almost normal • For normal population distributions, the sampling distribution of the mean is always normally distributed
POPULATION PROPORTION
Population Proportions π = the proportion of the population having some characteristic Sample proportion ( p ) provides an estimate of π: p
X number of items in the sample having the characteristic of interest n sample size
• 0≤ p≤1 • p is approximately distributed as a normal distribution when n is large • (assuming sampling with replacement from a finite population or without replacement from an infinite population)
Sampling Distribution of p • Approximated by a normal distribution if:
nπ 5
P( ps)
Sampling Distribution
.3 .2 .1 0
and 0
n(1 π ) 5 where
μp π
and
.2
.4
.6
8
π(1 π ) σp n
(where π = population proportion)
1
p
Z-Value for Proportions Standardize p to a Z value with the formula: p Z σp
p (1 ) n
Example • Seorang manajer bank lokal menetapkan bahwa 40% dari pelanggannya memiliki lebih dari satu akun rekening. • Jika anda memilih sampel acak 200 pelanggan, karena n = 200(0.40) = 80 ≥ 5 dan n(1 – ) = 200(0.60) = 120 ≥ 5, maka ukuran sampel cukup besar untuk bisa diasumsikan mendekati distribusi normal • Hitunglah peluang proporsi sampel pelanggan yang memiliki akun rekening lebih dari satu kurang dari 0.30.
Example Z
p
(1 ) n
0.30 0.40 (0.40)(0.60) 200
0.10 0.24 200
2.89
P(Z<-2.89) = 0.0019
Jika proporsi populasi 0.40, hanya 0.19% dari sampel (n=200) akan memiliki proporsi sampel kurang dari 0.3
EXERCISE
Exercise 4 Sebuah badan survey independen melakukan hitung cepat hasil pemilu. Misalkan terdapat dua kandidat pemilu, jika salah satu kandidat mendapat paling tidak 55% suara dari sampel, kandidat tersebut akan diprediksi sebagai pemenang pemilu. Jika anda memilih sampel acak 100 pemilih, berapakah peluang seorang kandidat akan diprediksi menjadi pemenang jika a. Persentase populasi pemilihnya sebesar 50.1%? b. Persentase populasi pemilihnya sebesar 60%? c. Persentase populasi pemilihnya sebesar 49% (dan dia sebenarnya kalah pemilu)? d. Jika ukuran sampelnya dinaikan menjadi 400, bagaimanakah jawaban poin (a) hingga (c)? Diskusikan!
Exercise 5 Pada survei terbaru pada pekerja wanita penuh waktu usia 22 hingga 35 tahun, 46% mengatakan bahwa lebih baik gaji mereka dikurangi demi mendapatkan lebih banyak waktu luang. (Data didapatkan dari “I’d Rather Give Up,” USA Today, 4 Maret 2010, hal. 1B.) Misalkan anda memilih sampel 100 pekerja wanita penuh waktu berusia 22 hingga 35 tahun. a. Berapakah peluang bahwa didalam sampel, kurang dari 50% sampel lebih memilih gaji mereka dikurangi demi waktu luang yang lebih banyak? b. Berapakah peluang bahwa didalam sampel, terdapat di antara 40% dan 50% sampel lebih memilih gaji mereka dikurangi demi waktu luang yang lebih banyak? c. Berapakah peluang bahwa didalam sampel, lebih dari 40% sampel lebih memilih gaji mereka dikurangi demi waktu luang yang lebih banyak? d. Jika jumlah sampel menjadi 400 orang, bagaimanakah perubahan jawaban poin (a) hingga (c)?
ANSWER
Exercise 4 a. = 0.501 (1 ) 0.501(1 0.501) P 0.05 n 100 P(p>0.55) = P(Z>0.98) = 1.0 – 0.8365 = 0.1635 b. = 0.60 (1 ) 0.6(1 0.6) P 0.04899 n 100 P(p>0.55) = P(Z>-1.021) = 1.0 – 0.1539 = 0.8461
Exercise 4 c. = 0.49
(1 )
0.49(1 0.49) P 0.05 n 100 P(p>0.55) = P(Z>1.28) = 1.0 – 0.8849 = 0.1151 d. = 0.60 (1 ) 0.6(1 0.6) P 0.04899 n 100 P(p>0.55) = P(Z>-1.021) = 1.0 – 0.1539 = 0.8461
THANK YOU