Statistik Bisnis 1. Week 11 Sampling and Sampling Distribution

Statistik Bisnis 1 Week 11 Sampling and Sampling Distribution

Learning Objectives In this chapter, you learn: • To distinguish between different sampling methods • The concept of the sampling distribution • To compute probabilities related to the sample mean and the sample proportion • The importance of the Central Limit Theorem

SAMPLING

Why Sample? • Selecting a sample is less time-consuming than selecting every item in the population (census). • Selecting a sample is less costly than selecting every item in the population. • An analysis of a sample is less cumbersome and more practical than an analysis of the entire population.

A Sampling Process Begins With A Sampling Frame • The sampling frame is a listing of items that make up the population • Frames are data sources such as population lists, directories, or maps • Inaccurate or biased results can result if a frame excludes certain portions of the population • Using different frames to generate data can lead to dissimilar conclusions

Types of Samples

Samples Non-Probability Samples

Judgment

Convenience

Probability Samples

Simple Random

Systematic

Stratified

Cluster

Types of Samples: Nonprobability Sample • In a nonprobability sample, items included are chosen without regard to their probability of occurrence. – In convenience sampling, items are selected based only on the fact that they are easy, inexpensive, or convenient to sample. – In a judgment sample, you get the opinions of preselected experts in the subject matter.

Types of Samples: Probability Sample • In a probability sample, items in the sample are chosen on the basis of known probabilities. Probability Samples

Simple Random

Systematic

Stratified

Cluster

Probability Sample: Simple Random Sample • Every individual or item from the frame has an equal chance of being selected • Selection may be with replacement (selected individual is returned to frame for possible reselection) or without replacement (selected individual isn’t returned to the frame). • Samples obtained from table of random numbers or computer random number generators.

Selecting a Simple Random Sample Using A Random Number Table Sampling Frame For Population With 850 Items Item Name Item # Bev R. Ulan X. . . . . Joann P. Paul F.

001 002 . . . . 849 850

Portion Of A Random Number Table 49280 88924 35779 00283 81163 07275 11100 02340 12860 74697 96644 89439 09893 23997 20048 49420 88872 08401

The First 5 Items in a simple random sample Item # 492 Item # 808 Item # 892 -- does not exist so ignore Item # 435 Item # 779 Item # 002

Probability Sample: Systematic Sample • Decide on sample size: n • Divide frame of N individuals into groups of k individuals: k=N/n • Randomly select one individual from the 1st group • Select every kth individual thereafter N = 40 n=4 k = 10

First Group

Probability Sample: Stratified Sample • Divide population into two or more subgroups (called strata) according to some common characteristic • A simple random sample is selected from each subgroup, with sample sizes proportional to strata sizes • Samples from subgroups are combined into one • This is a common technique when sampling population of voters, stratifying across racial or socio-economic lines. Population Divided into 4 strata

Probability Sample Cluster Sample • Population is divided into several “clusters,” each representative of the population • A simple random sample of clusters is selected • All items in the selected clusters can be used, or items can be chosen from a cluster using another probability sampling technique • A common application of cluster sampling involves election exit polls, where certain election districts are selected and sampled. Population divided into 16 clusters.

Randomly selected clusters for sample

Probability Sample: Comparing Sampling Methods • Simple random sample and Systematic sample – Simple to use – May not be a good representation of the population’s underlying characteristics

• Stratified sample – Ensures representation of individuals across the entire population

• Cluster sample – More cost effective – Less efficient (need larger sample to acquire the same level of precision)

Evaluating Survey Worthiness • • • • •

What is the purpose of the survey? Is the survey based on a probability sample? Coverage error – appropriate frame? Nonresponse error – follow up Measurement error – good questions elicit good responses • Sampling error – always exists

Types of Survey Errors • Coverage error or selection bias – Exists if some groups are excluded from the frame and have no chance of being selected

• Non response error or bias – People who do not respond may be different from those who do respond

• Sampling error – Variation from sample to sample will always exist

• Measurement error – Due to weaknesses in question design, respondent error, and interviewer’s effects on the respondent (“Hawthorne effect”)

Types of Survey Errors (continued)

• Coverage error

Excluded from frame

• Non response error

Follow up on nonresponses

• Sampling error • Measurement error

Random differences from sample to sample

Bad or leading question

SAMPLING DISTRIBUTION

Sampling Distributions • A sampling distribution is a distribution of all of the possible values of a sample statistic for a given size sample selected from a population. • For example, suppose you sample 50 students from your college regarding their mean GPA. If you obtained many different samples of 50, you will compute a different mean for each sample. We are interested in the distribution of all potential mean GPA we might calculate for any given sample of 50 students.

Developing a Sampling Distribution • Assume there is a population …

• Population size N=4 • Random variable, X, is age of individuals • Values of X: 18, 20, 22, 24 (years)

A

B

C

D

Developing a Sampling Distribution

(continued)

Summary Measures for the Population Distribution:

X  μ

P(x)

i

N

.3

18  20  22  24   21 4 σ

 (X

i

 μ)

N

.2

.1 0

2

 2.236

18 A

B

20

C

22 D

24

Uniform Distribution

x

Developing a Sampling Distribution (continued)

Now consider all possible samples of size n=2 1st

Obs

2nd

16 Sample Means

Observation

18

20

22

24

18

18,18

18,20

18,22

18,24

20

20,18

20,20

20,22

20,24

1st 2nd Observation Obs 18 20 22 24

22

22,18

22,20

22,22

22,24

18 18 19 20 21

24

24,18

24,20

24,22

24,24

20 19 20 21 22

16 possible samples (sampling with replacement)

22 20 21 22 23 24 21 22 23 24


Sampling Distribution of All Sample Means Sample Means Distribution

16 Sample Means 1st 2nd Observation Obs 18 20 22 24

18 18 19 20 21

_

P(X)

.3 .2

20 19 20 21 22 .1

22 20 21 22 23 0

24 21 22 23 24

18 19 20 21 22 23 24

(no longer uniform)

_

X


Summary Measures of this Sampling Distribution: μX

X  i 18  19  19    24    21

σX  

N

16

2 ( X  μ ) i  X

N (18 - 21)2  (19 - 21)2    (24 - 21)2  1.58 16

Comparing the Population Distribution to the Sample Means Distribution Population N=4

μ  21

Sample Means Distribution n=2

μX  21

σ  2.236

σ X  1.58

_

P(X)

.3

P(X) .3

.2

.2

.1

.1

0

18 A

20 B

22 C

24 D

X

0

18

19

20 21 22 23

24

_ X

Sample Mean Sampling Distribution: Standard Error of the Mean • Different samples of the same size from the same population will yield different sample means • A measure of the variability in the mean from sample to sample is given by the Standard Error of the Mean: (This assumes that sampling is with replacement or sampling is without replacement from an infinite population)

σ σX  n • Note that the standard error of the mean decreases as the sample size increases

Sample Mean Sampling Distribution: If the Population is Normal • If a population is normally distributed with mean μ and standard deviation σ, the sampling distribution of X is also normally distributed with

μX  μ

and

σ σX  n

Z-value for Sampling Distribution of the Mean • Z-value for the sampling distribution of X : Z

where:

( X  μX ) σX

( X  μ)  σ n

X = sample mean μ = population mean σ = population standard deviation n = sample size

Sampling Distribution Properties

μx  μ (i.e.

x is unbiased )

Normal Population Distribution

μ

x

μx

x

Normal Sampling Distribution (has the same mean)

Sampling Distribution Properties (continued)

– As n increases, – σ x decreases

Larger sample size

Smaller sample size

μ

x

Example Oxford Cereals mengisi ribuan kotak sereal dalam satu shift (8 jam). Sebagai manajer operasional, anda bertanggung jawab untuk memonitor jumlah sereal yang diisi pada tiap kotak. Agar konsisten dengan label pada kotak, kotak-kotak tersebut harus rata-rata berisi 368 gram sereal. Karena kecepatan proses, berat isi sereal bervariasi dari kotak ke kotak, menyebabkan ada kotak yang isinya lebih sedikit dan ada kotak yang isinya lebih banyak. Jika proses tersebut tidak bekerja dengan benar, berat rata-rata dari kotak-kotak tersebut dapat terlalu bervariasi dari berat label 368 gram tersebut.

Example Karena menimbang semua kotak akan terlalu menghabiskan waktu, biayanya besar dan tidak efisien, anda harus mengambil sampel. Untuk tiap sampel yang anda pilih, anda berencana untuk menimbang masing-masing kotak dan menghitung rata-rata sampel. Anda perlu menentukan peluang munculnya rata-rata sampel tersebut dari populasi yang rata-ratanya 368 grams. Berdasarkan analisis, anda harus memutuskan apakah anda perlu mempertahankan, menyesuaikan atau menutup proses pengisian sereal tersebut.

Example a. Jika anda memilih 25 kotak secara acak tanpa dikembalikan dari ribuan kotak yang diisi pada sebuah shift, sampel ini jumlahnya jauh lebih sedikit dari 5% populasi. Diketahui bahwa simpangan baku proses pengisian sereal adalah 15 gram, hitunglah kesalahan baku rata-rata (standard error of the mean)?

σX 

σ n



15 25

3

Example b. Bagaimana kesalahan baku rata-rata (standard error of the mean) dipengaruhi oleh peningkatan ukuran sampel dari 25 hingga 100 kotak?

σX 

σ n



15 100

 1.5

Example c. Jika anda memilih 100 kotak, berapakah peluang rata-rata sampel dibawah 365 gram?

Example c. Jika anda memilih 100 kotak, berapakah peluang rata-rata sampel dibawah 365 gram?

365

368

X  μ 365  368  3 Z    2 σ 15 1.5 n 100 P( x  365)  P( z  2)  0.0228

Example d. Temukan selang yang berdistribusi simetris disekitar rata-rata populasi yang mencakup 95% rata-rata sampel, jika sampel yang diambil adalah 25 kotak.

Example d. Temukan selang yang berdistribusi simetris disekitar rata-rata populasi yang mencakup 95% rata-rata sampel, jika sampel yang diambil adalah 25 kotak. 95% XL

368

XU

Dengan demikian: P( X  X L )  0.025 P( X  X U )  0.975

Example P( X  X L )  0.025

P( X  X U )  0.975

Z X L  1.96

Z XU  1.96

X L  368 - 1.96  15 25

X L  368 1.96  15 25

X L  368  1.96.3

X L  368  1.96.3

X L  362.12

X L  373.88

EXERCISE

Exercise 1 Biro Sensus U.S. mengumumkan bahwa median dari harga jual rumah baru pada tahun 2009 adalah $215.600, dan rata-rata harga jualnya adalah $270.100. Asumsikan simpangan baku dari harga jual adalah $90.000. a. Jika anda memilih sampel, n = 2, bagaimanakah bentuk distribusi sampling X . b. Jika anda memilih sampel, n = 100, bagaimanakah bentuk distribusi sampling X . c. Jika anda memilih sampel, n = 100, berapakah peluang rata-rata sampel akan kurang dari $300.000? d. Jika anda memilih sampel, n = 100, berapakah peluang rata-rata sampel akan berada antara $275.000 dan $290.000?

Exercise 2 Waktu yang dihabiskan untuk menggunakan surel (e-mail) per sesi berdistribusi normal, dengan  = 8 menit dan  = 2 menit. Jika anda memilih sampel acak 25 sesi, a. Berapakah peluang rata-rata sampel berada diantara 7.8 dan 8.2 menit? b. Berapakah peluang rata-rata sampel berada diantara 7.5 dan 8 menit? c. Jika anda memilih sampel acak 100 sesi, berapakah peluang rata-rata sampel berada diantara 7.8 dan 8.2 menit? d. Jelaskan perbedaan hasil pada poin (a) dan (c).

Exercise 3 Jumlah waktu yang dihabiskan oleh seorang teller bank untuk melayani tiap pelanggan memiliki rata-rata,  = 3.10 menit dan simpangan baku,  = 0.40 menit. Jika anda memilih sampel acak 16 pelanggan, a. Berapakah peluang rata-rata waktu yang dihabiskan per pelanggan paling tidak 3 menit? b. Terdapat 85% peluang bahwa rata-rata sampel akan kurang dari berapa menit? c. Apakah asumsi yang harus ada untuk dapat menyelesaikan poin (a) dan (b)? d. Jika anda memilih sampel acak 64 pelanggan, terdapat 85% peluang bahwa rata-rata sampel kurang dari berapa menit?

ANSWER

Exercise 1 a. Karena mean > median, distribusi populasi harga jual akan menceng ke kiri. Karena n=3 (n<30) maka distribusi sampelnya juga akan menceng ke kiri b. Karena n=100 maka distribusi sampelnya akan mendekati normal dengan rata-rata $274.300 dan simpangan baku $9.000 c. 0.9996 d. 0.2796

Exercise 3 a. P(X >3) = P(Z>-1.00) = 1.0 – 0.1587 = 0.8413 b. P(Z<1.04) = 0.85 X = 3.10 + 1.04 (0.1) = 3.204 a. Distribusi populasi paling tidak harus simetris b. P(Z<1.04) = 0.85 X = 3.10 + 1.04 (0.05) = 3.152

CENTRAL LIMIT THEOREM

Central Limit Theorem As the sample size gets large enough…

n↑

the sampling distribution becomes almost normal regardless of shape of population

x

Sample Mean Sampling Distribution: If the Population is not Normal

How Large is Large Enough? • For most distributions, n > 30 will give a sampling distribution that is nearly normal • For fairly symmetric distributions, n > 15 will usually give a sampling distribution is almost normal • For normal population distributions, the sampling distribution of the mean is always normally distributed

POPULATION PROPORTION

Population Proportions π = the proportion of the population having some characteristic Sample proportion ( p ) provides an estimate of π: p

X number of items in the sample having the characteristic of interest  n sample size

• 0≤ p≤1 • p is approximately distributed as a normal distribution when n is large • (assuming sampling with replacement from a finite population or without replacement from an infinite population)

Sampling Distribution of p • Approximated by a normal distribution if:

nπ  5

P( ps)

Sampling Distribution

.3 .2 .1 0

and 0

n(1  π )  5 where

μp  π

and

.2

.4

.6

8

π(1 π ) σp  n

(where π = population proportion)

1

p

Z-Value for Proportions Standardize p to a Z value with the formula: p  Z  σp

p   (1  ) n

Example • Seorang manajer bank lokal menetapkan bahwa 40% dari pelanggannya memiliki lebih dari satu akun rekening. • Jika anda memilih sampel acak 200 pelanggan, karena n = 200(0.40) = 80 ≥ 5 dan n(1 – ) = 200(0.60) = 120 ≥ 5, maka ukuran sampel cukup besar untuk bisa diasumsikan mendekati distribusi normal • Hitunglah peluang proporsi sampel pelanggan yang memiliki akun rekening lebih dari satu kurang dari 0.30.

Example Z

p 

 (1   ) n



0.30  0.40 (0.40)(0.60) 200



 0.10 0.24 200

 2.89

P(Z<-2.89) = 0.0019

Jika proporsi populasi 0.40, hanya 0.19% dari sampel (n=200) akan memiliki proporsi sampel kurang dari 0.3

EXERCISE

Exercise 4 Sebuah badan survey independen melakukan hitung cepat hasil pemilu. Misalkan terdapat dua kandidat pemilu, jika salah satu kandidat mendapat paling tidak 55% suara dari sampel, kandidat tersebut akan diprediksi sebagai pemenang pemilu. Jika anda memilih sampel acak 100 pemilih, berapakah peluang seorang kandidat akan diprediksi menjadi pemenang jika a. Persentase populasi pemilihnya sebesar 50.1%? b. Persentase populasi pemilihnya sebesar 60%? c. Persentase populasi pemilihnya sebesar 49% (dan dia sebenarnya kalah pemilu)? d. Jika ukuran sampelnya dinaikan menjadi 400, bagaimanakah jawaban poin (a) hingga (c)? Diskusikan!

Exercise 5 Pada survei terbaru pada pekerja wanita penuh waktu usia 22 hingga 35 tahun, 46% mengatakan bahwa lebih baik gaji mereka dikurangi demi mendapatkan lebih banyak waktu luang. (Data didapatkan dari “I’d Rather Give Up,” USA Today, 4 Maret 2010, hal. 1B.) Misalkan anda memilih sampel 100 pekerja wanita penuh waktu berusia 22 hingga 35 tahun. a. Berapakah peluang bahwa didalam sampel, kurang dari 50% sampel lebih memilih gaji mereka dikurangi demi waktu luang yang lebih banyak? b. Berapakah peluang bahwa didalam sampel, terdapat di antara 40% dan 50% sampel lebih memilih gaji mereka dikurangi demi waktu luang yang lebih banyak? c. Berapakah peluang bahwa didalam sampel, lebih dari 40% sampel lebih memilih gaji mereka dikurangi demi waktu luang yang lebih banyak? d. Jika jumlah sampel menjadi 400 orang, bagaimanakah perubahan jawaban poin (a) hingga (c)?

ANSWER

Exercise 4 a.  = 0.501  (1   ) 0.501(1  0.501) P    0.05 n 100 P(p>0.55) = P(Z>0.98) = 1.0 – 0.8365 = 0.1635 b.  = 0.60  (1   ) 0.6(1  0.6) P    0.04899 n 100 P(p>0.55) = P(Z>-1.021) = 1.0 – 0.1539 = 0.8461

Exercise 4 c.  = 0.49

 (1   )

0.49(1  0.49) P    0.05 n 100 P(p>0.55) = P(Z>1.28) = 1.0 – 0.8849 = 0.1151 d.  = 0.60  (1   ) 0.6(1  0.6) P    0.04899 n 100 P(p>0.55) = P(Z>-1.021) = 1.0 – 0.1539 = 0.8461

THANK YOU

Statistik Bisnis 1. Week 11 Sampling and Sampling Distribution

Recommend Documents