Written by: STATISTICA 3/9/2010 12:44 PM
A customer asked recently how to create a random subset. And I thought this would be a good topic for a blog.
Let us pretend...
We want to develop a credit scoring model that can be used to determine if a new applicant is a good credit risk or a bad credit risk. But I want to use a random subset of data.
Start by opening STATISTICA's example dataset, CreditScore.sta. It has 1000 rows of data.
You don't know where the example datasets are located? Select the Open Example menu under the File menu (or Home tab / Open). See the Datasets folder? Select it and browse for CreditScore.sta.
Select the Data menu or Data tab. If you are using the classic menus, then look for Random Sampling menu.If you are using the Ribbonbar, then look for Sampling on the far right.
On the Simple Sampling tab, select the Exact checkbox. Type 25 in to the Approximate % field. Click OK.
You now have a random subset with 250 rows of data.
1 comment(s) so far...
Re: How To Create Random Subset of Your Data Instead of simple sampling, you may want to do stratified sampling (www.statsoft.com/textbook/statistics-glossary/s.aspx?button=s#Stratified Random Sampling). This particular dataset has 300 bad ratings and 700 good ratings. If you do 25% of a stratified random sampling, then you get 75 bad and 175 good ratings in the sample.
Re: How To Create Random Subset of Your Data
Instead of simple sampling, you may want to do stratified sampling (www.statsoft.com/textbook/statistics-glossary/s.aspx?button=s#Stratified Random Sampling). This particular dataset has 300 bad ratings and 700 good ratings. If you do 25% of a stratified random sampling, then you get 75 bad and 175 good ratings in the sample.