* Recommended materials here: * http://dss.princeton.edu/training/ * http://libguides.princeton.edu/dss * Check working directory pwd * Set working directory cd h: * Creating log file log using nov29.log * You can use Stata to download any file from the web, * Download the file: http://dss.princeton.edu/training/students.xls * Make sure to check your working directory, type copy http://dss.princeton.edu/training/students.xls students.xls copy http://dss.princeton.edu/training/students.csv students.csv * From Excel to Stata * Stata 12+ can read excel files directly import excel "H:\Students.xls", sheet("Full") firstrow clear /*Full Excel sheet*/ import excel "H:\public_html\Stata\Students.xls", sheet("Full") cellrange(A1:J18) firstrow clear /*Part of the Excel sheet*/ * All Stata versions can read comma/tab-delimited files (*.csv or *txt) * Several options: copy-and-paste, see here * http://dss.princeton.edu/training/StataTutorial.pdf#page=17 * Using 'insheet' insheet using "H:\students.csv", clear browse describe summarize * Crosstabulations tab Major Gender, col row * Read data from an online resource use http://statistics.ats.ucla.edu/stat/stata/examples/ara/prestige.dta, clear * Exploring the data browse describe summarize *Renaming variables rename educat education rename percwomn women rename occ_code census * Recoding variables recode occ_type(2=1 "bc")(4=2 "wc")(3=3 "prof")(else=.), gen(type) label(type) * Adding a label to a variable label variable type "Type of occupation" * Droping a variable drop occ_type * Replacing a selected value in a variable replace type=3 if occtitle=="PILOTS" * Generating a variable from other variables gen lnincome = ln(income) *Creating a variable from the sum of two variables gen educincome = educat + income /*For ilustration purposes*/ *Creating a variable from the difference of two variables gen educincome_dif = educat - income /*For ilustration purposes*/ * see here for other options on creating variables: * http://dss.princeton.edu/training/StataTutorial.pdf#page=35 * Frequencies and descriptive statistics tab type table type, contents(freq mean education mean income mean women mean prestige) * Descriptive statistics tabstat education income women prestige, statistic(mean median sd var count range min max) * Descriptive statistics by group tabstat prestige education income women, statistic(mean median sd var count range min max) by(type) * Correlation matrix pwcorr prestige education women income lnincome, star(0.05) sig * Scatterplots graph matrix prestige education women income lnincome, half twoway scatter prestige education, || lfit prestige education twoway scatter prestige education, mlabel(occtitle) || lfit prestige education, yline(60) xline(12) * Running a linear regression regress prestige education income i.type * To interpret regression output see here * http://dss.princeton.edu/training/Regression101.pdf#page=6 * Predicting variables predict yhat1 /* Predicting y*/ * Estimating residuals predict res1, resid /* Getting the residuals*/ * Leverage, Cook's distance and studentized residuals predict hat1, hat /*Leverage: measures the potential leverage of Yi on all the fitted values. Pull the line towards them*/ predict stud1, rstudent /* Studentized residuals, values larger than 2 in absolute value may be problematic*/ predict cook1, cooksd /* Cook’s distance, refers to values influencing the overall model */ * Bubble-plot to identify regression outliers sum hat1 local lowx = r(mean)-r(sd) local hix = r(mean)+r(sd) twoway scatter stud1 hat1 [aw= cook1], msymbol(oh) yline(2) yline(-2) xline(`lowx') xline(`hix') || /// scatter stud1 hat1 if stud1 >2 | hat1 >`hix', mlabels(occtitle) msymbol(i) /// title("Studentized residuals, Hat values and Cook's Distance") * Linear regression excluding selected cases regress prestige education income i.type if occtitle!="MEDICAL_TECHNICIANS" & occtitle!="ELECTRONIC_WORKERS" & occtitle!="GENERAL_MANAGERS" predict yhat1a twoway lfit prestige yhat1 || lfit prestige yhat1a * Linear fit after regression regress prestige education income i.type avplot education avplot /*for all predictors* * Creating nice regression tables using outreg2, see here * http://dss.princeton.edu/training/Regression101.pdf#page=33 * http://dss.princeton.edu/training/Regression101.pdf#page=34