Saturday, September 21, 2013

Guns are cool

Reddit has a subreddit Guns are cool which in turn contains the MASS SHOOTINGS IN 2013 page. I pulled those data to see if there was anything of interest to plot or examine. I ended with modelling the frequency of shootings (once a day, Poisson distribution), and number of victims per shootings, a mixture of distributions, 2% of shootings are from a Poisson distribution with Lambda=15.

Data

Data are 264 records, spanning from January 1st to September 19th at the moment or writing. Rule for entry in this table: 'The only requirement is that four or more people are shot in a spree, likely without a cooling off period. This may include the gunman himself (because they often suicide by cop or use a gun to kill themselves to escape punishment), or police shootings of civilians around the gunman. The reasoning behind the latter being that if the shooter is arrested, he will often be charged with injuring people the police actually shot, as that is a foreseeable result of a shooting spree.' The data are not entered with the objective of machine processing, for instance records up to 6 show there are sometimes number of death, sometimes injured, sometimes both, in either order. The final entries on a row are location and state, the first are number and date, between it is messy, with random usage of commas and random number of entries between them.
Number 1: 1/1/13, Julian Sims, 4 injured, McKeesport, PA
Number 2: 1/1/13, Unknown, 1 dead 3 injured, Hawthorne, CA
Number 3: 1/1/13, Desmen Noble, Damian Bell, 1 dead 4 injured, Lorain, OH
Number 4: 1/5/13, Sonny Archuleta, 4 dead, Aurora, CO
Number 5: 1/7/13, Sandra Palmer, 2 dead 2 injured, Greensboro, NC
Number 6: 1/7/13, Herbert Bland Jr., 23, 3 dead 1 injured, Dinwiddie, VA
They have been processed using the following assumptions and steps:
  • The number before dead is the number of dead, same for injured. 
  • Final field is state. I know of one exception (Number 117: 5/26/13, Esteban J. Smith, 3 dead 5 injured, Eden, TX & Jacksonville, NC), but since I did not analyzed states I left it as processed.
  • US date format (MM/DD/YY)
  • anything between dead / injured and state is location.
  • anything between date and dead / injured can be ignored.
  • record 249 is incomplete: it misses '4 injured, Marion, NC'.
  • Washington DC, I know DC is not a state, but it sure made my processing easy to handle it as such.
  • Puerto Rico is removed from the analysis

Number of shootings in a day

The number of shootings very nicely follows a Poisson distribution with Lambda =1 (sd=0.06). The figure below shows the data (histogram) with fitted values (line).

Number of victims per shooting

For this analysis I will take the number of victims as number of dead plus number of injured. It is an interesting analysis, first of all data below 4 not registered. This lead me to choosing JAGS rather than fitdistr() to get an estimate. It is also appears a single Poisson distribution does not describe the data well. In the end a mixture of four distributions was used. The table below shows a small part (2%) of the shootings to come from a Poisson distribution with Lambda = 15. Combined with the number of shootings being 1 per day, this gives 3 to 4 really bad shootings per year. 
     Lambda      SD     Proportion     SD
0.87 0.36 0.55 0.16
3.1  1.2  0.34 0.15
7.1  2.7  0.09 0.08
15.4  2.4  0.02 0.01
The plot shows the data in a histogram, Poisson fit (blue) and four component mixture Poisson distribution (purple). The fit might be improved upon, but four distributions is pushing it with this limited range in data.

R-scripts

r1 <- readLines(textConnection('
Number 1: 1/1/13, Julian Sims, 4 injured, McKeesport, PA
Number 2: 1/1/13, Unknown, 1 dead 3 injured, Hawthorne, CA
Number 3: 1/1/13, Desmen Noble, Damian Bell, 1 dead 4 injured, Lorain, OH
Number 4: 1/5/13, Sonny Archuleta, 4 dead, Aurora, CO
Number 5: 1/7/13, Sandra Palmer, 2 dead 2 injured, Greensboro, NC
Number 6: 1/7/13, Herbert Bland Jr., 23, 3 dead 1 injured, Dinwiddie, VA
Number 7: 1/7/13, Cedric and James Poore (brothers), 4 dead, Tulsa, OK
Number 8: 1/11/13, Unknown, 4 dead, Sacramento, CA
Number 9: 1/19/13, Nehemiah Griego, 5 dead, Albuquerque, NM
Number 10: 1/21/13, Unknown, 4 injured, Brentwood, CA
Number 11: 1/25/13, Unknown, 1 dead 4 injured, St. Louis, MO
Number 12: 1/26/13, Unknown, 5 injured, Washington DC
Number 13: 1/27/13, Wilbert Thibodeaux, 1 dead 3 injured, Charenton, LA
Number 14: 1/30/13, Douglas Harmon, 3 dead 1 injured, Phoenix, AZ
Number 15: 2/1/13, Unknown, 1 dead 3 injured, Oakland, CA
Number 16: 2/2/13, Sundra Payne, 5 injured, Memphis, TN
Number 17: 2/3/13, Kong Meng Vue, Ryan Cha, Ken Cha, 1 dead 3 injured, Olivehurst, CA
Number 18: 2/3/13, Christopher Dorner, 5 dead 4 injured, Began in Riverside, CA
Number 19: 2/3/13, Jackie Spears & Kelcie Stewart, 2 dead 2 injured, Memphis, TN
Number 20: 2/6/13, Mayra Perez, 3 dead 1 injured, Denver, CO
Number 21: 2/7/13, Unknown, 4 injured, Chicago, IL
Number 22: 2/9/13, Malcolm Hall, Brandon Brown, Deron Bridgewater, 4 injured, New Orleans, LA
Number 23: 2/11/13, Nhan Lap Tran, 1 dead 4 injured, Oakdale, MN
Number 24: 2/11/13, Unknown, 1 dead 4 injured, Vallejo, CA
Number 25: 2/11/13, Thomas Matusiewicz, 3 dead 2 injured, Wilmington, DE
Number 26: 2/12/13, Two suspects, David Fresques in custody and Davis Fotu at large, 3 dead 1 injured, Suburban Salt Lake City(Midvale), UT
Number 27: 2/13/13, Joseph Matteson, 2 dead 2 injured, Red Springs, NC
Number 28: 2/16/13, Unknown, 1 dead 3 injured, Winston-Salem, NC
Number 29: 2/19/13, Ali Sayed, 4 dead 3 injured, Tustin, CA
Number 30: 2/21/13, Carlos Zuniga, 2 dead 2 injured, Miami, FL
Number 31: 2/21/13, Mark Hopkins, 1 dead 3 injured, Tulsa, OK
Number 32: 2/22/13, Unknown, 4 injured, Grand Rapids, MI
Number 33: 2/23/13, Unknown, 4 injured, Lancaster, CA
Number 34: 2/23/13, Unknown, 4 injured, Oakland, CA
Number 35: 2/24/13, Unknown, 8 injured, Macon, GA
Number 36: 3/2/13, Unknown, 1 dead 3 injured, Shreveport, LA
Number 37: 3/3/13, Unknown, 6 injured, Houston, TX
Number 38: 3/3/13, Unknown, 1 dead 3 injured, Moultrie, GA
Number 39: 3/4/13, Unknown, 4 injured, Saginaw, MI
Number 40: 3/4/13, Unknown, 1 dead 3 injured, Los Banos, CA
Number 41: 3/5/13, Unknown, 1 dead 3 injured, Indianapolis, IN
Number 42: 3/5/13, Unknown, 4 injured, Fuquay-Varina, NC
Number 43: 3/7/13, Joshua Hurst, 2 dead 2 injured, Jackson, MS
Number 44: 3/8/13, Unknown, 1 dead 3 injured, Buffalo, NY
Number 45: 3/10/13, Taleb Salameh, 3 dead 1 injured, North Liberty, IA
Number 46: 3/11/13, Taleb Hussein Yousef Salameh, 1 dead 3 injured, Iowa City, IA
Number 47: 3/11/13, Andrew Davon Allen and Craig Willson, 13 injured, Washington, DC
Number 48: 3/13/13, Unknown, 2 dead 2 injured, San Diego, CA
Number 49: 3/13/13, Kurt Myers, 5 dead 2 injured, Herkimer, NY
Number 50: 3/14/13, Unknown, 4 injured, Modesto, CA
Number 51: 3/15/13, Angelica Vazquez, 4 dead, Mesquite, TX
Number 52: 3/16/13, Unknown, 7 injured, Galt, CA
Number 53: 3/17/13, Unknown, 5 injured, Belle Glade, FL
Number 54: 3/17/13, Unknown, 2 dead 4 injured, Stockton, CA
Number 55: 3/20/13, Brandon Menefee, 3 dead 1 injured, Jefferson County, AL
Number 56: 3/21/13, Unknown, 7 injured, Chicago, IL
Number 57: 3/21/13, Unknown, 1 dead 4 injured, Kansas City, MO
Number 58: 3/22/13, Unknown, 1 dead 3 injured, New York, NY
Number 59: 3/26/13, Unknown, 4 injured, St. Petersberg, FL
Number 60: 3/31/13, Unknown, 2 dead 3 injured, Merced County, CA
Number 61: 3/31/13, Unknown, 3 dead 1 injured, Auburn, WA
Number 62: 4/2/13, Unknown, 4 dead 1 injured, San Juan, Puerto Rico
Number 63: 4/6/13, Unknown, 1 dead 4 injured, Greenwood, SC
Number 64: 4/6/13, Unknown spree shooter, 4 injured, Philadelphia, PA
Number 65: 4/7/13, Unknown, 1 dead 3 injured, Long Beach, CA
Number 66: 4/7/13, 1 in custody, name withheld, 1 dead 3 injured, Manhatten, Kansas
Number 67: 4/9/13, Unknown, 1 dead 3 injured, Philadelphia, PA
Number 68: 4/10/13, Unknown, 4 injured, Vallejo, CA
Number 69: 4/13/13, Trenton Ore, Joven Covington, 5 injured, Norfolk, VA
Number 70: 4/13/13, Unknown, 4 injured, Merced, CA
Number 71: 4/14/13, Unknown, 1 dead 4 injured, Lexington, KY
Number 72: 4/14/13, Unknown, 2 dead 4 injured, Phoenix, AZ
Number 73: 4/18/13, Unknown, 4 dead, Akron, OH
Number 74: 4/20/13, Unknown, 2 dead 2 injured, Modesto, CA
Number 75: 4/20/13, Christine Squire, 2 dead 2 injured, Richmond, VA
Number 76: 4/21/13, Unknown, 5 dead 2 injured, Federal Way, WA
Number 77: 4/22/13, Davante Robertson, Charlie A. Gumms, Frankie Hookman, Jr and Lashawn Davis, 5 injured, Havey, Louisiana
Number 78: 4/22/13, Unknown, 4 injured, Englewood, Illinois
Number 79: 4/24/13, Rick Odell Smith, 6 dead 1 injured, Manchester, IL
Number 80: 4/25/13, Sean M. Woodings, 4 injured, Oberlin, Ohio
Number 81: 4/27/13, Unknown, 1 dead 4 injured, Williston, FL
Number 82: 4/28/13, Unknown, 5 injured, Charlotte, NC
Number 83: 4/28/13, Neville Lynch, Mark Rose, 4 injured, Lauderdale Lakes, FL
Number 84: 4/28/13, Unknown, 2 dead 2 injured, Jackson, Tennessee
Number 85: 4/28/13, Unknown drive-by, 1 dead 3 injured, Chester, PA
Number 86: 5/2/13, Unknown, 5 injured, Newark, NJ
Number 87: 5/4/13, Unknown drive-by, 4 dead 6 injured, Aguas Buenas, PR
Number 88: 5/4/13, Krystal Olymica David, 4 injured, Smithfield, NC
Number 89: 5/5/13, Unknown, 1 dead 3 injured, Oakland, CA
Number 90: 5/5/13, Unknown drive-by, 6 injured, E. Palo Alto, CA
Number 91: 5/6/13, Two boys, 10, 11, 4 injured, Ocean Township, NJ
Number 92: 5/6/13, Unknown, 4 injured, Johnstown, PA
Number 93: 5/8/13, Ralph Robert Warren III, 4 dead, Hendersonville, NC
Number 94: 5/10/13, Unknown, 3 dead 1 injured, Harbor Gateway (Los Angeles), CA
Number 95: 5/11/13, Unknown, 4 injured, East Germantown, PA
Number 96: 5/11/13, Samuel E. Sallee, 4 dead, Waynesville, IN
Number 97: 5/12/13, Unknown, 5 injured, Newark, NJ
Number 98: 5/12/13, Unknown, 4 injured, Hollister, CA
Number 99: 5/12/13, Unknown, 19 injured, New Orleans, LA
Number 100: 5/12/13, Unknown, 4 injured, Apache Junction, AZ
Number 101: 5/12/13, Unknown, 4 injured, Jersey City, NJ
Number 102: 5/13/13, Unknown, 5 injured, Winton Hills, OH
Number 103: 5/15/13, Arrest made 5/21, 1 dead 4 injured, Detroit, MI
Number 104: 5/15/13, Jeremiah Bean, 5 dead, Fernley, NV
Number 105: 5/16/13, Unknown, 4 injured, Philladelphia, PA
Number 106: 5/18/13, Unknown, 3 dead 3 injured, Las Piedras, PR
Number 107: 5/19/13, Unknown, 4 injured, Detroit, MI
Number 108: 5/19/13, Codarrell Lee Yates, 4 injured, Lunenburg, VA
Number 109: 5/19/13, Unknown, 2 dead 2 injured, South Memphis, TN
Number 110: 5/20/13, Unknown, 4 injured, Chicago, IL
Number 111: 5/21/13, Unknown, 4 injured, Madison County, AL
Number 112: 5/23/13, Jason Brian Holt, 2 dead 2 injured, Knoxville, TN
Number 113: 5/23/13, Evellis T. McGee and Karon D. Thomas, 1 dead 3 injured, Saginaw, MI
Number 114: 5/24/13, Julio Jesus Romero, 2 dead 2 injured, Bakersfield, CA
Number 115: 5/25/13, Antonio King Green, 1 dead 3 injured, Flint, MI
Number 116: 5/25/13, Ryan Taybron, 15, and Eric Nixon, 17, 1 dead 4 injured, Hampton, VA
Number 117: 5/26/13, Esteban J. Smith, 3 dead 5 injured, Eden, TX & Jacksonville, NC
Number 118: 5/28/13, Unknown, 1 dead 3 injured, Memphis, TN
Number 119: 5/28/13, Unknown, 5 dead, Sells, AZ
Number 120: 5/29/13, Unknown, 4 injured, Chicago (Bronzeville), IL
Number 121: 5/31/13, Unknown, 4 injured, Atlanta, GA
Number 122: 6/1/13, Unknown, 2 dead 2 injured, Vallejo, CA
Number 123: 6/1/13, Unknown, 4 injured, Milwaukee, WI
Number 124: 6/2/13, Unknown, 4 injured, Indianapolis, IN
Number 125: 6/2/13, Unknown, 4 injured, Virginia Beach, VA
Number 126: 6/2/13, Unknown, 5 injured, Roanoke, VA
Number 127: 6/2/13, Xavier Edmondson and Lewis Antonio, 7 injured, LaGrange, GA
Number 128: 6/3/13, Manuel Mata III, 2 dead 2 injured, Las Vegas, NV
Number 129: 6/4/13, Johnny Simpson, 2 dead 2 injured, Shoreview, MN
Number 130: 6/5/13, St. Charles resident, 10 injured, Elburn, IL
Number 131: 6/7/13, John Zawahri, 5 dead 5 injured, Santa Monica, CA
Number 132: 6/9/13, Unknown, 4 injured, York City, PA
Number 133: 6/10/13, Unknown, 1 dead 5 injured, Chicago, IL
Number 134: 6/10/13, Davonta Coleman, 6 injured, St. Louis, MO
Number 135: 6/11/13, David Andrus, 4 dead, Darien, IL
Number 136: 6/12/13, Ahmed Dirir, 4 dead, St. Louis, MO
Number 137: 6/14/13, Unknown, 4 injured, High Point, NC
Number 138: 6/15/13, Earnest Woodley, 4 injured, Nashville, TN
Number 139: 6/15/13, Unknown, 1 dead 3 injured, Houston, TX
Number 140: 6/16/13, Unknown, 1 dead 3 injured, Chicago, IL
Number 141: 6/18/13, Unknown, 1 dead 3 injured, Berkeley, MO
Number 142: 6/19/13, Gary W. Stewart Jr., 3 dead 1 injured, Louisville, KY
Number 143: 6/21/13, Darren Lamont Roberts and Kyle Edward Thornton, 6 injured, Norfolk, VA
Number 144: 6/21/13, Unknown, 1 dead 3 injured, (Kelvyn Park) Chicago, IL
Number 145: 6/22/13, Unknown, 1 dead 4 injured, Baltimore, MD
Number 146: 6/22/13, Kamal “Rico” Edge, 1 dead 4 injured, Paterson, NJ
Number 147: 6/22/13, Unknown, 4 injured, Providence, RI
Number 148: 6/22/13, Lakim Anthony Faust, 5 injured, Greenville, NC
Number 149: 6/23/13, Unknown, 4 injured, Gentilly, LA
Number 150: 6/23/13, Unknown, 1 dead 3 injured, Chattanooga, TN
Number 151: 6/23/13, Unknown, 1 dead 4 injured, Virginia Beach, VA
Number 152: 6/23/13, Unknown, 1 dead 8 injured, Kansas City, MO
Number 153: 6/23/13, Elijah Rodgers, 1 dead 3 injured, Sacramento, CA
Number 154: 6/24/13, Unknown, 1 dead 3 injured, Kansas City, MO
Number 155: 6/25/13, Unknown, 1 dead 4 injured, Chicago, IL
Number 156: 6/27/13, Manuel Talamantez, Christina Martinez, 2 dead 2 injured, Three Rivers, CA
Number 157: 6/27/13, Tierra Fallin, 1 dead 3 injured, Baltimore, MD
Number 158: 6/28/13, Unknown, 6 injured, Chicago, IL
Number 159: 6/29/13, Ronald Reid, Barry Stinson, 3 dead 1 injured, North Charleston, SC
Number 160: 6/29/13, Unknown, 4 injured, Aurora, CO
Number 161: 6/30/13, Unknown, 9 injured, Brooklyn, NY
Number 162: 7/1/13, Amos Wells, 4 dead, Fort Worth, TX
Number 163: 7/2/13, Unknown, 1 dead 3 injured, Montgomery, AL
Number 164: 7/3/13, Unknown, 1 dead 3 injured, Calais, Maine
Number 165: 7/4/13, Robert Marion Naylor, 1 dead 6 injured, Pontiac, MI
Number 166: 7/4/13, Unknown, 4 injured, Woodlawn, IL
Number 167: 7/5/13, Unknown, 1 dead 3 injured, Macon, GA
Number 168: 7/6/13, Brandon Reese, 1 dead 3 injured, Brooklyn, NY
Number 169: 7/6/13, Unknown, 1 dead 7 injured, Chicago, IL
Number 170: 7/6/13, Chauncy Laray Mitchell, 4 injured, Florence, AL
Number 171: 7/6/13, Unknown, 1 dead 3 injured, Los Angeles, CA
Number 172: 7/7/13, Unknown, 1 dead 4 injured, Stockton, CA
Number 173: 7/7/13, Unknown, 1 dead 3 injured, Pompano Beach, FL
Number 174: 7/7/13, Unknown, 4 injured, Meridian, Miss.
Number 175: 7/7/13, Unknown, 1 dead 5 injured, Chicago, IL
Number 176: 7/9/13, Lamont Jones, 4 injured, Balimore, MD
Number 177: 7/9/13, Unknown, 2 dead 2 injured, Rockford, IL
Number 178: 7/10/13, Konrad Schafer, 15, and David Damus, 20, 2 dead 12 injured, Kissimmee, FL
Number 179: 7/11/13, Unknown, 1 dead 3 injured, Charlotte, NC
Number 180: 7/12/13, Unknown, 2 dead 2 injured, Greensburg, KY
Number 181: 7/12/13, Unknown, 1 dead 3 injured, San Francicsco, CA
Number 182: 7/12/13, Unknown, 1 dead 4 injured, Hamilton Township, NJ
Number 183: 7/13/13, Unknown, 4 injured, Washington DC
Number 184: 7/13/13, Unknown, 1 dead 3 injured, Oklahoma City, OK
Number 185: 7/13/13, Unknown, 5 dead 1 injured, New Orleans, LA
Number 186: 7/14/13, Unknown, 1 dead 4 injured, Wichita, KS
Number 187: 7/14/13, Suspect in custody., 5 injured, Kentwood, MI
Number 188: 7/14/13, Phillip Brierly Jr., 2 dead 2 injured, Lexington, SC
Number 189: 7/17/13, Unknown, 3 dead 1 injured, Oakland, CA
Number 190: 7/19/13, Unknown, 4 injured, Madera, CA
Number 191: 7/19/13, Unknown, 4 injured, Hartford, CT
Number 192: 7/20/13, Devin Spann, 5 injured, Campbell, OH
Number 193: 7/21/13, Unknown, 4 injured, Brooklyn (Crown Heights), NY
Number 194: 7/21/13, Unknown, 5 injured, Brooklyn (Bushwick), NY
Number 195: 7/21/13, Unknown, 2 dead 2 injured, Fort Pierce, FL
Number 196: 7/21/13, Terrence Lynom, Angelo Clark, 1 dead 3 injured, Chicago, IL
Number 197: 7/24/13, Unknown, 4 injured, Topeka, KS
Number 198: 7/25/13, Unknown, 4 injured, Inkster, MI
Number 199: 7/26/13, Sidney A. Muller, 4 dead, Clarksburg, WV
Number 200: 7/26/13, Unknown, 7 dead, Hialea, FL
Number 201: 7/27/13, Unknown, 1 dead 3 injured, Farmington, NM
Number 202: 7/28/13, Unknown, 4 injured, Granger, WA
Number 203: 7/28/13, Unknown, 1 dead 3 injured, Irvington, NJ
Number 204: 7/28/13, Unknown, 5 injured, Dallas, TX
Number 205: 7/30/13, Parrish Chee, 4 injured, Lea County (Hobbs), NM
Number 206: 8/2/13, Tristan Crayton, 4 injured, Broad Ripple, IN
Number 207: 8/2/13, Unknown, 4 dead, Whitesburg, KY
Number 208: 8/2/13, Unknown, 2 dead 3 injured, Newark, NJ
Number 209: 8/3/13, Unknown, 5 injured, Detriot, MI
Number 210: 8/4/13, Giovanni Pacheco, 3 dead 4 injured, Salinas, CA
Number 211: 8/4/13, Unknown, 1 dead 3 injured, Kansas City, Mo
Number 212: 8/5/13, Rockne Newell, 3 dead 4 injured, Ross Township, PA
Number 213: 8/5/13, Suspect known., 4 injured, Montclair, NJ
Number 214: 8/7/13, Erbie Bowser, 4 dead 4 injured, Dallas, TX
Number 215: 8/11/13, Nikko Jenkins, 4 dead, Omaha, NE
Number 216: 8/11/13, Unknown, 1 dead 3 injured, Flatbush, NY
Number 217: 8/11/13, Unknown, 1 dead 4 injured, Portsmouth, VA
Number 218: 8/11/13, Unknown, 4 injured, Wilmington, Del.
Number 219: 8/11/13, Unknown, 2 dead 2 injured, St. Louis, MO
Number 220: 8/12/13, Unknown, 4 injured, Cincinnati, OH
Number 221: 8/16/13, Unknown, 4 injured, Kingsessing, PA
Number 222: 8/17/13, Demetrius Ward, 4 injured, Oakland, CA
Number 223: 8/18/13, Unknown, 1 dead 3 injured, Chicago, IL
Number 224: 8/18/13, Darryl Sain, 4 injured, St. Louis, MO
Number 225: 8/18/13, Dontrel Shyhee Blakeney, 17, 1 dead 4 injured, Chesterfield County, SC
Number 226: 8/18/13, Unknown, 1 dead 3 injured, Rochester, NY
Number 227: 8/18/13, Unknown, 4 injured, Port Norris, NJ
Number 228: 8/18/13, Unknown, 4 injured, Toledo, OH
Number 229: 8/19/13, Suspects in custody., 1 dead 4 injured, Chicago, IL
Number 230: 8/20/13, Unknown, 4 injured, Vanceboro, NC
Number 231: 8/20/13, Daniel Livingston Green, 4 dead, Oklahoma City, OK
Number 232: 8/20/13, Melville Mason, 2 dead 2 injured, Baltimore, MD
Number 233: 8/22/13, Jim Edwards, 2 dead 2 injured, Shaler, PA
Number 234: 8/23/13, Unknown, 1 dead 6 injured, Baltimore, MD
Number 235: 8/24/13, Unknown, 4 injured, Indianapolis, IN
Number 236: 8/25/13, Hubert Allen, Jr., 3 dead 2 injured, Lake Butler, FL
Number 237: 8/25/13, Unknown, 2 dead 2 injured, Minneapolis, MN
Number 238: 8/25/13, Unknown, 1 dead 4 injured, Dillon County, SC
Number 239: 8/25/13, Unknown, 4 injured, Oakland, CA
Number 240: 8/28/13, Devonere Simmonds, 3 dead 2 injured, Columbus, OH
Number 241: 9/5/13, Unknown, 4 injured, Charlotte, NC
Number 242: 9/7/13, Unknown, 2 dead 4 injured, Gary, IN
Number 243: 9/7/13, Darnell Hollings, 1 dead 3 injured, St. Louis, MO
Number 244: 9/10/13, Roderick Rodgers, 1 dead 4 injured, Bridgeport, CT
Number 245: 9/11/13, Unknown, 1 dead 3 injured, Washington DC
Number 246: 9/11/13, Unknown, 4 injured, New York, NY
Number 247: 9/12/13, Jacob Bennett, Brittany Moser, 4 dead, Renegade Mountain, TN
Number 248: 9/13/13, Unknown, 4 injured, Centreville, IL
Number 249: 9/14/13, Earl Spencer Boyce
Number 250: 9/15/13, Unknown, 6 injured, Yakima, WA
Number 251: 9/15/13, Unknown, 5 injured, Colorado Springs, CO
Number 252: 9/15/13, Unknown, 7 injured, Fresno, CA
Number 253: 9/15/13, Unknown, 2 dead 2 injured, Bakersfield, CA
Number 254: 9/15/13, Robert E. Bell, 3 dead 1 injured, Snellville, GA
Number 255: 9/15/13, Unknown, 4 injured, Wilmington, DE
Number 256: 9/16/13, Aaron Alexis, 13 dead 8 injured, Washington DC
Number 257: 9/17/13, Unknown, 1 dead 4 injured, Stockton, CA
Number 258: 9/17/13, Unknown, 4 injured, Lansing, MI
Number 259: 9/17/13, Unknown, 4 injured, Las Vegas, NV
Number 260: 9/17/13, Unknown, 4 dead, West Broward, FL
Number 261: 9/18/13, Unknown, 1 dead 4 injured, Whitehaven, TN
Number 262: 9/18/13, Unknown, 5 injured, Kissimmee, FL
Number 263: 9/18/13, Unknown, 1 dead 4 injured, Bakersfield, CA
Number 264: 9/19/13, Unknown, 13 injured, Chicago, IL
'),)
#empty first & last line
r1[r1=='Number 249: 9/14/13, Earl Spencer Boyce'] <-
    'Number 249: 9/14/13, Earl Spencer Boyce, 4 injured, Marion, NC'
r2 <- gsub('Number ','',r1[r1!=''])
mydf <- data.frame(Number=as.numeric(gsub(':.*$','',r2)))
r3 <- gsub('^.*: ','',r2)
mydf$Date <- as.Date(gsub('/13, .*$','/13',r3),format='%m/%d/%y')
r4 <- gsub('^.*/13, ', '',r3)
r4 <- gsub('dead ','dead, ',r4)
r4 <- gsub('injured ','injured, ',r4)
r4 <- gsub('Washington DC','Washington, DC',r4)
r5 <- strsplit(r4,',')
r4[sapply(r5,length)>5]
grep('TX & ',r1,value=TRUE)
#mydf$nfield <- sapply(r5,length)

mydf$state <- unlist(sapply(r5,function(x) 
          gsub(' ','',x[length(x)])))
mydf$dead <- sapply(r5,function(x) 
      as.numeric(gsub('dead','',grep('dead',x,value=TRUE)[1])))
mydf$injured <- sapply(r5,function(x) 
      as.numeric(gsub('injured','',grep('injured',x,value=TRUE)[1])))

mydf$location <- sapply(r5,function(x) {
      end <- length(x)-1
      begin <- max(grep('injured|dead',x))+1
      if (begin==-Inf) print(x) 
      substring(paste(x[begin:end]),2)[1]
    })
mydf$dead[is.na(mydf$dead)] <- 0
mydf$injured[is.na(mydf$injured)] <- 0

mydf$state[mydf$state=='Illinois'] <- 'IL'
mydf$state[mydf$state=='Kansas'] <- 'KS'
mydf$state[mydf$state=='Louisiana'] <- 'LA'
mydf$state[mydf$state=='Tennessee'] <- 'TN'
mydf$state[mydf$state=='Maine'] <- 'ME'
mydf$state[mydf$state=='Ohio'] <- 'OH'
mydf$state[mydf$state=='Miss.'] <- 'MS'
mydf$state[mydf$state=='Del.'] <- 'DE'

mydf <- mydf[mydf$state!='PuertoRico',]

mydf$Victims <- mydf$dead+mydf$injured
plot(dead ~ injured,data=mydf)

# How often in a day?
library(ggplot2)

mydf$fDate <- factor(
    levels=as.character(seq(min(mydf$Date),max(mydf$Date),by=1)),
    as.character(mydf$Date))
fperday <- as.data.frame(table(table(mydf$fDate)))
png('freqinday.png')
g1 <- ggplot(fperday,aes(x=Var1,y=Freq))
g1 + geom_bar(stat='identity') +
    labs(x='Number of Shootings',
        y='Number of days') +
    geom_line(data=
            data.frame(Var1=1:7,
                Freq=dpois(0:6,
                    lambda=MASS::fitdistr(table(mydf$fDate),
                        densfun='Poisson')$estimate)*nrow(mydf)
            ),
        colour='blue')
dev.off()
str(MASS::fitdistr(table(mydf$fDate),
        densfun='Poisson'))
# # of victims per shooting
mydf$fVictims <- factor(levels=4:max(mydf$Victims),mydf$Victims)
freqdf <- as.data.frame(table(mydf$fVictims))
freqdf$Nvic <- as.numeric(as.character(freqdf$Var1))
library(R2jags)
datain <- list(Victims=mydf$Victims,
    N=nrow(mydf))

Fmodel <- function() {
  for (i in 1:N) {
    Victims[i] ~ dpois(lambda) %_% T(4,)
  }
  lambda ~ dunif( 0, 20)  
}

params <- c('lambda')
inits <- function() {
  list(lambda = runif(1,1,3))
}

jagsfit <- jags(datain, model=Fmodel, inits=inits, parameters=params,
    progress.bar="gui")

#jagsfit
freqdf$exp1 <- dpois(freqdf$Nvic,lambda=2.890)*sum(mydf$Victims)
#2.890
freqdf

#g1 <- ggplot(freqdf,aes(x=Nvic,y=Freq))
#g1 + geom_bar(stat='identity') +
#    labs(x='Number of victims',
#        y='Frequency') +
#    geom_line(data=
#            data.frame(Nvic=4:15,
#                Freq=dpois(4:15,
#                    lambda=2.890)*sum(mydf$Victims)
#            ),
#        colour='blue')

#http://doingbayesiandataanalysis.blogspot.nl/2012/06/mixture-of-normal-distributions.html

Fmodel3 <- function() {
  for (i in 1:N) {
    j[i] ~ dcat(pp) 
    Victims[i] ~ dpois(lambdas[j[i]]  )   %_% T(4,)
  }
  lambda[1] ~ dunif( 0, 20)
  lambda[2] ~ dunif(0,20 )
  lambda[3] ~ dunif(0,20 )
  lambda[4] ~ dunif(0,20 )
  lambdas <- sort(lambda)
  p[1] <- 3
  p[2] <- 3
  p[3] <- 1
  p[4] <- 1
  pp[1:4] ~ ddirch( p )
}

params3 <- c('lambdas','pp')
inits3 <- function() {
  list(lambda = c(runif(1,1,3),runif(1,1,3),runif(1,6,8),runif(1,8,20)))
}

jagsfit3 <- jags(datain, model=Fmodel3, inits=inits3, parameters=params3,
    progress.bar="gui",n.iter=10000)
#jagsfit3
#mcmc3 <- as.mcmc(jagsfit3)
#par(ask=TRUE)
#plot(mcmc3)

freqdf$exp4a <- sum(mydf$Victims)*dpois(freqdf$Nvic,lambda=0.87)*(.54)
freqdf$exp4b <- sum(mydf$Victims)*dpois(freqdf$Nvic,lambda=3.097)*(.337)
freqdf$exp4c <- sum(mydf$Victims)*dpois(freqdf$Nvic,lambda=7.101)*(.093)
freqdf$exp4d <- sum(mydf$Victims)*dpois(freqdf$Nvic,lambda=15.396)*(.023)


freqdf$exp4 <- freqdf$exp4a+freqdf$exp4b+freqdf$exp4c+freqdf$exp4d
png('numvictim.png')
g1 <- ggplot(freqdf,aes(x=Nvic,y=Freq))
g1 + geom_bar(stat='identity') +
    labs(x='Number of victims',
        y='Frequency') +
    geom_line(data=data.frame(Nvic=freqdf$Nvic,Freq=freqdf$exp1), colour='blue') +
    geom_line(data=data.frame(Nvic=freqdf$Nvic,Freq=freqdf$exp4),colour='purple') 
dev.off()

Saturday, September 14, 2013

Mixed models; Random Coefficients, part 2

Continuing from random coefficients part 1, it is time for the second part. To quote the SAS/STAT manual 'a random coefficients model with error terms that follow a nested structure'. The additional random variable is monthc, which is a factor containing the months and nested under batch. Hence there is one additional statement in generating the data
rc2$Monthc <- factor(rc2$Month)

lme4

Running the analysis in lme4 is not difficult. However, it is noticed the random effect for month:batch provides an estimation problem. Somewhere between the month fixed effect and monthc:batch random effect lme4 has no room left for the month:batch effect. However, it is still needed later on. During simulation full against restricted model the restricted model lacks the month fixed effect and hence the random month:batch effect is not equal to 0. Hence this term has influence on significance of the fixed term.
lmer1 <- lmer(Y ~ Month +(Month-1|Batch) + (1|Batch:Monthc),data=rc2)
summary(lmer1)
Linear mixed model fit by REML 
Formula: Y ~ Month + (Month - 1 | Batch) + (1 | Batch:Monthc) 
   Data: rc2 
   AIC   BIC logLik deviance REMLdev
 287.3 299.4 -138.6    275.2   277.3
Random effects:
 Groups       Name        Variance   Std.Dev.  
 Batch:Monthc (Intercept) 4.1669e+00 2.0413e+00
 Batch        Month       1.1006e-11 3.3175e-06
 Residual                 7.9670e-01 8.9258e-01
Number of obs: 84, groups: Batch:Monthc, 18; Batch, 3

Fixed effects:
            Estimate Std. Error t value
(Intercept) 102.5558     0.7674  133.64
Month        -0.5002     0.1140   -4.39

Correlation of Fixed Effects:
      (Intr)
Month -0.768

Detail on random effects

It is almost the same solution as PROC MIXED when compensated for continuous month effect. For instance, the last number (0.885355535) is close to 12*0.07600+0.002293= 0.914293 from PROC MIXED.

ranef(lmer1)

$`Batch:Monthc`

      (Intercept)
1:0   0.220538869
1:1  -2.582118495
1:3  -2.319238635
1:6   1.880673690
1:9  -1.244284881
1:12  0.772296027
2:0  -0.005588532
2:1  -2.295803984
2:3   2.905996019
2:6   1.642080584
2:9  -2.103224341
2:12 -3.330296971
3:0   1.964950249
3:1  -0.816514780
3:3   0.520042537
3:6   1.236464739
3:9   2.668672370
3:12  0.885355535

$Batch
          Month
1 -2.779928e-11
2 -6.342200e-09
3  6.369999e-09

Fixed effects

Fixed effects are easy after all these exercises. Again we see a discrepancy between anova and simulation, with simulation (p=0.033) making the month effect just significant and anova making it highly significant.
lmer1b <- lmer(Y ~ 1 +(Month-1|Batch)+ (1|Batch:Monthc) ,data=rc2)
anova(lmer1,lmer1b)
Data: rc2
Models:
lmer1b: Y ~ 1 + (Month - 1 | Batch) + (1 | Batch:Monthc)
lmer1: Y ~ Month + (Month - 1 | Batch) + (1 | Batch:Monthc)
       Df    AIC    BIC  logLik  Chisq Chi Df Pr(>Chisq)   
lmer1b  4 290.91 300.63 -141.45                            
lmer1   5 285.18 297.33 -137.59 7.7251      1   0.005446 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
pboot <- function(m0,m1) {
  s <- simulate(m0)
  L0 <- logLik(refit(m0,s))
  L1 <- logLik(refit(m1,s))
  2*(L1-L0)
}
obsdev <- 2*( logLik(lmer1)-logLik(lmer1b))
#
set.seed(1001)
reps <- replicate(1000,pboot(lmer1b,lmer1))
mean(reps>obsdev)
[1] 0.033

nlme

It is not obvious how to run this model in nlme. It appears (r-help post) that one can make a groupedData object. The random effect in this object gets used in the model, even though not explicitly in the model call. Then there is some trickery in using pdBlocked. However, the output is very straightforward. The month effect is highly significant.
rc3 <- groupedData(formula=Y ~ Month | Batch,
    data=rc2)
lme1 <- lme(Y ~ Month,
    random= pdBlocked(list(pdIdent(~ Monthc - 1),pdIdent(~ Month - 1))),
    data=rc3)
summary(lme1)
Linear mixed-effects model fit by REML
 Data: rc3 
      AIC      BIC    logLik
  286.901 298.9346 -138.4505

Random effects:
 Composite Structure: Blocked

 Block 1: Monthc0, Monthc1, Monthc3, Monthc6, Monthc9, Monthc12
 Formula: ~Monthc - 1 | Batch
 Structure: Multiple of an Identity
         Monthc0  Monthc1  Monthc3  Monthc6  Monthc9 Monthc12
StdDev: 1.934191 1.934191 1.934191 1.934191 1.934191 1.934191

 Block 2: Month
 Formula: ~Month - 1 | Batch
           Month  Residual
StdDev: 0.111472 0.8926711

Fixed effects: Y ~ Month 
                Value Std.Error DF   t-value p-value
(Intercept) 102.55643 0.7287180 80 140.73542   0e+00
Month        -0.50034 0.1259348 80  -3.97302   2e-04
 Correlation: 
      (Intr)
Month -0.66 

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max 
-2.0466072 -0.6347078  0.0557648  0.4304314  2.4608117 

Number of Observations: 84
Number of Groups: 3 
Conversion of standard deviations to variances for checking:
c(1.934191,0.111472, 0.8926711)^2
[1] 3.74109482 0.01242601 0.79686169

Details on Random effects

Exactly the same as in PROC MIXED.
ranef(lme1)
       Monthc0    Monthc1    Monthc3   Monthc6   Monthc9     Monthc12
1  0.219126119 -2.5690021 -2.3067185 1.8726057 -1.235035  0.773610734
2 -0.006207775 -2.2125547  3.1063194 2.0649346 -1.444999 -2.440480024
3  1.957416153 -0.8849589  0.3005989 0.7971893  2.005861  0.002293605
          Month
1 -0.0002840204
2 -0.0757124467
3  0.0759964671

MCMCglmm

For once this seems most easy. Notice that monthc is not used in the model. Absence of us() around month is sufficient for MCMCglm to interpret it as levels rather than continuous. By now I am not surprised by a mismatch between random effects between PROC MIXED and MCMCglmm. The month effect is significant. 
library(MCMCglmm)
MCMCglmm1 <- MCMCglmm(Y ~ Month, 
    random= ~ us(Month):Batch + Month:Batch, # two terms
    pr=TRUE,
    data=rc2,family='gaussian',
    nitt=1000000,thin=20,burnin=100000,
    verbose=FALSE)
summary(MCMCglmm1)

 Iterations = 100001:999981
 Thinning interval  = 20
 Sample size  = 45000 

 DIC: 238.8215 

 G-structure:  ~us(Month):Batch

                  post.mean  l-95% CI u-95% CI eff.samp
Month:Month.Batch   0.02221 1.386e-17  0.03053    45000

               ~Month:Batch

            post.mean l-95% CI u-95% CI eff.samp
Month:Batch     4.695    1.741    8.649    45000

 R-structure:  ~units

      post.mean l-95% CI u-95% CI eff.samp
units    0.8223   0.5518    1.114    45000

 Location effects: Y ~ Month 

            post.mean l-95% CI u-95% CI eff.samp   pMCMC    
(Intercept)  102.5577 100.9234 104.1365    45961 < 2e-05 ***
Month         -0.4994  -0.7563  -0.2401    45000 0.00467 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Details on random effects.

Again a small calculation for the last item shows 0.737419049+12*0.011875820= 0.8799289 against 0.91 and 0.88 for PROX MIXED and lme4 respectively. The random effects are quire close.

colMeans(MCMCglmm1$Sol)

        (Intercept)               Month Batch.Month.Batch.1 Batch.Month.Batch.2 

      102.557650677        -0.499353562        -0.000757751        -0.013238892 
Batch.Month.Batch.3     Month:Batch.0.1     Month:Batch.1.1     Month:Batch.3.1 
        0.011875820         0.218504027        -2.577571780        -2.317749883 
    Month:Batch.6.1     Month:Batch.9.1    Month:Batch.12.1     Month:Batch.0.2 
        1.873068384        -1.245499371         0.768104528        -0.008015210 
    Month:Batch.1.2     Month:Batch.3.2     Month:Batch.6.2     Month:Batch.9.2 
       -2.278469327         2.929929878         1.705485501        -1.995548530 
   Month:Batch.12.2     Month:Batch.0.3     Month:Batch.1.3     Month:Batch.3.3 
       -3.183017853         1.960380450        -0.828147383         0.483852028 
    Month:Batch.6.3     Month:Batch.9.3    Month:Batch.12.3 
        1.158358351         2.551876100         0.737419049 

Difference in fixed effects between all functions

All functions now make the fixed month effect at approximately -0.5. All have it significant at the conventional 5% level. There are still quite some differences. PROC MIXED and lme4 simulations have p-values close to 5%. Lme4 anova and MCMCglmm have the p-value in the 0.005 region. nlme has 0.0002. This is worrying; the moment I am performing such calculations with real data I need to be more certain on my p-values than implied here.

Sunday, September 8, 2013

Mixed models; Random Coefficients, part 1

Continuing with my exploration of mixed models I am now at the first part of random coefficients: example 59.5 for proc mixed (page 5034 of the SAS/STAT 12.3 Manual). This means I skipped examples 59.3 (plotting the likelihood) and 59.4 (known G and R). The latter I might want to do later, though I find this to be quite a strong prior.

Data 

To quote the SAS/STAT manual 'The observed responses are replicate assay results, expressed in percent of label claim, at various shelf ages, expressed in months. The desired mixed model involves three batches of product that differ randomly in intercept (initial potency) and slope (degradation rate).'
rc1 <- read.table(textConnection('
1 0 101.2 103.3 103.3 102.1 104.4 102.4
1 1 98.8 99.4 99.7 99.5 . .
1 3 98.4 99.0 97.3 99.8 . .
1 6 101.5 100.2 101.7 102.7 . .
1 9 96.3 97.2 97.2 96.3 . .
1 12 97.3 97.9 96.8 97.7 97.7 96.7
2 0 102.6 102.7 102.4 102.1 102.9 102.6
2 1 99.1 99.0 99.9 100.6 . .
2 3 105.7 103.3 103.4 104.0 . .
2 6 101.3 101.5 100.9 101.4 . .
2 9 94.1 96.5 97.2 95.6 . .
2 12 93.1 92.8 95.4 92.2 92.2 93.0
3 0 105.1 103.9 106.1 104.1 103.7 104.6
3 1 102.2 102.0 100.8 99.8 . .
3 3 101.2 101.8 100.8 102.6 . .
3 6 101.1 102.0 100.1 100.2 . .
3 9 100.9 99.5 102.2 100.8 . .
3 12 97.8 98.3 96.9 98.4 96.9 96.5
'),
    na.strings='.',
    col.names=c('Batch','Month',paste('Y',1:6,sep='')))
rc2 <- reshape(rc1,
    direction='long',
    varying=list(Y=paste('Y',1:6,sep='')),
    v.names='Y',
    timevar='i',
    times=1:6)
rc2$Batch <- factor(rc2$Batch)
rc2 <- rc2[complete.cases(rc2),]

Analysis

In this post the first model will be attempted, for nlme, lme4 and MCMCglmm. MCMCglmm was clearly the most difficult to formulate, especially with regards to the prior. It appeared the models have different results for the significance of the fixed month effect. Model quote 'The two random effects are Int and Month, modeling random intercepts and slopes, respectively. Note that Intercept and Month are used as both fixed and random effects.'

lme4

The basic call is not difficult at all. The summary also shows residual variance and variances for intercept and month (within batch).
library(lme4)
lmer1 <- lmer(Y ~ Month + (Month|Batch),data=rc2)
summary(lmer1)
Linear mixed model fit by REML 
Formula: Y ~ Month + (Month | Batch) 
   Data: rc2 
   AIC   BIC logLik deviance REMLdev
 362.3 376.9 -175.2    348.4   350.3
Random effects:
 Groups   Name        Variance Std.Dev. Corr   
 Batch    (Intercept) 0.976805 0.98833         
          Month       0.037166 0.19278  -0.548 
 Residual             3.293234 1.81473         
Number of obs: 84, groups: Batch, 3

Fixed effects:
            Estimate Std. Error t value
(Intercept) 102.7038     0.6456  159.08
Month        -0.5259     0.1194   -4.41

Correlation of Fixed Effects:
      (Intr)
Month -0.580
There is not as much default output as PROC MIXED. Extra output for variances is needed to retrieve the Month-Intercept covariance. 
VarCorr(lmer1)
$Batch
            (Intercept)       Month
(Intercept)   0.9768046 -0.10448120
Month        -0.1044812  0.03716553
attr(,"stddev")
(Intercept)       Month 
  0.9883343   0.1927836 
attr(,"correlation")
            (Intercept)      Month
(Intercept)   1.0000000 -0.5483579
Month        -0.5483579  1.0000000

attr(,"sc")
[1] 1.814727
Significance of Month effect can be calculated via anova and simulation. As I cannot see the point in estimating significance of intercept effect when a main effect is present this is skipped. Anova makes it clearly non-significant, while simulation is close to the p-value PROC MIXED provides, but there are 30 simulations with convergence problems.
lmer1b <- lmer(Y ~ 1 + (Month|Batch),data=rc2)
anova(lmer1,lmer1b)
Data: rc2
Models:
lmer1b: Y ~ 1 + (Month | Batch)
lmer1: Y ~ Month + (Month | Batch)
       Df    AIC    BIC  logLik Chisq Chi Df Pr(>Chisq)   
lmer1b  5 365.36 377.51 -177.68                           
lmer1   6 360.44 375.03 -174.22 6.914      1   0.008552 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
pboot <- function(m0,m1) {
  s <- simulate(m0)
  L0 <- logLik(refit(m0,s))
  L1 <- logLik(refit(m1,s))
  2*(L1-L0)
}
obsdev <- 2*( logLik(lmer1)-logLik(lmer1b))
#
set.seed(1001)
reps <- replicate(1000,pboot(lmer1b,lmer1))
mean(reps>obsdev)
[1] 0.055
Random effects are same:
ranef(lmer1)
$Batch
  (Intercept)       Month
1  -1.0010539  0.12865763
2   0.3934426 -0.20598652
3   0.6076112  0.07732889

nlme

library(nlme)
lme1 <- lme(Y ~ Month,
    random= ~Month|Batch,
    data=rc2)
summary(lme1)
Linear mixed-effects model fit by REML
 Data: rc2 
       AIC      BIC    logLik
  362.3281 376.7685 -175.1641

Random effects:
 Formula: ~Month | Batch
 Structure: General positive-definite, Log-Cholesky parametrization
            StdDev    Corr  
(Intercept) 0.9883331 (Intr)
Month       0.1927833 -0.548
Residual    1.8147270       

Fixed effects: Y ~ Month 
                Value Std.Error DF   t-value p-value
(Intercept) 102.70380 0.6456110 80 159.08001       0
Month        -0.52594 0.1193732 80  -4.40589       0
 Correlation: 
      (Intr)
Month -0.58 

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-1.85442335 -0.68272916 -0.09994707  0.62635493  2.64425526 

Number of Observations: 84
Number of Groups: 3 
Random effects are slightly different displayed but the same. 
#variances
c(.9883331,.1927844,1.814727)^2
[1] 0.97680232 0.03716582 3.29323408
#covariance
.9883331*.1927844*-.548
[1] -0.1044133
Fixed effects are same as in PROC MIXED, but degrees of freedom are off, making the month effect significant. Random effects are exactly the same
ranef(lme1)
  (Intercept)       Month
1  -1.0010483  0.12868117
2   0.3934057 -0.20599206
3   0.6076426  0.07731089

MCMCglmm

MCMCglmm has me baffled for this one:. After reviewing the manual the obvious model was not so difficult to formulate. However,a prior is needed. Following MCMCglmm course notes example, page 78 I copied a prior. The standard errors are larger than PROC MIXED. In addition, month is not a significant fixed effect.
prior1 <- list(R=list(V=1e-7,nu=-2),
    G=list(G1=list(V=diag(2),nu=2) )
)

MCMCglmm1 <- MCMCglmm(Y ~ Month, 
    random= ~ us(Month+1):Batch,
    pr=TRUE,
    data=rc2,family='gaussian',
    nitt=500000,thin=20,burnin=50000,
    prior=prior1,
    verbose=FALSE)
summary(MCMCglmm1)
 Iterations = 50001:499981
 Thinning interval  = 20
 Sample size  = 22500 

 DIC: 346.489 

 G-structure:  ~us(Month + 1):Batch

                              post.mean l-95% CI u-95% CI eff.samp
(Intercept):(Intercept).Batch    4.3457   0.1418   11.519    22500
Month:(Intercept).Batch         -0.3678  -4.8534    3.988    22500
(Intercept):Month.Batch         -0.3678  -4.8534    3.988    22500
Month:Month.Batch                2.0747   0.1247    5.766    22500

 R-structure:  ~units

      post.mean l-95% CI u-95% CI eff.samp
units     3.474    2.456    4.658    22500

 Location effects: Y ~ Month 

            post.mean l-95% CI u-95% CI eff.samp  pMCMC    
(Intercept)  102.7132 100.5880 105.0134    22257 <4e-05 ***
Month         -0.5211  -2.0054   1.0162    22500  0.356    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
colMeans(MCMCglmm1$Sol)
              (Intercept)                     Month Batch.(Intercept).Batch.1 
             102.71316233               -0.52109218               -1.00404017 
Batch.(Intercept).Batch.2 Batch.(Intercept).Batch.3       Batch.Month.Batch.1 
               0.38305996                0.60774619                0.12522274 
      Batch.Month.Batch.2       Batch.Month.Batch.3 
              -0.22493139                0.08335665 
apply(MCMCglmm1$Sol,2,sd)
              (Intercept)                     Month Batch.(Intercept).Batch.1 
                1.1851105                 0.8264831                 1.2192814 
Batch.(Intercept).Batch.2 Batch.(Intercept).Batch.3       Batch.Month.Batch.1 
                1.2049515                 1.2168346                 0.8271416 
      Batch.Month.Batch.2       Batch.Month.Batch.3 
                0.8274522                 0.8288230 
To return to this all important prior, where experimenting showed a different choice clearly had influence, we can plot it. This shows that the variance is suggested to be close to 1 and lower than 10. 
nu.ast <- prior1$G$G1$nu-dim(prior1$G$G1$V)[1]+1
V.ast <- prior1$G$G1$V[1,1]*(prior1$G$G1$nu/nu.ast)
xv=seq(1e-16,10,length.out=100)
dv=MCMCpack::dinvgamma(xv,shape=nu.ast/2,
    scale=(nu.ast*V.ast)/2)
plot(dv~xv,type='l')


Sunday, September 1, 2013

Mixed models exercise 2. Repeated measurements

Continuing my exploration of mixed models, I now understand what is happening in the second SAS(R)/STAT example for proc mixed (page 5007 of the SAS/STAT 12.3 Manual). It is all about correlation between the time-points within subjects. The data as such is simple, size measurements of children at ages 8, 10, 12 and 14. The subject of analysis is growth. There is correlation between the data at different ages. Then there are the usual subjects; Random subjects in intercept, fixed effects for gender and age effect and the interaction age*gender.
To summarize the results. With lme I get the fixed structure of the first two models, but the R structure has too high values. With MCMCglm I only was able to create the first model. Here the R structure was quite different, but equivalent to yet another lme model. For interpretation purposes, the fixed model parts are the same. The difference between the two lme models escapes me for now.  

Data

Data from SAS/STAT guide. To have the base levels as the SAS analysis there is a relevel for Gender, other than that it is straightforward reading data and a reshape().
r1 <- read.table(textConnection('
1 F 21.0 20.0 21.5 23.0
2 F 21.0 21.5 24.0 25.5
3 F 20.5 24.0 24.5 26.0
4 F 23.5 24.5 25.0 26.5
5 F 21.5 23.0 22.5 23.5
6 F 20.0 21.0 21.0 22.5
7 F 21.5 22.5 23.0 25.0
8 F 23.0 23.0 23.5 24.0
9 F 20.0 21.0 22.0 21.5
10 F 16.5 19.0 19.0 19.5
11 F 24.5 25.0 28.0 28.0
12 M 26.0 25.0 29.0 31.0
13 M 21.5 22.5 23.0 26.5
14 M 23.0 22.5 24.0 27.5
15 M 25.5 27.5 26.5 27.0
16 M 20.0 23.5 22.5 26.0
17 M 24.5 25.5 27.0 28.5
18 M 22.0 22.0 24.5 26.5
19 M 24.0 21.5 24.5 25.5
20 M 23.0 20.5 31.0 26.0
21 M 27.5 28.0 31.0 31.5
22 M 23.0 23.0 23.5 25.0
23 M 21.5 23.5 24.0 28.0
24 M 17.0 24.5 26.0 29.5
25 M 22.5 25.5 25.5 26.0
26 M 23.0 24.5 26.0 30.0
27 M 22.0 21.5 23.5 25.0
'),col.names=c('Person','Gender','Age8','Age10','Age12','Age14'),
colClasses=c('factor','factor',rep('numeric',4)))
rm <- reshape(r1,direction='long',
    varying=list(c('Age8','Age10','Age12','Age14')),
    timevar='Age',idvar=c('Person','Gender'),
    v.names='y',
    times=c(8,10,12,14))
rm$Gender <- relevel(rm$Gender,ref='M')
rm$fage=factor(rm$Age)

nlme

Ignoring correlation, the analysis is simple. However, this is not the analysis which is presented in the STAT Users Guide.
lNull <- lme(y ~  Age * Gender,data=rm, random= ~ 1 | Person,
    method='REML')
summary(lNull)
Linear mixed-effects model fit by REML
 Data: rm 
       AIC      BIC    logLik
  445.7572 461.6236 -216.8786

Random effects:
 Formula: ~1 | Person
        (Intercept) Residual
StdDev:    1.816214 1.386382

Fixed effects: y ~ Age * Gender 
                Value Std.Error DF   t-value p-value
(Intercept) 16.340625 0.9813122 79 16.651810  0.0000
Age          0.784375 0.0775011 79 10.120823  0.0000
GenderF      1.032102 1.5374208 25  0.671321  0.5082
Age:GenderF -0.304830 0.1214209 79 -2.510520  0.0141
 Correlation: 
            (Intr) Age    GendrF
Age         -0.869              
GenderF     -0.638  0.555       
Age:GenderF  0.555 -0.638 -0.869

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-3.59804400 -0.45461690  0.01578365  0.50244658  3.68620792 

Number of Observations: 108
Number of Groups: 27 

First Model

The first model in the guide should be general symmetric in R structure. While I first modeled this in the correlation term (see below), I ended up building this in the random term. The current model has fixed effects exactly like PROC MIXED, associated test very close, but the R matrix is twice as large.
lSymm <- lme(y ~  Age * Gender,
    data=rm, random= list(Person =pdSymm(~ fage-1)),method='ML')
summary(lSymm)
Linear mixed-effects model fit by maximum likelihood
 Data: rm 
      AIC     BIC    logLik
  449.477 489.709 -209.7385

Random effects:
 Formula: ~fage - 1 | Person
 Structure: General positive-definite
         StdDev    Corr               
fage8    2.1650807 fage8 fage10 fage12
fage10   1.8698474 0.603              
fage12   2.3554563 0.708 0.617        
fage14   2.0460614 0.569 0.800  0.793 
Residual 0.6569738                    

Fixed effects: y ~ Age * Gender 
                Value Std.Error DF   t-value p-value
(Intercept) 15.842297 0.9534264 79 16.616171  0.0000
Age          0.826803 0.0806212 79 10.255400  0.0000
GenderF      1.583072 1.4937321 25  1.059810  0.2994
Age:GenderF -0.350438 0.1263091 79 -2.774446  0.0069
 Correlation: 
            (Intr) Age    GendrF
Age         -0.875              
GenderF     -0.638  0.559       
Age:GenderF  0.559 -0.638 -0.875

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-1.34109378 -0.30114043  0.03182357  0.26360983  1.69966216 

Number of Observations: 108
Number of Groups: 27 
as.matrix(lSymm$modelStruct$reStruct)
$Person
           fage8   fage10    fage12   fage14
fage8  10.860557 5.655273  8.365118 5.843739
fage10  5.655273 8.100582  6.296156 7.095082
fage12  8.365118 6.296156 12.854465 8.858492
fage14  5.843739 7.095082  8.858492 9.699319

Second model

The second model is compound symmetry. Again, the statements are easy, but the R values twice as large. Fixed values match the PROC MIXED example output.
lCSymm <- lme(y ~  Age * Gender,
    data=rm, random= list(Person =pdCompSymm(~ fage-1)),method='ML')
summary(lCSymm)
Linear mixed-effects model fit by maximum likelihood
 Data: rm 
       AIC     BIC    logLik
  442.6391 461.414 -214.3195

Random effects:
 Formula: ~fage - 1 | Person
 Structure: Compound Symmetry
         StdDev    Corr             
fage8    2.1024306                  
fage10   2.1024306 0.686            
fage12   2.1024306 0.686 0.686      
fage14   2.1024306 0.686 0.686 0.686
Residual 0.6963793                  

Fixed effects: y ~ Age * Gender 
                Value Std.Error DF   t-value p-value
(Intercept) 16.340625 0.9814310 79 16.649795  0.0000
Age          0.784375 0.0779963 79 10.056564  0.0000
GenderF      1.032102 1.5376069 25  0.671239  0.5082
Age:GenderF -0.304830 0.1221968 79 -2.494580  0.0147
 Correlation: 
            (Intr) Age    GendrF
Age         -0.874              
GenderF     -0.638  0.558       
Age:GenderF  0.558 -0.638 -0.874

Standardized Within-Group Residuals:
         Min           Q1          Med           Q3          Max 
-1.854861527 -0.235701022  0.007918635  0.265357548  1.898850367 

Number of Observations: 108
Number of Groups: 27 
as.matrix(lCSymm$modelStruct$reStruct)
$Person
          fage8   fage10   fage12   fage14
fage8  9.114895 6.249301 6.249301 6.249301
fage10 6.249301 9.114895 6.249301 6.249301
fage12 6.249301 6.249301 9.114895 6.249301
fage14 6.249301 6.249301 6.249301 9.114895

MCMCglmm

I ran into the limits of MCMCglmm pretty quickly. However, the first model was possible after defining a prior. (As a note, a slightly more verbose output while setting the prior would have made life a bit more easy). The R structure is quite far from PROC MIXED. 
library(MCMCglmm)
rm$fage=factor(rm$Age)
prior1 <- list(R=list(V=diag(4),nu=.01),
    G=list(G1=list(V=diag(1),nu=.01) 

m1 <- MCMCglmm(y ~ Age* Gender , 
    random= ~ Person ,
    rcov=~ us(fage) :Person,
    data=rm,family='gaussian',
    nitt=500000,thin=20,burnin=50000,
    verbose=FALSE
 ,   prior=prior1
)
summary(m1)
 Iterations = 50001:499981
 Thinning interval  = 20
 Sample size  = 22500 

 DIC: 123.8128 

 G-structure:  ~Person

       post.mean l-95% CI u-95% CI eff.samp
Person     4.209    2.136    6.963     5615

 R-structure:  ~us(fage):Person

             post.mean l-95% CI u-95% CI eff.samp
8:8.Person      2.6261  0.45693   5.5948    562.4
10:8.Person    -0.5219 -1.89671   1.0416    659.6
12:8.Person     0.3808 -1.55471   2.6806    517.4
14:8.Person    -0.7821 -2.32815   0.8618    862.4
8:10.Person    -0.5219 -1.89671   1.0416    659.6
10:10.Person    1.4867  0.06493   3.4580    493.6
12:10.Person   -0.4788 -1.89693   1.1443    756.1
14:10.Person    0.1536 -1.06686   1.7433    515.8
8:12.Person     0.3808 -1.55471   2.6806    517.4
10:12.Person   -0.4788 -1.89693   1.1443    756.1
12:12.Person    3.0773  0.55387   6.2826    546.7
14:12.Person    0.6970 -1.03654   2.9108    505.7
8:14.Person    -0.7821 -2.32815   0.8618    862.4
10:14.Person    0.1536 -1.06686   1.7433    515.8
12:14.Person    0.6970 -1.03654   2.9108    505.7
14:14.Person    1.9824  0.21537   4.3931    529.9

 Location effects: y ~ Age * Gender 

            post.mean l-95% CI u-95% CI eff.samp  pMCMC    
(Intercept)  15.85825 13.71176 18.11169    18083 <4e-05 ***
Age           0.82697  0.64584  1.00791    22500 <4e-05 ***
GenderF       1.55014 -1.84468  4.93357    22500 0.3507    
Age:GenderF  -0.35005 -0.62738 -0.06956    22500 0.0163 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

lme comparison

It is interesting to notice the R correlation is close to slightly different formulation of lme, where not the random structure, but rather the correlation is set. The fixed effects seem the same between model formulations.
lSymm <- lme(y ~  Age * Gender,corr=corSymm(form= ~ I(Age/2-3)),
    data=rm, random= ~ 1 | Person,method='ML')
summary(lSymm)

Linear mixed-effects model fit by maximum likelihood
 Data: rm 
       AIC      BIC    logLik
  445.2348 477.4204 -210.6174

Random effects:
 Formula: ~1 | Person
        (Intercept) Residual
StdDev:    1.753195 1.356352

Correlation Structure: General
 Formula: ~I(Age/2 - 3) | Person 
 Parameter estimate(s):
 Correlation: 
  1      2      3     
2 -0.203              
3  0.006 -0.192       
4 -0.313  0.322  0.214
Fixed effects: y ~ Age * Gender 
                Value Std.Error DF   t-value p-value
(Intercept) 15.920019 0.9766449 79 16.300724  0.0000
Age          0.824758 0.0810365 79 10.177603  0.0000
GenderF      1.488148 1.5301085 25  0.972577  0.3401
Age:GenderF -0.348628 0.1269599 79 -2.745971  0.0075
 Correlation: 
            (Intr) Age    GendrF
Age         -0.874              
GenderF     -0.638  0.558       
Age:GenderF  0.558 -0.638 -0.874

Standardized Within-Group Residuals:
          Min            Q1           Med            Q3           Max 
-3.2236086696 -0.5564845586  0.0009453569  0.5124328827  3.7675873271 

Number of Observations: 108
Number of Groups: 27 
as.matrix(lSymm$modelStruct$corStruct)[[2]].
             [,1]       [,2]         [,3]       [,4]
[1,]  1.000000000 -0.2027774  0.005515599 -0.3128891
[2,] -0.202777439  1.0000000 -0.192400735  0.3222584
[3,]  0.005515599 -0.1924007  1.000000000  0.2142248
[4,] -0.312889054  0.3222584  0.214224798  1.0000000
cov2cor(matrix(colMeans(m1$VCV[,-1]),nrow=4))
           [,1]        [,2]       [,3]        [,4]
[1,]  1.0000000 -0.26413228  0.1339446 -0.34279042
[2,] -0.2641323  1.00000000 -0.2238354  0.08945827
[3,]  0.1339446 -0.22383536  1.0000000  0.28219796
[4,] -0.3427904  0.08945827  0.2821980  1.00000000