Sign up today for an exclusive discount along with our 30-day GUARANTEE — Love us or leave, with your money back! Click here to become a part of our growing community and learn how to stop gambling with your investments. We will teach you to BE THE HOUSE — Not the Gambler!

Click here to see some testimonials from our members!

You’re the worst thing that’s ever happened to me.

Let’s talk about relationships, you know, the statistical ones. 

You’re the worst thing that’s ever happened to me.

Hoping that our readers might enjoy a sneak preview of what we were working on, we decided to post our preliminary results on the Chrysler dealer data we crunched all weekend.

It will be a cold day in hell before we do that again, you can be sure.

The reaction was complicated by my typing "Why would there be an significant and highly positive correlation between dealer survival and Clinton donors?"

I didn’t mean "statistically significant," but that was all it took. A number of blogs fixated on this sentence and ignored the next two, "Granted, that P-Value (0.125) isn’t enough to reject the null hypothesis at 95% confidence intervals (our null hypothesis being that the effect is due to random chance), but a 12.5% chance of a Type I error in rejecting a null hypothesis (false rejection of a true hypothesis) is at least eyebrow raising. Most statistians would not call this a "find" as 95% confidence intervals are the gold standard for this sort of work." You can imagine the result.

As a measure of how partisan this issue is, both sides of the aisle had it out over sentence structure all day. We were severely chastised (in my case rightly so) and that was further complicated by my posting without explanation, the results of six regressions absent any narrative about our experimental design, our order of testing and the like. This subjected us to "data fishing" criticisms. (Probably warranted to some small degree- though some readers went so far to accuse us of outright fraud). What began as an attempt to give our readers some transparency devolved into a mess of sound bytes. The fault is entirely mine for expecting civil peer review or anything like it in such a forum. A mistake I shall not repeat with early findings again.

The slew of email I received ranged from "thanks" to "you are the spawn of the devil." The latter is the obviously closest to the truth, so we are sending that reader the Marla Singer Stolen Jeans prize. His name was Robert Paulson.

We planned to release the data publicly today, but we are having second thoughts about that particular plan. Nothing makes a researcher feel less like sharing than being told what a moron they are after 48 hours of straight dataset preparation.

We have come up with a few ideas instead.

1. We are considering creating a members-only section for research and data like this. This would be the place we would release preliminary findings and datasets we have put a great deal of (uncompensated) (wo)manhours into. We may or may not then release final results to the public. Or we may after a substantial delay.

2. We may send datasets like this one to qualified and interested parties who request it by email. (This is where we are leaning for this particular set at the moment).

This means that everyone who isn’t a certified space monkey is going to have to wait for the integration of the GM dealer closing data.

As our dataset is a derivative of the CRP, we will be distributing it (to the extent that we do) under the creative commons Attribution-Noncommercial-Share Alike license (though we may ask you respect the embargo time before we release it publicly).

I’ll decide at the end of this note.

About the Data – Dataset creation:

Individual donor records from the Center for Responsive Politics for the 2008 election cycle (~800 megs, n=3542585) were converted to a .csv file. (Please consider donating to the Center here: https://secure.groundspring.org/dn/index.php?aid=17127 ) Chrysler dealer records were compiled from dealer closing and dealer survival records (n=3129 after the elimination of mangled or unusable entries). These were obtained from bankruptcy documents and converted to excel. Both datasets were imported into an SQL database (after abortive attempts to run matching with less effective but more amusing methods-- at this size Excel and Access were useless so we had to rely on custom SQL queries for the initial extraction).

The dealer dataset was matched against full last name and first 2 characters of first name of the "majority owner" field v. the name of donor field in the CRP dataset. (Single initial first names were also matched). Records that matched were merged into a new file. The matching result produced a new data subset (n=~10000). This subset was filtered by first 2 charters in zip code. The resulting initial match dataset yielded n= ~6200 records including all political donations made by potential name matches, and preserving multiple donations from one majority owner name. In the case where a single majority owner owned several dealerships, records for each dealership/donation pair were created. In this way we could link dealership fates with their owner’s political acts.

As the CRP data includes self-identified profession and employer data, we edited the automated match list by hand by first looking for a match between CRP employer or profession and the name of the dealership in the dealership dataset. We also cross checked full name entries-- in cases where small details like middle initial disagreed but the two datasets agreed on employer as a dealership we preserved the record. If no clear match was found with employer / profession we then looked to other connections- for instance common CRP donor codes with other records where the donor listed a known dealer as employer / profession. Some professions caused us automatically to look for other evidence (self-employed, business owner, etc) which often led us to find donors we might have otherwise passed by.

We had finished the first pass of this edit when we ran our first regressions and published the preliminary results.

We’ve since made a second pass and caught some more bad matches.

The newest resulting hand-edited dataset (n=5117) includes a single entry for each dealership with no matched political donations and at least one entry for each dealership where majority owners donated in a fashion tracked by CRP individual donor data.

Each individual dealership was assigned a numeric ID code. After filtering for double entries and subject to our removal of mangled or unreadable records the dataset contained 2923 individual dealer codes. This is somewhat less than the reported ~3200 total dealers reported as current by Chrysler.

Using a pivot table keyed on individual dealer ID code we tabulated several variables from the CRP/Dealer data including:

Source/Field: Description

Dealer/Majority Owner (full name)
CRP/FullName
CRP/Profession
CRP/Employer
Dealer/CompanyName
Dealer/CompanyAddress
Dealer/Zip Code
Dealer/DealerTerminationStatus: (1=terminated)
Created/ID: Dealer Unique ID
CRP/DonorCode: Individual Donor Code
CRP/Recipient: CRP Unique Recipient ID
CRP/DateDonation
CRP/AmountDonation
CRP/Party – Codes:

First Character:
D=Democratic
R=Republican
3=3rd Party
U=Unknown
P= PAC

Second Character:

Party:
W=Winner
L=Loser
I=Incumbent
C=Challenger
O=Open Seat
N=Non-incumbent

PAC:
B=Business
L=Labor
I=Ideological
O=Other
U=Unknown

We further teased these party codes into boolean categoricals (1/0) for each party / PAC code.

We then added calculated fields for:

Clinton (0 or 1 where Recipient ID = N00000019)
Obama (0 or 1 where Recipient ID = N00009638)
McCain (0 or 1 where Recipient ID = N00006424)

D (Any democratic donation)
R (Any republican donation)

None: No donation match found
Some: Any donation match found

Using pivot tables in Excel sorted by dealer code, we then aggregated all donation data by Dealer ID. In this way we captured the entire donation profile of a given Majority Owner and assigned to to each of his/her dealers. We used this dataset in our regression analysis.

Initially we were interested in testing the hypothesis that donating to Obama in the 2008 election cycle might result in higher than average survival rates among dealers.

The data wouldn’t come close to rejecting the null hypothesis there. (p-value ~0.7)

Here’s that regression with the final version data:

Binary Logistic Regression: safe versus Obama 

Step Log-Likelihood
0 -1611.36
1 -1611.34
2 -1611.34
3 -1611.34


Link Function: Logit


Response Information

Variable Value Count
safe 1 2221 (Event)
0 702
Total 2923


Logistic Regression Table

Odds 95% CI
Predictor Coef SE Coef Z P Ratio Lower Upper
Constant 1.15086 0.0434894 26.46 0.000
Obama
1 0.101900 0.464948 0.22 0.827 1.11 0.45 2.75


Log-Likelihood = -1611.335
Test that all slopes are zero: G = 0.049, DF = 1, P-Value = 0.825

* NOTE * No goodness of fit test performed.
* NOTE * The model uses all degrees of freedom.


Measures of Association:
(Between the Response Variable and Predicted Probabilities)

Pairs Number Percent Summary Measures
Concordant 14616 0.9 Somers' D 0.00
Discordant 13200 0.8 Goodman-Kruskal Gamma 0.05
Ties 1531326 98.2 Kendall's Tau-a 0.00
Total 1559142 100.0

* NOTE * 1 time(s) the standardized Pearson residuals, delta chi-square, delta
deviance, delta beta (standardized) and delta beta could not be
computed because leverage (Hi) is equal to 1.

We ran Clinton next. Here are the new results (note a drop in p-value).

Binary Logistic Regression: safe versus Clinton 

Step Log-Likelihood
0 -1611.36
1 -1610.29
2 -1610.26
3 -1610.26
4 -1610.26


Link Function: Logit


Response Information

Variable Value Count
safe 1 2221 (Event)
0 702
Total 2923


Logistic Regression Table

Odds 95% CI
Predictor Coef SE Coef Z P Ratio Lower Upper
Constant 1.14409 0.0435562 26.27 0.000
Clinton
1 0.573566 0.412789 1.39 0.165 1.77 0.79 3.99


Log-Likelihood = -1610.265
Test that all slopes are zero: G = 2.190, DF = 1, P-Value = 0.139

* NOTE * No goodness of fit test performed.
* NOTE * The model uses all degrees of freedom.


Measures of Association:
(Between the Response Variable and Predicted Probabilities)

Pairs Number Percent Summary Measures
Concordant 27105 1.7 Somers' D 0.01
Discordant 15274 1.0 Goodman-Kruskal Gamma 0.28
Ties 1516763 97.3 Kendall's Tau-a 0.00
Total 1559142 100.0

* NOTE * 1 time(s) the standardized Pearson residuals, delta chi-square, delta
deviance, delta beta (standardized) and delta beta could not be
computed because leverage (Hi) is equal to 1.

We still think there is enough to be curious here, but clearly our model is insufficient to understand what’s going on in any statistically significant way. The reader will have to answer for themselves if such a p-value is eyebrow raising or not. (Perhaps someone will conduct a Bayesian analysis for this p-value and the Maureen White connection for us).

And for the "Anti-Republican" conspiracy buffs:

Binary Logistic Regression: term versus R 

Step Log-Likelihood
0 -1611.36
1 -1611.29
2 -1611.29
3 -1611.29


Link Function: Logit


Response Information

Variable Value Count
term 1 702 (Event)
0 2221
Total 2923


Logistic Regression Table

Odds 95% CI
Predictor Coef SE Coef Z P Ratio Lower Upper
Constant -1.14513 0.0467939 -24.47 0.000
R
1 -0.0457553 0.123407 -0.37 0.711 0.96 0.75 1.22


Log-Likelihood = -1611.291
Test that all slopes are zero: G = 0.138, DF = 1, P-Value = 0.710

* NOTE * No goodness of fit test performed.
* NOTE * The model uses all degrees of freedom.


Measures of Association:
(Between the Response Variable and Predicted Probabilities)

Pairs Number Percent Summary Measures
Concordant 198058 12.7 Somers' D 0.01
Discordant 189200 12.1 Goodman-Kruskal Gamma 0.02
Ties 1171884 75.2 Kendall's Tau-a 0.00
Total 1559142 100.0

I don’t plan to be so silly as to offer more commentary this time, and I’m only putting up the new regressions to quell the flames.

Alright, I’ve decided.

If you’d like the raw, merged dataset, drop me an email (marla @ zerohedge d o t com) with your stats qualifications and the like and we’ll work something out. You’ll probably get at least a week with the stuff on your own before we open the floodgates.

If you’d just like us to run some testing against what we have, just let us know and we’ll see what we can do for you.

Likewise, if you think we’ve blown something, why not write us a nice note telling us how it should be done to meet with your satisfaction? We’re happy to give it a shot- if you’re polite about the whole thing.


Tags: , ,

Do you know someone who would benefit from this information? We can send your friend a strictly confidential, one-time email telling them about this information. Your privacy and your friend's privacy is your business... no spam! Click here and tell a friend!





You must be logged in to make a comment.
You can sign up for a membership or get a FREE Daily News membership or log in

Sign up today for an exclusive discount along with our 30-day GUARANTEE — Love us or leave, with your money back! Click here to become a part of our growing community and learn how to stop gambling with your investments. We will teach you to BE THE HOUSE — Not the Gambler!

Click here to see some testimonials from our members!