Regression Discussion > How to deal with
Thanks for the question! Your question is a great one, although a tough one. There isn't a single answer, as which techniques you use can vary on several factors, but it sounds to me like a "Poisson Regression" would be a great place to start.
In general, a poisson regression is a kind of regression analysis that is used when the assumptions of normality are not met. Common occurrences of this are count variables (which tend to have a lot of zeros because of a "basement effect") and "event history" data, which tends to occur infrequently so it has many zero values.
While explaining how to accomplish a poisson regression is probably not realistic for this forum post, there are some great resources. One great option is David Garson's website (LINK TO HIS POISSON PAGE BELOW). If you have additional questions after you get into it, don't hesitate to post them here!
Good Luck!
LINK: http://faculty.chass.ncsu.edu/garson/PA765/logit.htm#poisson
Jeremy
"The Stats Make Me Cry Guy"

I've sometimes wondered if Poisson regression would be the best tool to use for predictive modeling in fundraising. Typically half or more cases have lifetime giving of zero dollars (the floor), then there are a large number of cases with small giving amounts, and finally a few very significant donors. So the distribution is severely non-normal. When using multiple linear regression with 'lifetime giving' as the outcome variable, I will log-transform the variable - actually, I take the log of LT Giving + 1, because I don't want to lose any cases. The variable and the model are much improved, but still that doesn't really help with that huge spike of zero (now near-zero) values. If Poisson would be an improvement, I would use it, but at the moment I don't know how, so I'll be following the link you've provided. Thanks!

I'm glad you found it useful, Keven! Please let us know if you found it applicable to what you are doing!

I am trying to develop a predictive modeling for water pipes breakage, I use 10 predictors (inputs), and they are diameter, length, material, age, roughness, thickness and the hydraulic loss of the pipes in addition to type of soil around the pipes, location of the pipes and depth of the trench. As they are the influential factors on breakage rate, breakage rate is the modeling target (output). Normally the predictive model dependent on the historical breakage records. My problem is, 40% of the previous breaks data are zeros, this condition made the results poor. The remaining values are ranges from 0.0005 to 0.06 break/year/unit pipe length.