I started off today by taking a brief look at the temperature data that Matt retrieved. When I asked Matt about his initial findings with the temperature, he told me that there was somewhat of a correlation between power and temperature. However, he is unsure as to whether the data is faulty or if there is a deeper correlation between the two. I plan on looking more into the temperature data later on when I have more statistical tools at my disposal. I then decided to look a bit into the Poisson Distribution and Poisson Regression that Matt mentioned on his blog.
The Poisson Distribution is a model that defines the probability a certain amount of events takes place in an interval given the average amount of times the event took place. There are a few restrictions that determine whether or not a Poisson Distribution can be used. The events that occur must be independent of one another, the rate which events occur must be constant, and the probability of an event occurring in a small interval is also proportionate to the length of the interval. Thinking about how this can be applied to our energy data, it seems to me that one way that Poisson Distributions can be implemented is by separating dates into groups which use similar amounts of energy (seasonal, monthly, bimonthly, etc.), and creating a Poisson Distribution for each of those groups. In this way, we can say that the rate at which events occur is a constant value.
One point of interest with these distributions is to make sure that the groupings neither have too many nor too few dates. If there are too many dates within the group, what may be a change in energy usage due to a temperature fluctuation would be perceived as a inefficient day. If there are too few dates in a group, then even the smallest of changes in energy output may also be seen as an anomaly. In the next couple of days, I hope to do a bit more research on this topic and also do some testing with code. The Poisson Distribution is a powerful tool, but there are certain guidelines that must be met in order for it to be used.
The Poisson Distribution is a model that defines the probability a certain amount of events takes place in an interval given the average amount of times the event took place. There are a few restrictions that determine whether or not a Poisson Distribution can be used. The events that occur must be independent of one another, the rate which events occur must be constant, and the probability of an event occurring in a small interval is also proportionate to the length of the interval. Thinking about how this can be applied to our energy data, it seems to me that one way that Poisson Distributions can be implemented is by separating dates into groups which use similar amounts of energy (seasonal, monthly, bimonthly, etc.), and creating a Poisson Distribution for each of those groups. In this way, we can say that the rate at which events occur is a constant value.
One point of interest with these distributions is to make sure that the groupings neither have too many nor too few dates. If there are too many dates within the group, what may be a change in energy usage due to a temperature fluctuation would be perceived as a inefficient day. If there are too few dates in a group, then even the smallest of changes in energy output may also be seen as an anomaly. In the next couple of days, I hope to do a bit more research on this topic and also do some testing with code. The Poisson Distribution is a powerful tool, but there are certain guidelines that must be met in order for it to be used.