dbTalk Databases Forums  

PLEASE HELP ME WITH DM PROBLEM

comp.databases.olap comp.databases.olap


Discuss PLEASE HELP ME WITH DM PROBLEM in the comp.databases.olap forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
dataminer101
 
Posts: n/a

Default PLEASE HELP ME WITH DM PROBLEM - 03-14-2006 , 08:22 AM






Hi Guys,

I kindly ask for your help with regards to my DM project. I am working
on a project that is related to the field of agriculture and that has
as an objective to find the "optimal values" of the operating
conditions that affect the outcome (the amount of meat produced i.e.
the weight) of an animal production (chicken broilers in my case). To
do so, I have to use historical data of previous productions as my
training dataset. The length a production cycle is typically around 44
days. For each production, a data acquisition system stores the
real-time and historical data of hundreds of parameters. These
parameters represent sensor measurements of all the operating
conditions (current temperature, set point temperature, humidity,
static pressure, etc...) and these are what I refer to as the inputs.
The operating costs and the production outcome are what I refer to as
outputs. The operating cost is indirectly computed from parameters
like water consumption, feed consumption, heater/cooling runtimes, and
lighting runtime; and the outcome of a production is defined by
parameters like animal mortality and conversion factor (amount of feed
in Lbs to produce 1Lb of meat). So the main objective of this project
is to find the set of "optimal daily values" (1value/day) for the
inputs that would minimize the operating costs and conversion ratio
outputs.
The biggest problem I am facing right now is the following: The
historical data that I have in the DB are time series for each measured
parameter. Some of these time series follow some kind of cyclic
pattern (e.g. daily water/feed consumption ...) while others follow an
increasing/decreasing trend (animal weight, total heater run time,
total water/feed consumption.....). My goal is to be able to come up
with a model that suggests a set of curves for the optimal daily values
throughout the length of the production cycle, one curve for each
measured input/output parameter. This model would allow the farmer to
closely monitor his production on a daily basis to make sure his
production parameters follow the "optimal curves" suggested by my
model. I have looked at ANN and I think it might be the solution to my
problem since it allows to model multiple input/outputs problems (Am I
wrong?), but I could not figure out a way to model the inputs/outputs
as time series (an array of values for each parameter). As far as I
know, all kinds of classifiers accept only single valued samples.
One approach would be to create one classifier/day (e.g. for day1:
extract a single value for each parameter and use these values as a
training sample and repeat this for all previous production to
construct the training set). The problem with this approach is that 44
or so classifiers will be constructed (hard to manage all of this) and
each of these resulting ANN will be some kind of "typical average"
of the training data but not necessarily the "optimal values"
leading to the best production outcome, if I am not mistaken.
Another approach would be to find a way to feed in the inputs and
outputs as time series (an array of 44 daily values for each
input/output parameter). In this case, there would be only one
resulting ANN and the training samples, would be a set of arrays for
each parameter, as opposed to single daily parameter values in the
first case. The problem is, I could not find any classifier that would
allow me to do that.

Another issue that I have is the amount of data. While a single
production cycle could represent 1-2GB of data, the length of the
production cycle (44 days) makes it difficult to have 100's of
production cycle historical data, as I could gather data for no more
than 7 full cycles/year. Fortunately, a farm can have many production
units (5-10 barns/site in big sites), so this makes it possible to have
40-70 cycles/yr. My question is: would this be enough to come up with
an acceptably accurate model or is it necessary to have hundreds of
samples?

Thanks for taking the time to reading this lengthy post, and I really
appreciate your help and thank you in advance.

Cheers.


Reply With Quote
  #2  
Old   
--CELKO--
 
Posts: n/a

Default Re: PLEASE HELP ME WITH DM PROBLEM - 03-15-2006 , 01:53 PM






I seem to remember "Evolutionary Operation" -- EvOp -- from chemical
manufacturing. The basic idea is small adjustments in multiple factors
to get an optimal setting for a process. There was an assumption of a
local optimal point among the parameters, but a relatively small sample
is needed to adjust things.

Sorry to be so vague.


Reply With Quote
  #3  
Old   
dataminer101
 
Posts: n/a

Default Re: PLEASE HELP ME WITH DM PROBLEM - 03-17-2006 , 10:04 AM



Hi CELKO and thanks for the reply.

I am not familiar with EvOp and As I told another person who kindly
replied to my post, Since the produced model will be used as part of an
alarm/flagging system, I will have to produce a curve of each of the
parameters of interest using 4 values/day=once/6h, and do this for the
44 days, this is to flag and correct any abnormal behaviour ASAP. So,
the whole curve would have 4*44=176 values. E.g. for the water
consumption curve: day1: 12AM=65Gal, 6AM=150, 12PM ...
DAY44=6PM=1500Gal. I would have to come up with similar curves for each
of the parameters of interest (inputs/outputs). Now as far as ANNs are
concerned, do I have to produce 176 of these ANNs, one for each
predicted value? ANN1: input1 (temperature-value Day1@12AM) input2
(humidity-value Day1@12AM)... output1 (feed consumption-value
Day1@12AM), output2 (heater_runtime-values Day1@12AM)... and train the
ANN with the 50-60 samples (Day1@12AM) from previous productions. This
would produce an ANN for predicting the value of each parameter for
Day1@12AM for future productions, etc.... This would quite intensive
computationally, so I am wondering if there is a better way to maybe
feed-in all the 176 values time series in one shot to have something
like input1(temperature-values 1-176), input2(humidity-values 1-176)...
output1(feed consumption-values 1-176), output2 (heater runtime-values
1-175)... and this will produce only one ANN which will predict the
176 values for all parameters of future productions?
I would really appreciate your help as I am really stuck at this.

Cheers.


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.