Page 53 - Flytxt
P. 53
statistics of the data must be dealt with
on an ongoing basis for infinite data sets. Infinite data sets require data
Extra-sensory unrecorded events like scientists to advance models
political upheaval and natural disaster, and test them at a speed
and changes in macro-economic, cultural comparable to the arrival
and other trends are inevitable however; of new data. Also, multiple
and these lead to non-homogeneity and models may work on multiple
non-stationarity of the distribution. data representations; hence
Furthermore, data set representations the ability to have many
will change over time for infinite data models being evaluated in
sets, including global notions of outliers parallel is necessary.
and patterns, and local notions of
improved or degraded ability of sensors
and variability in availability of streams. complete and the model is retrained.
Data scientists today build their models Thus models needing frequent manual
assuming homogeneity, stationarity and training don’t suit the cadence and
regularity of representation in the data assurance of decisions required for
set. Then they retrain the models when infinite data sets, and data science must
their (perhaps manual) observation evolve to address this problem.
of the target variables belies these Methods to deal with
assumptions. infinite data
Target values are used for decisions that Infinite data sets require data scientists
affect personal and professional lives. to advance models and test them at a
Human training of models, however, speed comparable to the arrival of new
implies that there is an allowance for data; and to balance this speed with
inefficient decisions until the training is the accuracy of the decision the models
INSIGHTZ - VOLUME 03, 2018 53

