Page 52 - Flytxt
P. 52
INFINITE DATA SETS AND
THE EVOLUTION OF DATA SCIENCE
Knowledge workers easily acknowledge that events and
transactions, recorded as data, form an integrated part of our
personal and professional life. Being the first generation to have
access to such vast historical record in digital form as data has led
to our extreme excitement about opportunities to use this data.
Successive generations will evolve the fields of big data and data
science further. This article takes a peek at some elements of this
evolution.
- Dr. Prateek Kapadia, CTO, Flytxt
ig data essentially concerns itself customer behaviours have events in
with large collections of data about the past as well as in the future. This
Bevents and transactions recorded philosophy of allowing data sets to
from the past. Allied terms like “fast extend to the (infinite) future requires
data” extend this further and fashion the data scientist to think and prepare
faster updates to this history. But the for a future beyond merely “big” data.
underlying analytics processes on big Thus, data scientists that predict future
data analyze the past; to predict the customer behaviours will have to
future. Data sets in this discourse are contend with infinite data sets.
large, but always finite.
Dealing with infinite data is
different
Theoretically, some function must exist
Properties of finite data sets
vary at a relatively lower rate that will map the historical data set (the
history) to target values (the prediction).
than those of infinite data
sets. Furthermore, data set The canonical machine learning problem
representations will change is to find a computationally efficient
over time for infinite data sets. approximation of this function. Data
scientists then build predictive models
using this approximation to foretell
Fundamentally, however, the physical future behaviour.
universe is different. Data sets that Approximations are derived from
correspond to digital capture of compactly representable properties
information from events and transactions of data sets – most commonly, the
by and among humans and machines statistical distributions that fit the set.
aren’t actually finite - these events, However, properties of finite data sets
transactions and their data capture will vary at a relatively lower rate than those
also exist in the future. For example, of infinite data sets. Changes in the
52 INSIGHTZ - VOLUME 03, 2018

