Page 52 - Flytxt
P. 52

INFINITE DATA SETS AND


        THE EVOLUTION OF DATA SCIENCE



                         Knowledge  workers easily acknowledge  that events and
                         transactions, recorded as data, form an integrated part of our
                         personal and professional life. Being the first generation to have

                         access to such vast historical record in digital form as data has led
                         to our extreme excitement about opportunities to use this data.
                         Successive generations will evolve the fields of big data and data
                         science further. This article takes a peek at some elements of this

                         evolution.

                                                                            - Dr. Prateek Kapadia, CTO, Flytxt




              ig data essentially concerns itself            customer behaviours have events in
              with large collections of data about  the past as well as in the future. This
       Bevents and transactions recorded                     philosophy of allowing data sets to

        from the past. Allied terms like “fast               extend to the (infinite) future requires
        data” extend this further and fashion                the data scientist to think and prepare
        faster updates to this history. But the              for a future beyond merely “big” data.

        underlying analytics processes on big                Thus, data scientists that predict future
        data analyze the past; to predict the                customer behaviours will have to
        future. Data sets in this discourse are              contend with infinite data sets.
        large, but always finite.
                                                             Dealing with infinite data is

                                                             different

                                                             Theoretically, some function must exist
               Properties of finite data sets
              vary at a relatively lower rate                that will map the historical data set (the
                                                             history) to target values (the prediction).
                than those of infinite data
                sets. Furthermore, data set                  The canonical machine learning problem

               representations will change                   is to find a computationally efficient
             over time for infinite data sets.               approximation of this function. Data
                                                             scientists then build predictive models
                                                             using this approximation to foretell

        Fundamentally, however, the physical                 future behaviour.
        universe is different. Data sets that                Approximations are derived from
        correspond to digital capture of                     compactly representable properties

        information from events and transactions             of data sets – most commonly, the
        by and among humans and machines                     statistical distributions that fit the set.
        aren’t actually finite - these events,               However, properties of finite data sets
        transactions and their data capture will             vary at a relatively lower rate than those

        also exist in the future. For example,               of infinite data sets. Changes in the


       52                                                                              INSIGHTZ - VOLUME 03, 2018
   47   48   49   50   51   52   53   54   55   56   57