<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Speed on FastDataScience.eu</title>
    <link>https://fastdatascience.eu/tags/speed/</link>
    <description>FastDataScience.eu (Speed)</description>
    <generator>Hugo -- gohugo.io</generator>
    <copyright>en-us</copyright>
    <lastBuildDate>Sun, 08 May 2022 00:00:00 +0000</lastBuildDate>
    
    <atom:link href="https://fastdatascience.eu/tags/speed/index.xml" rel="self" type="application/rss+xml" />
    
    
    <item>
      <title>Why Speed Matters</title>
      <link>https://fastdatascience.eu/post/2022-05-08-why_speed_matters/</link>
      <pubDate>Sun, 08 May 2022 00:00:00 +0000</pubDate>
      <guid>https://fastdatascience.eu/post/2022-05-08-why_speed_matters/</guid>
      <description>&lt;p&gt;The vast bulk of data science work is done using &lt;a href=&#34;https://python.org&#34;&gt;Python&lt;/a&gt;
and &lt;a href=&#34;https://www.r-project.org&#34;&gt;R&lt;/a&gt;, and that&amp;rsquo;s fine. Those languages are
well suited to analytics, and make available a rich infrastucture of libraries
and documentation.&lt;/p&gt;
&lt;p&gt;Looking at just Python (R is similar), there is however a problem. Python is
&lt;em&gt;slow&lt;/em&gt;, due to its interpreted execution model and dynamic typing. As a Python
program runs, it is constantly checking for the types of different variables
and data, and for the feasibility of certain operations such as converting
data types and expanding lists. While this makes for fast development and
prototyping, it can be very slow for some types of analysis.&lt;/p&gt;
&lt;p&gt;The penny dropped for me after I worked on a price optimization project for
a global beer company, which optimized wholesale and retail prices across
four countries, partly using complex procedural logic to calculate the
impact of price changes on volume. The Python optimization was done using
simulated annealing, using the standard scikit-learn library. The optimization
took &lt;em&gt;twelve minutes&lt;/em&gt; to run, and defeated our hopes of running it in real
time behind an interactive user interface.&lt;/p&gt;
&lt;p&gt;The problem was that the objective function (which needs to run hundreds or
thousands of times as the optimization explores the solution space) consisted
of about 200 lines of Python. While the simulated annealing was presumably
efficient, this complex objective function code made the optimization a
slow process.&lt;/p&gt;
&lt;h2 id=&#34;discovering-alternatives&#34; &gt;Discovering Alternatives
&lt;span&gt;
    &lt;a href=&#34;#discovering-alternatives&#34;&gt;
        &lt;svg viewBox=&#34;0 0 28 23&#34; height=&#34;100%&#34; width=&#34;19&#34; xmlns=&#34;http://www.w3.org/2000/svg&#34;&gt;&lt;path d=&#34;M10 13a5 5 0 0 0 7.54.54l3-3a5 5 0 0 0-7.07-7.07l-1.72 1.71&#34; fill=&#34;none&#34; stroke-linecap=&#34;round&#34; stroke-miterlimit=&#34;10&#34; stroke-width=&#34;2&#34;/&gt;&lt;path d=&#34;M14 11a5 5 0 0 0-7.54-.54l-3 3a5 5 0 0 0 7.07 7.07l1.71-1.71&#34; fill=&#34;none&#34; stroke-linecap=&#34;round&#34; stroke-miterlimit=&#34;10&#34; stroke-width=&#34;2&#34;/&gt;&lt;/svg&gt;
    &lt;/a&gt;
&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;During some vacation after the project, I took advantage of the down-time
to learn &lt;a href=&#34;https://julialang.org&#34;&gt;Julia&lt;/a&gt;. As a specific project, I rewrote the
optimization in Julia, using its optional static typing, and an open-source
simulated annealing libary, and the execution time went from &lt;em&gt;twelve minutes
down to six seconds&lt;/em&gt;.  This massive speed improvement (a factor of 120x),
brought the idea of a user interface running the simulation in the background
into the realm of
possibility.&lt;/p&gt;
&lt;p&gt;A few months later, I decided to learn &lt;a href=&#34;https://go.dev&#34;&gt;Go&lt;/a&gt;. Again, I rewrote
the optimization, using an open-source simulated annealing implementation
around the recoded objective function. This time, execution was even faster,
reaching &lt;em&gt;0.6 seconds&lt;/em&gt;. This was now performant enough to enable the
interactivity we had been hoping for, and Go&amp;rsquo;s suitability for microservices
was another strong enabler of this vision.&lt;/p&gt;
&lt;p&gt;It should be noted that neither Julia nor Go involved a massive rewrite of
the original Python objective function. Both languages allow for a procedural
style and syntax that is not very far from Python&amp;rsquo;s, so the translations were
reasonably forward, and took about half a day in each language.&lt;/p&gt;
&lt;h2 id=&#34;its-not-just-about-saving-time&#34; &gt;It&amp;rsquo;s not just about saving time
&lt;span&gt;
    &lt;a href=&#34;#its-not-just-about-saving-time&#34;&gt;
        &lt;svg viewBox=&#34;0 0 28 23&#34; height=&#34;100%&#34; width=&#34;19&#34; xmlns=&#34;http://www.w3.org/2000/svg&#34;&gt;&lt;path d=&#34;M10 13a5 5 0 0 0 7.54.54l3-3a5 5 0 0 0-7.07-7.07l-1.72 1.71&#34; fill=&#34;none&#34; stroke-linecap=&#34;round&#34; stroke-miterlimit=&#34;10&#34; stroke-width=&#34;2&#34;/&gt;&lt;path d=&#34;M14 11a5 5 0 0 0-7.54-.54l-3 3a5 5 0 0 0 7.07 7.07l1.71-1.71&#34; fill=&#34;none&#34; stroke-linecap=&#34;round&#34; stroke-miterlimit=&#34;10&#34; stroke-width=&#34;2&#34;/&gt;&lt;/svg&gt;
    &lt;/a&gt;
&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;The point that excited me was not the speed per se (since we routinely
tolerate code that takes a long time to run, and adapt our workflows
accordingly). It was the &lt;strong&gt;new set of possibilities&lt;/strong&gt;, either through the
quick calculation of a lot more parameters or scenarios, or the ability
to do calculations fast enough for users to explore the problem space
in an interactive way, which is not possible when it takes 12 minutes
or longer to recalculate.&lt;/p&gt;
&lt;p&gt;In this blog site, then, I&amp;rsquo;d like to share my continuing journey around
fast data science, and using different languages, architectures, and
algorithms to enable new explorations in data science.&lt;/p&gt;
</description>
    </item>
    
  </channel>
</rss>
