Solving Time Series Problem with Genetic Algorithm

Samuel Guedj
3 min readMay 25, 2021

Filter_bubble

Various personalization algorithms are applied in e-commerce and other kinds of websites/apps in order to increase purchases or user engagement.

The following time series were collected over a period of 20 months from a large online retail store. In this study, during the first 10 months (normalized as the period from month -10 to month 0) a contextual personalization algorithm was applied in order to increase user engagement (measured as the accumulated number of ‘Likes’ on products shared via Facebook). As in the case with many contextual approaches, this algorithm suffered from the ‘filter bubble’ problem (https://en.wikipedia.org/wiki/Filter_bubble), and therefore reached a plateau quite quickly.

Nonetheless, this step enabled a more advanced collaborative filtering algorithm to analyze the collected data, learn, and increase user engagement once it was applied in the following 10 months. A classic inflection point was detected after the transition to the new algorithm. A domain expert envisioned the inflection dynamics will behave similarly to the formula f(x) = ax³ + bx² + cx + d. The values of a, b, c and d were found to be 4.8, 12.1, 53.2 and 6219, respectively.

<< Please note: This function is an analytical solution, used here for simplifying the actual study, where a dynamic model was created with ODEs >>

We are given a sample sparse data of users from 10 different European countries. In this project, we will implement a simple GA in order to fit our data to the above-mentioned function, and discover the parameters for a, b, c and d.

Clone https://github.com/sgu-fai/y-data or get y-data/4-Unsupersived/week6-GA/TimeSerie/

I will not detail all code in the post but everything is in the notebook:

  • Import the correct library
  • Manipulate the Excel file
  • parameters of the algorithm
  • the main code/loop generating the GA solution

Let’s focus on the 4 mains GA functions:

  • fitness_function: how close a given design solution is to achieving the set aims.
  • biased_selection: Individual solutions are selected through a fitness-based process
  • recombination (Genetic operators): generate a second generation population of solutions from those selected
  • mutation: alters one or more gene values in a chromosome from its initial state

After playing with those functions for a few hours you start seing the whole world differently. You start feeling in a matrix :)

Here is an implement of those functions

The best analytic solution is pop_analytic = [4.8, 12.1, 53.2 , 6219]. We will compare it with the result of our GA. So let’s generate the analytic curve:

Now let’s plot our data: GA prediction, Analytic solution, and our original data

Analytic solution: [4.8, 12.1, 53.2, 6219] — GA solution: [ 6, 16, -88, 5959]

Conclusion: after a few generations we obtain a relatively close to the analytic solution.

--

--