Project overview
The goal
The main goal of the project is to identify hidden (or unknown for the Customer so far) patterns of the trial version usage by users leading to an increase of a probability of purchasing the full version of the product.
Business case
Customer’s main product is a utility software that conducts optimisation of MacOS-based PC. There is trial a version (full-featured time-limited, then turns into batch-size limited) which can be purchased and turned into full, unlimited version. There’s also a possibility to purchase a version from an array of distribution channels, including company website. Customer’s Sales and Marketing Team had a hypothesis that there are certain patterns in a user behavior in main trial version of main product, in other products and on site which indicate higher or lower probability of purchase.
The analysis of collected user behavior data and product purchases data confirmed the validity of the hypothesis and identified several non-obvious trial version usage patterns.
Challenges
- Data volume issue. The growth rate of the historical data (including events generated during product usage, purchasing transactions, app store visiting, feedbacks reading, viewing publications provided by experts, reading advertising and marketing materials, etc ).
- Identification of previous purchase issue. The gathering of historical data was started by the Customer a few years ago so that it was hard to identify which version (trial or full) was used by the user who had bought the product before the organizing of the storage of historical data. The key reason here lied on the same user behavior data were generated by both the trial and full version. As a result, it was hard to distinguish the trial and full version usage under the condition of absence of purchasing transaction record.
- Lack of usage sessions distinction issue. Absence of collected records like “program session start”, “program session finish” stored in the historical data formed a barrier for the analysis from the user session point of view (session durations, user behavior within session, session clustering etc)
- Repetition of user behavior data issue. The same events could occur a lot of times during the short period of time
Business value
- The unknown usage patterns which lead to increasing of purchasing probability were identified. For known usage patterns like product basic operations, their cycle time and batch size and their time alignment and frequency were identified as metrics and that they have a high impact on user decision to buy- the higher are these metrics – the higher purchase probability. As a result, the purchasing prediction model was developed and trained. The accuracy of the purchasing prediction using only recent scanning duration and size was 77.7%. Respectively was shown that such user actions like scanning interruption have a negative impact on the product purchasing.
- The model, which is able to predict the purchasing probability, was built and trained. The highest accuracy of the prediction (95.8%) was achieved in the case of historical data availability within a two weeks period. The accuracy of prediction based on the first hour data only was 94.5%. It is worth noting that such a case of prediction is a reasonable trade-off between the accuracy itself, complexity and resource consumption necessary for data analysis.
- The achieved results enable to build a personalized online discount system based on trial version usage analysis and dynamically calculated purchasing probability.
- Inefficient sites for advertising were identified so that marketing campaigns’ costs can be reduced by the Customer.