Now, Later, Never:
A Study of Urgency in Mobile Push-notifications

Beatriz Esteves, Kieran Fraser, Shridhar Kulkarni, Owen Conlan, Víctor Rodríguez-Doncel

Paper Overview

Push-notifications, by design, attempt to grab the attention of subscribers and impart new or valuable information in a particular context. These nudges are commonly initiated by marketing teams or autonomously via fixed rule sets and therefore, subsequent delivery interruptions tend to conflict with subscriber priorities and activities. In this work, we present an ontology used to aid in the annotation of urgency within a notification, based on its text content. We also demonstrate a variety of models capable of distinguishing multiple levels of urgency in a notification which could be used to help subscribers better prioritize information pushed at them, aid marketers creating campaigns and facilitate improved transparency with respect to the delivery-time chosen for pushed notifications.

Springer chapter Preprint

Push-notification and its components

Key features of a common mobile push-notification:

Title - the title of the notification as it appears in the notification drawer.
Ticker - the text description of the notification as it appears in the notification drawer.
Icon - the image related to the notification being pushed.
App - the app which published the notification.
Date posted - the date and time the notification was first pushed to the device, alerting the user.

Crowdsourced Annotation

Notification Data: A push-notification social listening tool, developed by EmPushy, was used to collect a variety of features from notifications pushed in real-time over a period of 550 days. The social listening tool was subscribed to apps sourced from 37 categories of the Google Play Store providing a wide net to be cast for notifications associated with differing types and levels of urgency and relevancy.
Balancing Script: A balancing script was created to ensure that notifications selected for annotation were evenly distributed amongst app categories and individual apps within those app categories. In addition, the text of the notification content was combined and converted to a sentence embedding then ranked using cosine similarity. Notifications which were least similar were included in the final data set for annotation.
Shared Ontology: APN was used to educate the workers subscribed to the task of annotating push-notifications about the different categories of urgency that were identified and respective definitions.
Gold Notifications & Test: A gold-standard dataset was used to educate and evaluate workers as they proceeded with the task. Workers were required to annotate 20 notifications sampled from the gold-standard dataset before they could proceed. Workers were required to achieve a minimum trust score of 90% before beginning the task and maintain it throughout (a test question was scheduled to appear after every 20 annotations).
Annotation: For annotation, the Appen Platform was used as it provided a global workforce and self-service tool set for creating and managing the annotation task at scale. Each notification instance was set to be annotated by at least 3 separate workers and this was extended up to 5 workers unless 3 workers were in agreement. The urgency labels were not mutually exclusive as notifications could be attributed multiple labels at once.

App Push Statistics

Sports apps were seen as the largest generator of notifications with each app pushing an average of 8 notifications daily, followed by News & Magazines (4 notifications).
The apps generating the least number of notifications every day on average were Watch Apps (1 notification), Libraries & Demo (1 notification) and Augmented Reality (1 notification).

Text Features Statistics

The advertools, TextBlob and Codeq-NLP Python packages were used to engineer a number of text features which were shown to be statistically significantly different across varying app category types, as is illustrated in the table below:

Feature	χ²	p
count_stopwords	31526.06	<0.01
count_emojis	56577.36	<0.01
count_capital_words	16594.54	<0.01
count_characters	26342.83	<0.01
avg_word_length	18291.99	<0.01
count_words	22336.36	<0.01
count_numeric_chars	17007.64	<0.01

Classification Algorithms

Related research has shown that the following Machine Learning algorithms have worked well in classification tasks:

Naive Bayes: Zhang, H. (2004). The optimality of naive Bayes. In: Proceedings of the the 17th International FLAIRS conference (FLAIRS2004). p. 562-567. URL: https://www.aaai.org/Papers/FLAIRS/2004/Flairs04-097.pdf
Random Forest: Breiman, L. (2001). Random Forests. Machine Learning, 45(1), p. 5-32. URL: https://doi.org/10.1023/A:1010933404324
AdaBoost: Hastie, T., Rosset, S., Zhu, J., & Zou, H. (2009). Multi-class AdaBoost. Statistics and its Interface, 2(3), p. 349-360. URL: https://dx.doi.org/10.4310/SII.2009.v2.n3.a8
XGBoost: Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'16). p. 785-794. URL: https://doi.org/10.1145/2939672.2939785

The annotated dataset was split into train (80%) and test (20%) sets and two problem transformation approaches were applied for facilitating multi-label classification:

Classifier Chains: for every urgency label, a classifier was created and ordered as a chain such that the first classifier ingested only input features and the subsequent classifiers ingested the input features and outputs of the previous classifiers in turn. More information
Binary Relevance: for every urgency label, a single binary-classifier was created. The final output was the union of predictions made by each individual classifier.

Experiment 1 - Baseline

Experiment 2 - Data Augmentation

Experiment 3 - Time Expressions

List of works used to extract time expressions from the notification text:

SemEval-2013 Task 1 - TempEval-3: UzZaman, N., Llorens, H., Derczynski, L., Allen, J., Verhagen, M., & Pustejovsky, J. (2013). Evaluating time expressions, events, and temporal relations. In Second Joint Conference on Lexical and Computational Semantics (* SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013). p. 1-9. URL: https://aclanthology.org/S13-2001.pdf
SemEval-2007 Task 15: Verhagen, M., Gaizauskas, R., Schilder, F., Hepple, M., Katz, G., & Pustejovsky, J. (2007). TempEval Temporal Relation Identification. In Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval-2007). p. 75-80. URL: https://aclanthology.org/S07-1014.pdf
SemEval-2010 Task 13: TempEval-2: Verhagen, M., Sauri, R., Caselli, T., & Pustejovsky, J. (2010). SemEval-2010 Task 13: TempEval-2. In Proceedings of the 5th International Workshop on Semantic Evaluation. p. 57-62. URL: https://aclanthology.org/S10-1010.pdf
SemEval-2018 Task 6: Laparra, E., Xu, D., Elsayed, A., Bethard, S., & Palmer, M. (2018). Parsing Time Normalizations. In Proceedings of the 12th International Workshop on Semantic Evaluation (SemEval-2018). p. 88-96. URL: https://aclanthology.org/S18-1011.pdf
SemEval-2021 Task 10: Laparra, E., Su, X., Zhao, Y., Uzuner, O., Miller, T., & Bethard, S. (2021). Source-free domain adaptation for semantic processing. In Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021). p. 348-356. URL: https://aclanthology.org/2021.semeval-1.42/

The figure below shows the performance improvement of the urgency classification algorithms when time-expression information was included as an input feature.

The table below illustrates the 10 most frequent time-expressions identified.

Time-expression Label	Num. Notifications
B-Calendar-Interval	2148
B-This	1288
B-Period	830
B-Number	658
B-Frequency	574
B-After	440
B-Year	366
B-Part-Of-Day	322
B-Season-Of-Year	287
B-Last	267

Experiment 4 - Delivery Date

Get in touch

Feel free to reach out to us regarding the research presented.

kieran.fraser [at] adaptcentre.ie
beatriz.gesteves [at] upm.es
ADAPT Centre,
Trinity College Dublin,
Ireland
Ontology Engineering Group,
Universidad Politécnica de Madrid,
Spain