I’m trying to implement an system for a news with 100k unique readers per day.

The model I want to use is session- and probability based, where the system looks at the previous 3 (or so) articles you’ve read, and recommends 4 articles that you’ll likely want to read next based on other readers with a similar (recent) reading . I’m essentially trying to implement chapter III.A of this study.

Computation wise, this means collecting state transitions from all readers where a 2-tuple of a user’s previous reading history and their updated history (a.k.a state transition) is sent to a collection for probability calculation on demand. A state transition event could e.g. be: ({1,3,5},{3,5,12}) where the numbers are id’s of articles.

Now I’m trying to figure out a reasonable architecture for this. Is streams the best choice for this? If so, Apache Kafka, Spark Streams / Spark SQL, AWS Kinesis Streams / Analytics? What’s a cost efficient choice here? I don’t want to pay hundreds of dollars a month just for this to run.

Or, are streams overkill? Are there simpler/better options?

Source link
and center
thanks you RSS link
( https://www.reddit.com/r/bigdata/comments/8cnk0f/_article_recommendation/)


Please enter your comment!
Please enter your name here