Designing Feed Based System

Overview of Search Rank Chapter from educative.io and other sources:

Problem: Design a Twitter Feed system that will show the most relevant tweets for a user based on their social graph

  • Time stamp based approach: All tweets gneerated by a users’ followees since user’s last visit were displayed in reverse chronological order.

  • WE need to rank the most relevant tweets:

  • Scale:

    • 500 million DAU and on average each user is connected to 100 users.
    • Each user fetches their feed 10 times in a day.

    500 min * 10 = 5 billion times ranking system will run.

    “Given a list of tweets, train an ML model that predicts the probability of engagement of tweets and orders them based on that score”

  • Goal: Maximize user engagement.

  • User actions can be positive or negative

    • Postive actions:
      • Time spent viewing the tweet
      • Liking
      • Retweeting
      • COmmenting
    • Negative Actions:
      • Hiding a tweet
      • reporting tweet as inappropriate
  • User engagement Metrics:

  • Increase User engagement:
    • Focus on increasing number of comments
    • increase overall engagement i.e comments, likes, and retweets
    • Increase time spent on twitter
    • Average negative action per user
  • All engagements are not equally important. So have different weights for each action.

The above metric is calculated as follows:

  • IN a day, 2000 tweets were viewed.
  • There were 70 likes, 80 comemnts, 20 retweets and 5 reports.
  • The wegithed impact is calcuated by multiplying the occurence by their weights.
  • The weighted impact is summed up to determine the score.
  • The score is normalized with the total number of users.

Why normalization is important?

  • The score is caculated for a period of time for a given number of users. If the score is calculated for a different period for different number of users, then the scores will not be comparable.

Architecture:

References:

Youtube

Related