Designing Feed Based System
Overview of Search Rank Chapter from educative.io and other sources:
Problem: Design a Twitter Feed system that will show the most relevant tweets for a user based on their social graph
-
Time stamp based approach: All tweets gneerated by a users’ followees since user’s last visit were displayed in reverse chronological order.
-
WE need to rank the most relevant tweets:
-
Scale:
- 500 million DAU and on average each user is connected to 100 users.
- Each user fetches their feed 10 times in a day.
500 min * 10 = 5 billion times ranking system will run.
“Given a list of tweets, train an ML model that predicts the probability of engagement of tweets and orders them based on that score”
-
Goal: Maximize user engagement.
-
User actions can be positive or negative
- Postive actions:
- Time spent viewing the tweet
- Liking
- Retweeting
- COmmenting
- Negative Actions:
- Hiding a tweet
- reporting tweet as inappropriate
- Postive actions:
-
User engagement Metrics:
- Increase User engagement:
- Focus on increasing number of comments
- increase overall engagement i.e comments, likes, and retweets
- Increase time spent on twitter
- Average negative action per user
- All engagements are not equally important. So have different weights for each action.
The above metric is calculated as follows:
- IN a day, 2000 tweets were viewed.
- There were 70 likes, 80 comemnts, 20 retweets and 5 reports.
- The wegithed impact is calcuated by multiplying the occurence by their weights.
- The weighted impact is summed up to determine the score.
- The score is normalized with the total number of users.
Why normalization is important?
- The score is caculated for a period of time for a given number of users. If the score is calculated for a different period for different number of users, then the scores will not be comparable.
Architecture:
References:
- Chip Huyen Machine learning systems design
- How to build your own search ranking algorithm
- https://towardsdatascience.com/what-is-machine-learning-system-design-interview-and-how-to-prepare-for-it-537d1271d754
- https://towardsdatascience.com/how-to-answer-any-machine-learning-system-design-interview-question-a98656bb7ff0
- http://patrickhalina.com/posts/ml-systems-design-interview-guide/
- https://www.reddit.com/r/learnmachinelearning/comments/uu5l9b/new%5Fml%5Fspecialization%5Fby%5Fandrew%5Fng/
- https://www.reddit.com/r/learnmachinelearning/
- https://huyenchip.com/machine-learning-systems-design/toc.html
- https://www.theinsaneapp.com/2021/03/system-design-and-recommendation-algorithms.html
- https://fall2019.fullstackdeeplearning.com/
- https://mlsystemdesign.github.io/
Youtube
- https://www.youtube.com/c/joshstarmer/videos
- https://www.youtube.com/channel/UCB3l7wGZMJ5BuQzOiz6aIqA/videos
- https://www.youtube.com/c/BrandonFoltz/search
- https://www.youtube.com/channel/UCbfYPyITQ-7l4upoX8nvctg
- https://www.youtube.com/c/Deeplearningai/videos
- https://www.kaggle.com/code/vonneumann/benchmarking-sklearn-classifiers/notebook