Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-twitter-algo
The release of the Twitter algorithm, annotated for recsys
https://github.com/igorbrigadir/awesome-twitter-algo
Last synced: 3 days ago
JSON representation
-
Resources for Learning More about Recsys
-
Ukraine
- Recommender Systems Handbook
- Recommender Systems Handbook
- Recommender Systems Handbook
- Recommender Systems Handbook
- Recommender Systems Handbook
- blog
- Recommender Systems course - learning/glossary/recsystems) of terms.
- ACM Recsys - of-the-art recommender systems research is usually openly available in the [proceedings](https://dl.acm.org/conference/recsys). A lot of the presentations are on [youtube](https://www.youtube.com/@acmrecsys/videos).
- RecSys Wiki
- Recommender Systems Handbook
- retrieval, filtering, scoring, and ordering
- systems design interview guide
- Recommender Systems Handbook
- Recommender Systems Handbook
- Recommender Systems Handbook
- Recommender Systems Handbook
- Recommender Systems Handbook
- Recommender Systems Handbook
- Recommender Systems Handbook
- Recommender Systems Handbook
- Recommender Systems Handbook
- Recommender Systems Handbook
- Recommender Systems Handbook
- Recommender Systems Handbook
- Recommender Systems Handbook
- Recommender Systems Handbook
- Recommender Systems Handbook
- Recommender Systems Handbook
- Recommender Systems Handbook
- Recommender Systems Handbook
- Recommender Systems Handbook
- Recommender Systems Handbook
- Recommender Systems Handbook
- Recommender Systems Handbook
- Recommender Systems Handbook
- Recommender Systems Handbook
- Recommender Systems Handbook
- Recommender Systems Handbook
- Recommender Systems Handbook
- Recommender Systems Handbook
- Recommender Systems Handbook
- Recommender Systems Handbook
- Recommender Systems Handbook
- Recommender Systems Handbook
- Recommender Systems Handbook
- Recommender Systems Handbook
- Recommender Systems Handbook
- Recommender Systems Handbook
- Recommender Systems Handbook
-
-
Bias and Manipulation
-
Ukraine
- documented here previously - and-policies/crisis-misinformation).
- one of them - related misinformation used for moderation, or warning labels, there is another _safety label_ for Twitter Spaces called _[UkraineCrisisTopic](https://github.com/twitter/the-algorithm/blob/7f90d0ca342b928b479b512ec51ac2c3821f5922/visibilitylib/src/main/scala/com/twitter/visibility/models/SpaceSafetyLabelType.scala#L39)_. Here are some facts about these labels and their function:
- other Twitter Spaces safety labels
- documented here previously - and-policies/crisis-misinformation).
- describes a particular policy violation, and usually leads to reduced visibility of the labeled entity in product surfaces
-
Deboosting Rival Sites
-
Elon Musk feature
-
-
Frameworks and Metalanguages
- building Scala and Java services
- Thrift - platform framework for RPC calls originally developed at Facebook(Meta)
- Hadoop - Twitter still runs one of the largest installs of Hadoop out there
- Heron - A realtime streaming analytics library (similar to Flink)
- Strato - a virtual database powered by microservices
- recsys
- bazel configuration
- Hadoop - Twitter still runs one of the largest installs of Hadoop out there
- Finagle
- Heron - A realtime streaming analytics library (similar to Flink)
-
Input Data
- input data
- Manhattan - time multitenant distributed database that was initially developed as a serving layer on top of Hadoop and includes both observability and other metrics.
- input data
-
Candidate Generators
-
GraphJet
- CLICK, FAVORITE, RETWEET, REPLY, AND TWEET as input node types
- GraphJet - A realtime Java graph processing library that allows for in-memory processing on a single server and focuses on providing content recommendations. [Paper here.](http://www.vldb.org/pvldb/vol9/p1281-sharma.pdf) Recommendations are provided based on shared interests, correlated activities, and a number of other input signals. GraphJet maintains a [realtime bipartite interaction graph](https://mathworld.wolfram.com/BipartiteGraph.html) that keeps track of user–tweet interactions over the most recent n hours and reads from Kafka. Each individual GraphJet server can ingest one million graph edges per second and compute 500 recommendations/second.
-
SimClusters
- SimClusters - learned vector representation of an entity, in this case a tweet, event, or topic), will return a group of candidate tweets (or events, or topics) that are similar to the input content via [approximate nearest neighbors](https://en.wikipedia.org/wiki/Nearest_neighbor_search#Approximate_nearest_neighbor) lookup using [approximate cosine similarity as a distance metric](https://github.com/twitter/the-algorithm/blob/main/simclusters-ann/README.md#simclusters-approximate-cosine-similarity-core-algorithm).
- built using this algorithm - algorithm/tree/main/simclusters-ann).
-
-
SimClusters Embeddings Algorithm
-
SimClusters
- tweet similarity and recommend similar tweets to users based on their tweet engagement history.
- Metropolis-Hastings
- 450 million MAU -monthly active users - tail content fairly well.](https://arxiv.org/abs/2110.04596)
- See Paper here
-
-
Simclusters Engineering Implementation
-
SimClusters
- CR mixer.
- SimClusters ANN - source/2018/heron-donated-to-apache-software-foundation) job builds the mapping between SimClusters and Tweets. The job saves top 400 Tweets for a SimClusters and top 100 SimClusters for a Tweet.
-
Filters
-
TwHIN
-
-
TwHIN at Twitter
-
1. TwHIN-Follow
-
2.TwHIN-Engagement
-
-
Training
-
2.TwHIN-Engagement
-
RealGraph
- RealGraph
- (code) - algorithm/blob/main/src/scala/com/twitter/interaction_graph/bqe/scoring/candidates.sql#L15).
- (BQ one-liner)
- (BQ query)
-
Earlybird
-
-
Mixers
-
Earlybird
- CR Mixer, - of-network recommended candidate tweets.
-
-
Rankers
-
Light Ranker
- documentation
- In-network model
- Out-of-network model
- TWML - algorithm/blob/ec83d01dcaebf369444d75ed04b3625a0a645eb9/src/python/twitter/deepbird/projects/timelines/scripts/models/earlybird/README.md) project.
- feature weights
-
Heavy Ranker
- parallel masknet - algorithm-ml/blob/main/projects/home/recap/README.md). The [ranker itself](https://github.com/twitter/the-algorithm-ml/blob/main/projects/home/recap/README.md) is run after the candidate generators.
- there are no content-based embeddings
- Input features - based features in the dataset are not used and weighted as much.
- and weighting are here.
-
Scoring Plan
-
-
Filters
-
Scoring Plan
-
-
Ordering
-
Business Terms and Logic
-
Scoring Plan
- "Unix Epoch"
- Notes - engineering-behind-twitter-s-new-search-experience).
- A/B Testing Platform
- assigning unique IDs
- Who to follow
-
-
Changes
-
Ukraine
-
-
Discussions about The Algorithm Elsewhere
-
Scala
Programming Languages
Categories
Resources for Learning More about Recsys
49
Rankers
11
Frameworks and Metalanguages
10
Training
8
Bias and Manipulation
7
Ordering
7
Business Terms and Logic
5
Candidate Generators
4
Simclusters Engineering Implementation
4
SimClusters Embeddings Algorithm
4
Input Data
3
Scala
2
TwHIN at Twitter
2
Discussions about The Algorithm Elsewhere
2
Changes
1
Filters
1
Mixers
1
Sub Categories