
From Likes to Lies: The Untold Story of TikTok’s Algorithm
I really enjoyed the boldness with which Arman, who quit his $330k engineering job at TikTok, replied to great questions. Some people on X replied with sarcasm that the claims I’m going to show you were for engagement. Maybe. Maybe not.
DISCLAIMER: I never used TikTok (many reasons, from atrocious bias to data security). Months ago, I wrote a private article about my concerns regarding keylogging. I was advised not to publish it and I didn’t.
I’m firmly against what TikTok secretly did and still does. But at the same time, I recognize beautiful engineering when I see it.
Here’s my compilation of the best questions and answers. I’ve focused on the technical and human dimension. I ignored the political dimension.

I find this a bit excessive so let’s break things down. The TikTok algorithm is based on a lightweight recommendation system named Monolith. It was created by ByteDance engineers.
I’ll share the foundational paper because these engineers deserve the credit.

Nothing would have happened without these engineers. Of course, if you think socially or geopolitically, did many terrible things happen because of them? Come up with your own answer. This article isn’t a blame game: we must focus on other dimensions for clarity.
Monolith is a deep learning framework for large-scale recommendation modeling and its 2 key features are
- collisionless embedding tables
- real time training
Real time training means the system learns and updates its recommendations as soon as new data comes in. If there’s a sudden trend like a new song everyone starts listening to, the system will quickly adjust its recommendations to show this new trend to users, without delay.
But because Monolith uses TensorFlow, a construction kit for building AI models, there’s real-time training and batch training: the AI model can learn from a large amount of data all at once (helpful to learn from historical/past data).
Collisionless embedding tables helps with differentiation and understanding. Suppose you have a bunch of unique stickers, each representing different items or users. In a recommendation system, you want each sticker to have its own unique spot on a board where it won’t overlap with another sticker. You want every item or user to be represented distinctly.
Now you understand the meaning of the keyword COLLISIONLESS: no collision!
I suggest reading Monolith: Real Time Recommendation System With Collisionless Embedding Table (PDF) to go deeper. Amazingly, the Monolith source code is on GitHub 😀
TikTok’s Back-end programming languages

Assembly was a joke but Go, Python, C++ and Java are all common choices for large-scale applications. I’m sure there are a few other languages (android backend services and server-side logic in iOS, web app, etc).
TikTok: Scale and user engagement

Imagine the integration of microservices, caching, load balancing, sharding, monitoring… unreal! Imagine 58 minutes of user engagement locked into a single closed ecosystem… Pure addictiveness. A dream for marketers, neuroscientists and those who analyze behavioral patterns.

True love of the TikTok app

Brutal honesty. Refreshing. I personally boycott the app but I refuse to let my disdain cloud my analysis. No pun.
About his teammates

I 100% agree. And let me add that for data giants, what should matter is how people think, not where they live. I have an article in mind on this topic because pay based on location isn’t acceptable. News flash: top talent isn’t paid based on location in the real world (source: me).

Price is what you pay, value is what you get —Warren Buffet.

No answer… but a true answer nevertheless. Top engineers are irreplaceable. I’ve worked with some of the best. It makes a BIG difference. They can be incredibly smart, solve complex problems, and sometimes ruin everything after 3 weeks of all-nighters because of a stupid killall (source: don’t ask —but also yes).
Cybersecurity concerns: nothing to see

I loved the “supposedly” keyword. Although this will be my only comment on the topic of cyber threats, the reality when I investigated was worse than anything I had imagined. Zero trust as far as I’m concerned.
Data Scraping

Wrong. Not the same as America. Different approach. Jilian is right here: it is a little sus.
Bytespider: aggressive ByteDance web crawler
Please don’t get me started on the high crawl rate of Bytespider! ByteDance’s web crawler aggressively scrapes content from websites to fuel ByteDance’s AI models (and for TikTok).
I’ve noticed a complete disregard for small website owners’ content and infrastructure. Bytespider often ignores the crawling directives in the robots.txt file: RFC 9309 – Robots Exclusion Protocol
There is zero transparency: the purpose and extent of this systematic data collection by Bytespider is not clear at all. High risk of potential misuse + real privacy concerns.
I hope you realize that you could be against TikTok, you could boycott TikTok and yet your data could end up in ByteDance’s obscure dataset because they aggressively scrapped content from your website. You have a portfolio? You shared pictures? You have an online CV/Resume with personal data? Your work employment history? Well…
I suggest blocking the Bytespider user agent at the web server level (Nginx/Apache) but that won’t be enough. You should add firewall rules to block all traffic from Bytespider. Easy to do in Cloudflare but ByteDance has a lot of IPs. Let me know if you want a guide to protect your data.
Speaking of user privacy

No comment but imagine the compound effect of an average of 58 minutes spent daily on a closed app when your brain isn’t even formed yet. You are so young and immature that you’ll share anything spontaneously about yourself and close ones. A privacy treasure and a privacy nightmare. No need for eavesdropping (passive listening through smartphone mics) here. I’m not saying it isn’t implemented, I’m just saying you can count on users to do the wrong thing.
Security threats? Yes? No? Maybe?

YouTube Shorts (Google/Alphabet) & Instagram (Meta)

You may think the gap between TikTok and competitors like YouTube and Instagram is subjective. But TikTok has a significant lead in the short-form video market. And that’s primarily because of an algorithmic reality: their algorithms are simply better in some regards.

I also should point out a common misconception: the algorithm alone isn’t responsible for TikTok’s success. It is crucial but there’s an interplay between the algorithm and networked data.

In this question, “networked data” is the mind-blowing amount of data users generate on social platforms (think interactions, preferences, and content consumption patterns). This data is the fuel that powers the TikTok algorithm. The algorithm processes this networked data to personalize the user experience, recommend “relevant” content, and predict what will keep users engaged.

Yep: there will be no selling under any circumstances if the algorithm is included. Why? Because the initial algorithm has been fine-tuned and it reached such an advanced level that it is now quite different from the foundational algorithm. It is no longer a Monolith, it is a Masterpiece. That’s what makes the advanced version “priceless”.
Yesterday, I trolled under a CNBC post claiming Perplexity AI was about to acquire TikTok. I quoted the previous CNBC article contradicting the current CNBC article.

Virality on social platforms

Not great long-term advice for technical and human reasons. Social platforms constantly update their algorithms and if you rely on past successes, you’ll end up failing. What worked before might not work now and what works now won’t work tomorrow. If you think short-term, sure: repeating the same elements might work for a while, but audiences get bored quickly.
But there’s a reason very few engineers are also content creators. There are safe bets however: being honest, respectful and helpful will never go out of style, no matter your level.
For normal people, creating high-quality and engaging content is extremely time-consuming.
Unless you are beautiful, there’s no secret formula, just hard work that no one sees. Sometimes, you can have surprising results because virality means the whole world discovers you in a few hours. For good or bad reasons. Everything is easier for beautiful people, offline and online. Reinforcement!

Each social app can become a prison that follows you everywhere.

© 2025 Elie Berreby – First publication: semking.com on January 20, 2025 at 5 AM GMT+2.
If you want to download the PDF file: From Likes to Lies, The Untold Story of TikTok’s Algorithm (3 Mo)