Skip to content

N: Baidu Cloud Storage

Baidu Cloud Storage#

When I used #Baidu-Netdisk today, I found that the author of Pandownload had been arrested "across provinces" in April, with guilt for damaging Baidu's interests. I was angry because of the injustice.

Why it is injustice:

  • The author's motivation is not to make money: I infer that the author does not want to profit with this software, while he can join some profitable projects, earning much more than the sponsorship. (Even charge for this very software directly)
  • It is absurd that Baidu claims the authors make profits :

    • Author's profit: There is a fundamental difference between sponsorship and profit. Open source software, especially open source software free of ads. The main source of economic benefits is sponsorship, or rewards, for the appreciation of the code/function, and the convenience when using it, and to express support for the author. This is not a profit from the sale of software as a service.
    • The error in the loss claimed by Baidu: According to the operating time of the software × the number of people served(users) , it is calculated that the author has helped multiple users save tens of millions of RMB, claiming that this is Baidu's loss. Actually not everyone buys a membership. For example, I downloaded a 4.2G file with very slow speed today and did not pay for membership. There are two reasons: one is that I bought it for a month, but only used it once, psychologically it feels wasted, although it is only 30 yuan/month. The second is that I don’t want to support an evil software company.
    • Collusion between the company and the government: It can be said that Baidu as a slave is protected by the CCP, so that large companies also have public force power . Similar things include Hongmao Medical Liquor and the Huawei 251 incident.
  • Thinking: Why can't other companies replace Baidu Netdisk?

    • Watched [Bilibili: paperclip video] (https://www.bilibili.com/video/BV1Ex411f7r4), the operating cost of Baidu Netdisk is mainly composed of three parts:
      1. Hardware investment: hard disk, server, network cable computer room, etc., which add up to about 100 million
      2. Electricity cost: 200 million yuan per year.
      3. Internet fee: 400 million yuan per year.
    • I think there are errors in this calculation.
      1. Hardware: Hard disk capacity is relatively small. The demand for hard disk capacity is much smaller than the user's online storage space. Among users’ network disks, most of the files are identical and stored multiple times. With indexing at user-side at upload, duplicate files in the server can be avoided, and a compression algorithm can be used in the background constantly if needed. When sharing files, the file index is not even needed, they are just permissions. For a large number of users, the actual hard disk capacity required may be less than one-tenth of what users perceive.
      2. Electricity fee: The server and hard disk do not have to be turned on all the time. And, according to the purpose and frequency:
        • Files for Individual/group storage (such as photos, etc.) are rarely used. If there is no active user, the hard disk can be shut down. When demanded, use a high-priority command to start up the corresponding hard disk. Further more, the user's habit can be predicted, always few but large blocks of time.
        • Files for the purpose of sharing: If there is a preview function, such as for PDF, put the beginning of these files on the always-on server to handle the request. If it is only for download, the demand is easier. The data can provided directly from those currently running servers, and turn on other servers later.
        • In addition: Electricity is mainly spent on calculations. The cost of reading data should be little. TOLEARN (not sure)
      3. Internet fees.
        • Bandwidth may come from P2P: When I use Baidu netdisk to download, My computer become very slow. I think P2P function in the background is running. Baidu only needs to host the P2P service(pair the downloader with the users who have downloaded the files). So that Baidu does not need to maintain a so high bandwidth.
        • The high profit of large bandwidth membership: Ordinary membership is not provided with the download acceleration service. Only super members with ¥30 per month can.
        • Even if one paid ¥30, maybe will use the acceleration service only once in a month. That is to say, the cost of one-time acceleration is ¥30, Baidu can make huge money from it.
        • The cost of bandwidth: Is bandwidth really so expensive? I think even 4G data-packages are cheaper than Baidu's bandwidth. Moreover, the demand of file transfer is not sensitive to the delay of the connection, also the cost of the wired connection should be much lower than that of the mobile phone data. The bandwidth should be much cheaper than it is claimed.
    • The ideal charging model: Allow users to purchase for data traffic. If bandwidth is expensive in China, the monopoly of state-owned enterprises is also culpable.
    • Foreign Cloud Storage services are more comfortable to use than Baidu's. For example, #Mega I used before.
  • Postscript: I want to talk about this topic because I saw the software #Roam-Research yesterday. Because of the high charge, and poor reputation, they lost many users. People who payed a lot for the service must expect higher service quality. In this situation, #Obsidian and #org-roam immediately come out. The difficulty of these services lies not in technology but in ideas and innovation. I do not support Roam-Research because they sell SaaS, not code. I am willing to pay for the code, but the profit margin between code and service should not be that big.
  • Other ideas

    • The cost of this service for the network disk is very low, and the development cost may increase because of the need to censor the content in the mainland with new algorithms. Even so, the average cost on a large user base is tiny. (I think no one wants to review in their own cloud storage service)
    • Because of content censorship, Baidu has state support and can charge high fees.
    • Starts an Honest Comany, such as telling customers the cost of data traffic when downloading, and the payment for developers. When the dev group is small, a negative balance is acceptable (maybe earn it back later). For new demand, price it or publish cost of hiring someone to do it.(for both customer and developer). And when publishing tasks, NLP can also be used to standardize the requirements(generating tags is a kind of standardization).

Chinese version: 今天我用百度云盘的时候发现 Pandownload 的作者已在四月被跨省抓捕,罪责是损害了百度的利益。 我为此打抱不平,有些愤怒。

  • 打抱不平的原因:

    • 作者的动机不是盈利:我推断,作者没有想依靠这个软件盈利。他完全可以开发盈利性项目,赚到到前比赞助费多。(甚至直接对这个软件收费)
    • 百度主张的作者获利很荒谬:
      • 作者获利:赞助和利润有本质上的区别。开源软件,特别是免广告的开源软件。经济利益的主要来源是赞助,或者叫打赏,是出于对代码/功能的欣赏,和使用时感受到的便利,对作者表示支持。这不是软件作为服务出售的利润。
      • 百度声称的损失中的纰漏:根据软件的运营时长×服务人数,计算出作者帮助了多名用户省了上千万人民币,声称这是百度的损失。 实际上不是每个人都会买会员。 比如我今天用龟速下载了4.2G文件,没有支付会员。原因有二:一是,买一个月,但只用一次,心理上感觉浪费了,虽然只是30元/月。二是,不想支持一个邪恶的软件公司。
    • 公司和政府勾结:可以说,百度作为奴隶,被中共保护。大公司也拥有了公权力。类似的事还有鸿茅药酒,华为251事件等。
    • 思考:为什么不能有其他公司代替百度网盘?
    • 看了哔哩哔哩的回形针视频, 百度网盘的运营成本主要是三部分:
      1. 硬件投入:硬盘,服务器,网线机房等,加起来大概一亿
      2. 电费:每年二亿。
      3. 网费:每年四亿。
    • 我觉得他成本计算的有问题。
      1. 硬件:真是硬盘容量很少。硬盘容量的需求远小于用户的在线存储空间。一群用户的网盘中,存储的大部分是相同文件的多次。上传时检测索引的方式可避免在服务器硬盘中储存重复文件,后台还可以用压缩算法。分享的文件不需要声称索引,直接给用户开放权限即可。对大量用户,实际需要的硬盘容量可能不到用户感知到的十分之一。
      2. 电费:服务器和硬盘不必一直开启。根据用途和频次不同:
        • 个人/团体的储存性文件(如照片等),使用到的频次很少。如果要是没有用户查看,硬盘可以关闭。需要读取时,用一个高优先级的指令启动对应的硬盘。而且用户的习惯可以预测,少次的大块时间。
        • 以分享为目的的文件:如果提供预览功能,如PDF,可以把这些文件的开头部分放在一直开启的服务器,调用到时再打开。如果只提供下载功能,需求更少,从正在运行的服务器直接先提供数据,再开启其他硬盘即可。
        • 另外:电量主要花在计算方面,读取数据花了电费,应该不多。TOLEARN(不确定)
      3. 网费方面。 * 带宽有可能来自P2P:我在使用百度网盘下载的时候,电脑变得很慢很慢。 我认为他后台开启了P2P功能。百度只需要配对已有文件的用户和正在下载用户。这些技术都是现成的。 所以说就算服务器的带宽不大,也没有问题,因为还有用户带宽。 * 大带宽收费的高额利润:百度的普通会员不能享受下载加速服务,只有每月¥30的超级会员才可以。 * 用户支付了¥30,但一个月内只下载了一次,相当于下载一次的成本就是¥30这个成本还是比较高的,因此百度的利润很高。 * 带宽的成本:带宽有那么贵吗?我认为甚至4G套餐都没有百度的带宽贵。而且文件传输的特性是对连接的延迟不敏感,通过有线连接的方式,成本应该比手机套餐低很多。 * 理想中的收费模式:让用户购买流量,可以比成本价高一些。如果带宽在国内很贵,也有国企垄断造成的原因。
    • 为什么国外的网盘用着比百度的舒服。 比如,我之前用的 mega 。
  • 后记:想说这个主题,也是因为昨天看了 roam research 这个软件。 因为收费太高,导致很多用户流失,同时口碑很差,因为支付高额服务费的人一定希望更高的服务质量。在这个基础上,立刻就有人写了 Obsidian 和 org-roam。这些服务的难点不在于技术而在于想法和创新。我不支持 Roam Research 也是因为他出售的 SaaS ,而不是 Code 。我愿意为 Code 支付,但 code 和 service 之间的利润空间不应该那么大。
  • 其他想法

    • 为网盘这项服务成本很低,可能在大陆因为需要用各种新算法来审查内容,开发费用会上升。即使如此,平均到庞大的用户群身上的成本也很小。(我认为没人想在自己的网盘服务中进行审查)
    • 因为内容审查,百度有了国家支持,可以收费很高。
    • 做一个"诚实的公司Honest Comany",比如下载时候告诉用户流量的成本, 开发人员的工资等。一个人做的时候可以接受负数盈利(以后再挣回来)。如果要开发新的需求可以对需求标价(类似拍卖),或者写出雇佣员工的成本。其次,发布任务和接受需求时,会有语言理解方面的问题,也可以用 NLP 来标准化(生成 tag 就是标准化的一种)。