A million-yuan annual salary to recruit fresh graduates with master's and doctor
Dr. Wang Shaodi is a rare young founder in the industry who not only understands technology and the market but is also good at expressing and outputting. He was admitted to Peking University in the 2007 National High School Physics Competition. After graduation, he went to the University of California, Los Angeles to pursue a master's and doctorate, focusing on the research of memory and computing-related fields.
In 2017, he founded Zhicun Technology and served as CEO. In just five years, he mass-produced the world's first in-memory computing chip and achieved commercial use in millions of terminals. Now the company's annual revenue exceeds 100 million yuan. In this interview, Dr. Wang Shaodi shared the present and future of in-memory computing with DeepTech, introduced Zhicun Technology's positioning in the era of multimodal large models, and the latest released "Genius Doctor Plan".
AI is shifting from a computing-centric to a memory-centric approach.
When talking about the original intention of founding Zhicun Technology, Dr. Wang Shaodi recalled, "At the end of 2016, AI was very popular. As AI shifted from machine learning to deep learning, the performance requirements for memory were getting higher and higher. We saw this trend, but at that time, the concept of 'computing-in-memory' was quite new, and some international giants were not very active in investing in this technology. We didn't want to miss this new opportunity, so we finally decided to start our own business to develop computing-in-memory technology. Now we are very fortunate to have made this decision. The changes in the past few years have been very fast, and the current large models have a demand for memory capacity and bandwidth that is 10,000 times higher than the deep learning models at that time."
Advertisement
Computing-in-memory technology is a new computing architecture that overcomes the bottleneck of the traditional von Neumann architecture through the method of "fusion of computing and storage". Combined with advanced packaging and new storage devices, this technology can further improve computing efficiency. "The computing unit of the in-memory computing chip is also a storage unit. By modifying the topological structure of the memory, what is read out from the memory is actually the result of a matrix multiplication and accumulation calculation, and the multiplication and accumulation are not implemented by multipliers and adders, but directly completed by the physical properties of the memory unit, which can greatly save memory read and write, and achieve an order of magnitude improvement in computing efficiency," he explained.In Wang Shaodi's view, "When AI shifts from being computation-centric to memory-centric, the entire cost of computation and the performance bottlenecks all fall on the side of memory. For example, currently about 70% of the cost of a GPU is memory, and the performance bottleneck is also in memory (more than 80% depends on the performance of memory). Therefore, developing in-memory computing technology is actually about increasing the effective bandwidth on the memory side to solve the computational power bottleneck. After all, one of the current bottlenecks in AI computation is the read bandwidth, and improving the efficiency of data reading can greatly enhance computational efficiency."
Regarding how the current in-memory computing architecture can be further optimized, Wang Shaodi believes there is a lot of room for improvement. "From a fundamental perspective, the in-memory computing technology we are developing now can only be developed within the existing memory system. However, existing memories are mostly optimized for storage rather than for computation. If you want to optimize the memory for computation, you need to sacrifice some of the storage density. For example, reducing the storage density by several times can result in a 1000-fold performance improvement."
"From a design perspective, with the support of advanced packaging, the redesign may also bring about a performance improvement of about 10 times," he pointed out. "For example, performing calculations in memory will inevitably increase the circuit for control. If the circuit is arranged in the chiplet (instead of in the memory) and then connected together through advanced packaging, then the optimization of the process around the memory system can be more extreme. Design and packaging, there is a very large space to explore in these two dimensions," he said.
Taking the lead in mass production and commercial use, Zhi Cun enters the third stage: stimulating larger application scenarios and revitalizing the ecosystem.
The development process of Zhi Cun Technology since its establishment can be roughly divided into three stages. "The first stage, the integration of storage and computation, was widely regarded by the industry as an unattainable thing. At that time, our idea was to first develop a demo of an in-memory computing chip that can run simple algorithms, to prove to the industry that in-memory computing chips can run, even under non-advanced process technology, it can still output very strong performance," Wang Shaodi introduced."In 2019, we developed a prototype in-memory computing chip using analog computing methods, which could run large algorithms with only 8M capacity, marking the first time in the industry," he added.
A product needs to form a commercial closed loop and possess cost-effectiveness and reliability when moving from the early laboratory stage to the market. "After 2019, the company entered the second stage, where the team absorbed talents in the fields of software, SoC, etc., and tried to start with a small scenario (wearable electronic devices)," he said. "After two years of research and development, we released the first in-memory computing chip in 2021, and then spent another year to achieve mass production of this chip." He stated, "This stage is mainly to prove that in-memory computing chips have advantages in performance, reliability, and cost-effectiveness, which are very important factors for a new technology."
Starting from 2022, after the in-memory computing chip was mass-produced and commercially available, Wang Shaodi led the team to start thinking about how to further improve the chip performance, and the company's development also entered the third stage, beginning to explore more cutting-edge technologies around the storage-computing integration technology and developing more efficient computing technologies. "As the chip performance improves, the scenarios and application range that can be covered will be more extensive, so next we will continue to upgrade in terms of energy efficiency, cost, computing power, etc.," he said.
In Wang Shaodi's view, the storage-computing integration technology also conforms to the development of Moore's Law, and there will be tens to hundreds of times performance improvement every year in the next 5-10 years. "In order to achieve higher computing power per unit area for in-memory computing chips, the R&D team remodels the chip, including the underlying array design, etc., exploring the theoretical limits of the chip," he said. "In the early stage, we developed products with problems in mind, now we are constantly approaching the theoretical limits by solving problems. With the deepening of research, we have also achieved some results, and the computing power per unit area of the chip has been improved by 20 times in two years, in addition, the chip integration scale can also be improved by 10-20 times every year."
Taking the second-generation 3D in-memory computing architecture independently developed by Zhi Cun Technology as an example, "The 3D in-memory computing architecture mainly brings two advantages, first, it can decouple the storage-computing integration part and the data I/O part, achieving 'fine division of labor' in performance; second, the storage capacity can also be further improved," Wang Shaodi pointed out.Regarding chip power consumption, he explained, "The high power consumption in AI computing mainly consists of two parts: one is the power consumption of the computation itself, which is currently largely limited by heat dissipation; the other is the I/O power consumption, which inevitably exists when multiple chips are required for integration due to the inability of a single chip to complete a complex task, including the power consumption of I/O and data communication. In-memory computing fundamentally saves the computation process, and then combines them together in a 3D manner, thus the I/O power consumption is also greatly saved. Overall, a rough estimate suggests there is a potential for a 50-100 times reduction in power consumption."
At present, large models are limited by model size and output speed on the edge side, and can only perform simple tasks such as voice-to-text and sound synthesis, without the ability to possess common sense and summarization. The reason lies in the small model dimensions.
Focusing on the market application of large models on the edge side, Wang Shaodi believes, "The AI field has always been driven by computing power. At this stage, the edge side is still a blue ocean market, as there is currently no chip on the edge side that can run a truly large model; at the same time, the edge side has a very strong demand for 'real-time', reducing latency can enhance the dependency of people's use and stimulate more application scenarios."
He stated that only excellent products can stimulate application scenarios, and then open up a larger ecosystem and market. For example, the reason why CPUs can form an ecosystem is mainly because they have achieved a good binding with the operating system, and the current popularity of AI is largely attributed to Nvidia's AI chips, which have enabled more and more application scenarios based on the improvement of chip computing power.
"In fact, including the emergence of large models and the changes in deep learning before, they are inseparable from the improvement of AI chip computing power. The significant increase in computing power has stimulated many application scenarios in the cloud. At present, mainstream AI chips have established a good ecosystem in the cloud, but it is very difficult to establish an ecosystem on the edge side. This is an opportunity for us, relying on in-memory computing technology to stimulate a wide range of application scenarios on the edge side, which can naturally form an ecosystem and then form a barrier," he said.On the edge side, IDC predicts that by 2025, the total number of global IoT devices may exceed 40 billion, generating data volume up to 80ZB. In many scenarios such as smart cities, smart homes, and autonomous driving, more than half of the data will need to rely on local processing on the edge side.
"In terms of product development, the edge side is one of our layout directions. We focus on developing in-memory computing chips to enhance the capabilities of large models and control costs. The goal is to improve computing power to stimulate more application scenarios, and then form an ecosystem to bring a larger market," said Wang Shaodi.
Another direction is the cooperation with leading customers to bind products. "In-memory computing can achieve stronger performance under the same area. For example, for edge-side devices, developing an in-memory computing chip with mature process technology, its computing efficiency can exceed the industry-leading chips by more than 5 times," he said, "While reducing costs, we also improve performance. This requires more suitable and powerful applications to make full use of these improved performance. Therefore, we bind with leading customers to achieve innovation in algorithms, application methods, product effects, and other aspects."
Launch the "Genius Doctor Plan", collect fresh master and doctoral talents with a million-level salary.It is understood that in May of this year, Zhicun Technology established the "In-Memory Computing Joint Laboratory" with Peking University, launched the "industry-academia-research" integration strategy upgrade, and will continue to invest nearly 100 million yuan to strengthen in-depth cooperation with top universities across the country. In the near future, it will also establish cooperation with other universities.
"In the academic community, in-memory computing is also a very popular field. I have led my team to visit many universities for in-depth exchanges and have seen many innovative ideas in the academic community that have not yet been seen in the industry. Moreover, many ideas in the academic community are actually very mature, which is the focus of our industry-academia-research integration," said Wang Shaodi.
"For example, around some mature market application scenarios, the technical difficulty of developing in the academic community is very high, but in a new application scenario, the technology in the academic community may be directly applicable. Transplanting these technologies from the academic community to the industry, we believe this is a very good integration point between the academic community and the industry," he said. "For example, a new technology needs to achieve 10 in the mature storage market, and the academic community has currently achieved 1. However, in the new field of in-memory computing, this technology may only need to achieve 2, which can be more quickly promoted to industrialization. At the same time, we can also propose some requirements to the academic community. The technological revolution brought by AI is very fast, and we hope to tell the academic community what we see and guide the academic community to carry out research in some new directions."
The industry-academia-research strategy upgrade is an important step for Zhicun Technology to promote the innovation of in-memory computing technology. In addition, talent training and team capability enhancement are also the core strategies for the future development of Zhicun Technology. "For 7 years since its establishment, the proportion of R&D personnel in Zhicun Technology has always been maintained at more than 80%, and has been doing research on innovation. Through this interview, I would also like to invite and warmly welcome fresh graduates with master's and doctoral degrees who have the same technical dreams to join Zhicun Technology," Wang Shaodi emphasized the desire for outstanding talents at the end of the interview.
It is reported that in order to build a more innovative R&D team, Zhicun Technology has launched the "Genius Doctor Plan" to attract fresh graduates with master's, doctoral, and post-doctoral degrees from around the world to jointly devote themselves to the research and product development of advanced in-memory computing technology."Especially those with a solid background in physics, mathematics, and chemistry who are willing to explore and research in the fields of semiconductor devices, processes, circuit design, algorithms, and so on," he added, "Of course, there are many good companies and platforms that value talent. As a leading enterprise in computing chips with storage, Zhicun Technology not only offers million-level compensation packages, but also provides more core positions and full freedom in R&D projects. We believe that underlying technological innovation can bring about significant progress at the application level. And talent is definitely the driving force behind underlying technological innovation."