The Vector Database Wars: Pinecone, Weaviate, Milvus and the Battle for the AI Data Layer
This is not a homogeneous founder set.
Pinecone is closest to the “top infrastructure research leader turns founder” archetype, centered on Edo Liberty. Weaviate is closer to an “open-source community + product narrative + developer growth” company, with Bob van Luijt as the main public-facing founder, Etienne Dilocker as the technical co-creator, and Micha Verhagen appearing much less often in public materials. Milvus is best understood as an open-source database incubated by Zilliz, with Charles Xie as the clearest and most verifiable founder-level figure.
Their shared insight was deeper than “vector databases are useful.”
All three founders bet on the same structural gap: in the LLM era, the real bottleneck is not only the model but also the production-ready retrieval, memory, vector storage, and service layer around it. Pinecone answered with a managed infrastructure-first approach that expanded into serverless and knowledge products; Weaviate evolved from open-source semantic search into an AI-native database and cloud platform; Milvus started as an open-source database and Zilliz built the commercial cloud layer around it.
Public information density is uneven.
Edo Liberty and Bob van Luijt are comparatively well documented through interviews and biographies. Charles Xie has rich technical and company-level coverage, but little family or early-life detail. Etienne Dilocker and Micha Verhagen have much thinner public records on family, education, and formative years. Wherever the record is thin, the most accurate phrasing is simply: public information is limited / accounts vary / cannot be fully confirmed.
Pinecone’s founder profile is that of a deeply technical infrastructure builder.
Edo Liberty publicly links his “Israeli upbringing” to his bias toward pioneering new things. Exact details about his birth date, birthplace, parents, and family class are not systematically public. What is public is his educational path: Tel Aviv University for physics and computer science, then Yale for a Ph.D. in computer science and postdoctoral work in applied mathematics. He has described starting out wanting to be a physicist and discovering that computation and algorithms would become central to his work.
Before founding Pinecone, Liberty accumulated unusually relevant experience.
He joined Yahoo’s Israel research center in 2009, moved to the U.S. in 2012 to build Yahoo’s scalable machine learning group, and later became a research director at AWS and head of Amazon AI Labs. Public bios credit his teams with work on services including SageMaker, Kinesis, QuickSight, Amazon Elasticsearch, Glue, Rekognition, Personalize, and Forecast. Pinecone’s later product posture—production-first rather than lab-first—directly reflects that background.
Pinecone’s founding logic was straightforward and powerful.
The company’s official origin story says Liberty founded Pinecone in 2019 after seeing how powerful vectors and AI models were in real applications, while also seeing how difficult it was for most teams to productionize that stack. In 2021 Pinecone publicly launched its vector database and announced a $10 million seed round led by Wing Venture Capital. The bigger leap came in 2024, when serverless reached GA; Pinecone said more than 20,000 organizations had used it in preview and that 12 billion embeddings had already been indexed on the new architecture. By 2025 the company said it had more than 5,000 customers and had raised $138 million in total.
Pinecone’s critical decisions were strategic, not cosmetic.
The first was leaving AWS in 2019. The second was rearchitecting around serverless in 2024. The third was the 2025 leadership transition in which Ash Ashutosh became CEO and Liberty became Chief Scientist. That move looks like a classic transition from a technical founder leading 0-to-1 to a professional operator scaling 1-to-N.
Pinecone’s most concrete public negative event was its March 2023 incident.
The company’s own postmortem says a free-tier cleanup script accidentally deleted 515 Starter-plan indexes, all of which were later restored. Beyond that, the more persistent criticism has been category-level: outside observers have argued that vector databases were overhyped and increasingly risk becoming a feature inside larger retrieval stacks rather than a standalone moat. Pinecone, because of its financing and brand position, became a focal example in that debate.
Weaviate’s founding story is unusual because the founder narrative itself has layers.
TechCrunch and a 2022 company newsletter frame the founding team as Bob van Luijt, Etienne Dilocker, and Micha Verhagen. A 2023 rebrand announcement, however, emphasizes Bob and Etienne more heavily. So even at the source level, there is a difference between the narrower “core founding pair” narrative and the wider “cofounding trio” narrative.
Bob van Luijt’s early life is the clearest documented part of Weaviate’s founder story.
He says he was born in 1985, grew up in the Netherlands, moved within the country during childhood, and got into software after his father brought home an IBM computer and he learned from a QBasic book found at the library. Before high school he was already building websites. Public materials do not systematically document his parents’ professions or exact household class, but they do show early access to computers, books, and music training.
Bob’s education and worldview are unusually influential on product philosophy.
He studied music at ArtEZ and Berklee, later completed executive education at Harvard Business School, and repeatedly describes software through the lens of language, structure, and artistic expression. His early work was not a classic big-tech career path: he ran software businesses, took client work, operated Kubrickology, and only later pulled these threads together into Weaviate. Two major influences stand out in his own retelling: seeing word embeddings around 2015, and hearing Sam Ramji speak about open-source business models.
Etienne Dilocker is the engineering core of the Weaviate origin story.
Public company writing says he was effectively the founding engineer and the person who hands-on built the first product. Bob also credits Etienne with the key idea of building an end-to-end database where vector embeddings were first-class citizens. His family background and fuller biographical details are publicly limited. Micha Verhagen appears as COO/cofounder in multiple sources, but his education, early career, and family details are much less publicly documented than Bob’s.
Weaviate’s project history predates its company history.
Bob traces the concept back to 2016, when ideas around “things,” graphs, and semantic structure gradually evolved into a database designed around vectors and meaning. By late 2018, after entering a Dutch accelerator, the team began formalizing commercialization. In 2019 SeMI Technologies was founded and Weaviate became its first product. The decisive architectural choice was to stop treating NLP and embeddings as just another feature and instead make vector storage and semantic retrieval central to the system.
Weaviate’s business model is one of the clearest open-source business explanations in the category.
Bob explicitly says commercial success does not depend on software licenses but on the service around the database: operations, scalability, SLAs, integrations, training, tooling, and the broader ecosystem. In his academic interview, he also says many people do not understand how open-source companies capture value. That is essentially the entire model in one sentence: open source drives adoption, education, and trust; cloud and enterprise services generate revenue.
Weaviate’s biggest achievements lie in product framing and developer credibility.
It built an unusually coherent story around vectors, hybrid retrieval, structured objects, GraphQL, and AI-native application development. It raised a $16 million Series A in 2022, a $50 million Series B in 2023, rebranded the company name to match the product, and by 2026 publicly described itself through search, vectorization, RAG, agents, and cloud deployment. Its community messaging says it serves more than 50,000 AI builders, which points to influence beyond code alone.
Charles Xie’s background is the most database-systems-native of the group.
Public records say he earned a bachelor’s degree from Huazhong University of Science and Technology and a master’s degree in computer science from the University of Wisconsin–Madison. The record on his family and exact early-life circumstances is thin, but his technical training is very clear.
His early representative job experience was at Oracle.
Multiple official bios describe him as a founding engineer on Oracle’s 12c cloud database project. That matters because Milvus did not emerge as a thin ANN wrapper; it emerged from a database-systems mindset. In interviews, Xie repeatedly says that structured data was already well managed but non-structured data remained largely underused, and that advances in embeddings opened a new way to make that data operationally useful.
Zilliz and Milvus were founded on a long-horizon thesis, not a short-term AI hype reaction.
Xie says vector embeddings became the bridge between unstructured data and usable insight, which led him to found Zilliz around 2017. Zilliz educational material says Milvus development began in 2018 and the product launched in 2019. In 2020 Zilliz contributed Milvus to LF AI & Data, and the project graduated in 2021. That sequence is important because it shows Milvus was early, open-source, and then institutionally legitimated through a foundation path.
Zilliz’s capital and commercialization path reveal a classic open-source-to-cloud evolution.
The company raised $43 million in 2020 and another $60 million extension in 2022, bringing total funding to about $113 million. Official and press materials continue to list investors including Prosperity7 Ventures, Pavilion Capital, Hillhouse, 5Y Capital, Yunqi Partners, and Trustbridge. A 2024 Zilliz engineering retrospective says the company moved toward commercialization because users kept asking for a stable hosted version, first built dedicated clusters, then serverless, and cut new-user acquisition cost from $300 to $5. That is not just a pricing tweak; it is the story of an open-source project being shaped into a scalable cloud business.
Milvus and Zilliz have built a broader asset stack than many people realize.
Public product pages now show Zilliz Cloud, BYOC, migration services, GPTCache, DeepSearcher, Attu, and Milvus CLI in addition to Milvus itself. By the end of 2024, Google Cloud’s case study said Milvus and Zilliz Cloud had over 10,000 enterprise customers globally, more than 33,000 GitHub stars, and over 100 million downloads and deployments. Even allowing for company framing, that is enough to show that Xie built not just a project but a visible global AI data infrastructure brand.
The main criticism directed at Milvus/Zilliz is not personal scandal but boundary uncertainty.
Like the rest of the category, it sits inside the debate over whether specialized vector databases remain a durable standalone category or get absorbed into broader retrieval stacks, cloud platforms, and incumbent databases. The external critique is less “can it work?” and more “what exactly will remain defensible as a business over time?”
The clearest cross-company conclusion is this: these founders were not merely building databases.
They were competing to define the retrieval layer, memory layer, and AI data layer of the LLM stack. Edo Liberty pushed that layer toward managed enterprise infrastructure. Bob van Luijt pushed it toward open-source developer culture and product narrative. Charles Xie pushed it toward a dual model of open-source database plus commercial cloud platform. Whether the market keeps overvaluing the category is secondary. They have already shaped the industry’s answer to a far more important question: what must exist, beyond the model, for AI applications to work in reality?