OceanBase Releases seekdb: An Open Supply AI Native Hybrid Search Database for Multi-model RAG and AI Brokers
AI functions hardly ever cope with one clear desk. They combine consumer profiles, chat logs, JSON metadata, embeddings, and typically spatial knowledge. Most groups reply this with a patchwork of an OLTP database, a vector retailer, and a search engine. OceanBase launched seekdb, an open supply AI centered database (beneath the Apache 2.0 license). seekdb is described as an AI native search database that unifies relational knowledge, vector knowledge, textual content, JSON, and GIS in a single engine and exposes hybrid search and in database AI workflows.
What’s seekdb?
seekdb is positioned because the light-weight, embedded model of the OceanBase engine, geared toward AI functions slightly than normal function distributed deployments. It runs as a single node database, helps embedded mode and consumer or server mode, and stays appropriate with MySQL drivers and SQL syntax.
Within the functionality matrix, seekdb is marked as:
- Embedded database supported
- Standalone database supported
- Distributed database not supported
whereas the complete OceanBase product covers the distributed case.
From an information mannequin perspective, seekdb helps:
- Relational knowledge with commonplace SQL
- Vector search
- Full textual content search
- JSON knowledge
- Spatial GIS knowledge
all inside one storage and indexing layer.
Hybrid search because the core characteristic
The primary characteristic OceanBase pushes is hybrid search. That is search that mixes vector based mostly semantic retrieval, full textual content key phrase retrieval, and scalar filters in a single question and a single rating step.
seekdb implements hybrid search by way of a system bundle named DBMS_HYBRID_SEARCH with two entry factors:
- DBMS_HYBRID_SEARCH.SEARCH which returns outcomes as JSON, sorted by relevance
- DBMS_HYBRID_SEARCH.GET_SQL which returns the concrete SQL string used for execution
The hybrid search path can run:
- pure vector search
- pure full textual content search
- mixed hybrid search
and might push relational filters and joins down into storage. It additionally helps question reranking methods like weighted scores and reciprocal rank fusion and might plug in massive language mannequin based mostly re-rankers.
For retrieval augmented era (RAG) and agent reminiscence, this implies you may write a single SQL question that does semantic matching on embeddings, precise matching on product codes or correct nouns, and relational filtering on consumer or tenant scopes.
Vector and full textual content engine particulars
At its core, seekdb exposes a fashionable vector and full textual content stack.
For vectors, seekdb:
- helps dense vectors and sparse vectors
- helps Manhattan, Euclidean, inside product, and cosine distance metrics
- offers in reminiscence index sorts akin to HNSW, HNSW SQ, HNSW BQ
- offers disk based mostly index sorts together with IVF and IVF PQ
Hybrid vector index present how one can retailer uncooked textual content, let seekdb name an embedding mannequin mechanically, and have the system keep the corresponding vector index and not using a separate preprocessing pipeline.
For textual content, seekdb presents full textual content search with:
- key phrase, phrase, and Boolean queries
- BM25 rating for relevance
- a number of tokenizer modes
The important thing level is that full textual content and vector indexes are top notch and are built-in in the identical question planner as scalar indexes and GIS indexes, so hybrid search doesn’t want exterior orchestration.
AI capabilities contained in the database
seekdb consists of in-built AI operate expressions that allow you to name fashions instantly from SQL, and not using a separate utility service mediating each name. The primary capabilities are:
- AI_EMBED to transform textual content into embeddings
- AI_COMPLETE for textual content era utilizing a chat or completion mannequin
- AI_RERANK to rerank an inventory of candidates
AI_PROMPT to assemble immediate templates and dynamic values right into a JSON object for AI_COMPLETE
Mannequin metadata and endpoints are managed by the DBMS_AI_SERVICE bundle, which helps you to register exterior suppliers, set URLs, and configure keys, all on the database facet.
Multimodal knowledge and workloads
seekdb is constructed to deal with a number of knowledge modalities in a single node. it has a multimodal knowledge and indexing layer that covers vectors, textual content, JSON, and GIS, and a multi-model compute layer for hybrid workloads throughout vector, full textual content, and scalar circumstances.
It additionally offers JSON indexes for metadata queries and GIS indexes for spatial circumstances. This enables queries like:
- discover semantically related paperwork
- filter by JSON metadata like tenant, area, or class
- constrain by spatial vary or polygon
with out leaving the identical engine.
As a result of seekdb is derived from the OceanBase engine, it inherits ACID transactions, row and column hybrid storage, and vectorized execution, though excessive scale distributed deployments stay a job for the complete OceanBase database.
Comparability Desk

Key Takeaways
- AI native hybrid search: seekdb unifies vector search, full textual content search and relational filtering in a single SQL and DBMS_HYBRID_SEARCH interface, so RAG and agent workloads can run multi sign retrieval in a single question as a substitute of sewing collectively a number of engines.
- Multimodal knowledge in a single engine: seekdb shops and indexes relational knowledge, vectors, textual content, JSON and GIS in the identical engine, which lets AI functions maintain paperwork, embeddings and metadata constant with out sustaining separate databases.
- In database AI capabilities for RAG: With AI_EMBED, AI_COMPLETE, AI_RERANK and AI_PROMPT, seekdb can name embedding fashions, LLMs and rerankers instantly from SQL, which simplifies RAG pipelines and strikes extra orchestration logic into the database layer.
- Single node, embedded pleasant design: seekdb is a single node, MySQL appropriate engine that helps embedded and standalone modes, whereas distributed, massive scale deployments stay the position of full OceanBase, which makes seekdb appropriate for native, edge and repair embedded AI workloads.
- Open supply and gear ecosystem: seekdb is open sourced beneath Apache 2.0 and integrates with a rising ecosystem of AI instruments and frameworks, with Python assist by way of pyseekdb and MCP based mostly integration for code assistants and brokers, so it could actually act as a unified knowledge aircraft for AI functions.
Try the Repo and Project. Be at liberty to take a look at our GitHub Page for Tutorials, Codes and Notebooks. Additionally, be happy to comply with us on Twitter and don’t neglect to hitch our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
The publish OceanBase Releases seekdb: An Open Source AI Native Hybrid Search Database for Multi-model RAG and AI Agents appeared first on MarkTechPost.