lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2026-07-04 11:30:46 +00:00

Author	SHA1	Message	Date
qzhu	1023a5754b	separate local and cloud connect	2024-01-31 11:33:02 -08:00
qzhu	7808f28ec7	exclude storage.js for tests	2024-01-25 17:08:27 -08:00
qzhu	7a5e65d437	website api doc rework	2024-01-25 16:16:50 -08:00
Will Jones	4e1ed2b139	docs: document basics of configuring object storage (#832 ) Created based on upstream PR https://github.com/lancedb/lance/pull/1849 Closes #681 --------- Co-authored-by: Prashanth Rao <35005448+prrao87@users.noreply.github.com>	2024-01-24 15:27:22 -08:00
Lei Xu	9a9fc77a95	doc: improve docs for nodejs connect functions (#833 ) * improve the docstring for NodeJS connect functions and `ConnectOptions` parameters. * Simplify `npm run build` steps.	2024-01-19 16:07:53 -08:00
Prashanth Rao	8f54cfcde9	Docs updates incl. Polars (#827 ) This PR makes the following aesthetic and content updates to the docs. - [x] Fix max width issue on mobile: Content should now render more cleanly and be more readable on smaller devices - [x] Improve image quality of flowchart in data management page - [x] Fix syntax highlighting in text at the bottom of the IVF-PQ concepts page - [x] Add example of Polars LazyFrames to docs (Integrations) - [x] Add example of adding data to tables using Polars (guides)	2024-01-18 20:43:59 -08:00
Prashanth Rao	119b928a52	docs: Updates and refactor (#683 ) This PR makes incremental changes to the documentation. * Closes #697 * Closes #698 ## Chores - [x] Add dark mode - [x] Fix headers in navbar - [x] Add `extra.css` to customize navbar styles - [x] Customize fonts for prose/code blocks, navbar and admonitions - [x] Inspect all admonition boxes (remove redundant dropdowns) and improve clarity and readability - [x] Ensure that all images in the docs have white background (not transparent) to be viewable in dark mode - [x] Improve code formatting in code blocks to make them consistent with autoformatters (eslint/ruff) - [x] Add bolder weight to h1 headers - [x] Add diagram showing the difference between embedded (OSS) and serverless (Cloud) - [x] Fix [Creating an empty table](https://lancedb.github.io/lancedb/guides/tables/#creating-empty-table) section: right now, the subheaders are not clickable. - [x] In critical data ingestion methods like `table.add` (among others), the type signature often does not match the actual code - [x] Proof-read each documentation section and rewrite as necessary to provide more context, use cases, and explanations so it reads less like reference documentation. This is especially important for CRUD and search sections since those are so central to the user experience. ## Restructure/new content - [x] The section for [Adding data](https://lancedb.github.io/lancedb/guides/tables/#adding-to-a-table) only shows examples for pandas and iterables. We should include pydantic models, arrow tables, etc. - [x] Add conceptual tutorial for IVF-PQ index - [x] Clearly separate vector search, FTS and filtering sections so that these are easier to find - [x] Add docs on refine factor to explain its importance for recall. Closes #716 - [x] Add an FAQ page showing answers to commonly asked questions about LanceDB. Closes #746 - [x] Add simple polars example to the integrations section. Closes #756 and closes #153 - [ ] Add basic docs for the Rust API (more detailed API docs can come later). Closes #781 - [x] Add a section on the various storage options on local vs. cloud (S3, EBS, EFS, local disk, etc.) and the tradeoffs involved. Closes #782 - [x] Revamp filtering docs: add pre-filtering examples and redo headers and update content for SQL filters. Closes #783 and closes #784. - [x] Add docs for data management: compaction, cleaning up old versions and incremental indexing. Closes #785 - [ ] Add a benchmark section that also discusses some best practices. Closes #787 --------- Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com> Co-authored-by: Will Jones <willjones127@gmail.com>	2024-01-19 00:18:37 +05:30
Chang She	af8263af94	feat(python): allow the entire table to be converted a polars dataframe (#814 )	2024-01-15 15:49:16 -08:00
Chang She	be4ab9eef3	feat(python): add exist_ok option to create table (#813 ) This mimics CREATE TABLE IF NOT EXISTS behavior. We add `db.create_table(..., exist_ok=True)` parameter. By default it is set to False, so trying to create a table with the same name will raise an exception. If set to True, then it only opens the table if it already exists. If you pass in a schema, it will be checked against the existing table to make sure you get what you want. If you pass in data, it will NOT be added to the existing table.	2024-01-15 11:09:18 -08:00
Ayush Chaurasia	4568df422d	feat(python): Add gemini text embedding function (#806 ) Named it Gemini-text for now. Not sure how complicated it will be to support both text and multimodal embeddings under the same class "gemini"..But its not something to worry about for now I guess.	2024-01-12 22:38:55 -08:00
Chang She	121687231c	chore(python): document phrase queries in fts (#788 ) closes #769 Add unit test and documentation on using quotes to perform a phrase query	2024-01-08 21:49:31 -08:00
Chang She	b0a88a7286	feat(python): Set heap size to get faster fts indexing performance (#762 ) By default tantivy-py uses 128MB heapsize. We change the default to 1GB and we allow the user to customize this locally this makes `test_fts.py` run 10x faster	2024-01-07 15:15:13 -08:00
sudhir	bf5202f196	Make examples work with current version of Openai api's (#779 ) These examples don't work because of changes in openai api from version 1+	2024-01-07 14:27:56 -08:00
Chris	8be2861061	Minor Fixes to Ingest Embedding Functions Docs (#777 ) Addressed minor typos and grammatical issues to improve readability --------- Co-authored-by: Christopher Correa <chris.correa@gmail.com>	2024-01-07 14:27:40 -08:00
Vladimir Varankin	0560e3a0e5	Minor corrections for docs of embedding_functions (#780 ) In addition to #777, this pull request fixes more typos in the documentation for "Ingest Embedding Functions".	2024-01-07 14:26:35 -08:00
QianZhu	b83fbfc344	small bug fix for example code in SaaS JS doc (#770 )	2024-01-04 14:30:34 -08:00
Bengsoon Chuah	7d55a94efd	Add relevant imports for each step (#764 ) I found that it was quite incoherent to have to read through the documentation and having to search which submodule that each class should be imported from. For example, it is cumbersome to have to navigate to another documentation page to find out that `EmbeddingFunctionRegistry` is from `lancedb.embeddings`	2024-01-04 11:15:42 -08:00
QianZhu	4d8e401d34	SaaS JS API sdk doc (#740 ) Co-authored-by: Aidan <64613310+aidangomar@users.noreply.github.com>	2024-01-03 16:24:21 -08:00
Xin Hao	8411c36b96	docs: fix link (#752 )	2023-12-29 15:33:24 -08:00
Chang She	4b8af261a3	feat: add timezone handling for datetime in pydantic (#578 ) If you add timezone information in the Field annotation for a datetime then that will now be passed to the pyarrow data type. I'm not sure how pyarrow enforces timezones, right now, it silently coerces to the timezone given in the column regardless of whether the input had the matching timezone or not. This is probably not the right behavior. Though we could just make it so the user has to make the pydantic model do the validation instead of doing that at the pyarrow conversion layer.	2023-12-28 11:02:56 -08:00
Chang She	c8728d4ca1	feat(python): add post filtering for full text search (#739 ) Closes #721 fts will return results as a pyarrow table. Pyarrow tables has a `filter` method but it does not take sql filter strings (only pyarrow compute expressions). Instead, we do one of two things to support `tbl.search("keywords").where("foo=5").limit(10).to_arrow()`: Default path: If duckdb is available then use duckdb to execute the sql filter string on the pyarrow table. Backup path: Otherwise, write the pyarrow table to a lance dataset and then do `to_table(filter=<filter>)` Neither is ideal. Default path has two issues: 1. requires installing an extra library (duckdb) 2. duckdb mangles some fields (like fixed size list => list) Backup path incurs a latency penalty (~20ms on ssd) to write the resultset to disk. In the short term, once #676 is addressed, we can write the dataset to "memory://" instead of disk, this makes the post filter evaluate much quicker (ETA next week). In the longer term, we'd like to be able to evaluate the filter string on the pyarrow Table directly, one possibility being that we use Substrait to generate pyarrow compute expressions from sql string. Or if there's enough progress on pyarrow, it could support Substrait expressions directly (no ETA) --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2023-12-27 09:31:04 -08:00
elliottRobinson	eab9072bb5	Update default_embedding_functions.md (#744 ) Modify some grammar, punctuation, and spelling errors.	2023-12-26 19:24:22 +05:30
Will Jones	ee0f0611d9	docs: update node API reference (#734 ) This command hasn't been run for a while...	2023-12-22 10:14:31 -08:00
Will Jones	34966312cb	docs: enhance Update user guide (#735 ) Closes #705	2023-12-22 10:14:21 -08:00
Chang She	0965d7dd5a	doc(javascript): minor improvement on docs for working with tables (#736 ) Closes #639 Closes #638	2023-12-20 20:05:22 -08:00
Chang She	371d2f979e	feat(python): add option to flatten output in to_pandas (#722 ) Closes https://github.com/lancedb/lance/issues/1738 We add a `flatten` parameter to the signature of `to_pandas`. By default this is None and does nothing. If set to True or -1, then LanceDB will flatten structs before converting to a pandas dataframe. All nested structs are also flattened. If set to any positive integer, then LanceDB will flatten structs up to the specified level of nesting. --------- Co-authored-by: Weston Pace <weston.pace@gmail.com>	2023-12-20 12:23:07 -08:00
Ayush Chaurasia	dca4533dbe	docs: Update roboflow tutorial position (#666 )	2023-12-08 11:00:11 -08:00
QianZhu	aca785ff98	saas python sdk doc (#692 ) <img width="256" alt="Screenshot 2023-12-07 at 11 55 41 AM" src="https://github.com/lancedb/lancedb/assets/1305083/259bf234-9b3b-4c5d-af45-c7f3fada2cc7">	2023-12-07 14:47:56 -08:00
James	b085d9aaa1	(docs):Add CLIP image embedding example (#660 ) In this PR, I add a guide that lets you use Roboflow Inference to calculate CLIP embeddings for use in LanceDB. This post was reviewed by @AyushExel.	2023-11-27 20:39:01 +05:30
Ayush Chaurasia	ccfdf4853a	[Docs]: Add Instructor embeddings and rate limit handler docs (#651 )	2023-11-18 06:08:26 +05:30
Ayush Chaurasia	87e5d86e90	[Docs][SEO] Add sitemap and robots.txt (#645 ) Sitemap improves SEO by ranking pages and tracking updates.	2023-11-18 06:08:13 +05:30
Ayush Chaurasia	2cb91e818d	Disable posthog on docs & reduce sentry trace factor (#607 ) - posthog charges per event and docs events are registered very frequently. We can keep tracking them on GA - Reduced sentry trace factor	2023-11-02 01:13:16 +05:30
QianZhu	7eec2b8f9a	Qian/query option doc (#615 ) - API documentation improvement for queries (table.search) - a small bug fix for the remote API on create_table ![image](https://github.com/lancedb/lancedb/assets/1305083/712e9bd3-deb8-4d81-8cd0-d8e98ef68f4e) ![image](https://github.com/lancedb/lancedb/assets/1305083/ba22125a-8c36-4e34-a07f-e39f0136e62c)	2023-10-31 19:50:05 -07:00
Lei Xu	6fb539b5bf	doc: add doc to use GPU for indexing (#611 )	2023-10-30 15:25:00 -07:00
Ayush Chaurasia	64a4f025bb	[Docs]: Minor Fixes (#587 ) * Filename typo * Remove rick_morty csv as users won't really be able to use it.. We can create a an executable colab and download it from a bucket or smth.	2023-10-20 16:14:35 +02:00
Ayush Chaurasia	6dc968e7d3	[Docs] Embeddings API: Add multi-lingual semantic search example (#582 )	2023-10-20 18:40:49 +05:30
Ayush Chaurasia	06b5b69f1e	[Docs]Versioning docs (#586 ) closes #564 --------- Co-authored-by: Chang She <chang@lancedb.com>	2023-10-20 18:40:16 +05:30
Ayush Chaurasia	a8c7f80073	[Docs] Update embedding function docs (#581 )	2023-10-18 13:04:42 +05:30
Ayush Chaurasia	7372656369	[Docs] Add posthog telemetry to docs (#577 ) Allows creation of funnels and user journeys	2023-10-17 21:11:59 -07:00
Chang She	bb01ad5290	doc: fix broken link and add README (#573 ) Fix broken link to embedding functions testing: broken link was verified after local docs build to have been repaired --------- Co-authored-by: Chang She <chang@lancedb.com>	2023-10-16 16:13:07 -07:00
Rok Mihevc	043e388254	docs: show source of documented functions (#569 )	2023-10-15 09:05:36 -07:00
Rok Mihevc	6d66404506	docs: switch python examples to be row based (#554 )	2023-10-14 14:07:43 -07:00
Ayush Chaurasia	7dfb555fea	[DOCS][PYTHON] Update embeddings API docs & Example (#516 ) This PR adds an overview of embeddings docs: - 2 ways to vectorize your data using lancedb - explicit & implicit - explicit - manually vectorize your data using `wit_embedding` function - Implicit - automatically vectorize your data as it comes by ingesting your embedding function details as table metadata - Multi-modal example w/ disappearing embedding function	2023-10-14 07:56:07 +05:30
Ayush Chaurasia	e41894b071	[Docs] Improve visibility of table ops (#553 ) A little verbose, but better than being non-discoverable ![Screenshot from 2023-10-11 16-26-02](https://github.com/lancedb/lancedb/assets/15766192/9ba539a7-0cf8-4d9e-94e7-ce5d37c35df0)	2023-10-11 12:20:46 -07:00
Chang She	e1ae2bcbd8	feat: add to_list and to_pandas api's (#556 ) Add `to_list` to return query results as list of python dict (so we're not too pandas-centric). Closes #555 Add `to_pandas` API and add deprecation warning on `to_df`. Closes #545 Co-authored-by: Chang She <chang@lancedb.com>	2023-10-11 12:18:55 -07:00
Ayush Chaurasia	a1377afcaa	feat: telemetry, error tracking, CLI & config manager (#538 ) Co-authored-by: Lance Release <lance-dev@lancedb.com> Co-authored-by: Rob Meng <rob.xu.meng@gmail.com> Co-authored-by: Will Jones <willjones127@gmail.com> Co-authored-by: Chang She <759245+changhiskhan@users.noreply.github.com> Co-authored-by: rmeng <rob@lancedb.com> Co-authored-by: Chang She <chang@lancedb.com> Co-authored-by: Rok Mihevc <rok@mihevc.org>	2023-10-08 23:11:39 +05:30
Lei Xu	a26c8f3316	feat: use GPU for index creation. (#540 ) Bump lance to 0.8.3 to include GPU training --------- Co-authored-by: Rob Meng <rob.xu.meng@gmail.com>	2023-10-05 20:49:00 -07:00
Josh Wein	88d8d7249e	Typo cleanup (#539 )	2023-10-05 23:07:28 -04:00
Tan Li	e4c3a9346c	[doc] make the tensor width differnt from height (#533 )	2023-10-03 00:55:16 -07:00
Prashanth Rao	1d1f8964d2	[DOCS][PYTHON] Update docs for clarity (#531 ) I only modified those docs pages that are untouched in existing unmerged PRs, so hopefully there are no merge conflicts! 1. The `tantivy-py` version specified in the docs doesn't work (pip install fails), but with the latest version of pip and wheel installed on my Mac, I was able to just `pip install tantivy` and FTS works great for me. I updated the docs page to include this in `7ca4b757ce` but can always modify to another specific version in case this breaks any tests. 2. The `.add()` method for Python should take in a list of dicts as the first option (to also align with the JS API), and additionally, users can pass an existing pandas DataFrame to add to a table. Hope this makes sense. 3. I've had multiple conversations with users who are unclear that the terms "exhaustive", "flat" and "KNN" are all the same kind of search, so I've updated the verbiage of this section to clarify this. 4. Fixed typos and improved clarity in the ANN indexes page.	2023-10-03 09:46:53 +05:30

1 2 3 4

166 Commits