https://github.com/search5/solrpy

Automatically exported from code.google.com/p/solrpy
https://github.com/search5/solrpy
Last synced: 2 months ago
JSON representation
Automatically exported from code.google.com/p/solrpy
Host: GitHub
URL: https://github.com/search5/solrpy
Owner: search5
Created: 2015-03-13T08:12:26.000Z (about 11 years ago)
Default Branch: master
Last Pushed: 2026-03-25T10:59:25.000Z (3 months ago)
Last Synced: 2026-03-26T13:49:13.874Z (3 months ago)
Language: Python
Size: 644 KB
Stars: 40
Watchers: 2
Forks: 15
Open Issues: 24
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

awesome-python - solrpy - Automatically exported from code.google.com/p/solrpy (Awesome Python / Search)
README

          # solrpy

solrpy is a Python client for [Solr], an enterprise search server

built on top of [Lucene]. solrpy allows you to add documents to a

Solr instance, and then to perform queries and gather search results

from Solr using Python.

- **Supports Solr 1.2 through 10.x**

- **Automatic Solr version detection** with runtime feature gating

- **Python 3.10+** required

## Installation

```bash

pip install solrpy

```

Or with Poetry:

```bash

poetry add solrpy

```

## Overview

```python

import solr

# create a connection to a solr server

s = solr.Solr('http://localhost:8983/solr/mycore')

# the server version is auto-detected

print(s.server_version)  # e.g. (9, 4, 1)

# check if the server is reachable

print(s.ping())  # True

# add a document to the index

doc = {

    "id": 1,

    "title": "Lucene in Action",

    "author": ["Erik Hatcher", "Otis Gospodnetić"],

}

s.add(doc, commit=True)

# do a search

response = s.select('title:lucene')

for hit in response.results:

    print(hit['title'])

```

## Response format

Since v1.0.4, solrpy uses JSON (`wt=json`) by default, matching Solr 7.0+ behavior.

For legacy XML mode:

```python

s = solr.Solr('http://localhost:8983/solr/mycore', response_format='xml')

```

The `Response` object API is identical regardless of format.

## More powerful queries

Optional parameters for query, faceting, highlighting, and more like this

can be passed in as Python parameters to the query method. Convert the

dot notation (e.g. `facet.field`) to underscore notation (e.g. `facet_field`)

so that they can be used as parameter names.

```python

response = s.select('title:lucene', facet='true', facet_field='subject')

```

If the parameter takes multiple values, pass them in as a list:

```python

response = s.select('title:lucene', facet='true', facet_field=['subject', 'publisher'])

```

## Version detection

solrpy automatically detects the connected Solr version and gates features

accordingly. If a feature requires a newer Solr version than what is

connected, a `SolrVersionError` is raised with a clear message.

```python

import solr

s = solr.Solr('http://localhost:8983/solr/mycore')

print(s.server_version)  # (6, 6, 6)

```

## Why solrpy over pysolr?

| Feature | solrpy | pysolr |

|---|:---:|:---:|

| Basic CRUD | ✅ | ✅ |

| Async/await | ✅ | — |

| Streaming Expressions | ✅ | — |

| KNN / Vector search | ✅ | — |

| Schema API (full CRUD) | ✅ | — |

| Pydantic models | ✅ | — |

| SolrCloud + ZooKeeper | ✅ | ✅ |

| JSON Facet API | ✅ | — |

| Structured Field/Sort/Facet builders | ✅ | — |

| Document extraction (Tika) | ✅ | ✅ |

| Solr version auto-detection | ✅ | — |

| Solr 1.2–10.x spanning | ✅ | — |

| Type hints + py.typed | ✅ | — |

| Connection pooling (httpx) | ✅ | ✅ |

| Performance | on par | on par |

### Migrating from pysolr

```python

# Option 1: Drop-in compatibility wrapper

from solr import PysolrCompat as Solr

conn = Solr('http://localhost:8983/solr/mycore')

results = conn.search('title:hello')       # pysolr API

conn.add([doc1, doc2], commit=True)        # accepts list like pysolr

conn.delete(id='1', q='old:*')             # q= maps to delete_query

# Option 2: Native solrpy API (recommended)

from solr import Solr

conn = Solr('http://localhost:8983/solr/mycore')

results = conn.select('title:hello')       # select instead of search

conn.add_many([doc1, doc2], commit=True)   # explicit add_many

conn.delete_query('old:*', commit=True)    # explicit delete_query

```

## Tests

Tests require a running Solr instance. Using Docker:

```bash

docker run -d --name solr-dev -p 8983:8983 solr:6.6 solr-precreate core0

poetry run pytest tests/

```

## Changelog

### 2.0.8a

- **18 bug fixes**: async event-loop blocking (`time.sleep` → `asyncio.sleep`), date validation logic inverted, highlight `hl_fl` list bug, async timeout leak, PysolrCompat double commit, SolrCloud resource leak, stream pipe mutation, sort false positive, paginator `q` loss, and more

- Full details in [changelog](https://search5.github.io/solrpy/latest/changelog.html)

### 2.0.8

- **Decimal support**: `decimal.Decimal` values now serialize correctly in both JSON and XML update paths

- **Security**: all `eval()` usage removed (resolved since 2.x rewrite)

- **No stray `print()`**: debug print statements fully eliminated from all execution paths

- **UTF-8 safe**: `setup.py` replaced with Poetry `pyproject.toml`, no encoding issues

- **`q.op` support**: dotted Solr parameters work via underscore notation (`q_op='AND'` → `q.op=AND`)

### 2.0.7

- **Lazy initialization**: `Solr()` constructor is now instant (~0ms). httpx client and version detection deferred to first use

- **PysolrCompat**: drop-in compatibility wrapper for pysolr migration (`from solr import PysolrCompat`)

- **Select performance**: `json.loads(bytes)` instead of `json.loads(string)` eliminates redundant decode

- Comparison table and migration guide in README

### 2.0.6

- **Async Pydantic models**: `await conn.select('*:*', model=MyDoc)` returns typed results

- `model=` parameter on `AsyncSolr.select()` — same as sync `SearchHandler`

### 2.0.5

- **Async Streaming Expressions**: `async for doc in await conn.stream(expr):`

- **serialize_value() bug fix**: `atomic_update()`, `AsyncSolr.add/add_many` now correctly serialize `datetime`, `date`, `bool`

- **Internal JSON update path**: Solr 4.0+ uses JSON for add/add_many/atomic_update (no user-facing change)

- `solr_json_default()` encoder handles `datetime`, `date`, `set`, `tuple`

### 2.0.4

- **Unified sync/async API**: `SchemaAPI(conn)` works with both `Solr` and `AsyncSolr`

- Single class, dual mode — no need for separate `AsyncSchemaAPI` etc.

- `DualTransport` auto-detects sync vs async connection

- `_chain()` helper for composing sync values and async coroutines

- `AsyncSchemaAPI`, `AsyncKNN`, `AsyncMoreLikeThis`, `AsyncSuggest`, `AsyncExtract` kept as backward-compatible aliases

### 2.0.3

- **Async companion classes**: `AsyncSchemaAPI`, `AsyncKNN`, `AsyncMoreLikeThis`, `AsyncSuggest`, `AsyncExtract`

- Full async support for all companion features

### 2.0.2

- **AsyncSolr**: `async with AsyncSolr(url) as conn: await conn.select('*:*')`

- **AsyncTransport** for async companion classes

- Full async: select, add, add_many, delete, commit, get

### 2.0.1

- **Breaking**: `http.client` replaced with `httpx.Client`

- Automatic **connection pooling** and keep-alive

- `httpx` is now a required dependency

- All public API unchanged — drop-in replacement for 1.x

### 1.12.0

- **Streaming Expressions**: Python builder with pipe (`|`) operator — no other non-Java client has this

- `search`, `merge`, `rollup`, `top`, `unique`, `innerJoin`, etc.

- Aggregate: `count`, `sum`, `avg`, `min`, `max`

- `conn.stream(expr)` → iterator of result dicts

- Pydantic model support via `model=` parameter

### 1.11.0

- **Pydantic response models**: `conn.select('*:*', model=MyDoc)` converts results to Pydantic models

- `Response.as_models(MyDoc)` for post-hoc conversion

- `conn.get(id='1', model=MyDoc)` returns `MyDoc | None`

- `pip install solrpy[pydantic]`

### 1.10.1

- **Field builder**: `Field('price', alias='p')`, `Field.func('sum', 'price', 'tax')`, `Field.transformer('explain')`

- **Sort builder**: `Sort('price', 'desc')`, `Sort.func('geodist()', 'asc')`

- **Facet builder**: `Facet.field('category')`, `Facet.range('price', 0, 100, 10)`, `Facet.query()`, `Facet.pivot()`

- Fully backward compatible — raw strings still work

### 1.10.0

- **SolrCloud**: `SolrCloud(zk, collection)` with ZooKeeper or `SolrCloud.from_urls(urls, collection)` HTTP-only

- Leader-aware writes, automatic failover, collection aliases

- `SolrZooKeeper` class for ZooKeeper node discovery

- `kazoo` optional dependency (`pip install solrpy[cloud]`)

- Docker Compose for local SolrCloud testing

### 1.9.2

- **Solr 6~10 full compatibility**: `wt=xml` on Solr 7+ (`wt=standard` changed in 7.0)

- Tested against Solr 6.6, 7.7, 8.11, 9.7, 10.0 — all 0 failures

- GitHub Actions CI matrix for 5 Solr versions

- KNN live tests version-gated (skip on < 9.0, efSearchScaleFactor skip on < 10.0)

- Test isolation: Paginator no longer deletes all documents

### 1.9.1

- **KNN API overhaul**: `search()`, `similarity()`, `hybrid()`, `rerank()` methods

- Full `{!knn}` parameters: `early_termination`, `seed_query`, `pre_filter`, etc.

- `{!vectorSimilarity}` threshold search (Solr 9.6+)

- Hybrid (lexical OR vector) and re-ranking patterns

### 1.9.0

- **KNN / Dense Vector Search**: `KNN(conn)` for `{!knn}` queries (Solr 9.0+)

### 1.8.1

- **HTTP transport abstraction**: `SolrTransport` decouples companion classes from internal `_get`/`_post`

- SchemaAPI, Suggest, Extract now use `SolrTransport` — prepares for httpx in 2.0.0

### 1.8.0

- **Bearer token auth**: `Solr(url, auth_token='...')`

- **Custom auth callable**: `Solr(url, auth=my_fn)` for OAuth2 dynamic refresh

- Priority: `auth` callable > `auth_token` > `http_user/http_pass`

### 1.7.0

- **Grouping / Field Collapsing**: `resp.grouped['field'].groups` for grouped results (Solr 3.3+)

- `GroupedResult`, `GroupField`, `Group` classes with `groupValue`, `doclist`, `matches`, `ngroups`

- Works in both JSON and XML modes

### 1.6.0

- **Extract**: `Extract(conn)` wrapper class for Solr Cell (Apache Tika) via `/update/extract` (Solr 1.4+).

  Index rich documents (PDF, Word, HTML, …) with optional literal field values.

  `extract_only()` extracts text and metadata without indexing.

  `from_path()` / `extract_from_path()` open files by filesystem path, MIME type guessed automatically.

```python

from solr import Solr, Extract

conn = Solr('http://localhost:8983/solr/mycore')

extract = Extract(conn)

# Index a PDF with metadata

with open('report.pdf', 'rb') as f:

    extract(f, content_type='application/pdf',

            literal_id='report1', literal_title='Annual Report',

            commit=True)

# Extract text only (no indexing)

text, metadata = extract.extract_from_path('report.pdf')

print(text[:200])

# Index from path (MIME type auto-detected)

extract.from_path('document.docx', literal_id='doc1', commit=True)

```

### 1.5.0

- **Suggest**: `Suggest(conn)` wrapper class for Solr's SuggestComponent (Solr 4.7+).

  Returns a flat list of suggestion dicts from the `/suggest` handler.

- **Spellcheck**: `Response.spellcheck` property returns a `SpellcheckResult` object

  with `.collation` and `.suggestions` accessors. Works in both JSON and XML modes (Solr 1.4+).

```python

from solr import Solr, Suggest

conn = Solr('http://localhost:8983/solr/mycore')

# Suggest

suggest = Suggest(conn)

results = suggest('que', dictionary='mySuggester', count=5)

for s in results:

    print(s['term'], s['weight'])

# Spellcheck

resp = conn.select('misspeled query', spellcheck='true', spellcheck_collate='true')

if resp.spellcheck and not resp.spellcheck.correctly_spelled:

    print('Did you mean:', resp.spellcheck.collation)

```

### 1.4.2

- New `MoreLikeThis(conn)` wrapper class — no need to know `/mlt` path

### 1.4.1

- **Breaking**: `conn.schema` and `conn.mlt` removed from auto-initialization

- Use `SchemaAPI(conn)` and `SearchHandler(conn, '/mlt')` explicitly

- Keeps `Solr` class lightweight; optional features created on demand

### 1.4.0

- **Schema API**: `conn.schema.fields()`, `add_field()`, `replace_field()`, `delete_field()`, copy fields, dynamic fields, field types (Solr 4.2+)

### 1.3.0

- **JSON Facet API**: `json_facet` parameter for advanced faceting (Solr 5.0+)

### 1.2.0

- **Cursor pagination**: `resp.cursor_next()` and `conn.iter_cursor()` for deep pagination (Solr 4.7+)

### 1.1.0

- **Soft Commit**: `conn.commit(soft_commit=True)` (Solr 4.0+)

- **Atomic Update**: `conn.atomic_update(doc)` with `set`/`add`/`remove`/`inc` modifiers (Solr 4.0+)

- **Real-time Get**: `conn.get(id='doc1')` via `/get` handler (Solr 4.0+)

- **MoreLikeThis**: `conn.mlt` handler for similar document search (Solr 4.0+)

### 1.0.9

- Per-request timeout override: `conn.select('*:*', timeout=5)`

### 1.0.8

- Exponential backoff on connection retries with configurable `retry_delay`

- Each retry logged at WARNING level

### 1.0.7

- **Breaking**: `EmptyPage` now inherits `ValueError` (was `SolrException`)

- New `PageNotAnInteger` exception (inherits `TypeError`)

- Paginator module no longer depends on `SolrException`

### 1.0.6

- URL validation: warns if URL path doesn't contain `/solr` (Solr 10.0+ preparation)

### 1.0.5

- **Breaking**: Removed `SolrConnection` class. Use `Solr` instead

- Migration: `add(**fields)` → `add(dict)`, `query()` → `select()`, `raw_query()` → `select.raw()`

### 1.0.4

- **Breaking**: Default `response_format` changed from `'xml'` to `'json'`

- Pass `response_format='xml'` explicitly for legacy XML behavior

### 1.0.3

- Added `response_format` constructor option (`'xml'` or `'json'`)

- Split `solr/core.py` into `exceptions.py`, `utils.py`, `response.py`, `parsers.py`

- All existing imports continue to work (re-exported via `__init__.py`)

### 1.0.2

- `mypy --strict` passes with zero errors on `solr/` package

- Added type hints to all internal classes (`ResponseContentHandler`, `Node`, `Results`, `UTC`)

- Fixed `endElement` variable shadowing for type safety

### 1.0.1

- Added type hints to all public methods in `solr/core.py` and `solr/paginator.py`

- Added `solr/py.typed` marker file for PEP 561 compatibility

- Added `mypy` to dev dependencies

- mypy passes with zero errors on `solr/` package

### 0.9.11

- Added JSON response parser (`parse_json_response`)

- Added `Solr.ping()` convenience method

- Added `always_commit` constructor option for auto-commit behavior

- Added gzip response support (`Accept-Encoding: gzip`)

### 0.9.10

- Added pyproject.toml metadata (authors, maintainers, classifiers, keywords)

- Added Sphinx documentation (quickstart, API reference, version detection, changelog)

- Rewrote README.md with current API examples and Docker test instructions

- Updated CLAUDE.md development guidelines

### 0.9.9

- Removed deprecated `encoder`/`decoder` attributes and `codecs` import

- Fixed `commit(_optimize=True)` to correctly issue `` command

- Added test coverage for `` XML type parsing

- Added test coverage for named `` tag handling

- Added Solr version auto-detection (`server_version`)

- Added `SolrVersionError` exception and `requires_version` decorator

- Removed all Python 2 compatibility code (Python 3.10+ only)

- Migrated from setuptools to Poetry

- Bumped version to 0.9.9

## License

[Apache License 2.0](http://www.apache.org/licenses/LICENSE-2.0)

[Solr]: https://solr.apache.org/

[Lucene]: https://lucene.apache.org/
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/search5/solrpy

Awesome Lists containing this project

README