{"id":13464699,"url":"https://github.com/coleifer/sqlite-vtfunc","last_synced_at":"2025-03-16T07:32:23.776Z","repository":{"id":62587918,"uuid":"47419259","full_name":"coleifer/sqlite-vtfunc","owner":"coleifer","description":"Implement SQLite table-valued functions with Python","archived":false,"fork":false,"pushed_at":"2023-08-23T13:36:18.000Z","size":182,"stargazers_count":59,"open_issues_count":0,"forks_count":7,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-03-16T00:04:08.788Z","etag":null,"topics":["cython","python","sql","sqlite","virtual-tables"],"latest_commit_sha":null,"homepage":null,"language":"Cython","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/coleifer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2015-12-04T17:31:54.000Z","updated_at":"2025-03-14T11:34:28.000Z","dependencies_parsed_at":"2024-01-16T05:46:12.639Z","dependency_job_id":"c95dde31-c803-4391-8b00-b46c9a30e0e5","html_url":"https://github.com/coleifer/sqlite-vtfunc","commit_stats":{"total_commits":36,"total_committers":2,"mean_commits":18.0,"dds":"0.13888888888888884","last_synced_commit":"a4a7bb7b3b7e0657a52f28c53331ab275fd7486a"},"previous_names":[],"tags_count":16,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coleifer%2Fsqlite-vtfunc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coleifer%2Fsqlite-vtfunc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coleifer%2Fsqlite-vtfunc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coleifer%2Fsqlite-vtfunc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/coleifer","download_url":"https://codeload.github.com/coleifer/sqlite-vtfunc/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243806072,"owners_count":20350775,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cython","python","sql","sqlite","virtual-tables"],"created_at":"2024-07-31T14:00:48.853Z","updated_at":"2025-03-16T07:32:23.403Z","avatar_url":"https://github.com/coleifer.png","language":"Cython","readme":"## vtfunc\n\n**NOTICE**: This project is no longer necessary if you are using Peewee 3.0 or\nnewer, as the relevant code has been included in Peewee's sqlite extension\nmodule. For more information, see:\n\n* [Peewee user-defined function examples](http://docs.peewee-orm.com/en/latest/peewee/database.html#user-defined-functions)\n* [TableFunction API documentation](http://docs.peewee-orm.com/en/latest/peewee/sqlite_ext.html#TableFunction)\n* [Table function registration API](http://docs.peewee-orm.com/en/latest/peewee/api.html#SqliteDatabase.table_function)\n* [General SQLite extensions documentation](http://docs.peewee-orm.com/en/latest/peewee/sqlite_ext.html)\n\nIf you intend to use this project with an older version of Peewee, or as a\nstandalone project with the standard library SQLite module, feel free to\ncontinue using this repository.\n\nRequires sqlite \u003e= 3.9.0\n\n--------------------------------------------------------------------------\n\nPython bindings for the creation of [table-valued functions](http://sqlite.org/vtab.html#tabfunc2)\nin SQLite.\n\nA table-valued function:\n\n* Accepts any number of parameters\n* Can be used in places you would put a normal table or subquery, such as the\n  `FROM` clause or on the right-hand-side of an `IN` expression.\n* may return an arbitrary number of rows consisting of one or more columns.\n\nHere are some examples of what you can do with Python and `sqlite-vtfunc`:\n\n* Write a SELECT query that, when run, will scrape a website and return a table\n  of all the outbound links on the page (rows are `(href, description)`\n  tuples. [See example below](#scraping-pages-with-sql)).\n* Accept a file path and return a table of the files in that directory and\n  their associated metadata.\n* Use table-valued functions to handle recurring events in a calendaring\n  application (by generating the series of recurrances dynamically).\n* Apply a regular expression search to some text and return a row for each\n  matching substring.\n\n### Scraping pages with SQL\n\nTo get an idea of how `sqlite-vtfunc` works, let's build the scraper table\nfunction described in the previous section. The function will accept a URL as\nthe only parameter, and will return a table of the link destinations and text\ndescriptions.\n\nThe `Scraper` class contains the entire implementation for the scraper:\n\n```python\n\nimport re, urllib2\n\nfrom pysqlite2 import dbapi2 as sqlite3  # Use forked pysqlite.\nfrom vtfunc import TableFunction\n\n\nclass Scraper(TableFunction):\n    params = ['url']  # Function argument names.\n    columns = ['href', 'description']  # Result rows have these columns.\n    name = 'scraper'  # Name we use to invoke the function from SQL.\n\n    def initialize(self, url):\n        # When the function is called, download the HTML and create an\n        # iterator that successively yields `href`/`description` pairs.\n        fh = urllib2.urlopen(url)\n        self.html = fh.read()\n        self._iter = re.finditer(\n            '\u003ca[^\\\u003e]+?href=\"([^\\\"]+?)\"[^\\\u003e]*?\u003e([^\\\u003c]+?)\u003c/a\u003e',\n            self.html)\n\n    def iterate(self, idx):\n        # Since row ids would not be meaningful for this particular table-\n        # function, we can ignore \"idx\" and just advance the regex iterator.\n\n        # Ordinarily, to signal that there are no more rows, the `iterate()`\n        # method must raise a `StopIteration` exception. This is not necessary\n        # here because `next()` will raise the exception when the regex\n        # iterator is finished.\n        return next(self._iter).groups()\n```\n\nTo start using the table function, create a connection and register the table\nfunction with the connection. **Note**: for SQLite version \u003c= 3.13, the table\nfunction will not remain loaded across connections, so it is necessary to\nregister it each time you connect to the database.\n\n```python\n\n# Creating a connection and registering our scraper function.\nconn = sqlite3.connect(':memory:')\nScraper.register(conn)  # Register the function with the new connection.\n```\n\nTo test the scraper, start up a python interpreter and enter the above code.\nOnce that is done, let's try a query. The following query will fetch the HTML\nfor the hackernews front-page and extract the three links with the longest\ndescriptions:\n\n```pycon\n\u003e\u003e\u003e curs = conn.execute('SELECT * FROM scraper(?) '\n...                     'ORDER BY length(description) DESC '\n...                     'LIMIT 3', ('https://news.ycombinator.com/',))\n\n\u003e\u003e\u003e for (href, description) in curs.fetchall():\n...     print description, ':', href\n\nThe Diolkos: an ancient Greek paved trackway enabling boats to be moved overland : https://...\nThe NumPy array: a structure for efficient numerical computation (2011) [pdf] : https://hal...\nRestoring Y Combinator's Xerox Alto, day 4: What's running on the system : http://www.right...\n```\n\nNow, suppose you have another table which contains a huge list of URLs that you\nneed to scrape. Since this is a relational database, it's incredibly easy to\nconnect the URLs in one table with another.\n\nThe following query will scrape all the URLs in the `unvisited_urls` table:\n\n```sql\n\nSELECT uu.url, href, description\nFROM unvisited_urls AS uu, scraper(uu.url)\nORDER BY uu.url, href, description;\n```\n\n### Example two: implementing Python's range()\n\nThis function generates a series of integers between given boundaries and at\ngiven intervals.\n\n```python\n\nfrom vtfunc import TableFunction\n\n\nclass GenerateSeries(TableFunction):\n    params = ['start', 'stop', 'step']\n    columns = ['output']\n    name = 'generate_series'\n\n    def initialize(self, start=0, stop=None, step=1):\n        # Note that when a parameter is optional, the only thing\n        # you need to do is provide a default value in `initialize()`.\n        self.start = start\n        self.stop = stop or float('inf')\n        self.step = step\n        self.curr = self.start\n\n    def iterate(self, idx):\n        if self.curr \u003e self.stop:\n            raise StopIteration\n\n        ret = self.curr\n        self.curr += self.step\n        return (ret,)\n```\n\n### Dependencies\n\nThis project is designed to work with the standard library `sqlite3` driver, or\nalternatively, the latest version of `pysqlite2`.\n\n### Implementation Notes\n\nTo create functions that return multiple values, it is necessary to create a\n[virtual table](http://sqlite.org/vtab.html). SQLite has the concept of\n\"eponymous\" virtual tables, which are virtual tables that can be called like a\nfunction and do not require explicit creation using DDL statements.\n\nThe `vtfunc` module abstracts away the complexity of creating an eponymous\nvirtual table, allowing you to write your own multi-value SQLite functions in\nPython.\n\n# TODO: was removing stuff and stopped here.\n\n```python\nimport re\n\nfrom vtfunc import TableFunction\n\n\nclass RegexSearch(TableFunction):\n    params = ['regex', 'search_string']\n    columns = ['match']\n    name = 'regex_search'\n\n    def initialize(self, regex=None, search_string=None):\n        self._iter = re.finditer(regex, search_string)\n\n    def iterate(self, idx):\n        # We do not need `idx`, so just ignore it.\n        return (next(self._iter).group(0),)\n```\n\nTo use our function, we need to register the module with a SQLite connection,\nthen call it using a `SELECT` query:\n\n```python\n\nimport sqlite3\n\nconn = sqlite3.connect(':memory:')  # Create an in-memory database.\n\nRegexSearch.register(conn)  # Register our module.\n\nquery_params = ('[0-9]+', '123 xxx 456 yyy 789 zzz 0')\ncursor = conn.execute('SELECT * FROM regex_search(?, ?);', query_params)\nprint cursor.fetchall()\n```\n\nLet's say we have a table that contains a list of arbitrary messages and we\nwant to capture all the e-mail addresses from that table. This is also easy\nusing our table-valued function. We will query the `messages` table and pass\nthe message body into our table-valued function. Then, for each email address\nwe find, we'll return a row containing the message ID and the matching email\naddress:\n\n```python\n\nemail_regex = '[\\w]+@[\\w]+\\.[\\w]{2,3}'  # Stupid simple email regex.\nquery = ('SELECT messages.id, regex_search.match '\n         'FROM messages, regex_search(?, messages.body)')\ncursor = conn.execute(query, (email_regex,))\n```\n\nThe resulting rows will look something like:\n\n```\n\nmessage id |         email\n-----------+-----------------------\n     1     | charlie@example.com\n     1     | huey@kitty.cat\n     1     | zaizee@morekitties.cat\n     3     | mickey@puppies.dog\n     3     | huey@throwaway.cat\n    ...    |         ...\n```\n\n#### Important note\n\nIn the above example you will note that the parameters for our query actually\nchange (because each row in the messages table has a different search string).\nThis means that for this particular query, the `RegexSearch.initialize()`\nfunction will be called once for each row in the `messages` table.\n\n### How it works\n\nBehind-the-scenes, `vtfunc` is creating a [Virtual Table](http://sqlite.org/vtab.html)\nand filling in the various callbacks with wrappers around your user-defined\nfunction. There are two important methods that the wrapped virtual table\nimplements:\n\n* xBestIndex\n* xFilter\n\nWhen SQLite attempts to execute a query, it will call the xBestIndex method of\nthe virtual table (possibly multiple times) trying to come up with the best\nquery plan. The `vtfunc` module optimizes for those query plans which include\nvalues for all the parameters of the user-defined function. Since some\nuser-defined functions may have optional parameters, query plans with only a\nsubset of param values will be slightly penalized.\n\nSince we have no visibility into what parameters the user *actually* passed in,\nand we don't know ahead of time which query plan SQLite suggests will be\nbest, `vtfunc` just does its best to optimize for plans with the highest\nnumber of usable parameter values.\n\nIf you encounter a situation where you pass your function multiple parameters,\nbut it doesn't receive all of them, it's the case that a less-than-optimal\nplan was used.\n\nAfter the plan is chosen by calling xBestIndex, the query will execute by\ncalling xFilter (possibly multiple times). xFilter has access to the actual\nquery parameters, and it's responsibility is to initialize the cursor and call\nthe user's initialize() callback with the parameters passed in.\n","funding_links":[],"categories":["Python"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcoleifer%2Fsqlite-vtfunc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcoleifer%2Fsqlite-vtfunc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcoleifer%2Fsqlite-vtfunc/lists"}