Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/wrobstory/mcflyin
A small timeseries transformation API built on Flask and Pandas
https://github.com/wrobstory/mcflyin
Last synced: 3 months ago
JSON representation
A small timeseries transformation API built on Flask and Pandas
- Host: GitHub
- URL: https://github.com/wrobstory/mcflyin
- Owner: wrobstory
- License: mit
- Created: 2013-06-11T03:45:57.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2022-09-16T08:18:17.000Z (over 2 years ago)
- Last Synced: 2024-10-14T18:36:30.347Z (3 months ago)
- Language: Python
- Homepage:
- Size: 346 KB
- Stars: 87
- Watchers: 10
- Forks: 16
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
- awesome-flask - mcflyin - A small timeseries transformation API built on Flask and Pandas (Built with Flask)
- awesome-flask - mcflyin - A small timeseries transformation API built on Flask and Pandas (Built with Flask)
- awesome-flask - mcflyin - A small timeseries transformation API built on Flask and Pandas (Built with Flask)
README
#Mcflyin
###A timeseries transformation API built on Pandas and Flask
This is a small demo of an API to do timeseries transformations built on Flask and Pandas.
Concept
-------The idea is that you can make a POST request to the API with a simple list/array of timestamps, from any language, and get back some interesting transformations of that data.
Why?
----Partly to show how straightforward it is to build such a thing. Python is great because it has very powerful, intuitive, quick-to-learn tools for both building web applications and doing data analysis/statistics.
That puts Python in kind of a unique position: powerful web tools, powerful scientific/numerical/statistical data tools. This API is a very simple example of how you can take advantage of both. Go read the source code- it's short and easy to grok. Bug fixes and pull requests welcome.
Getting Started
---------------First we need to find some data. We're going to use some data that Wes McKinney provided in a recent [blog post](http://wesmckinney.com/blog/?p=687), with some statistics on Python posts on Stack Overflow. This is something of a contrived example: I'm manipulating the data in Python, sending to a Python backend, and then getting a response to manipulate in Python. Just know that all you need is an array of timestamp strings, no matter your language.
```python
import pandas as pddata = pd.read_csv('AllPandas.csv')
data = data['CreationDate'].tolist()
```A simple array of timestamps:
```python
>>>data[:10]
['2011-04-01 14:50:44',
'2012-01-18 19:41:27',
'2012-01-23 03:21:00',
'2012-01-24 17:59:53',
'2012-03-04 16:58:45',
'2012-03-09 22:36:52',
'2012-03-10 15:35:26',
'2012-03-18 12:53:06',
'2012-03-30 13:58:29',
'2012-04-04 23:17:23']
```With the McFlyin application running on localhost, lets make a request to resample the data on an daily basis, to get the number of posts per day:
```python
import requests
import jsonfreq = {'D': 'Daily'}
sends = {'freq': json.dumps(freq), 'data': json.dumps(data)}
r = requests.post('http://127.0.0.1:5000/resample', data=sends)
response = r.json
```The response is simple JSON:
```js
{'Monthly': {'data': [1.0, 2.0, 1.0, 1.0,...
'time': ['2011-03-31T00:00:00', '2011-04-30T00:00:00', '2011-05-31T00:00:00', '2011-06-30T00:00:00', '2011-07-31T00:00:00',...
```Here's the distribution of daily questions on Stack Overflow for Pandas (monthly probably would have been a little more informative):
![Daily](http://farm6.staticflickr.com/5497/9062972730_aa34df95a2_o.jpg)
Let's call Mcflyin for a rolling sum on a seven-day window. It will resample to the given ```freq```, then apply the window to the result:
```python
freq = {'D': 'Weekly Rolling'}
sends = {'freq': json.dumps(freq), 'data': json.dumps(data), 'window': 7}
r = requests.post('http://127.0.0.1:5000/rolling_sum', data=sends)
response = r.json
```![Rolling](http://farm4.staticflickr.com/3682/9060743479_2962e61881_o.jpg)
Let's look at the total questions asked by day:
```python
sends = {'data': json.dumps(data), 'how': json.dumps('sum')}
r = requests.post('http://127.0.0.1:5000/daily', data=sends)
response = r.json
```
![dailysum](http://farm3.staticflickr.com/2838/9064294004_200b81b303_o.jpg)and daily means:
```python
sends = {'data': json.dumps(data), 'how': json.dumps('mean')}
r = requests.post('http://127.0.0.1:5000/daily', data=sends)
response = r.json
```
![dailymean](http://farm4.staticflickr.com/3786/9064294028_c8bf17fa09_o.jpg)The same for hourly:
```python
sends = {'data': json.dumps(data), 'how': json.dumps('sum')}
r = requests.post('http://127.0.0.1:5000/hourly', data=sends)
response = r.json
```
![dailymean](http://farm4.staticflickr.com/3814/9062065097_75d871a7bc_o.jpg)Finally, we can look at hourly by day-of-week:
```python
sends = {'data': json.dumps(data), 'how': json.dumps('sum')}
r = requests.post('http://127.0.0.1:5000/daily_hours', data=sends)
response = r.json
```
![hourdow](http://farm3.staticflickr.com/2838/9064294126_6036e724ba_o.jpg)Live demo [here](http://bl.ocks.org/wrobstory/5794343)
Dependencies
------------
Pandas, Numpy, Requests, FlaskHow did you make those colorful graphs?
--------------------------------------
[Vincent](https://github.com/wrobstory/vincent) and [Bearcart](https://github.com/wrobstory/bearcart)Status
------
Lots of stuff that could be better- error handling on the requests, probably better handling of weird timestamps,
etc. This is just a small demo of how powerful Python can be for building a statistics backend with relatively few lines of code.If I want to write a front-end in a different language, can I put it in the examples folder?
---------
Yes! PR's welcome.