https://github.com/rmosolgo/aiddata-fs
RESTful file storage for development project records
https://github.com/rmosolgo/aiddata-fs
Last synced: 3 months ago
JSON representation
RESTful file storage for development project records
- Host: GitHub
- URL: https://github.com/rmosolgo/aiddata-fs
- Owner: rmosolgo
- Created: 2013-03-27T18:44:09.000Z (almost 13 years ago)
- Default Branch: master
- Last Pushed: 2013-04-26T14:37:04.000Z (over 12 years ago)
- Last Synced: 2024-12-28T03:42:40.563Z (about 1 year ago)
- Language: JavaScript
- Size: 708 KB
- Stars: 2
- Watchers: 3
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README
# AidDataFS
Document storage accessible by simple, RESTful url. Documents are stored along with their projects,
and may be retrieved with three parameters: _namespace_, _project id_, and _document id_. For example, fs.aiddata.org/mbdc/1703/2843
## Setup
### Runs on...
- Sinatra framework:
```Ruby
require 'rubygems'
require 'bundler/setup'
require 'sinatra'
```
- Postgres, via DataMapper gem:
```Ruby
require 'data_mapper'
require 'dm-postgres-adapter'
require 'pg'
```
- Ruby utilities: Thin server, HAML templates, Barista for Coffeescript
```Ruby
require 'thin'
require 'haml'
require 'barista'
```
- Amazon S3 for storage:
```Ruby
require 'aws-sdk'
```
### Connection info
Set authentication and connection info in the Environment. They are not written down -- you have to know them.
```Ruby
AUTH_PAIR = [ENV['AIDDATA_FS_USERNAME'], ENV['AIDDATA_FS_PASSWORD']]
BUCKET_NAME = 'aiddata-fs'
AWS_ACCESS_KEY_ID = ENV['AWS_ACCESS_KEY_ID']
AWS_ACCESS_SECRET_KEY = ENV['AWS_SECRET_KEY']
```
DataMapper is configured for Heroku's DATABASE_URL or the local machine.
```Ruby
DataMapper.setup(:default, ENV['DATABASE_URL'] || 'postgres://postgres:postgres@localhost/postgres')
```
### Constants
Config constants for later:
```Ruby
NOT_SAVED = "{ \"error\" : \" not saved \" }"
NOT_FOUND = "{ \"error\" : \" not found \" }"
NOT_IMPLEMENTED = "{ \"error\" : \" not implemented\" }"
NOT_RECEIVED = "{ \"error\" : \" no file received\" }"
NOT_DELETED = "{ \"error\" : \" not deleted\" }"
FILE_TOO_BIG = "{ \"error\" : \"this file is too large!\" }"
SUCCESS = "{ \"success\" : \"success\" }"
FILESYSTEM_ROOT = "files"
MAX_FILE_SIZE = 10485760 # in bytes
```
### Icons
We serve icons with links for use on client side. They're generated from GMail attachment icons (is that illegal?):
```Ruby
ICONS = {
"doc" => "",
"xls" => "",
"ppt" => "",
"graphic" => "",
"txt" => "",
"zip" => "",
"sound" => "",
"pdf" => "",
"html" => "",
}
```
## Models
### Namespace
Namespace denotes the collection to which a given project belongs.
For example, namespaces might be "aiddata" for aiddata.org, "malawi" for Malawi-AMP
projects, or "mbdc" for media-based data collection projects.
```Ruby
class Namespace
include DataMapper::Resource
property :name, String, key: true
has n, :projects
def to_json
json = "{
\"type\": \"namespace\",
\"key\" : \"#{name}\",
\"name\" : \"#{name}\",
\"project_count\" : #{projects.count}
}"
end
end
```
#### Permissions
Any user may `GET` a resource, but any idempotent request must pass authentication (also known to AidData FS).
```Ruby
def protected!
unless authorized?
p "Unauthorized request."
response['WWW-Authenticate'] = %(Basic realm="AidDataFS")
throw(:halt, [401, "Not authorized\n"])
end
end
def authorized?
@auth ||= Rack::Auth::Basic::Request.new(request.env)
@auth.provided? && @auth.basic? && @auth.credentials && @auth.credentials == AUTH_PAIR
end
```
### Project
Project denotes the actual activity in the given namespace. It likely has an instance in a particular database or dataset, such as AidData.org or the Malawi geocoded dataset.
```Ruby
class Project
include DataMapper::Resource
property :id, String, key: true
property :namespace_name, String, key: true
belongs_to :namespace
has n, :links
has n, :documents, through: :links
def to_json
json = "{
\"type\": \"project\",
\"key\" : \"#{id}\",
\"name\" : \"#{id}\",
\"id\" : \"#{id}\",
\"document_count\" : #{documents.count} }"
end
end
```
### Document
Link is what actually appears inside a project's folder. It's a reference to the underlying document:
```Ruby
class Link
include DataMapper::Resource
property :id, Serial
property :document_pk, Integer, index: true
belongs_to :project
belongs_to :document
def link_json
json = "{\"type\": \"link\",
\"project_id\" : \"#{project.id}\",
\"document_id\" : #{document.pk},
\"document\" : \"#{document.to_json}\" }"
end
def to_json
# vv This is what matters! vv
document.to_json
end
end
```
Document is the reference to the file. It serves up a data uri in the `Document#icon` method.
```Ruby
class Document
include DataMapper::Resource
require 'digest/md5'
property :pk, Serial
# property :id, Integer #not really a pk, because doc can change versions.
property :md5, String
property :url, Text
property :size_in_kb, Integer
property :type, Text, default: lambda { |r, p| File.extname(r.name).gsub(/\./, '').downcase }
property :name, Text
def icon
if %w{doc docx xml odt }.include? type
ICONS["doc"]
elsif %w{png jpg jpeg gif bmp tiff svg}.include? type
ICONS["graphic"]
elsif %w{mp3 wav aac}.include? type
ICONS["sound"]
elsif %w{xls xlsx csv tsv tab ods}.include? type
ICONS["xls"]
elsif %w{ppt pptx odp}.include? type
ICONS["ppt"]
elsif %w{zip gz 7z}.include? type
ICONS["zip"]
elsif %w{pdf}.include? type
ICONS["pdf"]
elsif %{html}.include? type
ICONS["html"]
else
ICONS["txt"]
end
end
def to_json
json = "{
\"type\" : \"document\",
\"name\" : \"#{name}\",
\"key\" : \"#{pk}\",
\"size_in_kb\" : #{size_in_kb},
\"filetype\" : \"#{type}\",
\"md5\" : \"#{ md5}\",
\"path\" : \"/documents/#{pk}\",
\"icon\" : \"#{icon}\"
}"
end
end
```
```Ruby
DataMapper.finalize.auto_upgrade!
get "/" do
haml :browse
end
```
## API
JSON API is powered by the models' `:to_json` method, which allows really simple navigation.
```Ruby
def returns_json
content_type :json
end
def locate(location, contents=nil)
# location: string OR obj that responds to to_json
# contents: array with objs that respond to :to_json
if location.respond_to? :to_json
location = location.to_json
else
location = "\"#{location}\""
end
vals = ["\"location\" : #{location}"]
if contents
vals.push "\"contents\" : [#{contents.map{|c| c.to_json}.join ", "}]"
end
"{ #{ vals.join ", "}}"
end
```
### Namespace
__URL:__ `/:namespace`
- `POST` a new `name` to `/` to create a new namespace.
- `GET`ting the namespace path (eg `/mbdc`) responds with a project manifest.
```Ruby
get "/#{FILESYSTEM_ROOT}" do
returns_json
locate "root", Namespace.all
end
post "/#{FILESYSTEM_ROOT}" do
protected!
n = Namespace.new(name: params[:name])
if n.save
n.to_json
else
p n.errors
NOT_SAVED
end
end
get "/#{FILESYSTEM_ROOT}/:namespace" do
returns_json
if n = Namespace.get(params[:namespace])
locate n.name, n.projects
else
NOT_FOUND
end
end
```
### Projects
Create it by posting its `project_id` to its namespace:
```Ruby
post "/#{FILESYSTEM_ROOT}/:namespace" do
protected!
if n = Namespace.get(params[:namespace])
if p = Project.new(id: params[:project_id])
if (n.projects << p) && n.save
p.to_json
else
NOT_SAVED
end
else
NOT_SAVED
end
else
NOT_FOUND
end
end
delete "/#{FILESYSTEM_ROOT}/:namespace" do
protected!
n = Namespace.get(params[:namespace])
if n.destroy
SUCCESS
else
NOT_DELETED
end
end
```
It has a RESTful URL such as `/malawi/8071234` which responds to requests:
- `POST`ing a file adds that file to the project.
- `DELETE`ing a file destroys that link
- `GET` returns a document manifest.
```Ruby
get "/#{FILESYSTEM_ROOT}/:namespace/:project" do
returns_json
if (n = Namespace.get(params[:namespace])) && (p = Project.get(params[:project], params[:namespace]))
locate p.id, p.documents
else
NOT_FOUND
end
end
delete "/#{FILESYSTEM_ROOT}/:namespace/:project" do
protected!
if (n = Namespace.get(params[:namespace])) && (p = Project.get(params[:project], params[:namespace]))
if p.destroy
SUCCESS
else
NOT_DELETED
end
else
NOT_FOUND
end
end
post "/#{FILESYSTEM_ROOT}/:namespace/:project" do
protected!
if (n = Namespace.get(params[:namespace])) && (p = Project.first_or_create(id: params[:project], namespace: n))
# puts p.to_json
if params[:file]
p "Receiving file #{params[:file]}"
unless params[:file] && (tempfile = params[:file][:tempfile]) && (name = params[:file][:filename])
NOT_SAVED
end
if tempfile.size <= MAX_FILE_SIZE
if d = find_or_store(tempfile, name)
p "Making Link object"
l = Link.new(document: d, project: p)
l.save
locate l
else
NOT_SAVED
end
else
FILE_TOO_BIG
end
else
NOT_RECEIVED
end
else
NOT_FOUND
end
end
```
### Documents
Individual documents have RESTful URLs, eg `/malawi/8071234/9983`.
- GET returns the file
- POST/PUT replaces the file with the new file
- `DELETE` removes the file
```Ruby
get "/#{FILESYSTEM_ROOT}/:namespace/:project/:document" do
returns_json
if d = Document.get(params[:document])
require 'open-uri'
p "Getting file from #{d.url}"
data = open(d.url) {|io| io.read}
p "Sending file"
content_type 'application/octet-stream'
attachment d.name
data
else
NOT_FOUND
end
end
post "/#{FILESYSTEM_ROOT}/:namespace/:project/:document" do
protected!
NOT_IMPLEMENTED
end
delete "/#{FILESYSTEM_ROOT}/:namespace/:project/:document" do
protected!
p "Delete request /#{FILESYSTEM_ROOT}/#{params[:namespace]}/#{params[:project]}/#{params[:document]}"
if (n = Namespace.get(params[:namespace])) &&
(p = Project.get(params[:project], params[:namespace])) &&
(d_id = params[:document] )
l = Link.first(project: p, document_pk: d_id)
puts l
if l.destroy
SUCCESS
else
NOT_DELETED
end
else
NOT_FOUND
end
end
```
## Documents
### Documents aren't stored redundantly
When a file is loaded, its md5 is generated and tested against existing md5s.
```Ruby
def find_or_store(tempfile, filename)
# tempfile is a Tempfile
# filename is its human-readable filename
# p "Find or Store?"
require 'digest/md5'
# just in case
tempfile.rewind
this_md5 = Digest::MD5.hexdigest(tempfile.read)
if d = Document.first(md5: this_md5)
p "File already found"
else
path = "https://s3.amazonaws.com/#{BUCKET_NAME}/#{this_md5}"
p "Creating the file #{this_md5}"
# For my purposes, MD5 is the AWS filename.
upload(tempfile, this_md5)
p "Making Document object"
d = Document.new(
url: path,
name: filename,
md5: this_md5,
size_in_kb: ((tempfile.size)/1024).round
)
if !d.save
p d.errors
end
end
d
end
```
#### If false
Then the file is stored on the server and a link is created.
```Ruby
def upload(tempfile, filename)
# tempfile is a Tempfile
# filename is the name it should be stored as (in my case, MD5)
s3 = AWS::S3.new(
:access_key_id => AWS_ACCESS_KEY_ID,
:secret_access_key => AWS_ACCESS_SECRET_KEY
)
p "Uploading file to S3 #{BUCKET_NAME}"
# just in case
tempfile.rewind
obj = s3.buckets[BUCKET_NAME].objects[filename].write(tempfile.read)
# Oh heck, make sure people can download this stuff.
obj.acl = :public_read
filename
end
```
#### If true
Then a link is created, registering that document with the requested project.
### Links
```Ruby
get "/links" do
protected!
"[ #{Link.all.map{ |l| l.link_json}.join(", ") } ]"
end
```
A given document may be present in multiple links:
```json
{ "namespace" : "malawi", "project_id" : 8071234, "document_id" : 9983 },
{ "namespace" : "mbdc", "project_id" : 1703, "document_id" : 9983 }
```
And a project will have many links:
```json
{ "namespace" : "malawi", "project_id" : 8071234, "document_id" : 9983 }
...
{ "namespace" : "malawi", "project_id" : 8071234, "document_id" : 3214 }
```
### The documents themselves are served purely by ID
But the `/:namespace/:project/:document` API provides a logical, stable, implementation-independent interface to the files.
Documents can also be downloaded directly via `/document/:id`.
```Ruby
get "/documents" do
returns_json
protected!
"[
#{Document.all.map{|d| d.to_json}.join ", "}
]"
end
get "/documents/:pk" do
require 'open-uri'
d = Document.get(params[:pk])
p "Getting file from #{d.url}"
data = open(d.url) {|io| io.read}
p "Sending file"
content_type 'application/octet-stream'
attachment d.name
data
end
delete "/documents/:pk" do
NOT_IMPLEMENTED
# This is on purpose -- delete links, not documents!
end
```