Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/yipeng/dsgen-big

Febrl's data set generator mod
https://github.com/yipeng/dsgen-big

Last synced: 3 months ago
JSON representation

Febrl's data set generator mod

Lists

README

        

# This is a modification to Febrl's data set generator to faciliate generation
# of large (gigabyte) sets. See "generate_bigdata.py".
#
# Yipeng Huang, Feb 22 2012
# -------------------
# My changes allow dsgen to produce sizable datasets that exceed memory
# constraints. It writes to records directly to disk, and generates a
# proportional number of duplicates at regular intervals of a million original
# records (approx 100mb). The catch is that the output is not randomly sorted
# even for small files. You should run another script if you need sorted data.
# -------------------

# =============================================================================
# AUSTRALIAN NATIONAL UNIVERSITY OPEN SOURCE LICENSE (ANUOS LICENSE)
# VERSION 1.3
#
# The contents of this file are subject to the ANUOS License Version 1.3
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at:
#
# https://sourceforge.net/projects/febrl/
#
# Software distributed under the License is distributed on an "AS IS"
# basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See
# the License for the specific language governing rights and limitations
# under the License.
#
# The Original Software is: "generate.py"
#
# The Initial Developer of the Original Software is:
# Dr Peter Christen (Department of Computer Science, Australian National
# University)
#
# Copyright (C) 2002 - 2011 the Australian National University and
# others. All Rights Reserved.
#
# Contributors:
#
# Alternatively, the contents of this file may be used under the terms
# of the GNU General Public License Version 2 or later (the "GPL"), in
# which case the provisions of the GPL are applicable instead of those
# above. The GPL is available at the following URL: http://www.gnu.org/
# If you wish to allow use of your version of this file only under the
# terms of the GPL, and not to allow others to use your version of this
# file under the terms of the ANUOS License, indicate your decision by
# deleting the provisions above and replace them with the notice and
# other provisions required by the GPL. If you do not delete the
# provisions above, a recipient may use your version of this file under
# the terms of any one of the ANUOS License or the GPL.
# =============================================================================
#
# Freely extensible biomedical record linkage (Febrl) - Version 0.4.1
#
# See: http://datamining.anu.edu.au/linkage.html
#
# =============================================================================