{"id":17255182,"url":"https://github.com/OCamlPro/ocaml-ancient","last_synced_at":"2025-02-24T19:31:14.357Z","repository":{"id":144782516,"uuid":"161566416","full_name":"OCamlPro/ocaml-ancient","owner":"OCamlPro","description":"trial at reviving the ancient library","archived":false,"fork":false,"pushed_at":"2025-01-30T15:26:41.000Z","size":139,"stargazers_count":9,"open_issues_count":5,"forks_count":3,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-02-22T10:41:36.687Z","etag":null,"topics":["ancient","ancient-heap","mmap","ocaml-library","shared-memory","shm"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OCamlPro.png","metadata":{"files":{"readme":"README.txt","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-12-13T01:24:23.000Z","updated_at":"2025-02-05T09:13:00.000Z","dependencies_parsed_at":null,"dependency_job_id":"a1efdcf7-c8d2-46d4-8e02-3d1f613f71bf","html_url":"https://github.com/OCamlPro/ocaml-ancient","commit_stats":null,"previous_names":["ocamlpro/ocaml-ancient"],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OCamlPro%2Focaml-ancient","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OCamlPro%2Focaml-ancient/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OCamlPro%2Focaml-ancient/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OCamlPro%2Focaml-ancient/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OCamlPro","download_url":"https://codeload.github.com/OCamlPro/ocaml-ancient/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240544046,"owners_count":19818372,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ancient","ancient-heap","mmap","ocaml-library","shared-memory","shm"],"created_at":"2024-10-15T07:10:53.364Z","updated_at":"2025-02-24T19:31:14.350Z","avatar_url":"https://github.com/OCamlPro.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"'Ancient' module for OCaml\n----------------------------------------------------------------------\n\nWhat does this module do?\n----------------------------------------------------------------------\n\nThis module allows you to use in-memory data structures which are\nlarger than available memory and so are kept in swap.  If you try this\nin normal OCaml code, you'll find that the machine quickly descends\ninto thrashing as the garbage collector repeatedly iterates over\nswapped memory structures.  This module lets you break that\nlimitation.  Of course the module doesn't work by magic :-) If your\nprogram tries to access these large structures, they still need to be\nswapped back in, but it is suitable for large, sparsely accessed\nstructures.\n\nSecondly, this module allows you to share those structures between\nprocesses.  In this mode, the structures are backed by a disk file,\nand any process that has read/write access that disk file can map that\nfile in and see the structures.\n\nTo understand what this module really does, you need to know a little\nbit of background about the OCaml garbage collector (GC).  OCaml's GC\nhas two heaps, called the minor and major heaps.  The minor heap is\nused for short-term storage of small objects which are usually created\nand then quickly become unreachable.  Any objects which persist longer\n(or objects which are very big to start with) get moved into the major\nheap.  Objects in the major heap are assumed to be around for some\ntime, and the major heap is GC'd more slowly.\n\nThis module adds a third heap, called the \"ancient heap\", which is\nnever checked by the GC.  Objects must be moved into ancient manually,\nusing a process called \"marking\".  Once an object is in the ancient\nheap, memory allocation is handled manually.  In particular objects in\nthe ancient heap may need to be manually deallocated.  The ancient\nheap may either exist as ordinary memory, or may be backed by a file,\nwhich is how shared structures are possible.\n\nStructures which are moved into ancient must be treated as STRICTLY\nNON-MUTABLE.  If an ancient structure is changed in any way then it\nmay cause a crash.\n\nThere are some limitations which apply to ancient data structures.\nSee the section \"Shortcomings \u0026 bugs\" below.\n\nThis module is most useful on 64 bit architectures where large address\nspaces are the norm.  We have successfully used it with a 38 GB\naddress space backed by a file and shared between processes.\n\nAPI\n----------------------------------------------------------------------\n\nPlease see file ancient.mli .\n\nCompiling\n----------------------------------------------------------------------\n\n  cd mmalloc \u0026\u0026 ./configure\n  make\n\nMake sure you run this command before running any program which\nuses the Ancient module:\n\n  ulimit -s unlimited\n\nExample\n----------------------------------------------------------------------\n\nXXX Note the example code is really stupid, and fails for large\ndictionaries.  See bug (10) below.\n\nRun:\n\n  ulimit -s unlimited\n  wordsfile=/usr/share/dict/words\n  baseaddr=0x440000000000               # System specific - see below.\n  ./test_ancient_dict_write.opt $wordsfile dictionary.data $baseaddr\n  ./test_ancient_dict_verify.opt $wordsfile dictionary.data\n  ./test_ancient_dict_read.opt dictionary.data\n\n(You can run several instances of test_ancient_dict_read.opt on the\nsame machine to demonstrate sharing).\n\nShortcomings \u0026 bugs\n----------------------------------------------------------------------\n\n(0) Stack overflows are common when marking/sharing large structures\nbecause we use a recursive algorithm to visit the structures.  If you\nget random segfaults during marking/sharing, then try this before\nrunning your program:\n\n  ulimit -s unlimited\n\n(1) Ad-hoc polymorphic primitives (structural equality, marshalling\nand hashing) do not work on ancient data structures, meaning that you\nwill need to provide your own comparison and hashing functions.  For\nmore details see Xavier Leroy's response here:\n\nhttp://caml.inria.fr/pub/ml-archives/caml-list/2006/09/977818689f4ceb2178c592453df7a343.en.html\n\n(2) Ancient.attach suggests setting a baseaddr parameter for newly\ncreated files (it has no effect when attaching existing files).  We\nstrongly recommend this because in our tests we found that mmap would\nlocate the memory segment inappropriately -- the basic problem is that\nbecause the file starts off with zero length, mmap thinks it can place\nit anywhere in memory and often does not leave it room to grow upwards\nwithout overwriting later memory mappings.  Unfortunately this\nintroduces an unwanted architecture dependency in all programs which\nuse the Ancient module with shared files, and it also requires\nprogrammers to guess at a good base address which will be valid in the\nfuture.  There are no other good solutions we have found --\npreallocating the file is tricky with the current mmalloc code.\n\n(3) The current code requires you to first of all create the large\ndata structures on the regular OCaml heap, then mark them as ancient,\neffectively copying them out of the OCaml heap, then garbage collect\nthe (hopefully unreferenced) structures on the OCaml heap.  In other\nwords, you need to have all the memory available as physical memory.\nThe way to avoid this is to mark structures as ancient incrementally\nas they are created, or in chunks, whatever works for you.\n\nWe typically use Ancient to deal with web server logfiles, and in this\ncase loading one file of data into memory and marking it as ancient\nbefore moving on to the next file works for us.\n\n(4) Why do ancient structures need to be read-only / not mutated?  The\nreason is that you might create a new OCaml heap structure and point\nthe ancient structure at this heap structure.  The heap structure has\nno apparent incoming pointers (the GC will not by its very nature\ncheck the ancient structure for pointers), and so the heap structure\ngets garbage collected.  At this point the ancient structure has a\ndangling pointer, which will usually result in some form of crash.\nNote that the restriction here is on creating pointers from ancient\ndata to OCaml heap data.  In theory it should be possible to modify\nancient data to point to other ancient data, but we have not tried\nthis.\n\n(5) [Limit on number of keys -- issue fixed]\n\n(6) [Advanced topic] The _mark function in ancient_c.c makes no\nattempt to arrange the data structures in memory / on disk in a way\nwhich optimises them for access.  The worst example is when you have\nan array of large structures, where only a few fields in the structure\nwill be accessed.  Typically these will end up on disk as:\n\n  array of N pointers\n  structure 1\n  field A\n  field B\n    ...\n  field Z\n  structure 2\n  field A\n  field B\n    ...\n  field Z\n  structure 3\n  field A\n  field B\n    ...\n  field Z\n   ...\n   ...\n   ...\n  structure N\n  field A\n  field B\n    ...\n  field Z\n\nIf you then iterate accessing only fields A, you end up swapping the\nwhole lot back into memory.  A better arrangement would have been:\n\n  array of N pointers\n  structure 1\n  structure 2\n  structure 3\n    ...\n  structure N\n  field A from structure 1\n  field A from structure 2\n  field A from structure 3\n    ...\n  field A from structure N\n  field B from structure 1\n  field B from structure 2\n    etc.\n\nwhich avoids loading unused fields at all.  In some circumstances we\nhave shown that this could make a huge difference to performance, but\nwe are not sure how to implement this cleanly in the current library.\n\n[Update: I have fixed issue 6 manually for my Weblogs example and\nconfirmed that it does make a huge difference to performance, although\nat considerable extra code complexity.  Interested people can see the\nweblogs library, file import_weblogs_ancient.ml.in].\n\n(7) [Advanced topic] Certain techniques such as Address Space\nRandomisation (http://lwn.net/Articles/121845/) are probably not\ncompatible with the Ancient module and shared files.  Because the\nancient data structures contain real pointers, these pointers would be\ninvalidated if the shared file was not mapped in at precisely the same\nbase address in all processes which are sharing the file.\n\nOne solution might be to use private mappings and a list of fixups.\nIn fact, the code actually builds a list of fixups currently while\nmarking, because it needs to deal with precisely this issue (during\nmarking, memory is allocated with realloc which might move the memory\nsegment, thus real pointers cannot be stored while marking, but need\nto be fixed up afterwards).  The list of fixups would need to be\nstored alongside the memory segment (currently it is discarded after\nmarking), and the file would need to be mapped in using MAP_PRIVATE\n(see below).\n\nA possible problem with this is that because OCaml objects tend to be\nsmall and contain a lot of pointers, it is likely that fixing up the\npointers would result in every page in the memory segment becoming\ndirty, which would basically cancel out any benefit of using shared\nmappings in the first place.  However it is likely that some users of\nthis module have large amounts of opaque data and few pointers, and\nfor them this would be worthwhile.\n\n(8) Currently mmalloc is implemented so that the file is mapped in\nPROT_READ|PROT_WRITE and MAP_SHARED.  Ancient data structures are\nsupposed to be immutable so strictly speaking write access shouldn't\nbe required.  It may be worthwhile modifying mmalloc to allow\nread-only mappings, and private mappings.\n\n(9) The library assumes that every OCaml object is at least one word\nlong.  This seemed like a good assumption up until I found that\nzero-length arrays are valid zero word objects.  At the moment you\ncannot mark structures which contain zero-length arrays -- you will\nget an assert-failure in the _mark function.\n\nPossibly there are other types of OCaml structure which are zero word\nobjects and also cannot be marked.  I'm not sure what these will be:\nfor example empty strings are stored as one word OCaml objects, so\nthey are OK.\n\nThe solution to this bug is non-trivial.\n\n(10) Example code is very stupid.  It fails with large dictionaries,\neg. the one with nearly 500,000 words found in Fedora.\n\n(11) In function 'mark', the \"// Ran out of memory.  Recover and throw\nan exception.\" codepath actually fails if you use it - segfaulting\ninside do_restore.\n\nAuthors\n----------------------------------------------------------------------\n\nPrimary code was written by Richard W.M. Jones \u003crich at annexia.org\u003e\nwith help from Markus Mottl, Martin Jambon, and invaluable advice from\nXavier Leroy and Damien Doligez.\n\nmmalloc was written by Mike Haertel and Fred Fish.\n\nPort to no-naked-pointers and OCaml 5+ by Fabrice Le Fessant at\nOCamlPro.\n\nLicense\n----------------------------------------------------------------------\n\nThe module is licensed under the LGPL + OCaml linking exception.  This\nmodule includes mmalloc which was originally distributed with gdb\n(although it has since been removed), and that code was distributed\nunder the plain LGPL.\n\nLatest version\n----------------------------------------------------------------------\n\nThe latest version can be found on the website:\nhttp://merjis.com/developers/ancient\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FOCamlPro%2Focaml-ancient","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FOCamlPro%2Focaml-ancient","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FOCamlPro%2Focaml-ancient/lists"}