{"id":13993989,"url":"https://github.com/justinlittman/fbarc","last_synced_at":"2025-07-22T18:32:43.892Z","repository":{"id":215215756,"uuid":"82000528","full_name":"justinlittman/fbarc","owner":"justinlittman","description":"A commandline tool and Python library for archiving data from Facebook using the Graph API.","archived":true,"fork":false,"pushed_at":"2018-01-29T13:49:55.000Z","size":115,"stargazers_count":77,"open_issues_count":3,"forks_count":11,"subscribers_count":16,"default_branch":"master","last_synced_at":"2024-11-29T15:50:43.327Z","etag":null,"topics":["code4lib","facebook-graph-api"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"cc0-1.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/justinlittman.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-02-14T23:45:42.000Z","updated_at":"2024-05-17T04:57:02.000Z","dependencies_parsed_at":null,"dependency_job_id":"dcfba60a-5e52-4e6b-bd0d-d45a46807d8c","html_url":"https://github.com/justinlittman/fbarc","commit_stats":null,"previous_names":["justinlittman/fbarc"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/justinlittman/fbarc","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/justinlittman%2Ffbarc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/justinlittman%2Ffbarc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/justinlittman%2Ffbarc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/justinlittman%2Ffbarc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/justinlittman","download_url":"https://codeload.github.com/justinlittman/fbarc/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/justinlittman%2Ffbarc/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266552498,"owners_count":23947174,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-22T02:00:09.085Z","response_time":66,"last_error":null,"robots_txt_status":null,"robots_txt_updated_at":null,"robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["code4lib","facebook-graph-api"],"created_at":"2024-08-09T14:02:39.476Z","updated_at":"2025-07-22T18:32:43.880Z","avatar_url":"https://github.com/justinlittman.png","language":"Python","funding_links":[],"categories":["Tools \u0026 Software","Python"],"sub_categories":["Acquisition"],"readme":"# F(b)arc\n\n(The \"b\" is silent.)\n\nA commandline tool and Python library for archiving data from [Facebook](https://www.facebook.com/) using the [Graph API](https://developers.facebook.com/docs/graph-api).\n\nFacebook data is represented as a graph. The graph is composed of:\n\n* nodes:  Things on Facebook, such as Pages, Albums, and Photos. Each node has an id (e.g., 1322855124437680)\nand a type (e.g., Page).\n* fields:  Attributes such as things, such as name and id.\n* edges:  Connections between nodes, e.g., Page's Photos.\n\nThe graph is represented as a JSON object. For example:\n\n    {\n      \"name\": \"The White House\",\n      \"id\": \"1191441824276882\",\n      \"about\": \"Welcome to the official White House Facebook page.\n    \n    Comments posted on and messages received through White House pages are subject to the Presidential Records Act and may be archived. Learn more at WhiteHouse.gov/privacy.\",\n      \"albums\": {\n        \"data\": [\n          {\n            \"created_time\": \"2017-01-20T19:33:16+0000\",\n            \"name\": \"Timeline Photos\",\n            \"id\": \"1199645353456529\"\n          }\n        ]\n      },\n      \"metadata\": {\n        \"type\": \"page\"\n      }          \n    }\n\nF(b)arc supports retrieving parts of the graph for archiving. To do so, it allows you to specify what fields\nand edges to retrieve for a particular node type. (What fields and connections to\nretrieve is referred to as a definition and is described further below).\n\n## Getting API keys\nBefore you f(b)arc you will need to register an app. To do this:\n\n1. If you don't already have one, create a Facebook account.\n2. Go to [https://developers.facebook.com/apps/](https://developers.facebook.com/apps/) and log in.\n3. Click `Add a New App` and complete the form.\n4. From the app's dashboard, note the app id and app secret.\n\nSee below for more information on tokens.\n\n## Install\n\n_Note: pip install coming once f(b)arc is more stable._\n\nThese are instructions for Python 3. Make appropriate adjustments for Python 2.\n\n1. Download f(b)arc or clone it:\n\n        git clone https://github.com/justinlittman/fbarc.git\n\n2. Change to the directory:\n\n        cd fbarc\n        \n3. Optional: Create a [virtualenv](https://virtualenv.pypa.io/en/stable/):\n\n        virtualenv -p python3 ENV\n        source ENV/bin/activate\n        \n4. Install requirements:\n\n        pip install -r requirements/requirements3.txt\n        \n5. Get commandline usage:\n\n        python fbarc.py -h\n\n## Usage\n\n### Configure\nOnce you've got your API keys you can tell f(b)arc what they are with the\n`configure` command.\n\n    python fbarc.py configure\n\nThis will store your credentials in a file called `.fbarc` in your home\ndirectory so you don't have to keep providing them. If you would rather supply\nthem directly you can set them in the environment (`APP_ID`, `APP_SECRET`) or using \ncommandline options (`--app_id`, `--app_secret`).\n\n### Tokens\nUsing the API requires an [access token](https://developers.facebook.com/docs/facebook-login/access-tokens).\nF(b)arc supports app access tokens and user access tokens.\n\nF(b)arc can retrieve an app access token using the app id and app secret. However, there\nare some nodes that cannot be retrieved with an app access token, thus a user access token\nis recommended.\n\nA user access token allows retrieving more nodes than an app access token (but as used\nin f(b)arc is still limited to public data). There are two types of user access tokens:\nshort-lived and long-lived tokens. Short-lived access tokens are valid for\naround an hour; long-lived access tokens for a few months. Long-lived user access tokens\nare retrieved using a short-lived user access tokens and the app id and app secret.\n\nWhen given a short-lived access token (e.g., with the `configure` command), f(b)arc will\nretrieve and store a long-lived access token. You can get a short-lived access token from\n[https://developers.facebook.com/tools/accesstoken/](https://developers.facebook.com/tools/accesstoken/).\n\nF(b)arc will warn you when you're long-lived user access token is going to expire.\n\n### Graph\nThe graph command will retrieve the graph for a node (or use the graphs command to retrieve the graphs for\nmultiple nodes provided in files or stdin). The node is identified by a node id (e.g., 1191441824276882),\nname (e.g., WhiteHouse) or a Facebook url (e.g., https://www.facebook.com/WhiteHouse/).\n\nThe node graph is retrieved according to the specified definition. If the type of a node is not\nknown, provide a definition of `discover` and f(b)arc will look up the node's type and\ntry to match it to a definition.\n\nf(b)arc finds additional nodes in the graph for a node. For example, for a Page it may find the\nAlbum nodes. The `--levels` parameter will determine the number of levels of nodes that are retrieved,\nwith the default being 1 (i.e., the graph for just the node that was requested). Each additional node\ngraph is returned separately. Setting `--levels` to 0 will continue until all nodes reachable by edges\nare exhausted. Be careful, because depending on the definitions, this could be, well, infinite. Use the\n`--exclude` parameter to exclude definitions from recursive retrieval.\n\nNote that f(b)arc may need to make multiple requests to retrieve the entire node graph so executing the\ngraph command may take some time.\n\n    python fbarc.py graph page 1191441824276882 --levels 2 --pretty\n    \nTo write the output to a file, use `--output-dir` or redirect output to a file with `\u003e \u003cfilename\u003e.jsonl`.\n\n    python fbarc.py graph page 1191441824276882 --levels 2 --pretty \u003e 1191441824276882.jsonl\n\n\n### Metadata\nThe metadata command will retrieve all of the fields and connections for a node.\n\n    python fbarc.py metadata 1191441824276882 --pretty\n    \nNote that you may not be able to actually retrieve all of those fields or connections with the\nlevel of permissions of your API keys. The API will ignore any fields or connections that you\ncannot access.\n\nThe `--template` and `--update` parameters help with creating definitions. These are described below.\n\n### Url\nThe url command will return the url for retrieving the graph of a node according to the specified\ndefinition.\n\n    python fbarc.py url page 1191441824276882\n    \n## Definitions\nDefinitions specify what fields and connections will be returned for a node type, as well as the\nsize of node batches and edges.\n\nDefinitions are represented as simple python configuration files stored in the `definitions`\nor `local_definitions` directories. Definitions in `definitions` are distributed with f(b)arc. \nYou can add additional definitions in `local_definitions`. A definition in `local_definitions`\nwith the same filename as a definition in `definitions` will take precedence.\n\nHere is an example definition for a Page:\n\n    definition = {\n        'node_batch_size': 10,\n        'edge_size': 10,\n        'fields': {\n            'albums': {'edge_type': 'album'},\n            'bio': {},\n            likes': {'edge_type': 'page', 'follow_edge': False},\n            'name': {'default': True},\n            'workflows': {'omit': True},\n            'visitor_posts': {'edge_type': 'post', 'omit_on_error': 10}\n        }\n    }\n\n`fields` is a map of names to fields or edges to be retrieved for the node.\n\nA name with an `edge_type` is an edge. The value of `edge_type` is the name of another definition.\n\nA field or edge in which `default` is `True` will always be retrieved. Otherwise, the field or\nedge will only be retrieved when the node is the primary node being retrieved. In other words,\ndefault fields or edges specify the summary for a node type; other fields or edges are part\nof the detail for a node type.\n\nA field or edge in which `omit` is `True` will be ignored. This is helpful for keeping track of fields\nor edges that have been considered, but are not to be retrieved.\n\nIf an edge has `follow_edge` set to `False` then only the default fields or edges will be retrieved\nfor that edge. That edge will be omitted from recursive retrieval. For example, for a Page, the\nlikes edge is set to not follow edges because this would cause retrieval of all pages that liked\nthis page, which is not desired.\n\nSometimes for inexplicable reasons, the Graph API will report errors for particular fields. For example,\nas of late 2017, requesting the visitor_posts edge on [SenatorTedCruz](https://www.facebook.com/SenatorTedCruz)\nwith even a limit of 1 results in a \"Please reduce the amount of data you're asking for, then retry your request\"\nerror. To handle these sorts of errors, setting `omit_on_error` will cause the field to be omitted when the specified\nerror is encountered. (Errors are identified using Facebook error codes.)\n\n`node_batch_size` and `edge_size` are optional; if omitted sensible defaults will be used. Node batch\nsize determines how many nodes of that type will be requested at a time. A larger number reduces the\nnumber of requests to the API, speeding up retrieval. Edge size determines, when retrieving an edge, \nhow many nodes to retrieve. A larger number reduces the number of paging requests, speeding up retrieval.\nIn some cases, limits for node batch size and edge size can be found in the documentation; in others,\nit must be found by trial and error.\n\nThe `--template` and `--update` parameters of the metadata command can assist with creating definitions.\n`--template` will produce a definition for a node type that includes all possible fields or edges with \n`omit` set to `True` by default. `--update` will update an existing definition with any new fields or edges \nthat are not already included in the definition. The new field or edges will be indicated by a comment \n(\"Added field\") and will have `omit` set to `True`.\n\nThe [Graph API Explorer](https://developers.facebook.com/tools/explorer) is helpful for understanding\nthe fields and connections that are available for a node type. Less helpful is the \n[Graph API Reference](https://developers.facebook.com/docs/graph-api/reference).\n\n## F(b)arc Viewer\nF(b)arc Viewer allows you to view and explore the data retrieved from the API.\n\nTo run:\n\n    python fbarc_viewer.py \u003cfilepath(s) of file containing JSON or directories containing JSON files\u003e\n\nAdding `--index` will cause indexes to be used. Indexes will reduce the amount of memory required. If indexes \ndon't already exist, they will be created:\n\nOnce F(b)arc Viewer is running, it will be available at [http://localhost:5000/](http://localhost:5000/).\n\n## Unit tests\n\nTo run unit tests:\n\n        python -m unittest discover\n\n## Limitations\n\n### Users\nFacebook limits retrieving Users. F(b)arc does not support retrieving Users from the `graph` command,\nbut it does retrieve them when connected from other nodes. The fields that are available are\nextremely limited.\n\n### Incremental archiving\nIt would be ideal to be able to perform incremental archiving, i.e., only retrieve new or updated nodes.\nFor example, only retrieve new Photos in an Album. Unfortunately, the Graph API doesn't support this.\nIn particular, [ordering](https://developers.facebook.com/docs/graph-api/using-graph-api#ordering) does\nnot appear to work as documented and if it did work, it is unclear what field is used for ordering.\n\nSuggestions on a strategy for incremental harvesting would be appreciated.\n\n## Not yet implemented\n* [Search](https://developers.facebook.com/docs/graph-api/using-graph-api#search)\n* Setup.py\n* Travis configuration\n\n## Acknowledgemens\nF(b)arc borrows liberally from [Twarc](https://github.com/docnow/twarc) in code and spirit.\n\n## Facebook policies\nPlease be mindful of the [Facebook Platform Policy](https://developers.facebook.com/policy/).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjustinlittman%2Ffbarc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjustinlittman%2Ffbarc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjustinlittman%2Ffbarc/lists"}