{"id":19914741,"url":"https://github.com/fullstorydev/pathing-utils","last_synced_at":"2025-05-03T05:31:47.135Z","repository":{"id":35829091,"uuid":"205428898","full_name":"fullstorydev/pathing-utils","owner":"fullstorydev","description":"A collection of utilities to gain user path insights from exported FullStory data.","archived":false,"fork":false,"pushed_at":"2023-09-13T19:48:59.000Z","size":2854,"stargazers_count":11,"open_issues_count":0,"forks_count":6,"subscribers_count":11,"default_branch":"master","last_synced_at":"2023-09-14T10:40:59.731Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fullstorydev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":"CODEOWNERS","security":null,"support":null}},"created_at":"2019-08-30T17:34:26.000Z","updated_at":"2023-02-12T18:07:01.000Z","dependencies_parsed_at":"2022-09-09T14:00:55.718Z","dependency_job_id":null,"html_url":"https://github.com/fullstorydev/pathing-utils","commit_stats":null,"previous_names":[],"tags_count":0,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fullstorydev%2Fpathing-utils","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fullstorydev%2Fpathing-utils/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fullstorydev%2Fpathing-utils/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fullstorydev%2Fpathing-utils/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fullstorydev","download_url":"https://codeload.github.com/fullstorydev/pathing-utils/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224354151,"owners_count":17297401,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-12T21:36:55.290Z","updated_at":"2024-11-12T21:36:55.842Z","avatar_url":"https://github.com/fullstorydev.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# pathutils\n\n`pathutils` is a collection of utilities to gain user path insights from exported FullStory data..\n\n## Local install\n\nTo use this package locally, clone or download this repo and run `pip3 install .` from this folder. (`pathutils` has been developed for Python3). It will install the package into your usual python `site-packages`.\n\n- For example usage, see the `pathing_demo.ipynb` Jupyter notebook\n- Many scripts in `pathutils` can also be executed from command line: run `pathutils/\u003cscript name\u003e -h` for usage info\n\n## Interactive example\n\nFor an immediate, interactive example of working with our sample data set, click on the following link: [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/fullstorydev/pathing-utils/master?filepath=pathing_demo.ipynb). It will take a few moments to set itself up, before providing you with a live, web-hosted Jupyter notebook based on this repo (no login required).\n\n## Background\n\nThis package was developed to showcase FullStory's data export functionality at [2019 Activate Conference](https://www.activate-conf.com/)\n\n[FullStory](http://fullstory.com) lets development teams view user experience (UX) friction through the eyes of their users. Whether or not you're already a user of FullStory, this repository offers a glimpse into what you could be doing with FullStory's data export tool to dive deeper into the user paths taken on your site. See our [blog post]() for more insight into the value of this approach.\n\n## Teaser visualizations\n\nWe are going to see how to make this [Sankey diagram](https://en.wikipedia.org/wiki/Sankey_diagram) of user paths through the [Oodatime](https://www.oodatime.com) website.\n\n![Sankey diagram img here](images/funnel_example.png)\n\nBut first, let's understand exactly what we're dealing with.\n\n## Funnels and URL Resolution\n\n### Funnels\n\nWhat is a funnel? FullStory has [written about funnels in the past](https://blog.fullstory.com/the-fullstory-on-funnels/) but, for our purposes here, a funnel is a list of URLs that a user navigates in strict succession, with no room for digressions in between steps.\n\nAs a command line argument, a funnel is a path to a JSON file, containing the word `\"funnel\"` as key, and the list of URLs as value. For an e-commerce site, a 4-step funnel JSON file might look like: `{\"funnel\":[\"https://www.example.com/aproduct\",\"https://www.example.com/cart\",\"https://www.example.com/checkout\", \"https://www.example.com/confirmation\"]}`.\n\n### URL Resolution\n\nSometimes you may want to treat several slightly different URLs as if they were the same. For instance, parts of those URLs may be autogenerated. As an example, for URL `www.example.com/userID/1234/dashboard` you may want to replace the user-specific `1234` part of the URL with a generic label, such as `\u003cID\u003e`. `pathutils` allows you to use regular expressions to create such rules. The module for creating and managing URL resolutions is called `manage_resolutions`. It has 3 functions that let you declaratively build your resolutions and store them in a local text file (defaults to `pathrules.p`):\n* `add_rule` - adds a new resolution rule: Takes 2 parameters: `regex` which is a string representing a regular expression to be matched, and `val`, which is the replacement string value.\n* `show_rules` - shows all the currently stored resolution rules.\n* `delete_rule` - deletes the rule defined by the `regex` string.\n\nFor the command line script, the options are `add`, `show`, or `delete`.\n* Example: `./manage_resolutions.py show`\n\nAny function we describe below that accepts a `useResolvedUrls` flag can work with either standard or resolved URLs. Currently the URL resolution code runs every time a function is called with `useResolvedUrls` set to `True`, which incurs a small performance penalty.\n\n## Getting Started\n\n### Load Hauser data into a dataframe\n\n[Hauser](https://github.com/fullstorydev/hauser) is FullStory's open source tool that helps you work with data export. We've provided mock data from Oodatime that we prepared using Hauser. Most of the functions in this repo take the name of the folder containing Hauser data as input. In order to generate your own data, you would need to run Hauser and instruct it to save the bundles locally in JSON format.\n\nThen you will load the Hauser data into a [Pandas dataframe](https://pandas.pydata.org/pandas-docs/stable/getting_started/overview.html), and do some pre-processing. This step is relatively time consuming, so it's performed first in the notebook, and subsequent functions take the resulting dataframe as one of the arguments.\n\nYou can load the Hauser data into a dataframe by invoking the `analyze_traffic.get_hauser_as_df` function. Set `navigate_only` parameter to `False` to load all the event types, or to `True` to only load `navigate` events (most tools expect a dataframe that only contains `navigate` events -- but you can later remove non-`navigate` events from the full dataframe by invoking `analyze_clicks.remove_non_navigation`). Having a full dataset lets you filter it by click type (to only include sessions that contain clicks of certain type) by invoking `analyze_clicks.filter_dataset_by_clicktype`.\n\nFrom here, you have several options to visualize your data set. In no particular order...\n\n### Plot a diagram of top most visited URLs\n\nAccording to the way FullStory records user activity, a \"visit\" is defined as having a `navigate` event associated with the URL (so multiple visits can occur within the same session, or even consecutively). The list of URLs and corresponding visit counts can be produced by calling the `get_popular_urls.get_popular` function. The function parameters are:\n* `events` - dataframe containing the events data\n* `useResolvedUrls` - boolean flag. If `False`, standard page URLs will be used. If `True`, these page URLs can be grouped according to provided regex resolution rules (see URL Resolution section)\n* `limit_rows` - limit rows to a specific number of events (use full dataset if 0, which is default)\n\n\n* Command line example: `./get_popular_urls.py my_hauser_folder`\n\n### Show common funnels that include the specified URL\n\nFullStory users often ask: what makes a good funnel? `pathutils` can help with providing a starting point, by showing the most common paths the users take that include the given URL. It is invoked by calling `frequent_funnel.get_top_funnels_df`. The function parameters are:\n* `funurl` - URL that should be contained in the funnel\n* `funlen` - length of the funnels to consider\n* `useResolvedUrl`\n* `events`\n* `limit_rows`\n\nThe function will return an unsorted list of all the funnels of specified length, and their frequency counts. The list can be sorted and trimmed by invoking `frequent_funnel.show_top_funnel`.\n\nThe arguments are slightly different for the command line version. The arguments are:\n* `hauser_folder` - path to the hauser folder containing data\n* `url` - URL of interest\n* `funnelLength` - length of the funnels to be returned\n* `numResults` - number of results to return\n\n\n* Command line example: `./frequent_funnel my_hauser_folder \"https://www.example.com/\" 3 4`\n\n\n### Show conversion statistics for the specified funnel\n\n`funnel_stats.get_funnel_stats` function will return the number of sessions in which the user has navigated to a specific step of the funnel immediately after having navigated all the previous steps. For example, the returned count number for \"Step 3\" of a funnel will be the number of sessions that contain navigation sequence \"Step 1, Step 2, Step 3\". The function parameters are:\n* `events`\n* `funnel`\n* `useResolvedUrls`\n* `limit_rows`\n\n\n* Command line example: `./funnel_stats.py my_hauser_folder my_funnel.json`\n\n### Generate session links for the specified funnel\n\nTo generate links to sessions containing the funnel, invoke `analyze_traffic.get_sessions_for_funnel` function with the following parameters:\n* `events`\n* `funnel`\n* `useResolvedUrls`\n* `OrgId` - your FullStory OrgId\n* `is_staging` - boolean flag to indicate that you'd like to view the sessions in staging environment. This should only be set to `True` for internal FullStory use.\n* `strict` - boolean flag. If `True`, the session has to follow the funnel steps in exact order (with no diversions between the steps). The `False` option is currently not supported.\n* `numSessions` - number of sessions to return\n\nSession link tool can currently be used from code only (not as a command line tool).\n\nYou can also get only the session links that contain a certain click type, by invoking `analyze_traffic.get_sessions_for_funnel_and_click`. The only differences from the above function are the following: `get_sessions_for_funnel_and_click` expects an additional `clicktype` parameter, and also a full dataset passed as `events` (as opposed to a `navigate`-only dataset).\n\n### Generate inflow and outflow counts for the specified funnel\n\nYou can also find most frequent entry and exit points for a funnel. Invoking `funnel_in_outs.get_in_outs` will return 2 dictionaries (ingress and egress). The ingress dictionary contains the URLs from which the users have entered the funnel, and the frequency count for each URL. The egress dictionary does the same for URLs to which the users exit after completing the funnel. The function parameters are:\n* `events`\n* `funnel`\n* `useResolvedUrls`\n* `limit_rows`\n\n\n* Command line example: `./funnel_in_outs.py my_hauser_folder my_funnel.json`\n\n### Plot sankey diagram for the specified funnel\n\nUse the `sankey_funnel.plot_funnel` function to plot a Sankey diagram of funnel statistics and inflows/outflows with the following parameters:\n* `title` - plot title\n* `events`\n* `funnel`\n* `useResolvedUrls`\n* `cutoff` - maximum number of distinct input or output branches from each Sankey node (all additional branches get grouped into \"Other\" category)\n\n\n* Command line example: `./sankey_funnel.py my_hauser_folder my_funnel.json 5 \"My Funnel Diagram\"`\n\n### View timing statistics for the specified funnel\n\nOnce you have a funnel in mind, `pathutils` allows you to view timing information about it -- that is, gather insights into how much time users are spending at every step of the funnel before advancing to the next. Invoking `analyze_timing.get_timing_for_funnel` returns a complete set of timing results. The function parameters are:\n * `eventsfull`\n * `funnel`\n * `useResolvedUrls`\n\nFrom there, invoking `analyze_timing.print_timing_averages` prints average and median value for each step of the funnel. The parameters are:\n * `funnel`\n * `funneltimes` - set of timing results produced by `get_timing_for_funnel`\n\nYou can also plot a histogram of your timing results by invoking `analyze_timing.plot_timing_data`. The parameters are:\n * `funnel`\n * `funneltimes`\n * `step` - funnel step to plot. Negative values indicate that all steps should be plotted.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffullstorydev%2Fpathing-utils","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffullstorydev%2Fpathing-utils","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffullstorydev%2Fpathing-utils/lists"}