{"id":13837202,"url":"https://github.com/AirSage/Petrel","last_synced_at":"2025-07-10T16:32:28.905Z","repository":{"id":53535947,"uuid":"5096308","full_name":"AirSage/Petrel","owner":"AirSage","description":"Tools for writing, submitting, debugging, and monitoring Storm topologies in pure Python","archived":false,"fork":false,"pushed_at":"2022-12-14T20:22:18.000Z","size":125,"stargazers_count":247,"open_issues_count":12,"forks_count":70,"subscribers_count":31,"default_branch":"master","last_synced_at":"2024-11-10T18:50:17.867Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AirSage.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2012-07-18T13:12:12.000Z","updated_at":"2024-08-14T08:38:40.000Z","dependencies_parsed_at":"2023-01-29T00:45:54.263Z","dependency_job_id":null,"html_url":"https://github.com/AirSage/Petrel","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AirSage%2FPetrel","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AirSage%2FPetrel/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AirSage%2FPetrel/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AirSage%2FPetrel/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AirSage","download_url":"https://codeload.github.com/AirSage/Petrel/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225648369,"owners_count":17502176,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-04T15:01:03.251Z","updated_at":"2024-11-20T23:32:28.863Z","avatar_url":"https://github.com/AirSage.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":["General-Purpose Machine Learning"],"readme":"Petrel\n======\n\nTools for writing, submitting, debugging, and monitoring Storm topologies in pure Python.\n\nNOTE: The base Storm package provides storm.py, which supports Python 2.6.\nPetrel, however, requires Python 2.7 or 3.5.\n\nIf you like Petrel and are interested in more extensive documentation and examples, see the\n[book from Packt](https://www.packtpub.com/big-data-and-business-intelligence/building-python-real-time-applications-storm).\nThe book is also available from\n[Amazon](https://www.amazon.com/Building-Python-Real-Time-Applications-Storm/dp/1784392855/ref=sr_1_1?).\n\nI support Petrel in my spare time, and your purchases motivate me to continue maintaining it.\n\nOverview\n========\n\nPetrel offers some important improvements over the storm.py module provided with Storm:\n\n* Topologies are implemented in 100% Python\n* Petrel's packaging support automatically sets up a Python virtual environment for your topology and makes it easy to install additional Python packages.\n* \"petrel.mock\" allows testing of single components or single chains of related components.\n* Petrel automatically sets up logging for every spout or bolt and logs a stack trace on unhandled errors.\n\nHere's a quick example. It implements word count, the classic big data demo application.\n\nThis code defines the topology. Without Petrel, you'd have to write this code in Clojure or Java. Petrel re-implements the Java \"TopologyBuilder\" API in Python. If you've seen that class, this code will look very familiar:\n\n\u003cpre\u003e\nimport randomsentence\nimport splitsentence\nimport wordcount\n\ndef create(builder):\n    builder.setSpout(\"spout\", randomsentence.RandomSentenceSpout(), 1)\n    builder.setBolt(\"split\", splitsentence.SplitSentenceBolt(), 1).shuffleGrouping(\"spout\")\n    builder.setBolt(\"count\", wordcount.WordCountBolt(), 1).fieldsGrouping(\"split\", [\"word\"])\n\u003c/pre\u003e\n\nThis word count example is included in the Petrel repository. Here's how to run it. From the top-level directory of the Petrel repository, run:\n\n    cd samples/wordcount\n    ./buildandrun --config topology.yaml\n\nThis will build a topology JAR file and submit it to Storm, running the topology in local mode. No Ant, Maven, leinengen, or Clojure required.\n\n    ./buildandrun --config topology.yaml\n\nSimply add the topology name to the command line to run on a real cluster instead:\n\n    ./buildandrun --config topology.yaml wordcount\n \nNOTE: I'm working to improve the Petrel documentation and tooling to make it easier for beginners to become productive with Petrel quickly. If you have requests or suggestions, please log an issue in GitHub.\n\nInstallation\n============\n\n* Python 2.7\n* System packages\n  * libyaml\n  * thrift\n* Python packages (you install)\n    * virtualenv\n* Python packages (installed automatically by setup.py)\n    * simplejson 2.6.1\n    * thrift 0.8.0\n    * PyYAML 3.10\n\nInstalling Petrel as an egg\n---------------------------\n\nBefore installing Petrel, make sure Storm is installed and in your path. Run the following command:\n\n    storm version\n    \nThis will print the version of Storm active on your system, a number such as \"1.0.2\". You must use a version of Petrel whose first 3 digits match this version.\n\nInstall the egg:\n\neasy_install petrel*.egg\n\nThis will download a few dependencies and then print a message like:\n\n    Finished processing dependencies for petrel==1.0.2.0.3\n\nInstalling Petrel from source\n-----------------------------\n\nIf you plan to use use Petrel by cloning its source code repository from github.com, follow these instructions.\n\nEnsure the following tools are installed:\n\n* Storm\n    * Test with \"storm version\"\n    * Should print something like \"1.0.2\"\n* Thrift compiler\n    * Test with \"thrift -version\"\n    * Should print something like \"Thrift version 0.9.3\"\n* Maven (test with \"mvn -version\")\n\nClone Petrel from github. Then run:\n\n    cd Petrel/petrel\n    python setup.py develop\n\nThis will download a few dependencies and then print a message like:\n\n    Finished processing dependencies for petrel==1.0.2.0.3\n\nTopology Configuration\n======================\n\nPetrel's \"--config\" parameter accepts a YAML file with standard Storm configuration options. The file can also include some Petrel-specific settings. See below.\n\n```\ntopology.message.timeout.secs: 150\ntopology.ackers: 1\ntopology.workers: 5\ntopology.max.spout.pending: 1\nworker.childopts: \"-Xmx4096m\"\ntopology.worker.childopts: \"-Xmx4096m\"\n\n# Controls how Petrel installs its own dependencies, e.g. simplejson, thrift, PyYAML.\npetrel.pip_options: \"--no-index -f http://10.255.3.20/pip/\"\n\n# If you prefer, you can configure parallelism here instead of in setSpout() or\n# setBolt().\npetrel.parallelism.splitsentence: 1\n```\n\nBuilding and submitting topologies\n==================================\n\nUse the following command to package and submit a topology to Storm:\n\n\u003cpre\u003e\npetrel submit --sourcejar ../../jvmpetrel/target/storm-petrel-*-SNAPSHOT.jar --config localhost.yaml\n\u003c/pre\u003e\n\nThe above command builds and submits a topology in local mode. It will run until you stop it with Control-C. This mode is useful for simple development and testing.\n\nIf you want to run the topology on a Storm cluster, run the following command instead:\n\n\u003cpre\u003e\npetrel submit --sourcejar ../../jvmpetrel/target/storm-petrel-*-SNAPSHOT.jar --config localhost.yaml wordcount\n\u003c/pre\u003e\n\nYou can find instructions on setting up a Storm cluster here:\n\nhttps://github.com/nathanmarz/storm/wiki/Setting-up-a-Storm-cluster\n\nBuild\n-----\n\n* Get the topology definition by loading the create.py script and calling create().\n* Package a JAR containing the topology definition, code, and configuration.\n* Files listed in manifest.txt, e.g. additional configuration files\n\nDeploy and Run\n--------------\n\nTo deploy and run a Petrel topology on a Storm cluster, each Storm worker must have the following installed:\n\n* Python 2.7\n* setuptools\n* virtualenv\n\nNote that the worker machines don't require Petrel itself to be installed. Only the *submitting* machine needs to have Petrel. Each time you submit a topology using Petrel, it creates a custom jar file with the Petrel egg and and your Python spout and bolt code. These files in the wordcount example show how this works:\n\n* buildandrun\n* manifest.txt\n\nBecause Petrel topologies are self contained, it is easy to run multiple versions of a topology on the same cluster, as long as the code differences are contained within virtualenv. Before a spout or bolt starts up, Petrel creates a new Python virtualenv and runs the optional topology-specific setup.sh script to install Python packages. This virtual environment is shared by all the spouts or bolts from that instance of the topology on that machine.\n\nMonitoring\n==========\n\nPetrel provides a \"status\" command which lists the active topologies and tasks on a cluster. You can optionally filter by task name and Storm port (i.e. worker slot) number.\n\n\u003cpre\u003e\npetrel status 10.255.1.58\n\u003c/pre\u003e\n\nLogging\n=======\n\nPetrel does not write to the standard Storm logs. Instead it creates its own set of logs underneath the topology directory. For example, if you are running a topology in local mode, you'll find the Petrel log in a subdirectory of the \"storm.local.dir\" directory (whose location you can find in the Storm log). For example:\n\n./supervisor/stormdist/test+topology-1-1365766701/resources/petrel28289_randomsentence.log\n./supervisor/stormdist/test+topology-1-1365766701/resources/petrel28281_virtualenv.log\n./supervisor/stormdist/test+topology-1-1365766701/resources/petrel28281_wordcount.log\n./supervisor/stormdist/test+topology-1-1365766701/resources/petrel28285_splitsentence.log\n\nPetrel uses stdout to send JSON data to Storm. Any other code that writes to stdout (e.g. \"print\" statements) would cause the Storm worker to crash. In order to avoid this, Petrel automatically reassigns sys.stdout and sys.stderr so they write to the Petrel (i.e. Python) logger instead.\n\nWhen Storm is running on a cluster, it can be useful to send certain messages (e.g. errors) to a central machine. To help support this, Petrel sets an environment variable \"NIMBUS_HOST\". For example, the following log file configuration declares a log handler which sends any worker log messages INFO or higher to the Nimbus host.\n\n\u003cpre\u003e\n[handler_hand02]\nclass=handlers.SysLogHandler\nlevel=INFO\nformatter=form02\nargs=((os.getenv('NIMBUS_HOST') or 'localhost',handlers.SYSLOG_UDP_PORT),handlers.SysLogHandler.LOG_USER)\n\u003c/pre\u003e\n\nPetrel also has a \"StormHandler\" class sends messages to the Storm logger. This feature has not been thoroughly tested, but can be enabled by uncommenting the following line in petrel/util.py:\n\n\u003cpre\u003e\n# logging.StormHandler = StormHandler\n\u003c/pre\u003e\n\nStorm Logging\n=============\n\nWhen running Petrel applications in Storm's local mode, the console output is a mixture of Petrel and Storm logging output. This results in a lot of messages and can be hard to follow. You can control the Storm logging output by using Petrel's \"--extrastormcp\" option. Any directories specified to this option will be prepended to Storm's Java class path.\n\nFor example, create a file log4j.properties in the samples/wordcount directory, with the following contents:\n\n```\n# Set root logger level to DEBUG and its only appender to A1.\nlog4j.rootLogger=DEBUG, A1\n#\n## A1 is set to be a ConsoleAppender\nlog4j.appender.A1=org.apache.log4j.ConsoleAppender\n#\n## A1 uses PatternLayout.\nlog4j.appender.A1.layout=org.apache.log4j.PatternLayout\nlog4j.appender.A1.layout.ConversionPattern=[%d{dd MMM yyyy HH:mm:ss}] [%t] %-5p %c %x - %m%n\nlog4j.logger.org.apache=ERROR\nlog4j.logger.backtype=ERROR\nlog4j.logger.com.netflix=ERROR\n```\n\nNow run \"petrel submit\" like this:\n\n```petrel submit --extrastormcp=`pwd` --config=topology.yaml ```\n\nWith this setting, the apache, backtype, and netflix logs will be configured at ERROR level, suppressing most of the logger messages from Storm.\n\nTesting\n=======\n\nPetrel provides a \"mock\" module which mocks some of Storm's features. This makes it possible to test individual components and simple topologies in pure Python, without relying on the Storm runtime.\n\n\u003cpre\u003e\ndef test():\n    bolt = WordCountBolt()\n    \n    from petrel import mock\n    from randomsentence import RandomSentenceSpout\n    mock_spout = mock.MockSpout(RandomSentenceSpout.declareOutputFields(), [\n        ['word'],\n        ['other'],\n        ['word'],\n    ])\n    \n    result = mock.run_simple_topology([mock_spout, bolt], result_type=mock.LIST)\n    assert_equal(2, bolt._count['word'])\n    assert_equal(1, bolt._count['other'])\n    assert_equal([['word', 1], ['other', 1], ['word', 2]], result[bolt])\n\u003c/pre\u003e\n\nIn Petrel terms, a \"simple\" topology is one which only outputs to the default stream and has no branches or loops. run_simple_topology() assumes the first component in the list is a spout, and it passes the output of each component to the next component in the list.\n\nLicense\n=======\n\nThe use and distribution terms for this software are covered by the BSD 3-clause license 1.0 (http://opensource.org/licenses/BSD-3-Clause) which can be found in the file LICENSE.txt at the root of this distribution. By using this software in any fashion, you are agreeing to be bound by the terms of this license. You must not remove this notice, or any other, from this software.\n\nsetup.sh\n--------\n\nA topology may optionally include a setup.sh script. If present, Petrel will execute it before launching the spout or bolt. Typically this script is used for installing additional Python libraries. Here's an example setup.sh script:\n\n\u003cpre\u003e\nset -e\n\n# $1 will be non-zero if creating a new virtualenv, zero if reusing an existing one.\nif [ $1 -ne 0 ]; then\n    for f in Shapely==1.2.15 pyproj==1.9.0 pycassa==1.7.0 \\\n             configobj==4.7.2 greenlet==0.4.0 gevent==1.0b3\n    do\n        echo \"Installing $f\"\n        pip install $f\n    done\nfi\n\u003c/pre\u003e\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FAirSage%2FPetrel","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FAirSage%2FPetrel","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FAirSage%2FPetrel/lists"}