{"id":15029548,"url":"https://github.com/qihoo360/poseidon","last_synced_at":"2025-04-08T08:16:45.468Z","repository":{"id":37735646,"uuid":"71466273","full_name":"Qihoo360/poseidon","owner":"Qihoo360","description":"A search engine which can hold 100 trillion lines of log data.","archived":false,"fork":false,"pushed_at":"2017-05-22T10:36:56.000Z","size":9604,"stargazers_count":1987,"open_issues_count":9,"forks_count":432,"subscribers_count":153,"default_branch":"master","last_synced_at":"2025-04-08T08:16:15.583Z","etag":null,"topics":["big-data","golang","map-reduce","poseidon","search-engine"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Qihoo360.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-10-20T13:32:06.000Z","updated_at":"2025-03-30T15:26:00.000Z","dependencies_parsed_at":"2022-07-14T00:50:38.787Z","dependency_job_id":null,"html_url":"https://github.com/Qihoo360/poseidon","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Qihoo360%2Fposeidon","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Qihoo360%2Fposeidon/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Qihoo360%2Fposeidon/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Qihoo360%2Fposeidon/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Qihoo360","download_url":"https://codeload.github.com/Qihoo360/poseidon/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247801175,"owners_count":20998339,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["big-data","golang","map-reduce","poseidon","search-engine"],"created_at":"2024-09-24T20:10:59.962Z","updated_at":"2025-04-08T08:16:45.431Z","avatar_url":"https://github.com/Qihoo360.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 波塞冬：Poseidon\n\n波塞冬，是希腊神话中的海神，在这里是寓意着海量数据的主宰者。\n\nPoseidon 系统是一个日志搜索平台，可以在数百万亿条、数百PB大小的日志数据中快速分析和检索特定字符串。\n360公司是一个安全公司，在追踪 APT（高级持续威胁）事件时，经常需要在海量的历史日志数据中检索某些信息，\n例如某个恶意样本在某个时间段内的活动情况。在 Poseidon 系统出现之前，都是写 Map/Reduce 计算任务在 Hadoop 集群中做计算，\n一次任务所需的计算时间从数小时到数天不等，大大制约了 APT 事件的追踪效率。\nPoseidon 系统就是为了解决这个需求，能在几秒钟内从数百万亿条规模的数据集中找出我们需要的数据，大大提高工作效率；\n同时，这些数据不需要额外存储，仍然存放在Hadoop集群中，节省了大量存储和计算资源。该系统可以应用于任何结构化或非结构化海量(从万亿到千万亿规模)数据的查询检索需求。\n\n# [Quick Start](docs/get_started.md)\n\n# 所用技术\n\n- 倒排索引：构建日志搜索引擎的核心技术\n- Hadoop：用于存放原始数据和索引数据，并用来运行Map/Reduce程序来构建索引\n- Java：构建索引时是用Java开发的Map/Reduce程序\n- Golang：检索程序是用Golang开发的\n- Redis/Memcached：用于存储 *Meta* 元数据信息\n\n\n# 目录结构\n\n### builder\n\n这里存放的是数据生成工具\n\n- doc ：将原始日志转换为Poseidon格式的数据。\n- docmeta ：将Doc相关的元数据信息写入NoSQL库中的工具。\n- index ：从原始日志生成倒排索引数据的程序工具，是Hadoop 的 Map/Reduce 作业程序。\n- indexmeta ：将倒排索引的元数据写入NoSQL库中的工具。\n\n### common\n\n目前仅仅用来存放该项目中用到的 `protobuf` 定义\n\n### docs \n\n存放了相关的技术文档。\n\n* 项目设计文档\n    * [设计思路和原理(2016上海QCon大会分享PPT)](docs/design_detail.pdf) [QCon现场视频连接](http://mp.weixin.qq.com/s/UAzWDt7flOVXYdSF0I7LyQ)\n    * [如何构建倒排索引](docs/build_inverted_index.md)\n    * [术语解释](docs/component.md)\n    * [构建倒排索引时所需的配置文件模板的说明](docs/config.md)\n    * [快速开始](docs/get_started.md)\n* 微服务\n    * [HDFS数据读取微服务 hdfsreader](docs/hdfs_reader.md)\n    * [ID生成中心微服务 idgenerator](docs/id_generator.md)\n    * [元数据存取微服务 meta](docs/meta.md)\n    * [核心搜索引擎服务 searcher](docs/searcher.md)\n    * [搜索引擎代理服务 proxy](docs/proxy.md)\n\n\n### service\n\n这里存放的是各个HTTP微服务服务的程序\n\n* [hdfsreader](docs/hdfs_reader.md) ：读取HDFS中某个文件路径的一段数据。 \n    * /service/hdfsreader\n* [idgenerator](docs/id_generator.md) ：全局的ID生成中心\n    * /service/idgenerator\n* [meta](docs/meta.md) ：针对存放Meta信息的NoSQL提供统一的HTTP接口服务\n    * /service/meta/business/doc/get : DocGzMeta 信息查询接口\n\t* /service/meta/business/doc/set : DocGzMeta 信息更新接口\n    * /service/meta/business/index/get : InvertedIndexGzMeta 信息查询接口\n\t* /service/meta/business/index/set : InvertedIndexGzMeta 信息更新接口\n* [searcher](docs/searcher.md) ：Poseidon搜索引擎的核心检索服务\n* [proxy](docs/proxy.md) ：searcher的一个代理，并能实现跨时间的查询服务\n* allinone ： 为简化部署，将 idgenerator/meta/searcher/proxy 四个微服务集成在一个进程中，提供统一的服务接口\n\n### 其他\n\n* qq交流群：21557451\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqihoo360%2Fposeidon","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fqihoo360%2Fposeidon","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqihoo360%2Fposeidon/lists"}