DB Engines Types
Pinterest Switches from OpenTSDB to Their Own Time Series Database
Prophet: TSDB forecasting library (wrapper around Stan), particularly
approachable place to use Bayesian Inference for forecasting use cases
- Prophet is a procedure for forecasting time series data based on an
additive model where non-linear trends are fit with yearly, weekly, and daily
seasonality, plus holiday effects. It works best with time series that have
strong seasonal effects and several seasons of historical data. Prophet is
robust to missing data and shifts in the trend, and typically handles outliers well.
dbEngine: Data Topology compared
Key-value: wide-Column: Document
o -- O o -- O O O O ... O O→O→O...
o -- O o -- O O O O ... O ↘
o -- O o -- O O O O ... O O→O
... ... ↘
- represent data in graph structures of nodes (entities) and edges(relations).
- Relations are are first class citizens (vs foreign keys in SQL ddbbs,
"Relational DDBBs lack relationships, some queries are very easy, some
others require complex joins" ).
- BºModeling real world is easier than in SQL/RDBMs. º
BºAlso, they are resillient to model/schema changes.º
- Sort of "hyper-relational" databases.
- Two properties should be considered in graph DDBBs:
- native optimized graph storage
- Non native (serialization to relational/Document/key-value/... ddbbs.
-ºindex-free adjacencyº: ("true graph ddbb"), connected nodes physically
"point" to each other in the database.
BºMuch faster when doing graph-like queries.º Query performance
remains (aproximately) constant in "many-joins" scenarios, while RDBMs
degrades exponentially with JOINs.
Time is proportional to the size of the path/s traversed by the query,
(vs subset of the overall table/s).
Rº(usually) they DON'T provide indexes on all nodes:º
Rº- Direct access to nodes based on attribute values is not possibleº
Rº or slower.º
-ºnon-index-free adjacencyº: External indexes are used to link nodes.
BºLower memory use.º
Native |·Infi.Graph ·OrientDB ·Neo4j
|·Titan ·Affinity ·*dex
Processing | ·Franz Inc
vvvvv | ·Allegro Graph
non ← Graph → Native
- Graph Comute Engines execute graph algorithms against large datasets.
- Example algorithms include:
- Idetify clusters in data
- Answer "how many average relationships do node have"
- Find "paths" to get from one node to another.
- It can be independent of the underlying storage or not.
- They can be classified by:
- in-memory/single machine (Cassovary,...)
- Distributed: Pegasys, Giraph, ... Most of the based on the white-paper
BºPregel: A system fro large-scale graph processingº (2010 ACM)
"Many practical computing problems concern large graphs. ... billions of
vertices, trillions of edges ... In this paper we present a
computational model suitable for this task. "
- GraphQL ISO standard Work in place integrating the
ideas from openCypher (Neo4j contribution), PGQL, GSQL,
and G-CORE languages.
- OpenCypher support:
- Neo4jDB - Cypher for Spark/Gremin
- AgensGraph - Mengraph
- RedisGraph - inGraph
- SAPHANA Graph - Cypher.PL
- AGE: @[https://www.postgresql.org/about/news/2050/]
- multi-model graph database extension for PostgreSQL
- Support for Subset of Cypher Expressions through
- AWS Neptune
- Azure Cosmos DB
- Datastax Enterprise
- there areºtwo primary implementationsºboth consisting of
nodes(or "vertices") and directed edges (or "links").
Data models looking slightly different on each implementation.
Both allow properties (attribute/value pairs) to be
associated with nodes.
-ºgraphs—property graphs (PG)º
They also allow attributes for edges.
- no open standards exist for schema definition,
query languages, or data interchange formats.
-ºW3C Resource Description Framework (RDF)º
Node properties treated simply as more edges.
(techniques exists to express edge properties in RDF).
- graphs can be represented as edges or "triples":
edge triple = ( starting point, label , end point) or
edge triple = (subject , "predicate, object )
(Also called "triple stores")
- RDF forms part of (a suite of) standardized W3C specs.
built on other web standards, collectively known as
BºSemantic Web, or Linked Dataº. They include:
- schema languages (RDFS and OWL):
- declarative query language (SPARQL)
- serialization formats
- Supporting specifications:
- mapping RDBMs to RDF graphs
-Bºstandardized framework for inferenceº
Bº(ex, drawing conclusions from data)º
PG resemble conventional data structures in an isolated
application or use case, whereas RDF are originally designed
to support interoperability and interchange across
independently developed applications.
"...Because developers ultimately just want to do graphs, you
can choose to do fast Apache TinkerPop Gremlin traversals for
property graph or tuned SPARQL queries over RDF graphs..."
open source, vendor-agnostic, graph computing framework.
When a data system is TinkerPop-enabled, users are able
to model their domain as a graph and analyze that graph
using the Gremlin graph traversal language.
...Sometimes an application is best served by an in-memory,
transactional graph database. Sometimes a multi-machine
distributed graph database will do the job or perhaps the
application requires both a distributed graph database for
real-time queries and, in parallel, a Big(Graph)Data processor
for batch analytics.
Distributed Graph DB
The Web Ontology Language (OWL) is a family of knowledge
representation languages for authoring ontologies. Ontologies are a
formal way to describe taxonomies and classification networks,
essentially defining the structure of knowledge for various domains:
the nouns representing classes of objects and the verbs representing
relations between the objects. Ontologies resemble class hierarchies
in object-oriented programming but there are several critical
differences. Class hierarchies are meant to represent structures used
in source code that evolve fairly slowly (typically monthly
revisions) whereas ontologies are meant to represent information on
the Internet and are expected to be evolving almost constantly.
Similarly, ontologies are typically far more flexible as they are
meant to represent information on the Internet coming from all sorts
of heterogeneous data sources. Class hierarchies on the other hand
are meant to be fairly static and rely on far less diverse and more
structured sources of data such as corporate databases.
- support E-R data model.
- table schema defined by the
table name and fixed number
of attributes and data types.
- A record (=entity) corresponds
to a row in the table.
- basic operations are defined on
- CRUD: Create/Read/Update/Delete
- set ops: union|intersect|difference
- subset selection defined by filters
- Projection of subset of table columns
- JOIN: combination of:
- TX ACID control
- user management
- Ops defined in standard SQL
- beginning of 1980s
- Most widely used DBMS
- @[http://myrocks.io/] ← Built on top of key-value RocksDB.
optimized for SSD/Memory.
- SQLite / DQLite
- MySQL compatible
- Rust/go written
- "infinite" horizontal scalability
- strong consistency
- SQL Server
- Hive (https://db-engines.com/en/system/Hive)
- Home: https://hive.apache.org/
- RºWARNº: No Foreign keys, NO ACID
- data warehouse software facilitates reading,
writing, and managing large datasets residing in
distributed storage using SQL. (Hadoop,...)
- 2.0+ support Spark as execution engine.
- Implemented in Java
- supports analysis of large datasets
stored in Hadoop's HDFS and
compatible file systems such as
Amazon S2 filesystem.
- SQL-like DML and DDL statements
Traditional SQL queries implemented in
MapReduce Java API to execute SQL apps:
- necessary SQL abstraction provided to
integrate SQL-like queries (HiveQL) into
the underlying Java without the need
for low-level queries.
- Hive aids portability of SQL-based apps
- JDBC, ODBC, Thrift
- SQL "summary":
Beginner: Intermediate:AA Advanced:
1 What and Why of SQL? 5 LIKE, AND/OR and DISTINCT 9 JOINT ... UNIONS ...
2 Types of SQL 6 NULLs and Dates 10 OVER ... PARTITION BY ...
3 Table, Rows and Colums 7 COUNT, SUM, AVG ... GROUP BY ... HAVING... 11 STRUCT ... UNNEST ..
4 SELECT ... FROM ... 8 Alias, Sub-queries and WITH ...AS .. 12 Reguar Expresions
WHERE ... ORDER BY ...
Dqlite (“distributed SQLite”) extends SQLite across a cluster of machines,
automatic failover and high-availability to keep your application running. It
uses C-Raft, an optimised Raft implementation in C, to gain high-performance
transactional consensus and fault tolerance while preserving SQlite’s
outstanding efficiency and tiny footprint.
- SchemaSpy is generates database to HTML documentation, including Entity Relationship diagrams.
-ºsimplestºform of DBMS.
- store pairs of keys and values
- not adequate for complex apps.
Low data consistency in distributed clusters,
when compared to Registry Based.
- value is not ussually "big" (when compared
to wide-column DDBBs like Cassandra,..).
Not designed for high-data-layer apps, but for low-level software tasks.
- extended forms allows to sort the keys, enabling range queries as well as an
ordered processing of keys.
- Can evolve to document stores and wide column stores.
-ºRedisº: Cache/RAM-Memory oriented. Not strong cluster consistency
like etcd / Consul / Zookeeper.
- Comparative: redis (cache/simple key-value ddbb)
etcd (key-value Registry DDBB):
- etcd is designed with high availability in mind,
being distributed by default - with RAFT consensus algorithm -
and providing consistency through node failures.
- etcd is persisted to disk by default. Redis is not.
(logs and snapshots are possibles)
- Redis/memecached is a blazing fast in-memory key-value store
- etcd read performance is good for most purposes, but 1-2 orders
of magnitude lower than redis. Write gap is even bigger.
- Comparative: redis (cache/simple key-value ddbb)
memcached (cache/simple key-value ddbb)
- redis: able to store typed data (lists, hashes, ...)
and supporting lot of different programming patterns out of
the box, while memcached doesn't.
- ºnon-distributedº easy-to-use Java library for data caching
- A Cache is similar to ConcurrentMap, but not quite the same. The most
fundamental difference is that a ConcurrentMap persists all elements that
are added to it until they are explicitly removed. A Cache on the other
hand is generally configured to evict entries automatically, in order to
constrain its memory footprint. In some cases a LoadingCache can be useful
even if it doesn't evict entries, due to its automatic cache loading.
- Designed for caching.
- Hazelcast clients, by default, will
connect to all cache cluster nodes
an know about the cluster partition
table to route request to the correct
- Java JCache compliant
- Advanced cache eviction algorithms
based on heuristics with sanitized O(1)
- LabelDB ("Ethereum State"): Non distributed, with
focus in local-storage persistence.
- dev lang:ºRustº
- Incubation Kubernetes project
- used also for TiDB
- built on earlier work on LevelDB
- core building block for fast key-value server.
- Focus: storing data on flash drives.
- It has a Log-Structured-Merge-Database (LSM) design with
flexible tradeoffs between Write-Amplification-Factor (WAF),
Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF).
- It has multi-threaded compactions, making itºespecially suitableº
ºfor storing multiple terabytes of data in a singleº
ºlocal (vs distributed) database.º.
ºIt can be used for big-data processing on single-node infrastructureº
- Used by Geth/Besu/... and other Ethereum clients ...
- "NoSQL" DBMS for search of content
- Optimized for:
- complex search expressions
- Full text search
- reducing words to stem
- Ranking and grouping of results
- Geospatial search
- Distributed search for high
- Eclipse Hawk:
heterogeneous model indexing framework:
- indexes collections of models
transparently and incrementally into
NoSQL DDBB, which can be queried
- can mirror EMF, UML or Modelio
models (among others) into a Neo4j or
OrientDB graph, that can be queried
with native languages, or Hawk ones.
Hawk will watch models and update
the graph incrementally on change.
- Managing time series data
- very high TX load:
- designed to efficiently collect,
store and query various time
- Ex use-case:
SELECT SENSOR1_CPU_FREQUENCY / SENSOR2_HEAT'
joins two time series based on the
overlapping areas of time providing
- PostgreSQL optimized for Time Series.
by modifying the insert path,
execution engine, and query planner
to "intelligently" process queries
- PostgresSQL wire protocol.(TODO)
- "Query 1.6B rows in millisecs"
- Relational Joins + Time-Series Joins
- Unlimited Transaction Size
- Cross Platform.
- Java Embedded.
- Telegraf TCP/UDP
- Graphite. Simple system that will just:
- Store numeric time series data
- Render graphs of this data
- collect data (Carbon needed)
- render graphs (external apps exists)
- See also: Cortex: horizontally scalable, highly available,
multi-tenant, long term storage for Prometheus.
- CNCF project,ºkubernetes friendlyº
- GoLang based:
- binaries statically linked
- easy to deploy.
- Many client libraries.
- monitoring metrics analyzer
- Highly dimensional data model.
Time series are identified by a
metric name and a set of
- stores all data as time series:
streams of timestamped values
belonging to same metric and
same set of labeled dimensions.
- Multi-mode data visualization.
- Grafana integration
- Built-in expression browser.
- PromQL Query language allowing
to select+aggregate time-series
data in real time.
- result can either be shown as
graph, tabular data or consumed
by external HTTP API.
- Precise alerts based on PromQL.
- "Singleton" servers relying only
on local storage.
(optionally remote storage).
- Uber M3
- Large Scale Metrics Platform
""" built to replace Graphite+Carbon cluster,
and Nagios for alerting and Grafana for
dashboarding due to issues like
poor resiliency/clustering, operational
cost to expand the Carbon cluster,
and a lack of replication""".
- cluster management, aggregation,
collection, storage management,
a distributed TSDB
- M3QL query language (with features
not available in PromQL).
- tagging of metrics.
- local/remote integration similar to
the Prometheus Thanos extension providing
cross-cluster federation, unlimited
storage and global querying across
- query engine: single global view
without cross region replication.
- Redis TimeSeries:
- By default Redis is just a key-value store.
RedisTimeSeries simplifies the use of Redis for time-series use
cases like IoT, stock prices, and telemetry.
- See also:
How to Use Redis TimeSeries with Grafana for Real-time Analytics
M3DB: (Uber) OOSS distributed timeseries database
- ability to shard its metrics into partitions,
replicate them by a factor of three,
and then evenly disperse the replicas across separate
- Resource Description Framework stores
ºdescribes information in triplets:º
- Originally used for describing
- Today often used in semantic web.
- RDS is a subclass of graph DBMS:
^ ^ ^
node edge node
but it offer specific methods
beyond general graph DBMS ones. Ex:
SPARQL, SQL-like query lang. for
RDF data, supported by most
- Amazon Neptune
- Apache Rya:
- format for mapping JSON data into the RDF semantic graph model, as defined by [JSON-LD].
The project started at the U.S. government's Laboratory for Telecommunication
Sciences with an initial research paper published in 2012.
Among Rya's users is the U.S. Navy, which is using the open source technology
in a number of efforts, including one for autonomous drones. The project got
started because there was a need for a scalable RDF triple store and no
existing system that met all requirements, said Adina Crainiceanu, vice
president of Apache Rya and associate professor of computer science at the U.S.
- Specialized key/value DBMS:
-ºtwo processes must existsº:
-ºService registration processº
-ºService discovery processº
- let final-app-services query
- other aspects to consider:
- auto-delete of non-available services
- Support for replicated services
- Remote API is provided
- originated from Hadoop ecosystem.
now is core part of most cluster based Apache
projects (hadoop, kafka, solr,...)
- data-format similar to file system.
- cluster mode.
- Java plus big number of dependencies
- Still used by kafka for config but
plans exists to replace.
@[https://cwiki.apache.org/confluence/display/KAFKA/KIP-500%3A+Replace+ZooKeeper+with+a+Self-Managed+Metadata+Quorum] - 2019-11-06 -
- distributed versioned key/value store
- HTTPv2/gRPC remote API.
- based on previous experiences with ZooKeeper.
- hierarchical config.
- Very easy to deploy, setup and use
(implemented in go with self-executing binary)
- reliable data persistence
- very good doc
- See etcd vs Consul vs ZooKeeper vs NewSQL
- needs to be combined with few
third-party tools for serv.discover:
- etcd-registrator: keep updated list
of docker containers
- etcd-registrator-confd: keep updated
- Core-component of Kubernetes for cluster
- strongly consistent datastore
- multidatacenter gossip protocol
for dynamic clusters
- hierarchical key/value store
- adds the notion of app-service
- "watches" can be used for:
- sending notifications of
- (HTTP, TTLs , custom)
health checks and
- embedded service discovery:
no need to use third-party one
(like etcd). Discovery includes:
- node health checks
- app-services health checks
- Consul provides a built in framework
for service discovery.
(vs etcd basic key/value + custom code)
- Clients just register services and
perform discovery using the DNS
or HTTP interface.
- out of the box native support for
- template-support for config files.
- Web UI: display all services and nodes,
monitor health checks, switch
from one datacenter to another.
- doozerd (TODO)
- Comparision chart:
"etcd" name refers to unix "/etc" folder and "d"istributed system.
- etcd is designed as a general substrate for large scale distributed systems.
These are systems that will never tolerate split-brain operation and are
willing to sacrifice availability to achieve this end. etcd stores metadata in
a consistent and fault-tolerant way. An etcd cluster is meant to provide
key-value storage with best of class stability, reliability, scalability and
- designed to:
reliably (99.999999%) store infrequently updated data
reliable watch queries.
- multiversion persistent key-value store of key-value pairs:
- inexpensive snapshots
- watch history events ("time travel queries").
- key-value store is effectively immutable unless
store is compacted to to shed oldest versions.
- flat binary key space.
- key space:
- lexically sorted index on byte string keys
→ soºrange queries are inexpensiveº.
- Each key/key-space maintains multiple revisions starting with version 1
- revisions are indexed as well
→ºranging over revisions with watchers is efficientº
- generation: key life-span, from creation to deletion.
1 key ←→ 1+ generation.
- data stored as key-value pairs in a persistent b+tree.
-ºkey of key-value pair is a 3-tuple (revision, sub, type)º.
key-ID special key-type
in rev. (deleted key,..)
- In our tests, etcd 3.4.3 lived up to its claims for key-value operations: we
observed nothing but strict-serializable consistency for reads, writes, and
even multi-key transactions, during process pauses, crashes, clock skew,
network partitions, and membership changes. Strict-serializable behavior was
the default for key-value operations; performing reads with the serializable
flag allowed stale reads, as documented.
- etcd v3+ is fully committed to (HTTPv2)gRPC.
For example: v3 auth has connection based authentication,
rather than v2's slower per-request authentication.
Wide Column Stores
(also called extensible record stores)
-ºshort of two-dimensionalº ºkey-value stores.º
where key set as well as value can be "huge"
(thousands of columns per value)
- Compared to Service Registry DDBBs (Zookeeper,
etcd3) that are also key-value stored :
- Ser.Registry ddbb scale much less (many
times a single node can suffice) but are
strongly consistent. Wide Column Stores
can scale to ten of thousands of nodes but
consistency is eventual / optimistic.
- column names and record keys ºare not fixedº
- not to be confused with column oriented storage
of RDMS. Last one is an internal concept
for improving performance storing
table-data column-by-column vs
- SQL-like SELECT, DML and DDL statements (CQL)
- handle large amounts of data across farm
of commodity servers, providing high availability
with no single point of failure. (peer-to-peer)
- Allows for different replication strategies
(sync/async for slow-secure/fast-less-secure)
- logical infrastructure support for clusters
spanning multiple datacenters in different
-ºIntegrated Managed Data º
ºLayer Solutions with Spark,º
ºKafka and Elasticsearch. º
- compatible with Cassandra (same CQL and Thrift protocols,
and same SSTable file formats)
with higher throughputs and lower latencies (10x).
- C++14 (vs Cassandra Java)
-ºcustom network managementºthat minimizes resource by
ºbypassing the Linux kernel. No system call is requiredº
to complete a network request. Everything happens in userspace.
-ºSeastar async lib replacing threadsº
- sharded design by node:
- each CPU core handles a data-subset
- authors claim to achieve much better
performance on modern NUMA SMP archs,
and to scale very well with the number
-ºUp to 2 million req/sec per machineº
-ºHBaseº. Apache alternative to Google BigTable
Internally uses skip-lists:
REF: Google bigTable-osdi06.pdf
-ºdistributed key-value storeº
- developed at Facebook.
- data and processing spread out across many commodity servers
- highly available service without single point of failure
allowing replication even across multiple data centers.
- Possibility to choose synchronous/asynchronous replication
for each update.
-ºfully distributed DB, with no master DBº
- Reads are linearly scalable with the number of nodes.
It scalates (much)better cassandra that comparables systems
like hbase,voldermort voldtdb redis mysql
Writes are linearyly scalable?
- based on 2 core technologies:
- Google's Big Table
- Amazon's Dynamo
- 2 versions of Cassandra:
- Community Edition : distributed under the Apache™ License
- Enterprise Edition : distributed by Datastax
- Cassandra is a BASE system (vs an ACID system):
BASE == (B)asically (A)vailable, (S)oft state, (E)ventually consistent)
ACID == (A)tomicity, (C)onsistency, (I)solation, (D)urability.
Oº- BASE implies that the system is optimistic and accepts that º
Oº the database consistency will be in a state of fluxº
Oº- ACID is pessimistic and it forces consistency at the endº
Oº of every transaction.º
cluster 1 ←→ 1+ Keyspace 1 ←→ N Column Family ←→ 0+ Row
just one sharing similar - collection of
structure sorted columns.
┌─ CQL lang. used to access data.
│OºKEYSPACEº: container for app.data, similar to a RDMS schema │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Column Family 1 │ │
│ └─────────────────────────────────────────────────┘ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Column Family 2 │ │
│ └─────────────────────────────────────────────────┘ │
│ ... │
│ ┌─────────┐ ┌─────────────────────────────────────────────────┐ │
│ │Settings │ │ Column Family N Col Col2 ... ColN │ │
│ ├─────────┤ │ Key1 Key2 ... KeyN │ │
│ │*Replica.│ │ ┌────────┐ ┌──────────────────────────────────┐ │ │
│ │ Strategy│ │ │Settings│ │RowKey1│Val1_1│Val2_1│ │ValN_1│ │ │ │
│ └─────────┘ │ │ │ │RowKey2│Val2_2│Val2_2│ │ValN_2│ │ │ │
│ │ └────────┘ │ ... │ │ │
│ │ └──────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────┘ │
RowKey := Unique row UID
Used also to sharde/balance
ValX_Y := value ||
BASIC INFRASTRUCTURE ARCHITECTURE:
ºNodeº: (Cassandra peer instance) ºRackº:
- data storage (on File System - Set of nodes
- The data is balanced
across nodes based
on the RowKey f each
- Commit log to ensure
- Sequentially written automatically
┌── Node "N" choosen cluster
│ based on RowKey ↑
Input → Commit Log → Index+write →(memTable → Write to ──→ SSTable
@Node N toºmemTableº full) ºSSTableº Compaction
└──┬───┘ └──┬───┘ └───┬────┘
- in-memory structure - (S)orted (S)tring Table - Periodically consolidates
- "Sort-of" write-back in File System or HDFS storage by discarding
cache data marked for deletion
with a tombstone.
- repair mechanisms exits
to ensure consistency
- Logical Set of Racks. - 1+ Datacenters
- Ex: America, Europe,... - full set of nodes
- Replication set at which map to a single
Data Center bounds complete token ring
- similar syntax to SQL
- works with table data.
- Available Shells: cqlsh | DevCenter | JDBC/ODBC drivers
- Node where client connects for a read/write request
acting as a proxy between the client and Cassandra
Cassandra Key Components
A peer-to-peer communication protocol in which nodes periodically exchange
state information about themselves and about other nodes they know about. This
is similar to hear-beat mechanism in HDFS to get the status of each node by the
A peer-to-peer communication protocol to share location and state
information about the other nodes in a Cassandra cluster. Gossip information is
also stored locally by each node to use immediately when a node restarts.
The gossip process runs every second and exchanges state messages with up
to three other nodes in the cluster. The nodes exchange information about
themselves and about the other nodes that they have gossiped about, so all
nodes quickly learn about all other nodes in the cluster.
A gossip message has a version associated with it, so that during a gossip
exchange, older information is overwritten with the most current state for a
To prevent problems in gossip communications, use the same list of seed
nodes for all nodes in a cluster. This is most critical the first time a node
starts up. By default, a node remembers other nodes it has gossiped with
between subsequent restarts. The seed node designation has no purpose other
than bootstrapping the gossip process for new nodes joining the cluster. Seed
nodes are not a single point of failure, nor do they have any other special
purpose in cluster operations beyond the bootstrapping of nodes.
Note: In multiple data-center clusters, the seed list should include at least
one node from each data center (replication group). More than a single seed
node per data center is recommended for fault tolerance. Otherwise, gossip has
to communicate with another data center when bootstrapping a node. Making every
node a seed node is not recommended because of increased maintenance and
reduced gossip performance. Gossip optimization is not critical, but it is
recommended to use a small seed list (approximately three nodes per data
Data distribution and replication
How data is distributed and factors influencing replication.
In Cassandra, Data is organized by table and identified by a primary key, which
determines which node the data is stored on. Replicas are copies of rows.
Factors influencing replication include:
Virtual nodes (Vnodes):
Assigns data ownerommended for production. It defines a node’s data center
and rack and uses gossip for propagating this information to other nodes.
There are many vtypes of snitches like dynamic snitching, simple snitching,
RackInferringSnitch, PropertyFileSnitch, GossipingPropertyFileSnitch,
Ec2Snitch, Ec2MultiRegionSnitch, GoogleCloudSnitch, CloudstackSnitch.
- Seed node
- Snitch purpose
- Coordinator node,
- replication factors,
- Voldermort: developed by Linked-In.
Cassandra Orchestration Tool by Spotify
""" ... Cstar emerged from the necessity
of running shell commands on all host
in a Cassandra cluster ....
Spotify fleet reached 3000 nodes. Example
- Clear all snapshots
- Take a new snapshot (to allow a rollback)
- Disable automated puppet runs
- Stop the Cassandra process
- Run puppet from a custom branch of our git repo
in order to upgrade the package
- Start the Cassandra process again
- Update system.schema_columnfamilies to the JSON format
- Run `nodetool upgradesstables`, which depending on
the amount of data on the node, could take hours to
- Remove the rollback snapshot
$ pip3 install cstart
"""Why not simply use Ansible or Fabric?
Ansible does not have the primitives required to run things in a topology aware
fashion. One could split the C* cluster into groups that can be safely executed
in parallel and run one group at a time. But unless the job takes almost
exactly the same amount of time to run on every host, such a solution would run
with a significantly lower rate of parallelism, not to mention it would be
kludgy enough to be unpleasant to work with.
Unfortunately, Fabric is not thread safe, so the same type of limitations apply.
All involved machines are assumed to be some sort of UNIX-like system like
OS X or Linux with python3, and Bourne style shell."""
Dynamo vs Cassandra
Dynamo vs Cassandra : Systems Design of NoSQL Databases
State-of-the-art distributed databases represent a distillation of
years of research in distributed systems. The concepts underlying any
distributed system can thus be overwhelming to comprehend. This is
truer when you are dealing with databases without the strong
consistency guarantee. Databases without strong consistency
guarantees come in a range of flavours; but they are bunched under a
category called NoSQL databases.
- also called document-oriented DBMS
- different records may have
- values of individual columns
can have dif. types
- Columns can be multi-value
- Records can have a nested structure
- Often use internal notations,
- secondary indexes in (JSON) objects
14 Things I Wish I’d Known When Starting with
- Amazon DynamoDB
- Cosmos DB
"Improved" logstat |- TheºLº in ELK
- https://www.fluentd.org/ |-*OS data Collector
- data collector for unified logging layer |- TODO:
- increasingly used Docker, GCP, | osquery.readthedocs.io/en/stable/
and Elasticsearch communities |- low-level instrumentation framework
- https://logz.io/blog/fluentd-logstash | with system analytics and monitoring
FluentD vs Logstash compared | both performant and intuitive.
ºFeaturesº |- osquery exposes an operating system
- unify data collection and consumption | as a high-performance SQL RDMS:
for better understanding of data. | - SQL tables represent concepts such
| as running processes, loaded kernel
| modules, open network connections,
Syslog Elasticsearch | browser plugins, hardware events
Apache/Nginx logs → → → MongoDB |ºFeatures:º
Mobile/Web app logs → → → Hadoop | - File Integrity Monitoring (FIM):
Sensors/IoT AWS, GCP, ... | - DNS
| - *1
|ºPrometheus Node ExporteRº |ºOthersº
|- https://github.com/prometheus/node_exporter |- collectd
|- TODO: |- Dynatrace OneAgent
|- Datadog agent
|- New Relic agent
|- Ganglia gmond
- logging backend, optimized for Prometheus and Kubernetes
- optimized to search, visualize and explore your logs natively in Grafana.
- Elasticsearch, Logstash, and Kibana
- developed and maintained by Elastic.
- essentially a NoSQL, Lucene search engine implementation.
Extracted from: @[https://docs.sonarqube.org/latest/requirements/requirements/]
""" ...the "data" folder houses the Elasticsearch indices on which a huge amount
of I/O will be done when the server is up and running. Great read ⅋ write
hard drive performance will therefore have a great impact on the overall
- Logstash: log pipeline (ingest/transform/load it into a store
(It is common to replace Logstash with Fluentd)
- Kibana: visualization layer on top of Elasticsearch.
- In productiona few other pieces might be included like:
- Kafka, Redis, NGINX, ....
-ºElasticSearch cluster solutions in practice means:º
- Deploy dozens of machines/containers.
- Server tuning
- mapping writing: i.e., - deciding how the data is:
depending of user's requirements, internal limitations, and
available hardware resources.
- log management tool based on Java, ElasticSearch and MongoDB.
Graylog can be used to collect, index and analyze
any server log from a centralized location or distributed location. We can
easily monitor any unusual activity for debugging applications and logs using
Graylog. Graylog provides a powerful query language, alerting abilities, a
processing pipeline for data transformation and much more. We also can extend
the functionality of Graylog through a REST API and Add-ons.
- gaining popularity in the Go community with
the introduction of the Graylog Collector Sidecar
written in Go.
- ... it still lags far behind the ELK stack.
- Composed under the hood of:
- Graylog Server.
- comes with alerting built into the open source version
and streaming, message rewriting, geolocation, ....
- ºIt is not a log aggregation system.º
- developed at Treasure Data,
- Adopted by the CNCF as Incubating project.
- Recommended by AWS and Google Cloud.
- Common replacement for Logstash
- It acts as a local aggregator to collect
all node logs and send them off to central
- 500+ plugins for quick and easy integrations with
different data input/outputs.
- common choice in Kubernetes environments due to:
- low memory requirements (tens of megabytes)
(each pod has a Fluentd sidecar)
- high throughput.
Cerebro: admin GUI
Elasticsearch comes as a set of blocks, and you—as a designer - are supposed
to glue them together. Yet, the way the software comes out of the box does not
cover everything. So, to me, it was not easy to see the cluster's heartbeat
all in one place. I needed something to give me an overview as well as allow me
to take action on basic things.
I wanted to introduce you to a helpful piece of software I found: Cerebro.
According to the Cerebro GitHub page:
Cerebro is an open source (MIT License) elasticsearch web admin tool built
using Scala, Play Framework, AngularJS, and Bootstrap.
Logreduce IA filter
- Quiet log noise with Python and machine learning
Distributed (Event) Stream Processors
- Patterns for streaming realtime anaylitics
- Streaming Reactive Systems & Data Pites w. Squbs
- Streaming SQL Foundations: Why I Love Streams+Tables
- Next Steps in Stateful Streaming with Apache Flink
- Kafka Streams - from the Ground Up to the Cloud
- Data Decisions with Real-Time Stream Processing
- Foundations of streamng SQL
- The Power of Distributed Snapshots in Apache Flink
- Panel: SQL over Streams, Ask the Experts
- Survival of the Fittest - Streaming Architectures
- Streaming for Personalization Datasets at Netflix
- < href="https://www.safaribooksonline.com/library/view/an-introduction-to/9781491934951/">An Introduction to Time Series with Team Apache
""" Apache Cassandra evangelist Patrick McFadin shows how to solve time-series data
problems with technologies from Team Apache: Kafka, Spark and Cassandra.
- Kafka: handle real-time data feeds with this "message broker"
- Spark: parallel processing framework that can quickly and efficiently
analyze massive amounts of data
- Spark Streaming: perform effective stream analysis by ingesting data
- Cassandra: distributed database where scaling and uptime are critical
- Cassandra Query Language (CQL): navigate create/update your data and data-models
- Spark+Cassandra: perform expressive analytics over large volumes of data
|ºKafkaº@[./kafka_map.html] | ºSparkº (TODO)
|-@[https://kafka.apache.org] | @[http://spark.apache.org/]
|- scalable,persistent and fault-tolerant | - Zeppelin "Spark Notebook"(video):
| real-time log/event processing cluster. | @[https://www.youtube.com/watch?v=CfhYFqNyjGc]
|- It's NOT a real stream processor but a |
| broker/message-bus to store stream data |
| for comsuption. |
|- "Kafka stream" can be use to add |
| data-stream-processor capabilities |
|- Main use cases: |
| - real-time reliable streaming data | ºFlinkº TODO
| pipelines between applications | - Includes a powerful windowing system
| - real-time streaming applications | supports many types of windows:
| that transform or react to the | - Stream windows and win.aggregations are crucial
| streams of data | building block for analyzing data streams.
|- Each node-broker in the cluster has an |
| identity which can be used to find other |
| brokers in the cluster. The brokers also |
| need some type of a database to store |
| partition logs. |
| (Popular) Use cases |
| BºMessagingº: good replacement for traditional |
| message brokers decoupling processing from |
| data producers, buffering unprocessed messages, |
| ...) with better throughput, built-in |
| partitioning, replication, and fault-tolerance |
| BºWebsite Activity Trackingº(original use case) |
| Page views, searches, ... are published |
| to central topics with one topic per activity |
| type. |
| BºMetricsº operational monitoring data. |
| BºLog Aggregationº replacement. |
| - Group logs from different servers in a central |
| place. |
| - Kafka abstracts away the details of files and |
| gives a cleaner abstraction of logs/events as |
| a stream of messages allowing for lower-latency |
| processing and easier support for multiple data |
| sources and distributed data consumption. |
| When compared to log-centric systems like |
| Scribe or Flume, Kafka offers equally good |
| performance, stronger durability guarantees due |
| to replication, and much lower end-to-end |
| latency. |
| BºStream Processingº: processing pipelines |
| consisting of multiple stages, where raw input |
| data is consumed from Kafka topics and then |
| aggregated/enriched/transformed into new |
| topics for further consumption. |
| - Kafka 0.10.+ includes Kafka-Streams to |
| easify this pipeline process. |
| - alternative open source stream processing |
| tools include Apache Storm and Samza. |
| BºEvent Sourcing architectureº: state changes |
| are logged as a time-ordered sequence of |
| records. |
| BºExternal Commit Log for distributed systemsº: |
| - The log helps replicate data between |
| nodes and acts as a re-syncing mechanism |
| for failed nodes to restore their data. |
-Flink vs Spark Storm:
Check also: Streaming Ledger. Stream ACID TXs on top of Flink
Confluent Schema Registry provides a serving layer for your metadata. It
provides a RESTful interface for storing and retrieving Apache Avro® schemas.
It stores a versioned history of all schemas based on a specified subject name
strategy, provides multiple compatibility settings and allows evolution of
schemas according to the configured compatibility settings and expanded Avro
support. It provides serializers that plug into Apache Kafka® clients that
handle schema storage and retrieval for Kafka messages that are sent in the
- LogDevice has been compared with other log storage systems
like Apache BookKeeper and Apache Kafka.
- The primary difference with Kafka
seems to be the decoupling of computation and storage
- Underlying storage based on RocksDB, a key value store
also open sourced by Facebook
Jonas Boner ... talked about event driven services (EDA) and
event stream processing (ESP)... on distributed systems.
... background on EDA evolution over time:
- Tuxedo, Terracotta and Staged Event Driven Architecture (SEDA).
ºevents represent factsº
- Events drive autonomy in the system and help to reduce risk.
- increase loose coupling, scalability, resilience, and traceability.
- ... basically inverts the control flow in the system
- ... focus on the behavior of systems as opposed to the
structure of systems.
- TIP for developers:
ºDo not focus on just the "things" in the systemº
º(Domain Objects), but rather focus on what happens (Events)º
- Promise Theory:
- proposed by Mark Burgess
- use events to define the Bounded Context through
the lense of promises.
quoting Greg Young:
"""Modeling events forces you to have a temporal focus on what’s going on in the
system. Time becomes a crucial factor of the system."""
""" Event Logging allows us to model time by treating event as a snapshot in time
and event log as our full history. It also allows for time travel in the
sense that we can replay the log for historic debugging as well as for
auditing and traceability. We can replay it on system failures and for data replication."""
Boner discussed the following patterns for event driven architecture:
- Event Loop
- Event Stream
- Event Sourcing
- CQRS for temporal decoupling
- Event Stream Processing
Event stream processing technologies like Apache Flink, Spark Streaming,
Kafka Streams, Apache Gearpump and Apache Beam can be used to implement these
DDDesign, CQRS ⅋ Event Sourcing
Domain Driven Design, CQRS and Event Sourcing
- distributed, RESTful (all-document-type) search
and analytics engine
- Implemented on top of Lucene
- Developed alongside a data-collection and
log-parsing engine called Logstash, and the
analytics and visualisation platform Kibana.
- The three products form the "Elastic Stack"
(formerly the "ELK stack"
- At the heart of the ELK Stack, data is
- blazing-fast, search platform built on Apache Lucene
- Can be defined by next feautes:
- Monitoring of services/messages passed between them
- wire Protocol bridge between HTTP, AMQP, SOAP, gRPC, CVS in Filesystem,...
- Scheduling, mapping, QoS management, error handling, ..
- Data transformation
- Data pipelines
- Mule, JBoss Fuse (Camel + "etc..."), BizTalk, Apache ServiceMix, ...
^ Special App. Services
E | Process Automation BPEL, Workflow
t | Application Adapters RFC, BABI, IDoc, XML-RPC, ...
e m |
r e | Application Data Consolidation MDM, OSCo, ...
p s |
r s | Application Data Mapping EDI, B2B
i a | _______________________________
s g | Business Application Monitoring
e e | _______________________________
| Traffic Monitoring Cockpit
S c |
e h | Special Message Services Ex. Test Tools
r a |
v n | Web Services WSDL, REST, CGI
i n |
c e | Protocol Conversion XML, XSL, DCOM, CORBA
e l |
| Message Consolidation N.N (data locks, multi-submit,...)
u | Message Routing XI, WBI, BIZTALK, Seeburger
| Message Service MQ Series, MSMQ, ...
Business Data charts
Low-Level Data charts
|- "A Picture's Worth a Thousand Log Lines"
|- visualize (Elasticsearch) data and navigate
| the Elastic Stack, learning understanding
| the impact rain might have on your quarterly
- time series analytics
- Integrates with Prometheus queries
- 7.0+ includes new plugin architecture, visualizations,
transformations, native trace support, ...
Messaging traditionally has two models:
- a pool of consumers may read from a server and
each record goes to one of them;
- Pros: allows to divide up the processing of data
over multiple consumer instances, which
lets you scale your processing.
- Cons: queues aren't multi-subscriber—once
one process reads the data it's gone.
- the record is broadcast to all consumers.
- Pros: let broadcast data to multiple processes,
- Cons: no way of scaling processing since every message
goes to every subscriber
- message oriented architecture
- Persistence (or durability until comsuption)
- Routing: point-to-point / publish-and-subscribe
- No processing/transformation of message/data
ºImplementations and standardsº
- Open standard network protocol
- Often compared to JMS:
- JMS defines API interfaces, AMQP defines network protocol
- JMS has no requirement for how messages are formed and
transmitted and thus every JMS broker can implement the
messages in a different (incompatible) format.
AMQP publishes its specifications in a downloadable XML format,
allowing library maintainers to generate APIs driven by
the specs while also automating construction of algorithms to
marshal and demarshal messages.
- brokers implementations supporting it:
RabbitMQ, ActiveMQ, Qpid, Solace, ...
- Apache Qpid (TODO)
- INETCO's AMQP protocol analyzer
- JORAM JMS + AMPQ
@[http://joram.ow2.org/] 100% pure Java implementation of JMS
- Kaazing's AMQPºWeb Clientº
- Azure Service Bus+ AMPQ
- What's wrong with AMQP?
- JBoss A-MQ, built from @[http://qpid.apache.org/]
- IBM MQLight
a cloud hosted messaging service based on AMQP
- RabbitMQ (by VMware Inc)
also supported by SpringSource
- Java JMS
- (De-)Multiplexing of messages from/into multiple messages to different recipients
- Transformation (translation of message between formats)
- "things usually get blurry - many solutions are both (message queue and message
broker) - for example RabbitMQ or QDB. Samples for message queues are
Gearman, IronMQ, JMS, SQS or MSMQ."
Message broker examples are, Qpid, Open AMQ or ActiveMQ.
- Kafka can also be used as message broker but is not its main intention
storing to disk
- Chronicle-Queue: Micro second messaging that stores everything to disk
- Chronicle-Accelerate: HFT meets Blockchain in Java platform
XCL is a new cryptocurrency project that, learning from the previous
Blockchain implementations, aims to solve the issues limiting adoption
by building an entirely new protocol that can scale to millions of
transactions per second, delivering consistent sub-second latency. Our
platform will leverage AI to control volatility and liquidity, require
low energy and simplify compliance with integrated KYC and AML support.
The XCL platform combines low latencies (sub millisecond), IoT transaction
rates (millions/s), open source AI volatility controls and blockchain for
transfer of value and exchange of value for virtual fiat and crypto
currencies. This system could be extended to other asset classes such as
securities and fixed income. It uses a federated services model and
regionalized payment systems making it more scalable than a blockchain
which requires global consensus.
The platform makes use of Chronicle-Salt for encryption and Chronicle-Bytes.
on Chronicle Core’s direct memory and OS system call access.
- Chronicle-Logger: A sub microsecond java logger, supporting standard logging
APIs such as Slf & Log4J
- Web-GUI data route+transform
- scalable directed graphs of data routing,
transformation, and system mediation logic.
- Seamless experience between design, control,
feedback, and monitoring
- Highly configurable:
- Loss tolerant vs guaranteed delivery
- Low latency vs high throughput
- Dynamic prioritization
- Flow can be modified at runtime
- Back pressure
- Data Provenance
- Track dataflow from beginning to end
- Designed for extension
- Build your own processors and more
- Enables rapid development and
- SSL, SSH, HTTPS, encrypted content, etc...
- Multi-tenant authorization and internal
(Extracted from whitepaper by Christoph Engelbert, Soft.Arch.at Hazelcast)
cache hit: data is already available in the cache when requested
(otherwise it's said cache miss)
- Caches are implemented as simple key-value stores for performance.
- Caching-First: term to describe the situation where you start thinking
about Caching itself as one of the main domains of your application.
- normally small and used to speed up the dereferencing
of previously known, fixed number of elements (e.g.
states of the USA, abbreviations of elements,...).
- Grow to their maximum size and evict the oldest or not
frequently used entries to keep in memory bounds.
ºCooperative (Distributed) cachingº
different cluster-nodes work together to build a huge, shared cache
Ussually an "intelligent" partitioning algorithm is used to balance
load about cluster nodes.
- common approach when system requires large amounts of data to be cached
- not all data is stored in the cache.
- located in chosen locations to optimize latency
- CDN (Content Delivery Network) is the best known example of this type of cache
- Works well when content changes less often.
- mostly used in conjunction with a Geographical Cache
- Using a warm-up engine a Preemptive Cache is populated on startup
and tries to update itself based on rules or events.
- The idea behind this cache addition is to reload data from any
backend service or central cluster even before a requestor wants
to retrieve the element. This keeps access time to the cached
elements constant and prevents accesses to single elements from
becoming unexpectedly long.
- Can be difficult to implement properly and requires a lot of
knowledge of the cached domain and the update workflows
ºLatency SLA Cachingº
- It's able to maintain latency SLAs even if the cache is slow
or overloaded. This type of cache can be build in two different ways.
- Having a timeout to exceed before the system either requests
the potentially cached element from the original source
(in parallel to the already running cache request) or simple
default answer, using whatever returns first.
- Always fire both requests in parallel and take whatever returns first.
(discouraged since it mostly dimiss the value of caching). Can make
sense if multiple caching layers are available.
- cache share application's memory space.
- most oftenly used in non-distributed systems.
- fastest possible access speed.
- Easy to build, but complex to grow.
-ºEmbedded Node Cachesº
- the application itself will be part of the cluster.
- kind of combination between an In-Process Cache and the
- it can either use partitioning or full dataset replication.
- CONST: Application and cache cannot be scaled independently
- these systems tend to be Cooperative Caches by having a
multi-server architecture to scale out and have the
same feature set as the Embedded Node Caches
but with the client layer on top.
- This architecture keeps separate clusters of the applications
using the cached data and the data itself, offering
the possibility to scale the application cluster and the
caching cluster independently.
-ºLeast Frequently usedº
- values that are accessed the least amount of times are
remove on memory preasure.
- each cache record must keep track of its accesses using
a counter which is increment only.
-ºLeast Recently Usedº
- values that were last used most far back in terms of time
are removed on memory preasure.
- each record keeps must track of its last access timestamp
Other evict strategies can be found at
- distributed memory object caching system
- Memcached servers are unaware of each other. There is no crosstalk, no
syncronization, no broadcasting, no replication. Adding servers increases
the available memory. Cache invalidation is simplified, as clients delete
or overwrite data on the server which owns it directly
- initially intended to speed up dynamic web applications alleviating database load
tomcat HA/scalable/fault-tolerant session manager
- supports sticky and non-sticky configurations
- Failover is supported via migration of sessions
- in-memory data structure store, used as a key-value database and cache
- Since it can also notify listener of changes in its state it can
also be used as message broker (this is the case for example
in Kubernetes, where etcd implement an asynchronous message system
amongst its componentes).
- supports data structures such as strings, hashes, lists, sets, sorted sets
with range queries, bitmaps, hyperloglogs and geospatial indexes with
- Redis has built-in replication, Lua scripting, LRU eviction, transactions
and different levels of on-disk persistence, and provides high availability
via Redis Sentinel and automatic partitioning with Redis Cluster
- based on Java
- Can be used as tcp service (distributed cache) or process-embedded
TODO: Same API for local and distributed objects?
- open source, standards-based cache that boosts performance, offloads I/O
- Integrates with other popular libraries and frameworks
- It scales from in-process caching, all the way to mixed
in-process/out-of-process deployments withºterabyte-sized cachesº
ºExample Ehcache 3 APIº:
CacheManager cacheManager =
Cache˂Long, String˃ preConfigured =
= cacheManager.getCache("preConfigured", Long.class, String.class);
Cache˂Long, String˃ myCache = cacheManager.createCache("myCache",
myCache.put(1L, "da one!");
String value = myCache.get(1L);
(simpler/lighter solution but not so escalable could be to use Google Guava Cache)
Distributed Storage with rich semantics!!!
Today’s storage systems expose abstractions which are either too low-level
(e.g., key-value store, raw-blockstore) that they require developers
to re-invent thewheels, or too high-level (e.g., relational databases,
Git)that they lack generality to support many classes of ap-plications. In
this work, we propose and implement ageneral distributed data storage
system, called UStore,which has rich semantics. UStore delivers three
keyproperties, namely immutability, sharing and security,which unify and
add values to many classes of today’sapplications, and which also open the
door for new ap-plications. By keeping the core properties within
thestorage, UStore helps reduce application development ef-forts while offering
high performance at hand. The stor-age embraces current hardware trends as key
enablers.It is built around a data-structure similar to that of Git,a popular
source code versioning system, but it alsosynthesizes many designs from
distributed systems anddatabases. Our current implementation of UStore
hasbetter performance than general in-memory key-valuestorage systems,
especially for version scan operations.We port and evaluate four
applications on top of US-tore: a Git-like application, a collaborative
data scienceapplication, a transaction management application, anda
blockchain application. We demonstrate that UStoreenables faster
development and the UStore-backed ap-plications can have better performance
than the existingimplementations
""" Our current implementation of UStore hasbetter performance than
general in-memory key-valuestorage systems, especially for version
scan operations.We port and evaluate four applications on top of
US-tore: a Git-like application, a collaborative data
scienceapplication, a transaction management application, anda
blockchain application. We demonstrate that UStore enables faster
development and the UStore-backed applications can have better
performance than the existing implementations. """
Ceph’s RADOS provides you with extraordinary data storage scalability—
thousands of client hosts or KVMs accessing petabytes to
exabytes of data. Each one of your applications can use the object, block or
file system interfaces to the same RADOS cluster simultaneously, which means
your Ceph storage system serves as a flexible foundation for all of your
data storage needs. You can use Ceph for free, and deploy it on economical
commodity hardware. Ceph is a better way to store data.
By decoupling the namespace from the underlying hardware, object-based
storage systems enable you to build much larger storage clusters. You
can scale out object-based storage systems using economical commodity hardware
, and you can replace hardware easily when it malfunctions or fails.
Ceph’s CRUSH algorithm liberates storage clusters from the scalability and
performance limitations imposed by centralized data table mapping. It
replicates and re-balance data within the cluster dynamically—elminating this
tedious task for administrators, while delivering high-performance and
See more at: http://ceph.com/ceph-storage/#sthash.KNp2tGf5.dpuf
- memory-centric distributed file system enabling reliable file sharing at memory-speed
across cluster frameworks, such as Spark and MapReduce. It achieves high performance by leveraging
lineage information and using memory aggressively. Tachyon caches working set files in memory,
thereby avoiding going to disk to load datasets that are frequently read. This enables different
jobs/queries and frameworks to access cached files at memory speed.
Tachyon is Hadoop compatible. Existing Spark and MapReduce programs can run on top of it without
any code change. The project is open source (Apache License 2.0) and is deployed at multiple companies.
It has more than 40 contributors from over 15 institutions, including Yahoo, Intel, and Redhat.
The project is the storage layer of the Berkeley Data Analytics Stack (BDAS) and also part of the
(Amazon S2, GCS,
- FUSE-based file system
- backed by several cloud storages:
- such as Amazon S2, Google Cloud Storage, Rackspace CloudFiles, or OpenStack
- S3QL is one of the most popular open-source cloud-based file systems.
- full featured file system:
- unlimited capacity
- up to 2TB file sizes
- UNIX attributes
- snapshots with copy-on-write
- immutable trees
- hardlink/symlink support, etc.
- Any bytes written to an S3QL file system are compressed/encrypted
locally before being transmitted to cloud backend.
- When you attempt to read contents stored in an S3QL file system, the
corresponding objects are downloaded from cloud (if not in the local
cache), and decrypted/uncompressed on the fly.
- Private Cloud Storage
- high performance distributed object storage server, designed for
large-scale private cloud infrastructure. Minio is widely deployed across the
world with over 146.6M+ docker pulls.
Why Object storage?
Object storage is an interesting idea and makes for a much more scalable
system. It removes portions of the file system from the host and pushes them
into the storage subsystem. There are trade-offs here, but by distributing
portions of the file system to multiple endpoints, you distribute the workload,
making the object-based method simpler to scale to much larger storage systems.
Rather than the host operating system needing to worry about block-to-file
mapping, the storage device itself provides this mapping, allowing the host to
operate at the file level.
Object storage systems also provide the ability to query the available
metadata. This provides some additional advantages, because the search
capability can be distributed to the endpoint object systems.
Object storage has made a comeback recently in the area of cloud storage. Cloud
storage providers (which sell storage as a service) represent their storage as
objects instead of the traditional block API. These providers implement APIs
for object transfer, management, and metadata management.
Genesis Distributed Testing
The following are types of tests one can perform on a distributed system:
- Functional Testing is conducted to test whether a system performs as it was
specified or in accordance with formal requirements
- Performance Testing tests the reliability and responsiveness of a system
under different types of conditions and scenarios
- Penetration Testing tests the system for security vulnerabilities
- End-to-End Testing is used to determine whether a system’s process flow
functions as expected
- Fuzzing is used to test how a system responds to unexpected, random, or
invalid data or inputs
Genesis is a versatile testing platform designed to automate the tests listed
above, making it faster and simpler to conduct them on distributed systems
where it was traditionally difficult to do so. Where Performance, End-to-End,
and Functional testing comprise the meat of Genesis’ services, other types of
testing are enabled through the deployment of services and sidecars on the
End-to-End tests can be designed by applying exit code checks for process
completion, success, or failure in tasks and phases, while Performance tests
can be conducted by analyzing data from tests on Genesis that apply a variety
of network conditions and combinations thereof. Functional tests can use a
combination of tasks, phases, supplemental services and sidecars, and network
conditions, among other tools.
These processes and tools are further described in this documentation.
"""Add authentication to applications and secure services with minimum fuss"""
- No need to deal with storing users or authenticating users.
- It's all available out of the box.
- Advanced features such as User Federation, Identity Brokering and Social Login.
Single-Sign On LDAP and Active Directory
Login once to multiple applications Connect to existing user directories
Standard Protocols Social Login
OpenID Connect, OAuth 2.0 Easily enable social login
and SAML 2.0
Identity Brokering Clustering
OpenID Connect or SAML 2.0 IdPs For scalability and availability
High Performance Themes
Lightweight, fast and scalable Customize look and feel
Customize through code
Customize password policies
The Open Group Architecture Framework (TOGAF) is a framework for enterprise
architecture that provides an approach for designing, planning, implementing,
and governing an enterprise information technology architecture. TOGAF is
a high level approach to design. It is typically modeled at four levels:
Business, Application, Data, and Technology. It relies heavily on
modularization, standardization, and already existing, proven technologies and
TOGAF was developed starting 1995 by The Open Group, based on DoD's TAFIM. As
of 2016, The Open Group claims that TOGAF is employed by 80% of Global 50
companies and 60% of Fortune 500 companies.
- Cucumber, ...
End-to-End Request Tracing
Request tracing is the ultimate insight tool. Request tracing tracks operations
inside and across different systems. Practically speaking, this allows
engineers to see the how long an operation took in a web server, database,
application code, or entirely different systems, all presented along a
timeline. Request tracing is especially valuable in distributed systems where a
single transaction (such as “create an account”) spans multiple systems
Problems that Jaeger addresses:
- distributed transaction monitoring
- performance and latency optimization
- root cause analysis
- service dependency analysis
- distributed context propagation
- Software-Defined Storage Product Family.
When Disruptor is not good fit
GraphQL: Facebook-developed language that provides a powerful API to get only
the dataset you need in a single request, seamlessly combining data sources.
The purpose of this monorepo is to give the GraphQL Community:
a to-specification official language service (see: API Docs)
a comprehensive LSP server and CLI service for use with IDEs
a codemirror mode
a monaco mode (in the works)
an example of how to use this ecosystem with GraphiQL.
examples of how to implement or extend GraphiQL.
Simplify app development by combining APIs, databases, and microservices into a
single data graph that you can query with GraphQL
Apache Ambari makes Hadoop cluster provisioning, managing, and monitoring dead simple.
- Run workloads 100x faster.
- Apache Spark achieves high performance for both batch
and streaming data, using a state-of-the-art DAG scheduler,
a query optimizer, and a physical execution engine.
- Ease of Use
- Write applications quickly in Java, Scala, Python, R, and SQL.
- Spark offers over 80 high-level operators that make it easy to
build parallel apps. And you can use it interactively from the
Scala, Python, R, and SQL shells.
- Example PythonBºDataFrame APIº.
|Bºdfº=ºspark.read.jsonº("logs.json") ← automatic schema inference
|Bºdfº.where("age ˃ 21")
- Combine SQL, streaming, and complex analytics.
- Spark powers a stack of libraries including SQL and DataFrames,
MLlib for machine learning, GraphX, and Spark Streaming.
You can combine these libraries seamlessly in the same application.
- Runs Everywhere
- Spark runs on Hadoop, Apache Mesos, Kubernetes, standalone,
or in the cloud. It can access diverse data sources.
- You can run Spark using its standalone cluster mode, on EC2, on Hadoop YARN,
on Mesos, or on Kubernetes. Access data in HDFS, Alluxio, Apache Cassandra,
Apache HBase, Apache Hive, and hundreds of other data sources.
- Common applications for Spark include real-time marketing campaigns, online
product recommendations, cybersecurity analytics and machine log monitoring.
Hadoop "vs" Spark
Hadoop is essentially a distributed data infrastructure:
-It distributes massive data collections across multiple nodes
within a cluster of commodity servers
-It also indexes and keeps track of that data, enabling
big-data processing and analytics far more effectively
than was possible previously.
Spark, on the other hand, is a data-processing tool that operates on those
distributed data collections; it doesn't do distributed storage.
You can use one without the other:
- Hadoop includes not just a storage component, known as the
Hadoop Distributed File System, but also a processing component called
MapReduce, so you don't need Spark to get your processing done.
- Conversely, you can also use Spark without Hadoop. Spark does not come with
its own file management system, though, so it needs to be integrated with one
- if not HDFS, then another cloud-based data platform. Spark was designed for
Hadoop, however, so many agree they're better together.
Spark is generally a lot faster than MapReduce because of the way it processes
data. While MapReduce operates in steps, Spark operates on the whole data set
in one fell swoop:
"The MapReduce workflow looks like this: read data from the cluster, perform
an operation, write results to the cluster, read updated data from the
cluster, perform next operation, write next results to the cluster, etc.,"
explained Kirk Borne, principal data scientist at Booz Allen Hamilton.
Spark, on the other hand, completes the full data analytics operations
in-memory and in near real-time:
"Read data from the cluster, perform all of the requisite analytic
operations, write results to the cluster, done," Borne said.
Spark can be as much as 10 times faster than MapReduce for batch processing and
p to 100 times faster for in-memory analytics, he said.
You may not need Spark's speed. MapReduce's processing style can be just fine
if your data operations and reporting requirements are mostly static and you
can wait for batch-mode processing. But if you need to do analytics on
streaming data, like from sensors on a factory floor, or have applications that
require multiple operations, you probably want to go with Spark.
Most machine-learning algorithms, for example, require multiple operations.
Recovery: different, but still good.
Hadoop is naturally resilient to system faults or failures since data
are written to disk after every operation, but Spark has similar built-in
resiliency by virtue of the fact that its data objects are stored in something
called resilient distributed datasets distributed across the data cluster.
"These data objects can be stored in memory or on disks, and RDD provides full
recovery from faults or failures," Borne pointed out.
Data W.H. on top of Hadoop
- querying and managing utilities for large datasets
residing in distributed storage built on top of Hadoop
- easy data extract/transform/load (ETL)
* a mechanism to impose structure on a variety of data formats*
- access to files stored in Apache HDFS and other data storage
systems such as Apache HBase (TM)
HiveMQ Broker: MQTT+Kafka for IoT
- Interactive Data Visualization
- Visual Programming
- Student's friendly.
- Python Anaconda Friendly
$ conda config --add channels conda-forge
$ conda install orange3
$ conda install -c defaults pyqt=5 qt
- Python pip Friendly
$ pip install orange3
Ex:ºFile Integrity Monitoring at scale: (RSA Conf)º
Auditing log to gain insights at scale:
┌─→ Grafana ─┼─→ Email
Elastic │ └─→ Slack
Search ─┼─→ Kibana
└─→ Pre─processing ─→ TensorFlow
User │ go-audit- User space
land │ container app
───────├───── Netlink ───── Syscall iface ───────────
Kernel │ socket ^
│ ^ |
└─ Kauditd ───┘
Adidas Reference Architecture
ecosystem is our goal. We are not only using one of many available frameworks,
we test them, we analyze our necessities and we choose what has the best
balance between productivity and developer satisfaction.
adidas is moving most of the applications from the desktop to the browser, and
we are improving every day to achieve it in the less time possible:
standardization, good practices, continuous deployment, automated tasks are
part of our daily work.
Our stack has changed a lot since the 2015. Back then we used code which is
better not to speak about. Now, we power our web applications with frameworks
like React and Vue and modern tooling. Our de facto tool for shipping these
applications is webpack.
use its latest features. TypeScript is also a good choice for developing
consistent, highly maintainable applications.
Apiary is our collaborative platform for designing, documenting and governing
API's. It helps us to adopt API first approach with templates and best
practices. Platform is strongly integrated with our API guidelines and helps us
to achieve consistency of all our API's. Apiary also speeds up and simplifies
the development and testing phase by providing API mock services and automation
of API testing.
Mashery is an API platform that delivers key API management capabilities needed
for successful digital transformation initiatives. Its full range of
capabilities includes API creation, packaging, testing, and management of your
API's and community of users. API security is provided via an embedded or
optional on-premise API gateway.
Runscope is our continuous monitoring platform for all our API's we are
exposing. It monitors uptime, performance and validates the data. It is tightly
integrated with Slack and Opsgenie in order to alert and create incident if
things are going wrong way. Using Runscope we can detect any issues with our
API's super fast and act accordingly to achieve highest possible SLA.
Apache Kafka is the core of our Fast Data Platform.
The more we work with it, the more versatile we perceived it was, covering more
use cases apart of event sourcing strategies.
It is designed to implement the pub-sub pattern in a very efficient way,
enabling Data Streaming cases, where multiple can easily subscribe to the same
stream of information. The fan-out is the pub-sub pattern is implemented in a
really efficient way, allowing to achieve high thought in the production and
the consumption part.
It also enables other capabilities like Data Extraction and Data Modeling, as
long as Stateful Event Processing. Together with Storm, Kafka Streams and the
Elastic stack, provides the perfect toolset to integrate our digital
applications in a modern self-service way.
Jenkins has become the de facto standard for continuous integration/continuous
ACID is a 100% docker powered by Kubernetes that allows you to create a Jenkins
as a Service platform. ACID provides an easy way to generate an isolated
Jenkins environment per team while keeping a shared basic configuration and a
central management dashboard.
Updating and maintaining teams instances have never been so easy. Team freedom
provided out of the box, gitops disaster recovery capabilities and elastic
slaves allocation are the key pillars.
SerenityBDD is our default for automation in test solution. Built on top of
Selenium and Cucumber, it enables us to write cleaner and more maintainable
automated acceptance and regression tests in a faster way. With archetypes
available for both frontend (web) and backend (API), it is almost
straightforward to be implemented in most projects.
Supporting a wide type of protocols and application types, Neoload is our tool
of choice for performance testing. It provides an easy to use interface with
powerful recording capabilities, allowing manual implementation as well if the
user chooses. Since it supports dockerized load generators, it makes it easy to
create a shared infrastructure and integrate it in our continuous integration
From the monitoring perspective, we are using different tools trying to provide
internally end-to-end monitoring, for a better troubleshooting.
For system and alerting monitoring, we are using Prometheus, an open source
solution, easy for developers and completely integrated with Kubernetes. For
infrastructure monitoring we have Icinga with hundreds of custom scripts and
graphite as timeseries database. About APM, Instana is our solution, which
makes easy and fast to monitor the applications and identify bottlenecks. We
use Runscope to test and monitor our APIs with continuous schedule to give us
completely visibility into API problems.
Grafana is the visualization tool by default which is integrated with
Prometheus, Instana, Elasticsearch, Graphite, etc.
For logging, we are based on the ELK stack running on-premises and AWS
Kubernetes, with custom components for processing and alerting being able to
notify the problems to the teams.
As the goal of monitoring is the troubleshooting, all these tools have
integration with Opsgenie, where we have defined escalation policies for
alerting and notifications. The incidents can be communicated to the final
users via Statuspage.
At mobile development, we run out of hybrid solutions. Native is where we live.
We work with the latest and most concise, expressive and safe languages
available at the moment for android and iOS mobile phones: Kotlin and Swift.
Our agile mobile development process requires big doses of automation to test,
build and distribute reliably and automatically every single change in the app.
We rely on tools such as Jenkins and Fastlane to automate those steps.
100% Serverless AWS arch
What a typical 100% Serverless Architecture looks like in AWS!
"""""" Facebook had a system in the past for monitoring
that used syslog-ng, but it was less than 60 percent reliable.
In contrast, Owens stated netconsole is highly scalable and can
handle enormous log volume with greater than 99.99 percent
20 billion Events/day
When Ceph is Not Enough
Josh Goldenhar, VP Product Marketing, Lightbits Labs
: There’s a new kid on the “block”
About this webinar
In this 30-minute webinar, we'll discuss the origins of Ceph and why
it's a great solution for highly scalable, capacity optimized storage
pools. You’ll learn how and where Ceph shines but also where its
architectural shortcomings make Ceph a sub-optimal choice for today's
high performance, scale-out databases and other key web-scale
software infrastructure solutions.
Participants will learn:
• The evolution of Ceph
• Ceph applicability to infrastructures such as OpenStack,
OpenShift and other Kubernetes orchestration environments
• Why Ceph can't meet the block storage challenges of modern,
scale-out, distributed databases, analytics and AI/ML workloads:
• Where Cephs falls short on consistent latency response
• Overcoming Ceph’s performance issues during rebuilds
• How you can deploy high performance, low latency block storage in
the same environments Ceph integrates with, alongside your Ceph
Plugins that can extend Logstash's functionality.
logstash-codec-protobuf parsing Protobuf messages
A curated list of awesome projects, libraries, tools, etc. related to InfluxDB
Tools whose primary or sole purpose is to feed data into InfluxDB.
- accelerometer2influx - Android application that takes the x-y-z axis metrics
from your phone accelerometer and sends the data to InfluxDB.
- agento - Client/server collecting near realtime metrics from Linux hosts
- aggregateD - A dogstatsD inspired metrics and event aggregation daemon for
- aprs2influxdb - Interfaces ham radio APRS-IS servers and saves packet data
into an influxdb database
- Charmander - Charmander is a lab environment for measuring and analyzing
- gopherwx - a service that pulls live weather data from a Davis Instruments
Vantage Pro2 station and stores it in InfluxDB
- grade - Track Go benchmark performance over time by storing results in
- Influx-Capacitor - Influx-Capacitor collects metrics from windows machines
using Performance Counters. Data is sent to influxDB to be viewable by grafana
- Influxdb-Powershell - Powershell script to send Windows Performance counters
to an InfluxDB Server
- influxdb-logger - SmartApp to log SmartThings device attributes to an
- influxdb-sqlserver - Collect Microsoft SQL Server metrics for reporting to
InfluxDB and visualize them with Grafana
- marathon-event-metrics - a tool for reporting Marathon events to InfluxDB
- mesos-influxdb-collector - Lightweight mesos stats collector for InfluxDB
- mqforward - MQTT to influxdb forwarder
- node-opcua-logger - Collect industrial data from OPC UA Servers
- ntp_checker - compares internal NTP sources and warns if the offset between
servers exceeds a definable (fraction of) seconds
- proc_to_influxdb - Console app to observe Windows process starts and stops
- pysysinfo_influxdb - Periodically send system information into influxdb (uses
python3 + psutil, so it also works under Windows)
- sysinfo_influxdb - Collect and send system (linux) info to InfluxDB
- snmpcollector - A full featured Generic SNMP data collector with Web
Administration Interface for InfluxDB
- Telegraf - (Official) plugin-driven server agent for reporting metrics into
- tesla-streamer - Streams data from Tesla Model S to InfluxDB (rake task)
- traffic_stats - Acquires and stores statistics about CDNs controlled by
Apache Traffic Control
- vsphere-influxdb-go - Collect VMware vSphere, vCenter and ESXi performance
metrics and send them to InfluxDB
(in distributed databases)
ºFive consistency modelsº natively supported by the A.Cosmos DBºSQL APIº
(SQL API is default API):
- native support for wire protocol-compatible APIs for
popular databases is also provided including
ºMongoDB, Cassandra, Gremlin, and Azure Table storageº.
RºWARN:º These databases don't offer precisely defined consistency
models or SLA-backed guarantees for consistency levels.
They typically provide only a subset of the five consistency
models offered by A.Cosmos DB.
- For SQL API|Gremlin API|Table API default consistency level
configured on theºA.Cosmos DB accountºis used.
Comparative Cassandra vs Cosmos DB:
Cassandra 4.x Cosmos DB Cosmos DB
(multi-region) (single region)
ONE, TWO, THREE Consistent prefix Consistent prefix
LOCAL_ONE Consistent prefix Consistent prefix
QUORUM, ALL, SERIAL Bounded stale.(def) Strong
Strong in Priv.Prev
LOCAL_QUORUM Bounded staleness Strong
LOCAL_SERIAL Bounded staleness Strong
Comparative MongoDB 3.4 vs Cosmos DB
MongoDB 3.4 Cosmos DB Cosmos DB
(multi-region) (single region)
Linearizable Strong Strong
Majority Bounded staleness Strong
Local Consistent prefix Consistent prefix
https://opensource.com/article/20/3/open-source-multi-factor-authentication Open source alternative for multi-factor authentication: privacyIDEA
Chrony (NTP replacement)
Facebook’s Switch from ntpd to chrony for a More Accurate, Scalable NTP Service
Apache Beam provides an advanced unified programming model, allowing
you to implement batch and streaming data processing jobs that can
run on any execution engine.
Allows to execute pipelines on multiple environments such as Apache
Apex, Apache Flink, Apache Spark among others.
Apache Ignite is a high-performance, integrated and distributed
in-memory platform for computing and transacting on large-scale data
sets in real-time, orders of magnitude faster than possible with
traditional disk-based or flash technologies.
- distributed SQL query engine originally developed by Facebook.
- running interactive analytic queries against data sources of
all sizes ranging from gigabytes to petabytes.
- Engine can combine data from multiple sources (RDBMS, No-SQL,
Hadoop) within a single query, and it has little to no performance
degradation running. Being used and developed by big data giants
like, among others, Facebook, Twitter and Netflix, guarantees a
bright future for this tool.
- Designed and written from the ground up for interactive
analytics and approaches the speed of commercial data warehouses
while scaling to the size of organizations like Facebook.
- NOTE: Presto uses Apache Avro to represent data with schemas.
(Avro is "similar" to Google Protobuf/gRPC)
Google Colossus FS
ddbb from scratch
Writing a SQL database from scratch in Go: 3. indexes | notes.eatonphil.com !!!
CERN, cambia el uso de Facebook Workplace por Mattermost y Discourse
Intel Optane DC performance
Through the courtesy of Intel I have access to a machine with 6 TB of Intel
Optane DC Persistent Memory. This is memory that can be used both as
persistent memory in App Direct Mode or simply used as a very large
DRAM in Memory Mode.
Slides for a presentation of this is available at slideshare.net.
This memory can be bigger than DRAM, but has some different characteristics
compared to DRAM. Due to this different characteristics all accesses to this
memory goes through a cache and here the cache is the entire DRAM in the
In the test machine there was a 768 GB DRAM acting as a cache for the
6 TB of persistent memory. When a miss happens in the DRAM cache
one has to go towards the persistent memory instead. The persistent memory
has higher latency and lower throughput. Thus it is important as a programmer
to ensure that your product can work with this new memory.
What one can expect performance-wise is that performance will be similar to
using DRAM as long as the working set is smaller than DRAM. As the working
set grows one expects the performance to drop a bit, but not in a very significant
We tested NDB Cluster using the DBT2 benchmark which is based on the
standard TPC-C benchmark but uses zero latency between transactions in
the benchmark client.
This benchmark has two phases, the first phase loads the data from 32 threads
where each threads loads one warehouse at a time. Each warehouse contains
almost 500.000 rows in a number of tables.
SOAP vs RESTfull vs ...
SOAP vs RESTfull vs JSON-RPC vs gRPC vs Drift
eventql/eventql: Distributed "massively parallel" SQL query engine
SAN vs ...
Storage Area Network, and Other Storage Methods
"...Developed an indexing and search software, mostly in C.
The indexer is multi-threaded and computes a n-gram index
- stored in B-Trees - on hundreds of GB of data generated
every day in production. The associated search engine is also
a multi-threaded, efficient C program."
- In the fields of computational linguistics and probability, an n-gram
is a contiguous sequence of n items from a given sample of text or
speech. The items can be phonemes, syllables, letters, words or base
pairs according to the application. The n-grams typically are
collected from a text or speech corpus. When the items are words,
n-grams may also be called shingles[clarification needed].
Apollo GraphQL Middelware
Apollo Data Graph Platform: A GraphQL Middleware Layer for the Enterprise
Genesis: end-to-end testing
Genesis: An End-To-End Testing & Development Platform For Distributed Systems
Outputting data to collect for later analysis is as simple as writing
a JSON object to stdout. You can review the data we collect from each
test on the test details page in the dashboard, build your own
metrics and analytics dashboards with Kibana, and write your own
custom data analysis and reports using Jupyter Notebooks.
Graph Management Software
- Visualization: Gephi, Cytoscape, LinkUrious
- APIs: Apache TinkerPop,
- JanusGraph, Neo4j, DataStax, MarkLogic, ...
- Processing Systems:
- Spark, Apache Giraph, Oracle Labs PGX,
- Hadoop, Neo4j, Apache HBase, Cassandra.
fallacies of distributed computing
The fallacies are
The network is reliable;
Latency is zero;
Bandwidth is infinite;
The network is secure;
Topology doesn't change;
There is one administrator;
Transport cost is zero;
The network is homogeneous.
The effects of the fallacies
- Software applications are written with little error-handling on
networking errors. During a network outage, such applications may
stall or infinitely wait for an answer packet, permanently consuming
memory or other resources. When the failed network becomes available,
those applications may also fail to retry any stalled operations or
require a (manual) restart.
- Ignorance of network latency, and of the packet loss it can cause,
induces application- and transport-layer developers to allow
unbounded traffic, greatly increasing dropped packets and wasting
- Ignorance of bandwidth limits on the part of traffic senders can
result in bottlenecks.
- Complacency regarding network security results in being blindsided
by malicious users and programs that continually adapt to security
- Changes in network topology can have effects on both bandwidth and
latency issues, and therefore can have similar problems.
- Multiple administrators, as with subnets for rival companies, may
institute conflicting policies of which senders of network traffic
must be aware in order to complete their desired paths.
- The "hidden" costs of building and maintaining a network or subnet
are non-negligible and must consequently be noted in budgets to avoid
- If a system assumes a homogeneous network, then it can lead to the
same problems that result from the first three fallacies.
Jepsen is an effort to improve the safety of distributed databases, queues,
consensus systems, etc. We maintain an open source software library for systems
testing, as well as blog posts and conference talks exploring particular
systems’ failure modes. In each analysis we explore whether the system lives
up to its documentation’s claims, file new bugs, and suggest recommendations
Jepsen pushes vendors to make accurate claims and test their software
rigorously, helps users choose databases and queues that fit their needs, and
teaches engineers how to evaluate distributed systems correctness for