Wednesday, August 3, 2016

Elastic Search

View Priya Saini's LinkedIn profile View Priya Saini's profile

Elasticsearch | Search & Analyze Data in Real Time


1.Schema-Free

Elasticsearch allows you to get started fast. Simply index a JSON document and it will automatically detect the data structure and types,  index the data, and make your data searchable.
You also have full control to customize how your data is indexed.

2.Developer-Friendly, RESTful API

Elasticsearch is API driven. Almost any action can be performed using a simple RESTful API using JSON over HTTP. Client libraries are available for many programming languages.

3.Per-Operation Persistence

Elasticsearch puts your data safety first. Document changes are recorded in transaction logs on multiple nodes in the cluster to minimize the chance of any data loss.

4.Apache 2 Open Source License

Elasticsearch can be downloaded, used, and modified free of charge. It is available under the Apache 2 license, one of the most flexible open source licenses available.







5.Build on top of Apache Lucene™

Apache Lucene is a high performance, full-featured Information Retrieval library, written in Java. Elasticsearch uses Lucene internally to build its state of the art distributed search and analytics capabilities.
Build on top of Apache Lucene.Lucene is a most popular java-based full text search index implementation.

6.Basic Concepts 



Document 

A document is a JSON document which is stored in elasticsearch. It is like a row in a table in a relational database. Each document is
stored in an index and has a type and an id. A document is a JSON object (also known in other languages as a hash / hashmap / associative array) which contains zero or
 more fields, or key-value pairs. The original JSON document that is indexed will be stored in the _source field, which is returned by default when getting or searching  for a document.

Field :

 A document contains a list of fields, or key-value pairs. The value can be a simple (scalar) value (eg a string, integer, date), or a nested
  structure like an array or an object. A field is similar to a column in a table in a relational database. The mapping for each field has a field ‘type’ (not to be confused with document type) which indicates the type of data that can be stored in that field, eg integer, string, object. The mapping also allows you to define (amongst other things) how the value for a field should be analyzed.

Mapping :

 A mapping is like a ‘schema definition’ in a relational database.
 Each index has a mapping, which defines each type within the index, plus a number of index-wide settings. A mapping can either be defined explicitly, or it will be generated automatically when a document is indexed.

Cluster 

A cluster consists of one or more nodes which share the same cluster name. Each cluster has a single master node which is chosen automatically by
the cluster and which can be replaced if the current master node fails.

Node :

 A node is a running instance of elasticsearchwhich belongs to a cluster. Multiple nodes can be started on a single server for testing purposes, but usually you should have one node per server. At startup, a node will use unicast (or multicast, if specified) to  discover an existing cluster with the same cluster name and will try to join that cluster.

Index 

An index is like a ‘database’ in a relational database. It has a
  mapping which defines multiple types. An index is a logical namespace which maps to one or more primary shards and can have zero or more replica shards.

Type : 

A type is like a ‘table’ in a relational database. Each type has a list of fields that can be specified for documents of that type. The mapping defines how each field in the document is analyzed.

Shard : 

A shard is a single Lucene instance. It is a low-level “worker” unit which is managed automatically by elasticsearch. An index is a logical
namespace which points to primary and replica shards. Elasticsearch distributes shards amongst all nodes in the cluster, and can move shards automatically from one node to another in the case of node failure, or the addition of new nodes.

Primary Shard 

Each document is stored in a single primary shard. When you index a document, it is indexed first on the primary shard, then on all replicas of the primary shard. By default, an index has 5 primary shards. You can specify fewer or more primary shards to scale the number of documents that your index can handle.

Replica Shard : 

Each primary shard can have zero or more replicas. A replica is a copy of the primary shard, and has two purposes: 1) increase failover: a replica shard can be promoted to a primary shard if the primary fails. 2) increase performance: get and search requests can be handled by primary or replica shards.Identified by index/type/id.


Reference link :

1. https://www.elastic.co/products/elasticsearch
2. http://www.slideshare.net/MayurRathod5/elasticsearch-basic-introduction?next_slideshow=1