Extract, Transform, Load

In computing, Extract, Transform and Load (ETL) refers to a process in database usage and especially in data warehousing that: Extracts data from homogeneous or heterogeneous data sources. Transforms the data for storing it in proper format or structure for querying and analysis purpose.
Article on Wikipedia

Pentaho Data Integration

Formerly called Kettle, Pentaho Data Integration (Community) is an open source tool writtten in Java that enables you to setup workflows for ETL processes. You can 'extract' data from databases, CSV, streams, 'transform' data using lookups, joins, aggregations and 'load' data into various databases.

Managing Data Integration Flows

If you are building a complex system that integrates data, processes it and visualises it you often need a piece of software called a messaging server to manage the flows of data. This is especially true for multiple sensor data inputs arriving as streams. There are many enterprise message brokers and some open source ones. RabbitMQ and ActiveMQ are two open source messaging servers. An elegant lightweight alternative is ZeroMQ and a powerful scalable alternative that was developed internally by LinkedIn is Apache Kafka. This well-written article describes the design of Kafka.

The important concept used to manage data flows here is the publish-subscribe mechanism. In general, you have topics and queues and you have consumers and producers. There are some variations like durable subscribers to topics, etc. A topic broadcasts all messages published to it to all its active consumers. Whereas when a consumer consumes a message from a queue, it is no longer in the queue and any other consumer will not receive the message. If a consumer is offline it does not receive messages from the topic (these messages are lost) except in the case of durable subscribers, whereas for a queue, if a consumer is offline, the message remains in the queue till it is consumed when the consumer comes back online.