ETL Tools are fundamental these days to an enterprise data workflow process especially as part of data integration. Firstly, data is extracted from external sources. The data is then transformed through a quality assurance process to meet specific needs. The data is then loaded to the target database. With extensive and diverse big data needs, the role of ETL tools has become ever more important for data processing requirements. There are plenty of commercial and open source tools in the market. Sometimes designing one's own solution suffices over a third party option. The below is a list of tools and libraries that may be available open source alternatives with their own unique approaches and limitations. One can also always utilize the cloud especially AWS EMR for same purpose of ETL.
19 July 2014
Open Source ETL
Labels:
big data
,
data science
,
databases
,
Java
,
metadata
,
scala
,
software engineering
,
spring