GCP Dataproc

Log processing—With minimal modification, you can process large amounts of text log data per day from several sources using existing MapReduce.
Reporting—Aggregate data into reports and store the data in BigQuery. Then you can push the aggregate data to applications that power dashboards and conduct analysis.
On-demand Spark clusters—Quickly launch ad-hoc clusters to analyze data is stored in blob storage using Spark (Spark SQL, PySpark, Spark shell).
Machine learning—Use the Spark Machine Learning Libraries (MLlib), which are preinstalled on the cluster, to customize and run classification algorithms.

hadoop, spark, spark sql, hive, pig 可以创建多个cluster，解决冲突问题。无法升级现有cluster可以销毁重建。The most cost-effective and flexible way to migrate your Hadoop system to GCP is to shift away from thinking in terms of large, multi-purpose, persistent clusters and instead think about small, short-lived clusters that are designed to run specific jobs. tailor cluster configurations for individual jobs

sqoop hdfs to relational db oozie = workflow dag stack driver resource type and metrics