Big Data Developer – Data Lake
Cambridge, MA
6 Months with Possible extension
Description:
• Implementation and Administration of On-prem Data lake environment
• Monitoring and managing the Hadoop services on 3 clusters
• Installing the New hosts (Head nodes, compute nodes and worker nodes to the existing cluster) and decommission of the hosts from the cluster
• Maintenance and Monitoring of the jobs of Production, UAT and Development environments
• Code changes and updated code deployments in the UAT and Production environments
• Deploying code changes on Rshiny server and Rstudio server as per the user request
• Implementation and Monitoring of oozie scheduled jobs
• Implementation of patching activities and applying the fixes to the data lake environment provided by the Hortonworks
• Working on the job failures mostly Hive and Spark jobs across the data lake environment
• Onboarding the new users to the Hadoop data lake environment
• Requirements gathering for creating the databases in Hive and providing policy based access management from the Ranger for the new Proof of Concepts (POCs) like Veeva Insights
• Supporting the developers for executing the adhoc jobs in Hive environments for the existing POCs like enrollment_forecaster etc
• HDFS home directories and Hive schema, table and column level enforcing access bases policies management from Ranger
• Implementation of Security and management of Active Directory based Kerberos authentication across data lake clusters
• Implementation of SSL for the Ambari and other HDP services in Hortonworks environment across the data lake clusters
• Management of Encryption and Decryption of the users data using Ranger-KMS across the clusters of data lake environment
• Installation and upgradation of Jupyterhub and python packages to support the developers for implementing the code in on-prem environments
• working with HPC team for hardware issues and allocation of physical resources for the data lake environment
• Hail- Spark implementation and analysis of UKBIOBANK datasets of genotypes and Phenotypes
• Installation of latest version of spark and hail and optimization of Resources for launching datasets with huge size of data
• Work with Hortonworks team for the planned upgradation of HDP version from 2.6 to 3.0
• Support and maintenance of MongoDB servers in data lake
• Source code Repository maintenance in Bitbucket
In addition to the above tasks, the resource will also perform the following AWS activities
• Support of Cloudbreak server in AWS for the Hortonworks CB Deployment • Support of software upgrades for Cloudbreak,HDP packages installation in AWS Cluster
• Support Data scientists for any technical issues during the execution of Spark-Hail jobs in Cloudbreak AWS cluster
Setup of latest versions of Spark and Hail in AWS spark cluster
--
Thanks and regards
Shankar Allamsetti
Phone :281-823-9222 Ext: 517| Fax : 281-823-9225 |
Email:
shankar.allamsetti@3sbc.com || G-Talk: shankar3sbc
****Best way to reach me through email****
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)