Welcome to fabric8-analytics-rudra’s documentation!¶

Indices and tables¶

Data Store and Retrieval from various Storage.

Basic interface to the Amazon S3.

class rudra.data_store.aws.AmazonEmr(*args, **kwargs)[source]¶

Bases: rudra.data_store.aws.AmazonS3

Basic interface to the Amazon EMR.

connect()[source]¶: Connect to the emr instance.

disconnect()[source]¶: Close the connection to S3 database.

get_status(cluster_id)[source]¶: Get the status of EMR Instance.

is_connected()[source]¶: Check if the connection to database has been established.

run_flow(configs)[source]¶: Run emr job flow.

terminate_jobs(jobs)[source]¶: Terminate emr job.

class rudra.data_store.aws.AmazonS3(aws_access_key_id=None, aws_secret_access_key=None, bucket_name=None, region_name=None, use_ssl=False, encryption=None, versioned=None, local_dev=False, endpoint_url=None)[source]¶

Bases: rudra.data_store.abstract_data_store.AbstractDataStore

Basic interface to the Amazon S3.

connect()[source]¶: Connect to the S3 database.

disconnect()[source]¶: Close the connection to S3 database.

get_name()[source]¶: Get name of this object’s bucket.

is_connected()[source]¶: Check if the connection to database has been established.

list_bucket_keys()[source]¶: List all the keys in bucket.

list_bucket_objects(prefix=None)[source]¶: List all the objects in bucket.

load_matlab_multi_matrix(s3_path)[source]¶

Load a ‘.mat’file & return a dict representation.

S3_path: The path of the object in the S3 bucket.
Returns: A dict containing numpy matrices against the keys of the multi-matrix.

object_exists(object_key)[source]¶: Check if the there is an object with the given key in bucket, does only HEAD request.

read_generic_file(filename)[source]¶: Retrieve remote object content.

read_json_file(filename)[source]¶: Read JSON file from the S3 bucket.

read_pickle_file(filename)[source]¶: Read Pickle file from the S3 bucket.

read_yaml_file(filename)[source]¶: Read Yaml file from the S3 bucket.

s3_clean_bucket()[source]¶: Clean the bucket.

s3_delete_object(object_key)[source]¶: Delete a object in bucket.

s3_delete_objects(object_keys)[source]¶: Delete a object in bucket.

s3_upload_folder(folder_path, prefix='')[source]¶

Upload(Sync) a folder to S3.

Folder_path: The local path of the folder to upload to s3
Prefix: The prefix to attach to the folder path in the S3 bucket

store_blob(blob, object_key)[source]¶: Store blob onto S3.

upload_file(src, target)[source]¶: Upload file into S3 Bucket.

write_json_file(filename, contents)[source]¶: Write JSON file into S3 bucket.

write_pickle_file(filename, contents)[source]¶: Write Pickle file into S3 bucket.

exception rudra.data_store.aws.NotFoundAccessKeySecret[source]¶

Bases: Exception

Exception for invalid AWS secret/key.

Local data_store interface.

class rudra.data_store.local_data_store.LocalDataStore(src_dir)[source]¶

Bases: rudra.data_store.abstract_data_store.AbstractDataStore

Wrapper on local filesystem, API similar to s3DataStore.

get_name()[source]¶: Return name of local filesystem root dir.

load_matlab_multi_matrix(local_filename)[source]¶

Load a ‘.mat’file & return a dict representation.

Local_filename: The path of the object.
Returns: A dict containing numpy matrices against the keys of the multi-matrix.

read_generic_file(filename)[source]¶: Read a file and return its contents.

read_json_file(filename)[source]¶: Read JSON file from the data_input source.

read_pickle_file(filename)[source]¶: Read Pickle file from the data_input source.

read_yaml_file(filename)[source]¶: Read Yaml file from the data_input source.

upload_file()[source]¶: Upload file to a data store.

write_json_file()[source]¶: Write json file to data store.

Google Bigquery data collection implementation.

Implementation Bigquery builder base.

class rudra.data_store.bigquery.base.BigqueryBuilder(query_job_config=None)[source]¶

Bases: object

BigqueryBuilder class Implementation.

get_result(job_id=None, job_query_obj=None)[source]¶: Get the result of the job.

get_status(job_id)[source]¶: Get the job status of async query.

run_query_async()[source]¶: Run the bigquery asynchronously.

run_query_sync()[source]¶: Run the bigquery synchronously.

class rudra.data_store.bigquery.base.DataProcessing(s3_client=None)[source]¶

Bases: object

Process the Bigquery Data.

update_s3_bucket(data, bucket_name, filename='collated.json')[source]¶: Upload s3 bucket.

Maven bigquery implementation.

class rudra.data_store.bigquery.maven_bigquery.MavenBQDataProcessing(big_query_instance=None, s3_client=None, file_name='collated.json')[source]¶

Bases: rudra.data_store.bigquery.base.DataProcessing

Implementation data processing for maven bigquery.

construct_packages(content)[source]¶: Construct package list.

process()[source]¶: Process Maven Bigquery response data.

class rudra.data_store.bigquery.maven_bigquery.MavenBigQuery(*args, **kwargs)[source]¶

Bases: rudra.data_store.bigquery.base.BigqueryBuilder

MavenBigQuery Implementation.

Npm bigquery implementation.

class rudra.data_store.bigquery.npm_bigquery.NpmBQDataProcessing(big_query_instance=None, s3_client=None, file_name='collated.json')[source]¶

Bases: rudra.data_store.bigquery.base.DataProcessing

Implementation data processing for npm bigquery.

construct_packages(content)[source]¶: Construct package from content.

static handle_corrupt_packagejson(content)[source]¶: Find dependencies from corrupted/invalid package.json.

process()[source]¶: Process Npm Bigquery response data.

class rudra.data_store.bigquery.npm_bigquery.NpmBigQuery(*args, **kwargs)[source]¶

Bases: rudra.data_store.bigquery.base.BigqueryBuilder

NpmBigQuery Implementation.

Deployments scripts.

EMR Deployments.

class rudra.deployments.emr_scripts.MavenEMR[source]¶

Bases: rudra.deployments.emr_scripts.emr_script_builder.EMRScriptBuilder

Maven Emr script implementation.

ecosystem = 'maven'¶

run_job(input_dict)[source]¶: Run the emr job.

class rudra.deployments.emr_scripts.NpmEMR[source]¶

Bases: rudra.deployments.emr_scripts.emr_script_builder.EMRScriptBuilder

NPM Emr script implementation.

ecosystem = 'npm'¶

run_job(input_dict)[source]¶: Run the emr job.

class rudra.deployments.emr_scripts.PyPiEMR[source]¶

Bases: rudra.deployments.emr_scripts.emr_script_builder.EMRScriptBuilder

PyPi Emr script implementation.

ecosystem = 'pypi'¶

run_job(input_dict)[source]¶: Run the emr job.

Configurations for EMR instance.

class rudra.deployments.emr_scripts.emr_config.EMRConfig(name, log_uri, ecosystem, s3_bootstrap_uri, training_repo_url, training_file_name='training/train.py', release_label='emr-5.10.0', instance_count=1, instance_type='m3.xlarge', applications=[{'Name': 'MXNet'}], visible_to_all_users=True, job_flow_role='EMR_EC2_DefaultRole', service_role='EMR_DefaultRole', properties={}, hyper_params='{}')[source]¶

Bases: object

Config class for EMR.

get_config()[source]¶: Get the config object.

home_dir = '/home/hadoop'¶

EMR script builder implementation.

class rudra.deployments.emr_scripts.emr_script_builder.EMRScriptBuilder[source]¶

Bases: rudra.deployments.emr_scripts.abstract_emr.AbstractEMR

EMR Script implementation.

construct_job(input_dict)[source]¶: Submit emr job.

run_job(input_dict)[source]¶: Run the emr job.

EMR script implementation for the Maven service.

class rudra.deployments.emr_scripts.maven_emr.MavenEMR[source]¶

Bases: rudra.deployments.emr_scripts.emr_script_builder.EMRScriptBuilder

Maven Emr script implementation.

ecosystem = 'maven'¶

run_job(input_dict)[source]¶: Run the emr job.

EMR script implementation for the NPM service.

class rudra.deployments.emr_scripts.npm_emr.NpmEMR[source]¶

Bases: rudra.deployments.emr_scripts.emr_script_builder.EMRScriptBuilder

NPM Emr script implementation.

ecosystem = 'npm'¶

run_job(input_dict)[source]¶: Run the emr job.

EMR script implementation for the PYPI service.

class rudra.deployments.emr_scripts.pypi_emr.PyPiEMR[source]¶

Bases: rudra.deployments.emr_scripts.emr_script_builder.EMRScriptBuilder

PyPi Emr script implementation.

ecosystem = 'pypi'¶

run_job(input_dict)[source]¶: Run the emr job.

Package for various utils function.

Validation Utility module.

class rudra.utils.validation.BQValidation[source]¶

Bases: object

Add validation for ecosystems.

validate_pypi(content)[source]¶

Validate python packages.

Attributes:

content (str or [str] or {str}):: list/set of packages or package str

Returns:

[str]: list of valid packages.

Raises:

ValueError: if content is not a type of str or list

rudra.utils.validation.check_field_exists(input_data, fields)[source]¶: Check field exist in the input data.

rudra.utils.validation.check_url_alive(url, accept_codes=[401])[source]¶: Validate github repo exist or not.

rudra.utils.validation.nn(name)[source]¶: Return a normalized name.

Utility helper functions.

class rudra.utils.helper.CacheDict(max_len=1024)[source]¶

Bases: object

CacheDict implementation with max size limit.

rudra.utils.helper.get_github_repo_info(repo_url)[source]¶: Get the github repository information.

rudra.utils.helper.get_training_file_url(user, repo, branch='master', training_file_path='training/train.py')[source]¶: Get the training file from the github repo.

rudra.utils.helper.load_hyper_params()[source]¶: Load the hyper parameter from the command line args.

Mercator: implementation of dependencies finder.

class rudra.utils.mercator.SimpleMercator(content)[source]¶

Bases: object

SimpleMercator Implementation.

class Dependency(dep)[source]¶

Bases: object

Dependency class Implementation.

get_dependencies()[source]¶: Get the list dependencies.

static handle_corrupt_pom(content)[source]¶: Try to find the dependencies in corrupt/invalid pom.