DeepTCR.DeepTCR
DeepTCR_base
__init__(self, Name, max_length=40, device=0, tf_verbosity=3)
special
Initialize Training Object.
Initializes object and sets initial parameters.
All DeepTCR algorithms begin with initializing a training object. This object will contain all methods, data, and results during the training process. One can extract learned features, per-sequence predictions, among other outputs from DeepTCR and use those in their own analyses as well.
This method is included in the three main DeepTCR objects:
- DeepTCR_U (unsupervised)
- DeepTCR_SS (supervised sequence classifier/regressor)
- DeepTCR_WF (supervised repertoire classifier/regressor)
Parameters: |
|
---|
Get_Data(self, directory, Load_Prev_Data=False, classes=None, type_of_data_cut='Fraction_Response', data_cut=1.0, n_jobs=40, aa_column_alpha=None, aa_column_beta=None, count_column=None, sep='\t', aggregate_by_aa=True, v_alpha_column=None, j_alpha_column=None, v_beta_column=None, j_beta_column=None, d_beta_column=None, p=None, hla=None, use_hla_supertype=False, keep_non_supertype_alleles=False)
Get Data for DeepTCR
Parse Data into appropriate inputs for neural network from directories where data is stored.
This method can be used when your data is stored in directories and you want to load it from directoreis into DeepTCR. This method takes care of all pre-processing of the data including:
- Combining all CDR3 sequences with the same nucleotide sequence (optional).
- Removing any sequences with non-IUPAC characters.
- Removing any sequences that are longer than the max_length set when initializing the training object.
- Determining how much of the data per file to use (type_of_data_cut)
- Whether to use HLA/HLA-supertypes during training.
This method is included in the three main DeepTCR objects:
- DeepTCR_U (unsupervised)
- DeepTCR_SS (supervised sequence classifier/regressor)
- DeepTCR_WF (supervised repertoire classifier/regressor)
Parameters: |
|
---|
Returns: |
|
---|
Load_Data(self, alpha_sequences=None, beta_sequences=None, v_beta=None, d_beta=None, j_beta=None, v_alpha=None, j_alpha=None, class_labels=None, sample_labels=None, freq=None, counts=None, Y=None, p=None, hla=None, use_hla_supertype=False, keep_non_supertype_alleles=False, w=None)
Load Data programatically into DeepTCR.
DeepTCR allows direct user input of sequence data for DeepTCR analysis. By using this method, a user can load numpy arrays with relevant TCRSeq data for analysis.
Tip: One can load data with the Get_Data command from directories and then reload it into another DeepTCR object with the Load_Data command. This can be useful, for example, if you have different labels you want to train to, and you need to change the label programatically between training each model. In this case, one can load the data first with the Get_Data method and then assign the labels pythonically before feeding them into the DeepTCR object with the Load_Data method.
Of note, this method DOES NOT combine sequences with the same amino acid sequence. Therefore, if one wants this, one should first do it programatically before feeding the data into DeepTCR with this method.
Another special use case of this method would be for any type of regression task (sequence or repertoire models). In the case that a per-sequence value is fed into DeepTCR (with Y), this value either becomes the per-sequence regression value or the average of all Y over a sample becomes the per-sample regression value. This is another case where one might want to load data with the Get_Data method and then reload it into DeepTCR with regression values.
This method is included in the three main DeepTCR objects:
- DeepTCR_U (unsupervised)
- DeepTCR_SS (supervised sequence classifier/regressor)
- DeepTCR_WF (supervised repertoire classifier/regressor)
Parameters: |
|
---|
Returns: |
|
---|
Sequence_Inference(self, alpha_sequences=None, beta_sequences=None, v_beta=None, d_beta=None, j_beta=None, v_alpha=None, j_alpha=None, p=None, hla=None, batch_size=10000, models=None, return_dist=False)
Predicting outputs of sequence models on new data
This method allows a user to take a pre-trained autoencoder/sequence classifier and generate outputs from the model on new data. For the autoencoder, this returns the features from the latent space. For the sequence classifier, it is the probability of belonging to each class.
In the case that multiple models have been trained via MC or K-fold Cross-Validation strategy for the sequence classifier, this method can use some or all trained models in an ensemble fashion to provide the average prediction per sequence as well as the distribution of predictions from all trained models.
This method is included in the two sequence DeepTCR objects:
- DeepTCR_U (unsupervised)
- DeepTCR_SS (supervised sequence classifier/regressor)
Parameters: |
|
---|
Returns: |
|
---|
DeepTCR_S_base
AUC_Curve(self, by=None, filename='AUC.tif', title=None, title_font=None, plot=True, diag_line=True, xtick_size=None, ytick_size=None, xlabel_size=None, ylabel_size=None, legend_font_size=None, frameon=True, legend_loc='lower right', figsize=None, set='test', color_dict=None)
AUC Curve for both Sequence and Repertoire/Sample Classifiers
Parameters: |
|
---|
Returns: |
|
---|
Representative_Sequences(self, top_seq=10, motif_seq=5, make_seq_logos=True, color_scheme='weblogo_protein', logo_file_format='.eps')
Identify most highly predicted sequences for each class and corresponding motifs.
This method allows the user to query which sequences were most predicted to belong to a given class along with the motifs that were learned for these representative sequences. Of note, this method only reports sequences that were in the test set so as not to return highly predicted sequences that were over-fit in the training set. To obtain the highest predicted sequences in all the data, run a K-fold cross-validation or Monte-Carlo cross-validation before running this method. In this way, the predicted probability will have been assigned to a sequence only when it was in the independent test set.
In the case of a regression task, the representative sequences for the 'high' and 'low' values for the regression model are returned in the Rep_Seq Dict.
This method will also determine motifs the network has learned that are highly associated with the label through multi-nomial linear regression and creates seq logos and fasta files in the results folder. Within a folder for a given class, the motifs are sorted by their linear coefficient. The coefficient is in the file name (i.e. 0_0.125_feature_2.eps reflects the the 0th highest feature with a coefficient of 0.125.
Parameters: |
|
---|
Returns: |
|
---|
Furthermore, the motifs are written in the results directory underneath the Motifs folder. To find the beta motifs for a given class, look under Motifs/beta/class_name/. These fasta/logo files are labeled by the linear coefficient of that given feature for that given class followed by the number name of the feature. These fasta files can then be visualized via weblogos at the following site: "https://weblogo.berkeley.edu/logo.cgi" or are present in the folder for direct visualization.
Residue_Sensitivity_Logo(self, alpha_sequences=None, beta_sequences=None, v_beta=None, d_beta=None, j_beta=None, v_alpha=None, j_alpha=None, hla=None, p=None, batch_size=10000, models=None, figsize=(10, 8), low_color='red', medium_color='white', high_color='blue', font_name='serif', class_sel=None, cmap=None, min_size=0.0, edgecolor='black', edgewidth=0.25, background_color='white', Load_Prev_Data=False, norm_to_seq=True)
Create Residue Sensitivity Logos
This method allows the user to create Residue Sensitivity Logos where a set of provided sequences is perturbed to assess for position of the CDR3 sequence that if altered, would change the predicted specificity or affinity of the sequence (depending on whether training classification or regression task).
Residue Sensitivity Logos can be created from any supervised model (including sequence and repertoire classifiers). Following the training of one of these models, one can feed into this method an cdr3 sequence defined by all/any of alpha/beta cdr3 sequence, V/D/J gene usage, and HLA context within which the TCR was seen.
The output is a logo created by LogoMaker where the size of the character denotes how sensitive this position is to perturbation and color denotes the consequences of changes at this site. As default, red coloration means changes at this site would generally decrease the predicted value and blue coloration means changes at this site would increase the predicted value.
Parameters: |
|
---|
Returns: |
|
---|
SRCC(self, s=10, kde=False, title=None)
Spearman's Rank Correlation Coefficient Plot
In the case one is doing a regression-based model for the sequence classiifer, one can plot the predicted vs actual labeled value with this method. The method returns a plot for the regression and a value of the correlation coefficient.
Parameters: |
|
---|
Returns: |
|
---|
DeepTCR_SS
Get_Train_Valid_Test(self, test_size=0.25, LOO=None, split_by_sample=False, combine_train_valid=False)
Train/Valid/Test Splits.
Divide data for train, valid, test set. Training is used to train model parameters, validation is used to set early stopping, and test acts as blackbox independent test set.
Parameters: |
|
---|
K_Fold_CrossVal(self, folds=None, split_by_sample=False, combine_train_valid=False, seeds=None, kernel=5, trainable_embedding=True, embedding_dim_aa=64, embedding_dim_genes=48, embedding_dim_hla=12, num_fc_layers=0, units_fc=12, weight_by_class=False, class_weights=None, use_only_seq=False, use_only_gene=False, use_only_hla=False, size_of_net='medium', graph_seed=None, drop_out_rate=0.0, multisample_dropout=False, multisample_dropout_rate=0.5, multisample_dropout_num_masks=64, batch_size=1000, epochs_min=10, stop_criterion=0.001, stop_criterion_window=10, accuracy_min=None, train_loss_min=None, hinge_loss_t=0.0, convergence='validation', learning_rate=0.001, suppress_output=False, batch_seed=None)
K_Fold Cross-Validation for Single-Sequence Classifier
If the number of sequences is small but training the single-sequence classifier, one can use K_Fold Cross Validation to train on all but one before assessing predictive performance.After this method is run, the AUC_Curve method can be run to assess the overall performance.
The method also saves the per sequence predictions at the end of training in the variable self.predicted. These per sequenes predictions are only assessed when the sequences are in the test set.
The multisample parameters are used to implement Multi-Sample Dropout at the final layer of the model as described in "Multi-Sample Dropout for Accelerated Training and Better Generalization" https://arxiv.org/abs/1905.09788. This method has been shown to improve generalization of deep neural networks as well as inmprove convergence.
Parameters: |
|
---|
Monte_Carlo_CrossVal(self, folds=5, test_size=0.25, LOO=None, split_by_sample=False, combine_train_valid=False, seeds=None, kernel=5, trainable_embedding=True, embedding_dim_aa=64, embedding_dim_genes=48, embedding_dim_hla=12, num_fc_layers=0, units_fc=12, weight_by_class=False, class_weights=None, use_only_seq=False, use_only_gene=False, use_only_hla=False, size_of_net='medium', graph_seed=None, drop_out_rate=0.0, multisample_dropout=False, multisample_dropout_rate=0.5, multisample_dropout_num_masks=64, batch_size=1000, epochs_min=10, stop_criterion=0.001, stop_criterion_window=10, accuracy_min=None, train_loss_min=None, hinge_loss_t=0.0, convergence='validation', learning_rate=0.001, suppress_output=False, batch_seed=None)
Monte Carlo Cross-Validation for Single-Sequence Classifier
If the number of sequences is small but training the single-sequence classifier, one can use Monte Carlo Cross Validation to train a number of iterations before assessing predictive performance.After this method is run, the AUC_Curve method can be run to assess the overall performance.
The method also saves the per sequence predictions at the end of training in the variable self.predicted. These per sequenes predictions are only assessed when the sequences are in the test set. Ideally, after running the classifier with multiple folds, each sequencce will have multiple predicttions that were collected when they were in the test set.
The multisample parameters are used to implement Multi-Sample Dropout at the final layer of the model as described in "Multi-Sample Dropout for Accelerated Training and Better Generalization" https://arxiv.org/abs/1905.09788. This method has been shown to improve generalization of deep neural networks as well as inmprove convergence.
Parameters: |
|
---|
Train(self, kernel=5, trainable_embedding=True, embedding_dim_aa=64, embedding_dim_genes=48, embedding_dim_hla=12, num_fc_layers=0, units_fc=12, weight_by_class=False, class_weights=None, use_only_seq=False, use_only_gene=False, use_only_hla=False, size_of_net='medium', graph_seed=None, drop_out_rate=0.0, multisample_dropout=False, multisample_dropout_rate=0.5, multisample_dropout_num_masks=64, batch_size=1000, epochs_min=10, stop_criterion=0.001, stop_criterion_window=10, accuracy_min=None, train_loss_min=None, hinge_loss_t=0.0, convergence='validation', learning_rate=0.001, suppress_output=False, batch_seed=None)
Train Single-Sequence Classifier
This method trains the network and saves features values at the end of training for downstream analysis.
The method also saves the per sequence predictions at the end of training in the variable self.predicted
The multiesample parameters are used to implement Multi-Sample Dropout at the final layer of the model as described in "Multi-Sample Dropout for Accelerated Training and Better Generalization" https://arxiv.org/abs/1905.09788. This method has been shown to improve generalization of deep neural networks as well as improve convergence.
Parameters: |
|
---|
DeepTCR_U
KNN_Repertoire_Classifier(self, folds=5, distance_metric='KL', sample=None, n_jobs=1, plot_metrics=False, plot_type='violin', by_class=False, Load_Prev_Data=False, metrics=['Recall', 'Precision', 'F1_Score', 'AUC'])
K-Nearest Neighbor Repertoire Classifier
This method uses a K-Nearest Neighbor Classifier to assess the ability to predict a repertoire label given the structural distribution of the repertoire.The method returns AUC,Precision,Recall, and F1 Scores for all classes.
Parameters: |
|
---|
Returns: |
|
---|
KNN_Sequence_Classifier(self, folds=5, k_values=[1, 26, 51, 76, 101, 126, 151, 176, 201, 226, 251, 276, 301, 326, 351, 376, 401, 426, 451, 476], rep=5, plot_metrics=False, by_class=False, plot_type='violin', metrics=['Recall', 'Precision', 'F1_Score', 'AUC'], n_jobs=1, Load_Prev_Data=False)
K-Nearest Neighbor Sequence Classifier
This method uses a K-Nearest Neighbor Classifier to assess the ability to predict a sequence label given its sequence features.The method returns AUC,Precision,Recall, and F1 Scores for all classes.
Parameters: |
|
---|
Returns: |
|
---|
Train_VAE(self, latent_dim=256, kernel=5, trainable_embedding=True, embedding_dim_aa=64, embedding_dim_genes=48, embedding_dim_hla=12, use_only_seq=False, use_only_gene=False, use_only_hla=False, size_of_net='medium', latent_alpha=0.001, sparsity_alpha=None, var_explained=None, graph_seed=None, batch_size=10000, epochs_min=0, stop_criterion=0.01, stop_criterion_window=30, accuracy_min=None, suppress_output=False, learning_rate=0.001, split_seed=None, Load_Prev_Data=False)
Train Variational Autoencoder (VAE)
This method trains the network and saves features values for sequences for a variety of downstream analyses that can either be done within the DeepTCR framework or by the user by simplying extracting out the learned representations.
Parameters: |
|
---|
Returns: |
|
---|
DeepTCR_WF
Get_Train_Valid_Test(self, test_size=0.25, LOO=None, combine_train_valid=False, random_perm=False)
Train/Valid/Test Splits.
Divide data for train, valid, test set. In general, training is used to train model parameters, validation is used to set early stopping, and test acts as blackbox independent test set.
Parameters: |
|
---|
K_Fold_CrossVal(self, folds=None, combine_train_valid=False, seeds=None, kernel=5, num_concepts=12, trainable_embedding=True, embedding_dim_aa=64, embedding_dim_genes=48, embedding_dim_hla=12, num_fc_layers=0, units_fc=12, weight_by_class=False, class_weights=None, use_only_seq=False, use_only_gene=False, use_only_hla=False, size_of_net='medium', graph_seed=None, qualitative_agg=True, quantitative_agg=False, num_agg_layers=0, units_agg=12, drop_out_rate=0.0, multisample_dropout=False, multisample_dropout_rate=0.5, multisample_dropout_num_masks=64, batch_size=25, batch_size_update=None, epochs_min=25, stop_criterion=0.25, stop_criterion_window=10, accuracy_min=None, train_loss_min=None, hinge_loss_t=0.0, convergence='validation', learning_rate=0.001, suppress_output=False, loss_criteria='mean', l2_reg=0.0, batch_seed=None, subsample=None, subsample_by_freq=False, subsample_valid_test=False)
K_Fold Cross-Validation for Whole Sample Classifier
If the number of samples is small but training the whole sample classifier, one can use K_Fold Cross Validation to train on all but one before assessing predictive performance.After this method is run, the AUC_Curve method can be run to assess the overall performance.
The two MIL options alter how the predictive signatures in the neural network are aggregated to make a prediction about the repertoire. If qualitative_agg or quantitative_agg are set to True, this will include these different types of aggregation in the predcitions. One can set either to True or both to True and this will allow a user to incorporate features from multiple modes of aggregation. See below for further details on these methods of aggregation across the sequences of a repertoire.
The multiesample parameters are used to implement Multi-Sample Dropout at the final layer of the model as described in "Multi-Sample Dropout for Accelerated Training and Better Generalization" https://arxiv.org/abs/1905.09788. This method has been shown to improve generalization of deep neural networks as well as improve convergence.
Parameters: |
|
---|
Monte_Carlo_CrossVal(self, folds=5, test_size=0.25, LOO=None, combine_train_valid=False, random_perm=False, seeds=None, kernel=5, num_concepts=12, trainable_embedding=True, embedding_dim_aa=64, embedding_dim_genes=48, embedding_dim_hla=12, num_fc_layers=0, units_fc=12, weight_by_class=False, class_weights=None, use_only_seq=False, use_only_gene=False, use_only_hla=False, size_of_net='medium', graph_seed=None, qualitative_agg=True, quantitative_agg=False, num_agg_layers=0, units_agg=12, drop_out_rate=0.0, multisample_dropout=False, multisample_dropout_rate=0.5, multisample_dropout_num_masks=64, batch_size=25, batch_size_update=None, epochs_min=25, stop_criterion=0.25, stop_criterion_window=10, accuracy_min=None, train_loss_min=None, hinge_loss_t=0.0, convergence='validation', learning_rate=0.001, suppress_output=False, loss_criteria='mean', l2_reg=0.0, batch_seed=None, subsample=None, subsample_by_freq=False, subsample_valid_test=False)
Monte Carlo Cross-Validation for Whole Sample Classifier
If the number of samples is small but training the whole sample classifier, one can use Monte Carlo Cross Validation to train a number of iterations before assessing predictive performance.After this method is run, the AUC_Curve method can be run to assess the overall performance.
The two MIL options alter how the predictive signatures in the neural network are aggregated to make a prediction about the repertoire. If qualitative_agg or quantitative_agg are set to True, this will include these different types of aggregation in the predcitions. One can set either to True or both to True and this will allow a user to incorporate features from multiple modes of aggregation. See below for further details on these methods of aggregation across the sequences of a repertoire.
The multiesample parameters are used to implement Multi-Sample Dropout at the final layer of the model as described in "Multi-Sample Dropout for Accelerated Training and Better Generalization" https://arxiv.org/abs/1905.09788. This method has been shown to improve generalization of deep neural networks as well as improve convergence.
Parameters: |
|
---|
Returns: |
|
---|
Sample_Inference(self, sample_labels=None, alpha_sequences=None, beta_sequences=None, v_beta=None, d_beta=None, j_beta=None, v_alpha=None, j_alpha=None, p=None, hla=None, freq=None, counts=None, batch_size=10, models=None, return_dist=False)
Predicting outputs of sample/repertoire model on new data
This method allows a user to take a pre-trained sample/repertoire classifier and generate outputs from the model on new data. This will return predicted probabilites for the given classes for the new data. If the model has been trained in Monte-Carlo or K-Fold Cross-Validation, there will be a model created for each iteration of the cross-validation. if the 'models' parameter is left as None, this method will conduct inference for all models trained in cross-validation and output the average predicted value per sample along with the distribution of predictions for futher downstream use. For example, by looking at the distribution of predictions for a given sample over all models trained, one can determine which samples have a high level of certainty in their predictions versus those with lower level of certainty. In essense, by training a multiple models in cross-validation schemes, this can allow the user to generate a distribution of predictions on a per-sample basis which provides a better understanding of the prediction. Alternatively, one can choose to fill in the the models parameter with a list of models the user wants to use for inference.
To load data from directories, one can use the Get_Data method from the base class to automatically format the data into the proper format to be then input into this method.
One can also use this method to get per-sequence predictions from the sample/repertoire classifier. To do this, provide all inputs except for sample_labels. The method will then return an array of dimensionality [N,n_classes] where N is the number of sequences provided. When using the method in this way, be sure to change the batch_size is adjusted to a larger value as 10 sequences per batch will be rather slow. We recommend changing into the order of thousands (i.e. 10 - 100k).
Parameters: |
|
---|
Returns: |
|
---|
If sample_labels is not provided, the method will perform per-sequence predictions and will return the an array of [N,n_classes]. If return_dist is set to True, the method will return two outputs. One containing the mean predictions and the other containing the full distribution over all models.
Train(self, kernel=5, num_concepts=12, trainable_embedding=True, embedding_dim_aa=64, embedding_dim_genes=48, embedding_dim_hla=12, num_fc_layers=0, units_fc=12, weight_by_class=False, class_weights=None, use_only_seq=False, use_only_gene=False, use_only_hla=False, size_of_net='medium', graph_seed=None, qualitative_agg=True, quantitative_agg=False, num_agg_layers=0, units_agg=12, drop_out_rate=0.0, multisample_dropout=False, multisample_dropout_rate=0.5, multisample_dropout_num_masks=64, batch_size=25, batch_size_update=None, epochs_min=25, stop_criterion=0.25, stop_criterion_window=10, accuracy_min=None, train_loss_min=None, hinge_loss_t=0.0, convergence='validation', learning_rate=0.001, suppress_output=False, loss_criteria='mean', l2_reg=0.0, batch_seed=None, subsample=None, subsample_by_freq=False, subsample_valid_test=False)
Train Whole-Sample Classifier
This method trains the network and saves features values at the end of training for downstream analysis.
The method also saves the per sequence predictions at the end of training in the variable self.predicted
The two MIL options alter how the predictive signatures in the neural network are aggregated to make a prediction about the repertoire. If qualitative_agg or quantitative_agg are set to True, this will include these different types of aggregation in the predcitions. One can set either to True or both to True and this will allow a user to incorporate features from multiple modes of aggregation. See below for further details on these methods of aggregation across the sequences of a repertoire.
The multiesample parameters are used to implement Multi-Sample Dropout at the final layer of the model as described in "Multi-Sample Dropout for Accelerated Training and Better Generalization" https://arxiv.org/abs/1905.09788. This method has been shown to improve generalization of deep neural networks as well as improve convergence.
Parameters: |
|
---|
feature_analytics_class
Cluster(self, set='all', clustering_method='phenograph', t=None, criterion='distance', linkage_method='ward', write_to_sheets=False, sample=None, n_jobs=1, order_by_linkage=False)
Clustering Sequences by Latent Features
This method clusters all sequences by learned latent features from either the variational autoencoder or by the supervised methods. Several clustering algorithms are included including Phenograph, DBSCAN, or hierarchical clustering. DBSCAN is implemented from the sklearn package. Hierarchical clustering is implemented from the scipy package.
Parameters: |
|
---|
Returns: |
|
---|
Motif_Identification(self, group, p_val_threshold=0.05, by_samples=False, top_seq=10)
Motif Identification Supervised Classifiers
This method looks for enriched features in the predetermined group and returns fasta files in directory to be used with "https://weblogo.berkeley.edu/logo.cgi" to produce seqlogos.
Parameters: |
|
---|
Returns: |
|
---|
Sample_Features(self, set='all', Weight_by_Freq=True)
Sample-Level Feature Values
This method returns a dataframe with the aggregate sample level features.
Parameters: |
|
---|
Returns: |
|
---|
Structural_Diversity(self, sample=None, n_jobs=1)
Structural Diversity Measurements
This method first clusters sequences via the phenograph algorithm before computing the number of clusters and entropy of the data over these clusters to obtain a measurement of the structural diversity within a repertoire.
Parameters: |
|
---|
Returns: |
|
---|
vis_class
HeatMap_Samples(self, set='all', filename='Heatmap_Samples.tif', Weight_by_Freq=True, color_dict=None, labels=True, font_scale=1.0, figsize=(12, 10), legend=False, legend_size=10)
HeatMap of Samples
This method creates a heatmap/clustermap for samples by latent features for the unsupervised deep learning methods.
Parameters: |
|
---|
Returns: |
|
---|
HeatMap_Sequences(self, set='all', filename='Heatmap_Sequences.tif', sample_num=None, sample_num_per_class=None, color_dict=None, figsize=(12, 10), legend=False, legend_size=10)
HeatMap of Sequences
This method creates a heatmap/clustermap for sequences by latent features for the unsupervised deep lerning methods.
Parameters: |
|
---|
Repertoire_Dendrogram(self, set='all', distance_metric='KL', sample=None, n_jobs=1, color_dict=None, dendrogram_radius=0.32, repertoire_radius=0.4, linkage_method='ward', gridsize=24, Load_Prev_Data=False, filename=None, sample_labels=False, gaussian_sigma=0.5, vmax=0.01, n_pad=5, lw=None, log_scale=False)
Repertoire Dendrogram
This method creates a visualization that shows and compares the distribution of the sample repertoires via UMAP and provided distance metric. The underlying algorithm first applied phenograph clustering to determine the proportions of the sample within a given cluster. Then a distance metric is used to compare how far two samples are based on their cluster proportions. Various metrics can be provided here such as KL-divergence, Correlation, and Euclidean.
Parameters: |
|
---|
Returns: |
|
---|
UMAP_Plot(self, set='all', by_class=False, by_cluster=False, by_sample=False, freq_weight=False, show_legend=True, scale=100, Load_Prev_Data=False, alpha=1.0, sample=None, sample_per_class=None, filename=None, prob_plot=None, plot_by_class=False)
UMAP visualization of TCR Sequences
This method displays the sequences in a 2-dimensional UMAP where the user can color code points by class label, sample label, or prior computing clustering solution. Size of points can also be made to be proportional to frequency of sequence within sample.
Parameters: |
|
---|
UMAP_Plot_Samples(self, set='all', filename='UMAP_Samples.tif', Weight_by_Freq=True, scale=5, alpha=1.0)
UMAP visualization of TCR Samples
This method displays the samples in a 2-dimensional UMAP
Parameters: |
|
---|