Facing the Clinical Trial Annotation Problem on Breast Cancer: Natural Language Processing & Machine Learning Models Selection

Pablo Eliseo Reynoso-Aguirre; Pedro Flores-Pérez

doi:10.12691/jcsa-12-1-3

Article Versions

Export Article

Cite this article

Normal Style
MLA Style
APA Style
Chicago Style

Research Article

Open Access Peer-reviewed

Facing the Clinical Trial Annotation Problem on Breast Cancer: Natural Language Processing & Machine Learning Models Selection

Pablo Eliseo Reynoso-Aguirre, Pedro Flores-Pérez

Journal of Computer Sciences and Applications. 2024, 12(1), 17-24. DOI: 10.12691/jcsa-12-1-3

Received July 15, 2024; Revised August 18, 2024; Accepted August 25, 2024

Abstract

Clinical trial classification problem (CTCP) is one of the cutting-edge real-life applications in biomedical informatics, especially in the domain considered in this paper, namely breast cancer. The task consists in the development of models able to discriminate patient’s eligibility profile at breast cancer trials based on performance status (PS) labels. The task has gained relevance at medical research and practice in the framework of decision support systems. Besides, the task has been considered a meaningful instrument for an accurate selection of participants at experimentations resulting in no health-behavioral drug side effects on participants.

Keywords: ECOG KPS performance status eligibility criteria clinical trial classification multinomial linear regression multinomial naive bayes multilayer perceptron support vector machines

1. Introduction

Now a day, biomedical informatics has gained a high importance through real life applications and academic research (see ¹ for a clear overview of such applications). Focusing on clinical trial (CT) of breast cancer, the National Institute of Health (NIH) lists biomedical applications related to breast cancer clinical trials ². Each study’s protocol in CT ³ has guidelines for who can or cannot participate in the study. These guide- lines, called eligibility criteria (EC), describe characteristics that must be shared by all participants. They may include age, gender, medical history, and current health status. EC for treatment studies often seek a particular type and stage of cancer in patients. Facing the problems involved in these applications consider the following implications:

• The complex and non-unified genres variety in which biomedical information is represented, including: electronic heath reports (EHR), medical publications, medical blogs and social media, drug leaflets, clinical trial reports (CTR), textual in- formation included in ontologies ^{4, 5, 6} and knowledge bases. Author’s profiles include medical doctors, nurses, radiologists, pharmacologists, scientists, and lay people.

• The difficulty of extracting, normalizing, and classifying medical entities as drugs, diseases, medical findings, anatomical elements, and other medical-related linguistic patterns (doses, formulas, quantities, units), etc.

All these implications are presented in CT, documents written for human use. These files frequently contain un-precise information un-useful for medical decision support (MDS) ⁷. As part of them, EC present the same issues.

An important task, related to MDS CT, is the automatic computing of score status of a patient given the textual context on EC content from CTs. PS, a metric to evaluate prospective patient stage of cancer is contained in most of the EC text in trials. Details related to the CT Annotation Problem ⁸ are described in the following subsections:

1.1. Performance Status Scales & Ranges

According to the literature ⁹, PS has been considered as the standardized metric to Asses EC in terms of clinical trial EC attribute. PS eases tracking of patient’s treatment evolution and unifies researching analysis through different institutions and countries. Different scales can represent PS: Eastern Oncology Group (ECOG) ¹⁰, Karnofsky performance status (KPS) ¹¹, and Lansky performance status (LPS) (a particular case of KPS for oncological children’s studies). These scales describe the stage of cancer of a patient based on their daily physical-behavioral signs. PS scales equivalences ⁹ and descriptions are found in Table 1.

Table 1. ECOG-KPS scale equivalences and patients profile description
Tables index
View option
Full Size Next Table

1.2. Data Source

Experimental data for model’s inference learning/evaluation was obtained from Clinical Trials Gov. (USA) ¹². Dataset origin sources, descriptors related to CT XML files are described by Tables 2 & 3 respectively.

Table 2. CT studies distribution of cancer/breast cancer among U.S. and Non-U.S. countries
Tables index
View option
Full Size Previous Table Next Table

Table 3. KPS & ECOG data samples/features distribution
Tables index
View option
Full Size Previous Table Next Table

1.3. KPS Class Distribution

For class distribution, KPS scale is featured as main scoring in the problem overviewed based on the scale intrinsic granularity in comparison to ECOG scale. Equivalence among scales was done according to PS scales/ranges (see Table 1). Initial distributional analysis of the labels resulted in a class high imbalance for both KPS min, max values. Based on complexity of data and prediction goals, solving approach splits annotation in 2 learning tasks, predicting CT KPS min, max in a separate way.

PNG
Larger image(png format)
View option
Figures index
NEW
Larger figure in new window
NEXT
View next figure
Figure 1. Class Distribution of Min KPS from CT EC KPS Ranges

PNG
Larger image(png format)
View option
Figures index
NEW
Larger figure in new window
PREV
View previous figure
NEXT
View next figure
Figure 2. Class Distribution of Max KPS from CT EC KPS Ranges

2. Related Work

The work in ¹³ considers KPS as EC classification scale approach at CT breast cancer patients profile annotation. Study proposes an algorithm which uses a minimum of two and a maximum of three questions to facilitate an adequate and efficient evaluation of the CT KPS score. According to the authors, the system obtained an average good performance for this type of application. However, their CT classifier suffers from synonymy, polysemy and fuzziness by its framework constrained text nature. Besides, their framework is not capable to classify CT in a range of PS scores as CT real PS scores; it is constrained to single value classification. Finally, their prototype has built in a static learning arquitechture not induced by data. The proposed approach in this work faces most of the drawbacks described in this section.

According to ¹⁴, CTR text mining study. This work considers processing CT texts with NCBO annotator. In ¹⁵ ExaCT, researches comprise user assistance locating and extracting key trial characteristics (e.g., EC, sample size, drug dosage, primary outcomes) from full-text journal articles reporting on randomized controlled trials are presented. In ¹⁶, research faces the problem of extracting EC. Since EC are represented as free text, their automatic interpretation and the evaluation of patient eligibility is challenging. Processing approach is based on the identification of contextual patterns and semantic concepts that together define the machine-interpretable meaning. In ¹⁷, study presents a system working on cancer vaccines CT, enabling rapid extraction of information about institutions, diseases, clinical approaches, clinical trials dates to obtain predominant cancer types in the trials, clinical opportunities and pharmaceutical market coverage.

There is a former work on the Clinical Trial Classification Problem (CTCP) task overviewed in this article considering the same CT data samples ¹⁸. The initial solving approach considered a multivariate regression modeling to forecast min & max PS scores of a given CT. The aim consists in finding a useful correlation among KPS scores and the CT eligibility criteria clinical terms. The experiment considered the following generic tasks ^{19, 20, 21}:

• PS Scoring Extraction (Regular Expressions)

• Data Cleansing (XML Tag Removal, Tokenization)

• Text Normalization (Stopwords Removal, Lemmatization)

• Text Vectorization (Term Frequency - Inverse Document Frequency)

• Features Projections (Single Value Decomposition)

• Extra Considerations (Problematic Samples Removal, Prediction Refining, ECOG PS Predictions),

• Linear / Non-linear Models Tuning (Partial Least Squares, Multilayer Perceptron Regressor)

The final reported results considering 4024 highly imbalanced-labeled CTs from 8107 breast cancer CT (clinicaltrials.gov - NIH US National Library of Medicine):

Table 4. Model classification performance comparison among PLS/MLP in terms of 1 R² and MSE scores using 10-Fold Cross Validation learning framework
Tables index
View option
Full Size Previous Table Next Table

Findings on Table 4, suggested both PLS, MLP models, achieved weak classification performance in terms of their R² values ²² for min [0.1116, 0.1049] and max [0.0312, 0.0505] since typical scores considered for pure science fields required the following condition R²[0.5, 0.75] for pure science according to ²³. Experimentation results denote learning performance high dependency with data representations e.g., complex clinical terms combinations as bigrams, trigrams and a tendency of better generalization on linear models than non-linear approaches.

3. Solving Approach

The solving approach in this work for the problem proposed at ⁸ considers a multi classification inference on data with the following confusion matrix and statistical metrics related:

Table 5. Confusion Matrix for Classification Tasks
Tables index
View option
Full Size Previous Table Next Table

On classification algorithms considerations, since number of samples in data (#samples < 50, 000) the ideal approach to avoid over fitting and computational effort on inference on this work are classical machine learning models e.g.: Multinomial Linear Regression (MLR), Multinomial Naive Bayes (MNB), Multilayer Perceptron (MLP) ²⁴, Support Vector Machines (SVM). Deep learning models are not considered in the classification task.

Table 6. Statistical Metrics for Classification Tasks based on Confusion Matrix
Tables index
View option
Full Size Previous Table Next Table

4. Experiments

In this section is described the experimentation done at the different stages of the inference analysis. To begin with, a single multi label classification algorithm (MNB) is implemented in the initial experimental stages, to continue evaluating learning correlations among clinical textual features and KPS labels. All experimental stages ^{25, 26} consider a 5-K Fold Cross Validation. Different multi classification algorithms as MLR, MLP, SVM are implemented in the advanced stages in order to selected the best possible approach in order to maximize the True Positive (TP), True Negative (TN) for every label of KPS range considered in the trials.

4.1. Class Distribution Balancing

As it was seen in Figures 1 & 2, both KPS range limits of CT have a high imbalance distribution, particularly on max variable; therefore, we proceed using a sampling approach. Since number of samples in minority classes is very low, oversampling seems to be an appropriate framework to tackle the problem. For this task, implementation consider the most generic technique, RandomOverSampler, oversampling minority classes occurrences up to number of occurrences of majority class (with replacement, without adding noise to the samples copies) for each classification takes KPS _min & KPS _max. In this stage only MNB is considered to observe the classification outcome based on statistic descriptors.

Table 7. Imbalance vs. RandomOverSampler KPS _min classification comparisons on MNB model, TF-IDF (1,1)
Tables index
View option
Full Size Previous Table Next Table

PNG
Larger image(png format)
View option
Figures index
NEW
Larger figure in new window
PREV
View previous figure
NEXT
View next figure
Figure 3. MNB TF-IDF (1,1) on Imbalance KPS_min

PNG
Larger image(png format)
View option
Figures index
NEW
Larger figure in new window
PREV
View previous figure
NEXT
View next figure
Figure 4. MNB TF-IDF (1,1) on RandomOverSampler KPS_min

As it can be seen in Table 7 & Table 8, no sampling on imbalance data may seem to achieve better classification results. However, Youden J statistic reflects how generalization of the models differentiate among the different n classes predicted for either KPS _min, KPS _max. The higher the Youden J statistic value, the better generalization we obtain in the model to predict all the different classes of KPS range. Therefore, RandomOverSampler simple assumption to sample up the number of samples in majority class seem to heal the imbalance problem, and helps the model inference to escape from over fitting the majority class in variable distribution.

Table 8. Imbalance vs. RandomOverSampler KPS _max classification comparisons on MNB model, TF-IDF (1,1)
Tables index
View option
Full Size Previous Table Next Table

PNG
Larger image(png format)
View option
Figures index
NEW
Larger figure in new window
PREV
View previous figure
NEXT
View next figure
Figure 5. MNB TF-IDF (1,1) on Imbalance KPS_max

PNG
Larger image(png format)
View option
Figures index
NEW
Larger figure in new window
PREV
View previous figure
NEXT
View next figure
Figure 6. MNB TF-IDF (1,1) on RandomOverSampler KPS_max

4.2. Feature Extraction (Weighting)

An important part of model’s inference is the extraction of features for training the models. Moreover, tasks that involve natural language text require a text embedding (Word2Vec, Sentence2Vec, and Doc2Vec) as numerical matrixes to be a valid input data for Classical Machine Learning algorithms. Deep learning approaches consider methods as Keras Embedding’s and Bert that automatically calculate text embedding’s using initial weights on the input layer of the networks. In this experimentation phase we consider a Doc2Vec type of embedding to find relationships among CT XML documents based on document words similarities.

Doc2Vec can be implemented by different ways of weighting: CountVectorizer, TFIDF Vectorizer. CountVectorizer, recalls for frequency of word in a given document from CORPUS, while TfidfVectorizer, considers a special weight combining frequency of word in a given document times a normalizing factor of how common the term is for all documents in CORPUS overall. Doc2Vec columns, Bag of Words (BOW) are represented by universe of words in all documents of CORPUS. Bag of Words (BOW) may contain mono-grams, bi-grams, tri-grams, n-grams or combination of them if needed. Doc2Vec rows represent the documents in the CORPUS. After Doc2Vec textual model is implemented on CT documents, the numerical representation of text is known as document term matrix (DTM) (see Figure 7). In this stage only MNB is considered to observe the classification outcome based on statistic descriptors.

PNG
Larger image(png format)
View option
Figures index
NEW
Larger figure in new window
PREV
View previous figure
NEXT
View next figure
Figure 7. CT CORPUS representation as Document Term Matrix using TfidfVectorizer

Table 9. TF-IDF vs. Count rep. (1,1) for KPS _min Oversampling classification comparisons on MNB model
Tables index
View option
Full Size Previous Table Next Table

PNG
Larger image(png format)
View option
Figures index
NEW
Larger figure in new window
PREV
View previous figure
NEXT
View next figure
Figure 8. MNB TF-IDF (1,1) on RandomOverSampler- KPS_min

PNG
Larger image(png format)
View option
Figures index
NEW
Larger figure in new window
PREV
View previous figure
NEXT
View next figure
Figure 9. MNB Count (1,1) on RandomOverSampler- KPS_max

Table 10. TF-IDF vs. Count rep. (1,1) for KPS_max Oversampling classification comparisons on MNB model
Tables index
View option
Full Size Previous Table Next Table

PNG
Larger image(png format)
View option
Figures index
NEW
Larger figure in new window
PREV
View previous figure
NEXT
View next figure
Figure 10. MNB TF-IDF (1,1) on RandomOverSampler KPS_max

PNG
Larger image(png format)
View option
Figures index
NEW
Larger figure in new window
PREV
View previous figure
Figure 11. MNB Count (1,1) on RandomOverSampler KPS_max

4.3. Trial Text Preprocessing

After experimenting on feature extraction weighting to finding a useful numerical representation of features, a text preprocessing ²⁷ stage is considered to boost the feature extraction approaching different techniques of text normalization. All approaches involve Tokenization, and subsequent NLP preprocessing methods as: Stopwords Removal, Stemming Algorithm, and English Lemmatization Processes. In this stage only MNB is considered to observe the classification outcome based on statistic descriptors on Count Feature Extraction for both KPS _min, and KPS _max.

Table 11. Text Pre Processing Comparisons for Count ngram (1,1) representation on MNB KPS _min model
Tables index
View option
Full Size Previous Table Next Table

Table 12. Text Pre Processing Comparisons for Count ngram (1,1) representation on MNB KPS _max model
Tables index
View option
Full Size Previous Table Next Table

Different types of preprocessing approaches for normalizing text before extracting numerical features on Multinomial Naive Bayes (MNB) suggested that for that specific algorithm text normalizations result in better classification results by performing Stopwords Removal and Stemming Chunking to posterior extract features by a Count Weighting. The experimentation was extended to implementing all different text normalization approaches, both Count & TF IDF feature weighting, and different ngram combinations for other ML supervised learning algorithms (default parameters) as: Multinomial Logistic Regression (MLR), Support Vector Machines (SVM) and Multilayer Perceptron Neural Network (MLP). The extended experimental results are shown in Tables 13, 14.

Table 13. Text Pre Processing Comparisons for Count ngram representations on ML KPS _min models
Tables index
View option
Full Size Previous Table Next Table

Table 14. Text Pre Processing Comparisons for Count ngram representations on ML KPS _max models
Tables index
View option
Full Size Previous Table Next Table

4.4. Feature Selection (Mapping)

In terms of data projections, SVD algorithm ²⁸ was considered to project existing numerical features to a more separable space (latent variables) ^{29, 30}. SVD have been proved as good for experiments as this one with sparse data (has non-numerical nature - text vectorization) and curse of dimensionality issues e.g. (# f features > #samples). In this experiment feature mapping consider different numbers of meta-features for data projections [100, 150, 200, 250, 300]^{, , , ,} on the results obtained for every text preprocessing, feature weighting, ngram settings and algorithms on previous stage. After analyzing classification results in which ngram features are projected into a more compact dimensional space, we obtained the following results for the best SVD configuration on every algorithm prediction for KPS _min, KPS _max:

Table 15. SVD Feature Selection Comparisons for Count ngram (1,3) representations on ML KPS _min models
Tables index
View option
Full Size Previous Table Next Table

Table 16. SVD Feature Selection Comparisons for Count ngram (1,3) representations on ML KPS _max models
Tables index
View option
Full Size Previous Table Next Table

After comparing the results of Trial Text Preprocessing from Tables 13 & 14 with the results of Feature Selection (Mapping) of Tables 15 & 16 respectively we observed that statistical metrics did not improve, therefore the SVD feature projections does not seem to be an efficient approach to boost classification performance metrics (Accuracy, F1- Score, Youden J Statistic) scoring.

4.5. Models Tuning

In the following section experimentation related to model hyper parameters tuning we consider a RandomizedSearchCV approach from a model selection framework to explore different combinations of parameters values in order to find settings that optimize classification performance metrics from former stages.

• On Multinomial Naive Bayes, hyper-parameters and settings considered for testing are: alpha in [0.005, 5.000], class prior = none, fit prior = True.

• On Multinomial Logistic Regression, hyper-parameters and settings considered for testing are: penalty = L2, C in [0.005, 6.000], tol in [0.0001, 0.2000], dual = False, solver in [lbfgs, sag, saga], multi class in [ovr, multinomial, auto].

• On Support Vector Machines, hyper-parameters and settings considered for testing are: penalty = L2, C in [0.005, 10.000], tol in [0.0001, 0.2000], dual in [True, False], max iter in [1, 10]^{1, 10}, multi class = ovr, random state = 0, loss = squared hinge

• On Multilayer Perceptron, hyper-parameters and settings considered for testing are: activation in [identity, logistic, tanh, relu], hidden layer sizes in [(5, 1), (10, 1), (15, 1), (20, 1), (25, 1), (50, 1), (100, 1), (200, 1)], solver = lbfgs, max iter in [10, 25, 50, 100]^{10, 25, ,}.

After trying different combinations of model hyper-parameters along generic (up-to-majority) class oversampling, different text preprocessing, different feature extraction, different n-gram representations and different feature selection (mapping), the following results were obtained:

Table 17. Hyper-parameters Algorithms Tuning Comparisons for Countngram (1,3) representations on ML KPS_min models
Tables index
View option
Full Size Previous Table Next Table

Table 18. Hyper-parameters Algorithms Tuning Comparisons for Count ngram (1,3) representations on ML KPS_max models
Tables index
View option
Full Size Previous Table Next Table

4.6. Additional Considerations: Sampling Tuning

After all the experimentation performed in previous stages to find relevant results, we performed additional considerations to maximize the accuracy results and learning generalization by adjusting sampling framework on both KPS _min & KPS _max. The class distribution balancing considered oversampling of minority classes on difference percentages ranges [5%-25%] in relation with majority class ³¹. This implementation only considered monograms (1,1) features in order to avoid The Curse of Dimensionality Problem, since sampling tuning considered between 4-6 times less samples than sampling strategy on Section 4.1:

Table 19. Sampling Tuning Comparisons for Count ngram (1,1) representations on ML KPS _min models
Tables index
View option
Full Size Previous Table Next Table

Table 20. Sampling Tuning Comparisons for Count ngram (1,1) representations on ML KPS _max models
Tables index
View option
Full Size Previous Table Next Table

In the results found we can observe an improvement on KPS _max classification performance metrics. However, KPS _min seem to generalize better on n-gram [(1,2), (1,3)] features data representations than monogram representations.

5. Results

After all the experimentation performed in previous stages to find relevant results, the best generalization found for the models (MNB, MLR, SVM, MLP) on KPS _min & KPS _max annotation tasks are:

Table 21. Final Performance Comparisons for Count ngram (1,3) representations on ML KPS _min models
Tables index
View option
Full Size Previous Table Next Table

Table 22. Final Performance Comparisons for Count ngram (1,1) representations on ML KPS _max models
Tables index
View option
Full Size Previous Table

6. Conclusions

After analyzing final learning performance final results, we can observe the following key points:

• Both KPS _min & KPS _max generalization perform better on Count Vectorization (frequency weighting) is considered to build Document Term Matrixes.

• Both KPS _min & KPS _max generalization perform better on Stopwords Removal Text Pre Processing.

• KPS _min Annotation Task has a better generalization performance when minority classes oversampled up to 100% majority class framework is considered to heal data imbalance, and combinations of n-grams (single, two, three) features frequencies are considered as feature extraction.

• KPS _max Annotation Task has a better generalization performance when minority classes oversampled up to [5% - 25%] majority class framework is considered to heal data imbalance, and monograms (single word) feature frequencies are considered as feature extraction.

• On KPS _max Annotation Task, best learning performance found is:

1. Class Imbalance: minority classes Oversampling to 15% of majority class.

2. Text Pre Processing: Stopwords Removal.

3. Feature Extraction: Count Vectorizer (1,1) mono-grams.

4. Model: Multilayer Perceptron.

5. Settings: MLPClassifier (activation=’relu’, hidden layer sizes = (100,1), alpha=0.0001, tol = 0.0001, learning rate =’constant’, solver =’adam’, max iter = 200).

6. Accuracy: 0.9374, F1-Score: 0.9387, Youden-J(Informedness): 0.9525.

• On KPS _min Annotation Task, best learning performance found is:

1. Class Imbalance: minority classes Oversampling to 100% of majority class.

2. Text Pre Processing: Stopwords Removal.

3. Feature Extraction: Count Vectorizer (1,3) mono-grams, bi-grams, tri-grams.

4. Model: Multinomial Naive Bayes.

5. Settings: Multinomial NB (alpha = 0.0000000001, class prior = None, fit prior = True).

6. Accuracy: 0.9070, F1-Score: 0.8754, Youden-J(Informedness): 0.8141

• The best decision support models for annotation of trials found after all of the experimentation done in different stages seem to be: MLPClassifier for KPS _max, MultinomialNB for KPS _min achieving multi-class accuracy scores of 0.9374 & 0.9070 respectively.

References

[1]	Demner-Fushman D., Chapman WW., McDonald CJ. What can natural language processing do for clinical decision support? Journal of Biomedical Informatics, Number 42, Vol. 5 (2009).
	In article	View Article PubMed

[2]	National Institute of Health. Breast Cancer Clinical Trials. (2017)
	In article

[3]	Clinical Trials Governmental Organization. Protocol Registration Data Element Definitions for Interventional and Observational Studies. http://prsinfo.clinicaltrials.gov/definitions.html, (2017).
	In article

[4]	Melnikov M., Vorobkalov P. Metrics in Ontologies in the Medical Domain. (2014).
	In article

[5]	Jain J., Kumari A., Somvanshi P., Grover A., Pai S., Sunil S. In silico analysis of natural compounds targeting structural and nonstructural proteins of chikungunya virus. F1000Research, Number 1, Vol. 1, (2017).
	In article	View Article PubMed

[6]	National Institutes of Health. BioPortal Ontology. https://bioportal.bioontology.org/ontologies, (2011).
	In article

[7]	Goodwin TR., Harabagiu SM. Medical Question Answering for Clinical Decision Support. Processing ACM Interantional Conference Information Knowledge Management, Number 1, Vol. 1, Pages = 297- 306, (2016).
	In article	View Article PubMed

[8]	Medbravo Barcelona. MedBravo Programming Interview Task. https://stackoverflow.com/jobs, (2015).
	In article

[9]	Ecog-Acrin Organization. ECOG Performance Status Specifications. http://ecog- acrin.org/resources/ecog-performance-status, (2017).
	In article

[10]	Zubrod, Charles G. et al. Appraisal of methods for the study of chemotherapy of cancer in man: Comparative therapeutic trial of nitrogen mustard and triethylene thiophosphoramide. Journal of Clinical Epidemiology, Number 1, Vol. 11, Pages = 7-33, (1960).
	In article	View Article

[11]	Karnofsky D., Burchenal J. Evaluation of chemotherapeutic agents: The clinical evaluation of chemotherapeutic agents in cancer. Evaluation of Chemotherapeutic Agents, Number 1, Vol. 11, Pages = 191-205, (1949).
	In article

[12]	National Institute of Health, ClincalTrial.org. Clinical Trials XML Data Finder. https://clinicaltrials.gov, (2018).
	In article

[13]	Peus D., Newcomb N., Hofer S. Appraisal of the Karnofsky Performance Status and proposal of a simple algorithmic system for its evaluation. BMC Medical Informatics and Decision Making, Number 1, Vol. 13, Pages = 1-7, (2013).
	In article	View Article PubMed

[14]	P. M. Rodda Text Mining: Automatic Retrieval, Annotation and Visualisation of Clinical Trials Text using Ontology. Master thesis. University of Manchester (2010).
	In article

[15]	Kiritchenko, S., de Bruijn, B., Carini, S., Martin, J., Sim, I. ExaCT: automatic extraction of clinical trial characteristics from journal publications. BMC Medical Informatics and Decision Making, Number 10, Vol. 56, (2010).
	In article	View Article PubMed

[16]	Millian et al. Eligibility Criteria Text Extraction. (2013).
	In article

[17]	Cao X., Maloney K., Brusic V. Data mining of cancer vaccine trials, a bird’s eye view. Immunome Research 2008, Number 4, Vol. 7, (2008).
	In article	View Article PubMed

[18]	Reynoso-Aguirre P., Rodriguez-Hontoria H., Belanche Mun˜oz Ll. (2018). Natural Language Processing and Machine Learning Techniques to Solve a Breast Can- cer Clinical Trial ECOG-Classification Problem (Master’s Thesis). Retrieved from https:// upcommons.upc.edu/bitstream/handle/2117/118759/131668.pdf.
	In article

[19]	Anderson P., Thor A., Benik J., Raschid L., Vidal. ME. PAnG: finding patterns in annotation graphs. SIGMOD Conference, (2012).
	In article	View Article

[20]	Cotik V., Rodriguez H., Vivaldi J. Semantic tagging of French medical entities using distant learning. (2015).
	In article

[21]	Vivaldi J., Rodrguez H. Using Wikipedia for term extraction in the biomedical domain: first experience. In Procesamiento del Lenguaje Natural 45, Number 1, Vol. 1, Pages = 251-254, (2011).
	In article

[22]	OConnor B. R2 is rescaled mean squared error. (2009).
	In article

[23]	Hiar J., Ringle C., Sarstedt M. Partial Least Squares Structural Equation Modeling: Rigorous Applica- tions, Better Results and Higher Acceptance. Long Range Planning, Number 1-2, Vol. 46 (2013).
	In article	View Article

[24]	Ruineihart D., Hint. G., Williams R. Learning Internal Representations by Error Propagation. Parallel Distributed Processing: Explorations in the Micro structure of Cognition, Number 1, Vol. 1, Pages = 1-33, (1985).
	In article

[25]	Raschka, S. Python Machine Learning. Packt Publishing, ISBN: 9781783555130, (2015).
	In article

[26]	Pedregosa F. et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, Number 1, Vol. 12, Pages = 2825–2830, (2011).
	In article

[27]	Yetisgen M., Gunn M., Xia F., Payne T. A text processing pipeline to extract recommendations from radiology reports. Journal of Biomedical Informatics, Number 2, Vol. 46, Pages = 354-362, (2013).
	In article	View Article PubMed

[28]	Jia Y. Singular Value Decomposition. (2017).
	In article

[29]	Wold H. Path models with latent variables: The NIPALS approach. Quantitative sociology: International perspectives on mathematical and statistical modeling, Number 1, Vol. 1, Pages = 307-357, (1975).
	In article	View Article

[30]	Landauer T., Foltz P., Laham D. An Introduction to Latent Semantic Analysis. (1998).
	In article	View Article

[31]	Albisua I., Arbelaitz O., Gurrutxaga I., Lasargueren A., Muguerza J., M. Perez J. The quest for the op- timal class distribution: an approach for enhancing the effectiveness of learning via resampling methods for imbalanced data sets 2008, Number 2, Vol. 45, (2013).
	In article	View Article

This work is licensed under a Creative Commons Attribution 4.0 International License. To view a copy of this license, visit https://creativecommons.org/licenses/by/4.0/

Cite this article:

Normal Style

Pablo Eliseo Reynoso-Aguirre, Pedro Flores-Pérez. Facing the Clinical Trial Annotation Problem on Breast Cancer: Natural Language Processing & Machine Learning Models Selection. Journal of Computer Sciences and Applications. Vol. 12, No. 1, 2024, pp 17-24. https://pubs.sciepub.com/jcsa/12/1/3

MLA Style

Reynoso-Aguirre, Pablo Eliseo, and Pedro Flores-Pérez. "Facing the Clinical Trial Annotation Problem on Breast Cancer: Natural Language Processing & Machine Learning Models Selection." Journal of Computer Sciences and Applications 12.1 (2024): 17-24.

APA Style

Reynoso-Aguirre, P. E. , & Flores-Pérez, P. (2024). Facing the Clinical Trial Annotation Problem on Breast Cancer: Natural Language Processing & Machine Learning Models Selection. Journal of Computer Sciences and Applications, 12(1), 17-24.

Chicago Style

Like this article()

Figure 1. Class Distribution of Min KPS from CT EC KPS Ranges
View in article
Full Size Figure

Figure 2. Class Distribution of Max KPS from CT EC KPS Ranges
View in article
Full Size Figure

Figure 3. MNB TF-IDF (1,1) on Imbalance KPS_min
View in article
Full Size Figure

Figure 4. MNB TF-IDF (1,1) on RandomOverSampler KPS_min
View in article
Full Size Figure

Figure 5. MNB TF-IDF (1,1) on Imbalance KPS_max
View in article
Full Size Figure

Figure 6. MNB TF-IDF (1,1) on RandomOverSampler KPS_max
View in article
Full Size Figure

Figure 7. CT CORPUS representation as Document Term Matrix using TfidfVectorizer
View in article
Full Size Figure

Figure 8. MNB TF-IDF (1,1) on RandomOverSampler- KPS_min
View in article
Full Size Figure

Figure 9. MNB Count (1,1) on RandomOverSampler- KPS_max
View in article
Full Size Figure

Figure 10. MNB TF-IDF (1,1) on RandomOverSampler KPS_max
View in article
Full Size Figure

Figure 11. MNB Count (1,1) on RandomOverSampler KPS_max
View in article
Full Size Figure

Table 1. ECOG-KPS scale equivalences and patients profile description
View in article
Full Size

Table 2. CT studies distribution of cancer/breast cancer among U.S. and Non-U.S. countries
View in article
Full Size

Table 3. KPS & ECOG data samples/features distribution
View in article
Full Size

Table 4. Model classification performance comparison among PLS/MLP in terms of 1 R² and MSE scores using 10-Fold Cross Validation learning framework
View in article
Full Size

Table 5. Confusion Matrix for Classification Tasks
View in article
Full Size

Table 6. Statistical Metrics for Classification Tasks based on Confusion Matrix
View in article
Full Size

Table 7. Imbalance vs. RandomOverSampler KPS _min classification comparisons on MNB model, TF-IDF (1,1)
View in article
Full Size

Table 8. Imbalance vs. RandomOverSampler KPS _max classification comparisons on MNB model, TF-IDF (1,1)
View in article
Full Size

Table 9. TF-IDF vs. Count rep. (1,1) for KPS _min Oversampling classification comparisons on MNB model
View in article
Full Size

Table 10. TF-IDF vs. Count rep. (1,1) for KPS_max Oversampling classification comparisons on MNB model
View in article
Full Size

Table 11. Text Pre Processing Comparisons for Count ngram (1,1) representation on MNB KPS _min model
View in article
Full Size

Table 12. Text Pre Processing Comparisons for Count ngram (1,1) representation on MNB KPS _max model
View in article
Full Size

Table 13. Text Pre Processing Comparisons for Count ngram representations on ML KPS _min models
View in article
Full Size

Table 14. Text Pre Processing Comparisons for Count ngram representations on ML KPS _max models
View in article
Full Size

Table 15. SVD Feature Selection Comparisons for Count ngram (1,3) representations on ML KPS _min models
View in article
Full Size

Table 16. SVD Feature Selection Comparisons for Count ngram (1,3) representations on ML KPS _max models
View in article
Full Size

Table 17. Hyper-parameters Algorithms Tuning Comparisons for Countngram (1,3) representations on ML KPS_min models
View in article
Full Size

Table 18. Hyper-parameters Algorithms Tuning Comparisons for Count ngram (1,3) representations on ML KPS_max models
View in article
Full Size

Table 19. Sampling Tuning Comparisons for Count ngram (1,1) representations on ML KPS _min models
View in article
Full Size

Table 20. Sampling Tuning Comparisons for Count ngram (1,1) representations on ML KPS _max models
View in article
Full Size

Table 21. Final Performance Comparisons for Count ngram (1,3) representations on ML KPS _min models
View in article
Full Size

Table 22. Final Performance Comparisons for Count ngram (1,1) representations on ML KPS _max models
View in article
Full Size

[1]	Demner-Fushman D., Chapman WW., McDonald CJ. What can natural language processing do for clinical decision support? Journal of Biomedical Informatics, Number 42, Vol. 5 (2009).
	In article	View Article PubMed

[2]	National Institute of Health. Breast Cancer Clinical Trials. (2017)
	In article

[3]	Clinical Trials Governmental Organization. Protocol Registration Data Element Definitions for Interventional and Observational Studies. http://prsinfo.clinicaltrials.gov/definitions.html, (2017).
	In article

[4]	Melnikov M., Vorobkalov P. Metrics in Ontologies in the Medical Domain. (2014).
	In article

[5]	Jain J., Kumari A., Somvanshi P., Grover A., Pai S., Sunil S. In silico analysis of natural compounds targeting structural and nonstructural proteins of chikungunya virus. F1000Research, Number 1, Vol. 1, (2017).
	In article	View Article PubMed

[6]	National Institutes of Health. BioPortal Ontology. https://bioportal.bioontology.org/ontologies, (2011).
	In article

[7]	Goodwin TR., Harabagiu SM. Medical Question Answering for Clinical Decision Support. Processing ACM Interantional Conference Information Knowledge Management, Number 1, Vol. 1, Pages = 297- 306, (2016).
	In article	View Article PubMed

[8]	Medbravo Barcelona. MedBravo Programming Interview Task. https://stackoverflow.com/jobs, (2015).
	In article

[9]	Ecog-Acrin Organization. ECOG Performance Status Specifications. http://ecog- acrin.org/resources/ecog-performance-status, (2017).
	In article

[10]	Zubrod, Charles G. et al. Appraisal of methods for the study of chemotherapy of cancer in man: Comparative therapeutic trial of nitrogen mustard and triethylene thiophosphoramide. Journal of Clinical Epidemiology, Number 1, Vol. 11, Pages = 7-33, (1960).
	In article	View Article

[11]	Karnofsky D., Burchenal J. Evaluation of chemotherapeutic agents: The clinical evaluation of chemotherapeutic agents in cancer. Evaluation of Chemotherapeutic Agents, Number 1, Vol. 11, Pages = 191-205, (1949).
	In article

[12]	National Institute of Health, ClincalTrial.org. Clinical Trials XML Data Finder. https://clinicaltrials.gov, (2018).
	In article

[13]	Peus D., Newcomb N., Hofer S. Appraisal of the Karnofsky Performance Status and proposal of a simple algorithmic system for its evaluation. BMC Medical Informatics and Decision Making, Number 1, Vol. 13, Pages = 1-7, (2013).
	In article	View Article PubMed

[14]	P. M. Rodda Text Mining: Automatic Retrieval, Annotation and Visualisation of Clinical Trials Text using Ontology. Master thesis. University of Manchester (2010).
	In article

[15]	Kiritchenko, S., de Bruijn, B., Carini, S., Martin, J., Sim, I. ExaCT: automatic extraction of clinical trial characteristics from journal publications. BMC Medical Informatics and Decision Making, Number 10, Vol. 56, (2010).
	In article	View Article PubMed

[16]	Millian et al. Eligibility Criteria Text Extraction. (2013).
	In article

[17]	Cao X., Maloney K., Brusic V. Data mining of cancer vaccine trials, a bird’s eye view. Immunome Research 2008, Number 4, Vol. 7, (2008).
	In article	View Article PubMed

[18]	Reynoso-Aguirre P., Rodriguez-Hontoria H., Belanche Mun˜oz Ll. (2018). Natural Language Processing and Machine Learning Techniques to Solve a Breast Can- cer Clinical Trial ECOG-Classification Problem (Master’s Thesis). Retrieved from https:// upcommons.upc.edu/bitstream/handle/2117/118759/131668.pdf.
	In article

[19]	Anderson P., Thor A., Benik J., Raschid L., Vidal. ME. PAnG: finding patterns in annotation graphs. SIGMOD Conference, (2012).
	In article	View Article

[20]	Cotik V., Rodriguez H., Vivaldi J. Semantic tagging of French medical entities using distant learning. (2015).
	In article

[21]	Vivaldi J., Rodrguez H. Using Wikipedia for term extraction in the biomedical domain: first experience. In Procesamiento del Lenguaje Natural 45, Number 1, Vol. 1, Pages = 251-254, (2011).
	In article

[22]	OConnor B. R2 is rescaled mean squared error. (2009).
	In article

[23]	Hiar J., Ringle C., Sarstedt M. Partial Least Squares Structural Equation Modeling: Rigorous Applica- tions, Better Results and Higher Acceptance. Long Range Planning, Number 1-2, Vol. 46 (2013).
	In article	View Article

[24]	Ruineihart D., Hint. G., Williams R. Learning Internal Representations by Error Propagation. Parallel Distributed Processing: Explorations in the Micro structure of Cognition, Number 1, Vol. 1, Pages = 1-33, (1985).
	In article

[25]	Raschka, S. Python Machine Learning. Packt Publishing, ISBN: 9781783555130, (2015).
	In article

[26]	Pedregosa F. et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, Number 1, Vol. 12, Pages = 2825–2830, (2011).
	In article

[27]	Yetisgen M., Gunn M., Xia F., Payne T. A text processing pipeline to extract recommendations from radiology reports. Journal of Biomedical Informatics, Number 2, Vol. 46, Pages = 354-362, (2013).
	In article	View Article PubMed

[28]	Jia Y. Singular Value Decomposition. (2017).
	In article

[29]	Wold H. Path models with latent variables: The NIPALS approach. Quantitative sociology: International perspectives on mathematical and statistical modeling, Number 1, Vol. 1, Pages = 307-357, (1975).
	In article	View Article

[30]	Landauer T., Foltz P., Laham D. An Introduction to Latent Semantic Analysis. (1998).
	In article	View Article

[31]	Albisua I., Arbelaitz O., Gurrutxaga I., Lasargueren A., Muguerza J., M. Perez J. The quest for the op- timal class distribution: an approach for enhancing the effectiveness of learning via resampling methods for imbalanced data sets 2008, Number 2, Vol. 45, (2013).
	In article	View Article