chemometrics and related fields in python

6
ANALYTICAL SCIENCES JANUARY 2020, VOL. 36 107 1 Introduction Python 1,2 is a general purpose programing language created by van Rossum. This programing language has gradually been becoming a powerful tool for signal processing 3,4 and data analysis 5,6 as well as other applications such as web scraping 7 due to its comprehensive standard library and useful external libraries. Many open source projects in Python have been adequately developed and distributed via GitHub, 8,9 e.g., a scikit-learn (sklearn) machine learning (ML) library for Python. 1012 This library fortunately contains many typical tools for multivariate analysis 13,14 and chemometrics, 1518 e.g., principal component analysis (PCA), 1923 partial least squares (PLS), 2429 etc. Other chemometrics tools that are not included in the ML library, e.g., pyMCR 30,31 for multivariate curve resolution (MCR), 3237 are also independently found in GitHub. Many of them can be easily installed by using pip 38 of the package management system for Python. User friendly Python distributions, web-based interactive computational environments and integrated development environments, such as Anaconda, 39,40 Jupyter Notebook, 41,42 Google Colaboratory, 43 Microsoft Visual Studio, 44,45 etc., are available for free or at a low price and provide support for writing a computer program in Python and analyzing obtained data by instrumental analysis. Convenient cheat-sheets for data analysis in Python can also be found. 46,47 Table 1 summarizes useful libraries for chemometrics and related fields in Python. Abbreviations used in the present work are also summarized in Table 2. Some user-friendly software for chemometrics are commercially available. However, their functions are sometimes limited. Though a similar computational environment for chemometrics can also be constructed by MATLAB, 48,49 R, 50,51 and so on, vast amounts of excellent textbooks for data analysis in Python are easily obtained. Therefore it should be strongly 2020 © The Japan Society for Analytical Chemistry E-mail: [email protected] 1 Introduction 107 2 Dimensionality Reduction 108 3 Clustering 109 4 Classification 109 5 Regression 109 6 Others 109 7 Summary 110 8 References 110 Chemometrics and Related Fields in Python Shigeaki MORITA Department of Engineering Science, Osaka Electro-Communication University, 18-8 Hatsu-cho, Neyagawa, Osaka 5728530, Japan The Python programing language is becoming a promising tool for data analysis in various fields. However, little attention has been paid to using Python in the field of analytical chemistry, though recent advances in instrumental analysis require robust and reliable data analysis. In order to overcome the difficulty in accurate analysis, multivariate analysis, or chemometrics, has been widely applied to various kinds of data obtained by instrumental analysis. In the present work, the potential usefulness of Python for chemometrics and related fields in chemistry is reviewed. Many practical tools for chemometrics, e.g., principal component analysis (PCA), partial least squares (PLS), support vector machine (SVM), etc., are included in the scikit-learn machine learning (ML) library for Python. Other useful libraries such as pyMCR for multivariate curve resolution (MCR), 2Dpy for two-dimensional correlation spectroscopy (2D-COS), etc. can be obtained from GitHub. For these reasons, a computational environment for chemometrics is easily constructed in Python. Keywords Python, multivariate analysis, chemometrics, machine learning (Received August 31, 2019; Accepted November 7, 2019; Advance Publication Released Online by J-STAGE November 15, 2019) Shigeaki MORITA is a professor of Department of Engineering Science, Osaka Electro-Communication University. He received B. Eng. (1996), M. Sci. (1998) and Ph. D. (2001) degrees from Tokyo University of Agriculture and Technology. He served as a postdoctoral fellow at Hokkaido University (2001 2003), a postdoctoral fellow at Kwansei- Gakuin University (2003 2007), an assistant professor at Nagoya University (2007 2012), an associate professor at Osaka Electro-Communication University (2012 2017) and a professor at Osaka Electro-Communication University (2017 ). He received the NIR Advanced Award in 2011 and the SPSJ Asahi Kasei Award in 2011. Reviews

Upload: others

Post on 01-Oct-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chemometrics and Related Fields in Python

ANALYTICAL SCIENCES JANUARY 2020, VOL. 36 107

1 Introduction

Python1,2 is a general purpose programing language created by van Rossum. This programing language has gradually been becoming a powerful tool for signal processing3,4 and data analysis5,6 as well as other applications such as web scraping7 due to its comprehensive standard library and useful external libraries. Many open source projects in Python have been

adequately developed and distributed via GitHub,8,9 e.g., a  scikit-learn (sklearn) machine learning (ML) library for Python.10–12 This library fortunately contains many typical tools for multivariate analysis13,14 and chemometrics,15–18 e.g., principal component analysis (PCA),19–23 partial least squares (PLS),24–29 etc. Other chemometrics tools that are not included in the ML library, e.g., pyMCR30,31 for multivariate curve resolution (MCR),32–37 are also independently found in GitHub. Many of them can be easily installed by using pip38 of the package management system for Python. User friendly Python distributions, web-based interactive computational environments and integrated development environments, such as Anaconda,39,40 Jupyter Notebook,41,42 Google Colaboratory,43 Microsoft Visual Studio,44,45 etc., are available for free or at a low price and provide support for writing a computer program in Python and analyzing obtained data by instrumental analysis. Convenient cheat-sheets for data analysis in Python can also be found.46,47 Table 1 summarizes useful libraries for chemometrics and related fields in Python. Abbreviations used in the present work are also summarized in Table 2.

Some user-friendly software for chemometrics are commercially available. However, their functions are sometimes limited. Though a similar computational environment for chemometrics can also be constructed by MATLAB,48,49 R,50,51 and so on, vast amounts of excellent textbooks for data analysis in Python are easily obtained. Therefore it should be strongly

2020 © The Japan Society for Analytical Chemistry

E-mail: [email protected]

1 Introduction 1072 Dimensionality Reduction 1083 Clustering 1094 Classification 109

5 Regression 1096 Others 1097 Summary 1108 References 110

Chemometrics and Related Fields in Python

Shigeaki MORITA

Department of Engineering Science, Osaka Electro-Communication University, 18-8 Hatsu-cho, Neyagawa, Osaka 572–8530, Japan

The Python programing language is becoming a promising tool for data analysis in various fields. However, little attention has been paid to using Python in the field of analytical chemistry, though recent advances in instrumental analysis require robust and reliable data analysis. In order to overcome the difficulty in accurate analysis, multivariate analysis, or chemometrics, has been widely applied to various kinds of data obtained by instrumental analysis. In the present work, the potential usefulness of Python for chemometrics and related fields in chemistry is reviewed. Many practical tools for chemometrics, e.g., principal component analysis (PCA), partial least squares (PLS), support vector machine (SVM), etc., are included in the scikit-learn machine learning (ML) library for Python. Other useful libraries such as pyMCR for multivariate curve resolution (MCR), 2Dpy for two-dimensional correlation spectroscopy (2D-COS), etc. can be obtained from GitHub. For these reasons, a computational environment for chemometrics is easily constructed in Python.

Keywords Python, multivariate analysis, chemometrics, machine learning

(Received August 31, 2019; Accepted November 7, 2019; Advance Publication Released Online by J-STAGE November 15, 2019)

Shigeaki MORITA is a professor of Department of Engineering Science, Osaka Electro-Communication University. He received B. Eng. (1996), M. Sci. (1998) and Ph. D. (2001) degrees from Tokyo University of Agriculture and Technology. He served as a postdoctoral fellow at Hokkaido University (2001 – 2003), a postdoctoral fellow at Kwansei-Gakuin University (2003 – 2007), an assistant professor at Nagoya University (2007 – 2012), an associate professor at Osaka Electro-Communication University

(2012 – 2017) and a professor at Osaka Electro-Communication University (2017 –). He received the NIR Advanced Award in 2011 and the SPSJ Asahi Kasei Award in 2011.

Reviews

Page 2: Chemometrics and Related Fields in Python

108 ANALYTICAL SCIENCES JANUARY 2020, VOL. 36

emphasized that it is time for chemists, especially analytical chemists, to start to learn chemometrics or ML in Python, since typical chemometrics tools are included in the ML library of scikit-learn. In the present work, the methodology for chemometrics and related fields in Python and their applications are reviewed.

2 Dimensionality Reduction

Computational tools for matrix decomposition, or those mostly for dimensionality reduction in multivariate statistics, e.g., PCA, factor analysis (FA),14,52 independent component analysis (ICA),53,54 non-negative matrix factorization (NMF),55–58 etc., are included in the sklearn.decomposition module in the ML library of scikit-learn. Computational algorithms assuming non-negative signals as NMF are sometimes useful for the data obtained by instrumental analysis having only positive intensities

such as emission spectrum, chromatogram, etc. Underlying calculations based on linear algebra59 for the data analyses are obtained by numpy.linalg module in a library for multi-dimensional array of NumPy.60 For example, singular value decomposition (SDV)59 is calculated by using nympy.linalg.svd class. Parallel factor analysis (PARAFAC)61,62 of dimensionality reduction for multi-dimensional array data is included in the scikit-tensor-py3 (sktensor)63 library. Such multi-dimensional array data are obtained in many different types of instrumental analyses through hyphenated techniques, e.g., GC-MS,64–66 LC-MS,67–69 LC-Vis,37,70 LC-IR,71,72 TG-IR,73,74 etc., as well as fluorescence excitation-emission matrix (EEM) spectroscopy,75,76 hyperspectral imaging,77,78 microscopic spectroscopy,36,79,80 etc.

Dimensionality reduction of PCA has widely been applied to data analysis in analytical chemistry such as data explanation,81,82 discriminant analysis,83 regression analysis,84 noise reduction,85 outlier detection,86 etc. Figure 1 shows an example of the

Table 1 Useful libraries for chemometrics and related fields in Python

Library Function

NumPy60 Multi-dimensional arraySciPy94 Scientific computingpandas146 Panel data structurematplotlib135 Plottingscikit-lean10 Machine learningTensorFlow116 Deep learningpyMCR30 Multivariate curve resolutionpyDOE2126 Design of experiments2Dpy134 Two-dimensional correlation spectroscopy

Table 2 Abbreviations used in the present work

2D-COS Two-dimensional correlation spectroscopyANOVA Analysis of varianceDBSCAN Density-based spatial clustering of applicationsDL Deep learningDoE Design of experimentsEEM Excitation-emission matrixFA Factor analysisFFT Fast Fourier transformationGC Gas chromatographyICA Independent component analysisIR InfraredLC Liquid chromatographyLDA Linear discriminant analysisMCR Multivariate curve resolutionML Machine learningMLR Multiple linear regressionMS Mass spectrometryNMF Non-negative matrix factorizationNN Neural networkOLS Ordinary least squaresPARAFAC Parallel factor analysisPCA Principal component analysisPLS Partial least squaresRF Random forestRMSE Root-mean-square-errorSEIRA Surface-enhanced infrared absorptionSVD Singular value decompositionSVM Support vector machineTG ThermogravimetryVis Visible

Fig. 1 An example of Python programing. A result of PCA using temperature-dependent infrared spectra of poly(vinyl alcohol).

Page 3: Chemometrics and Related Fields in Python

ANALYTICAL SCIENCES JANUARY 2020, VOL. 36 109

Python program using the PCA class in the sklearn.decomposition module. This program was written by using Jupyter Notebook running through Anaconda. In [1]: three libraries were imported. In [2]: a data set in csv format of temperature-dependent infrared spectra of poly(vinyl alcohol)87,88 was loaded. Here, a data structure used in the 2DShige89 software was applied. In [3]: PCA was performed assuming two components. In [4]: loadings for first (blue) and second (orange) principal components were respectively plotted. In [5] score-score plot between first and second principal components was depicted. In this case, the first and second principal components were mainly attributed to the sample in an amorphous phase and that in a crystalline phase, respectively. Temperature-dependent spectral variation was visually traced in the 2D scores plot, i.e., melting temperature of the sample was clearly identified as a folding point in the plot.

As shown in Fig. 1, Python programing is simple and easy to understand. Almost all important and essential items for any types of programming are included in Python. Therefore this programing language is often described as “batteries included”.

3 Clustering

Clustering or cluster analysis90 is a typical unsupervised learning to find previously unknown patterns in data without pre-existing labels. There are two different types of clustering, i.e., hierarchical clustering and non-hierarchical clustering. Several models for the clustering, e.g., Ward’s method,91 k-means,92 density-based spatial clustering of applications with noise (DBSCAN),93 etc. are mounted in sklearn.cluster. Hierarchical clustering is generally visualized by dendrogram and it is obtained by scipy.cluster.hierarchy.dendrogram in the SciPy94 library for scientific computing. Many practical data analyses using hierarchical clustering, e.g., lake water,95 adhesive tapes,96 sheet glasses,97 etc., have been reported in the field of analytical chemistry for environmental, industrial and forensic applications.

4 Classification

Classification,98 especially supervised classification, identifies category membership assuming two or more classes, i.e., binary classification or multiclass classification. Many algorithms have been applied to the problem, e.g., linear discriminant analysis (LDA),99,100 kernel approximation,101 k-nearest neighbors,102 naive Bayes,103 support vector machine (SVM),104–107 random forest (RF),108,109 neural network (NN),110–113 deep learning (DL),12,114,115 etc. These algorithms are mainly enclosed in the scikit-learn. Several libraries for DL are independently available, e.g., TensorFLow,116 Chainer,117 etc.

In the field of analytical chemistry, discriminant analysis based on not only conventional LDA but also the advanced algorithms have been reported in many kinds of samples, e.g., DNA,26 phospholipids,118 cancer cells,119 etc.

5 Regression

Regression analysis based on calibration curve, i.e., relationship between an instrument response and the known chemical information such as concentrations of an analyte, is one of the most important data analysis in the field of analytical chemistry for quantitative determination. Curve fitting using a linear or nonlinear function is obtained by scipy.optimize.curve_fit.

However, it becomes sometimes difficult to determine the chemical information using only one variable, i.e., single regression analysis, due to multicollinearity, i.e., intercorrelations among the independent variables. In that case, a statistical or empirical approach for the regression is generally applied. Multiple linear regression (MLR) or ordinary least squares (OLS) is calculated using sklearn.linear_model. Cross decomposition, i.e., projection of the predicted and observed variables to a new space, of PLS is calculated using sklearn.cross_decomposition. Linear or nonlinear regression based on SVM is also calculated using sklearn.svm. Regression validation such as cross validation is obtained using sklern.model_selection. Model evaluations using root-mean-square-error (RMSE), coefficient of determination R2, etc. are calculated using sklearn.metrics. Many applications of the multivariate regression have been reported, e.g., sugar content in sugarcane,120 milk composition,121 mechanical properties of wood,122 etc.

6 Others

As described above, many methods in chemometrics are overlapping with those in ML with or without labeled data, i.e., supervised learning or unsupervised learning. Other important and well-established techniques in chemometrics and related fields excluding the family of ML will be described below.

MCR decomposes large data matrix into several pure components and their response profiles. As described above, the MCR library in Python of pyMCR is available and installed via pip. Not only series 2D data32,33 such as time-dependent spectra but also hyperspectral 3D data123,124 have recently been analyzed by MCR.

Design of experiments (DoE)125 or experimental design is a practical technique for optimizing performance of experiments with known input factors. Python library for DoE of pyDOE2126 is able to be installed via pip. This technique has been applied to optimization of reaction conditions for derivatization,127 preparation of plasmonic substances for surface-enhanced infrared absorption (SEIRA) spectroscopy,128 data processing for LC-MS metabolomics,129 etc.

Two-dimensional correlation spectroscopy (2D-COS) provides correlation maps spread between two independent variable axes.130–133 For a spectral data set obtained under a certain perturbation, such as temperature, concentration, time, etc., generalized 2D-COS was proposed by Noda.130 A Python program for generalized 2D-COS of 2Dpy134 was developed by Morita and is available in GitHub. Figure 2 shows 2D correlation spectra constructed from the temperature-dependent spectra shown in Fig. 1 calculated using the 2Dpy software. As shown in Fig. 2, publication quality figures are plotted by supporting the matplotlib135 plotting library for Python. This method has been applied to the analysis of the ethanol fermentation process,136,137 hydration structure in cellulose,81 phase separation behavior of polymer solution,138 etc.

Statistical hypothesis testing, e.g., Student’s t-test,139,140 analysis of variance (ANOVA),141–143 etc., are calculated using scipy.stats. Data smoothing and derivatives based on Savitzky–Golay filter144 are calculated using scipy.signal.savgol_filter. Fast Fourier transformation (FFT)145 is calculated using numpy.fft.fft.

Page 4: Chemometrics and Related Fields in Python

110 ANALYTICAL SCIENCES JANUARY 2020, VOL. 36

7 Summary

The usefulness of Python in chemistry, especially analytical chemistry, is reviewed. Many practical tools for ML are included in scikit-lean of the ML library for Python. Other tools for chemometrics and related computations are also obtained from GitHub and/or pip. Therefore, a computational environment for data analysis in instrumental analysis can be easily constructed free of charge.

8 References

1. Python.org, www.python.org. 2. G. van Rossum and Python Dev Team, “Python 3.6

Tutorial”, 2016, Samurai Media Limited, Hong Kong. 3. J. Unpingco, “Python for Signal Processing”, 2016,

Springer, Heidelberg. 4. A. B. Downey, “Think DSP”, 2016, O’Reilly Media,

Sebastopol. 5. J. VanderPlas, “Python Data Science Handbook”, 2016,

O’Reilly Media, Sebastopol. 6. W. McKinney, “Python for Data Analysis”, 2017, O’Reilly

Media, Sebastopol. 7. R. Mitchell, “Web Scraping with Python”, 2018, O’Reilly

Media, Sebastopol. 8. GitHub, github.com.

9. B. Beer, “Introducing GitHub”, 2018, O’Reilly Media, Sebastopol.

10. scikit-learn, scikit-learn.org. 11. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B.

Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and É. Duchesnay, J. Mach. Learn. Res., 2011, 12, 2825.

12. A. Géron, “Hands-On Machine Learning with Scikit-Learn and TensorFlow”, 2017, O’Reilly Media, Sebastopol.

13. K. V. Mardia, J. T. Kent, and J. M. Bibby, “Multivariate Analysis”, 3rd ed., 1980, Academic Press, San Diego.

14. A. C. Rencher, “Methods of Multivariate Analysis”, 3rd ed., 2012, Wiley, Hoboken.

15. M. Otto, “Chemometrics: Statistics and Computer Application in Analytical Chemistry”, 3rd ed., 2016, Wiley-VCH, Weinheim.

16. R. G. Brereton, “Chemometrics: Data Driven Extraction for Science”, 2nd ed., 2018, Wiley, Hoboken.

17. H. Mark and J. Jerry Workman, “Chemometrics in Spectroscopy”, 2018, Academic Press, Burlington.

18. Y. Morisawa, Anal. Sci., 2019, 35, 833. 19. K. Peason, Philosophical Magazine, 1901, 2, 559. 20. I. T. Jolliffe, “Principal Component Analysis”, 2002,

Springer, Heidelberg. 21. N. Shioya, T. Shimoaka, and T. Hasegawa, Anal. Sci., 2017,

33, 117. 22. N. Wijit, S. Prasitwattanaseree, S. Mahatheeranont, P.

Wolschann, S. Jiranusornkul, and P. Nimmanpipug, Anal. Sci., 2017, 33, 1211.

23. X.-F. Gao, Y. Xiao, and Y. Dai, Anal. Sci., 2018, 34, 1067. 24. S. Wold, M. Sjöström, and L. Eriksson, Chemometrics

Intellig. Lab. Syst., 2001, 58, 109. 25. R. Tanaka, N. Takahashi, Y. Nakamura, Y. Hattori, K.

Ashizawa, and M. Otsuka, Anal. Sci., 2017, 33, 41. 26. S. Kasemsumran, N. Suttiwijitpukdee, and V. Keeratinijakal,

Anal. Sci., 2017, 33, 111. 27. M. Li, L. Zhang, X. Yao, and X. Jiang, Anal. Sci., 2017, 33,

1225. 28. M. F. Barbosa, D. S. d. Nascimento, M. Grünhut, H. V.

Dantas, B. S. F. Band, M. C. U. d. Araújo, and M. Insausti, Anal. Sci., 2017, 33, 1285.

29. Y. Chen and L. Dai, Anal. Sci., 2019, 55, 511. 30. pyMCR, github.com/usnistgov/pyMCR. 31. C. H. Camp, J. Res. Natl. Inst. Stand. Technol., 2019, 124, 1. 32. A. Tanabe, S. Morita, M. Tanaka, and Y. Ozaki, Appl.

Spectrosc., 2008, 62, 46. 33. A. Uda, S. Morita, and Y. Ozaki, Polymer, 2013, 54, 2130. 34. C. Ruckebusch and L. Blanchet, Anal. Chim. Acta, 2013,

765, 28. 35. A. de Juan, J. Jaumot, and R. Tauler, Anal. Methods, 2014,

6, 4964. 36. H. Noothalapati, K. Iwasaki, and T. Yamamoto, Anal. Sci.,

2017, 33, 15. 37. H. Yin, L. Zou, Y. Sheng, X. Bai, Q. Liu, and B. Yan, Anal.

Sci., 2018, 34, 207. 38. PyPI, pypi.org. 39. Anaconda, www.anaconda.com. 40. D. Y. Yan and J. Yan, “Hands-On Data Science with

Anaconda”, 2018, Packt Publishing, Birmingham. 41. Project Jupyter, jupyter.org. 42. D. Toomey, “Learning Jupyter”, 2016, Packt Publishing,

Birmingham. 43. Google Colaboratory, colab.research.google.com. 44. Microsoft Visutal Studio, visualstudio.microsoft.com.

Fig. 2 Synchronous (upper) and asynchronous (lower) 2D correlation spectra constructed from the temperature-dependent infrared spectra of poly(vinyl alcohol) shown in Fig. 1.

Page 5: Chemometrics and Related Fields in Python

ANALYTICAL SCIENCES JANUARY 2020, VOL. 36 111

45. M. Sabia and C. Wang, “Python Tools for Visual Studio”, 2014, Packt Publishing, Birmingham.

46. Choosing the right estimator, scikit-learn.org/stable/tutorial/machine_learning_map/.

47. Scikit-Learn Cheat Sheet: Python Machine Learning, www.datacamp.com/community/blog/scikit-learn-cheat-sheet.

48. MATLAB, www.mathworks.com. 49. G. Ciaburro, “MATLAB for Machine Learning”, 2017,

Packt Publishing, Birmingham. 50. R, www.r-project.org. 51. R. Wehrens, “Chemometrics with R”, 2011, Springer,

Heidelberg. 52. T. Adzuhata, J. Inotsume, T. Okamura, R. Kikuchi, T.

Ozeki, M. Kajikawa, and N. Ogawa, Anal. Sci., 2001, 17, 71. 53. A. Hyvärinen and E. Oja, Neural Networks, 2000, 13, 411. 54. A. Hyvärinen, J. Karhunen, and E. Oja, “Independent

Component Analysis”, 2001, Wiley-Interscience, New York. 55. D. D. Lee and H. S. Seung, Nature, 1999, 401, 788. 56. H.-T. Gao, T.-H. Li, K. Chen, W.-G. Li, and X. Bi, Talanta,

2005, 66, 65. 57. K. Neymeyr, M. Sawall, and D. Hess, J. Chemometrics,

2010, 24, 67. 58. B. Yousefi, S. Sojasi, C. I. Castanedo, X. P. Maldague, G.

Beaudoin, and M. Chamberland, Appl. Opt., 2018, 57, 6219.

59. G. Strang, “Introduction to Linear Algebra”, 5th ed., 2016, Wellesley-Cambridge Press.

60. NumPy, numpy.org. 61. N. K. M. Faber, R. Bro, and P. K. Hopke, Chemom. Intell.

Lab. Syst., 2003, 65, 119. 62. A. Quatela, A. M. Gilmore, K. E. S. Gall, M. Sandros, K.

Csatorday, A. Siemiarczuk, B. B. Yang, and L. Camenen, Methods Appl. Fluoresc., 2018, 6, 1.

63. scikit-tensor-py3, github.com/evertrol/scikit-tensor-py3. 64. J. C. Hoggard and R. E. Synovec, Anal. Chem., 2007, 79,

1611. 65. K. Shigeta, H. Tao, K. Nakagawa, T. Kondo, and T.

Nakazato, Anal. Sci., 2018, 34, 227. 66. Y. Horie, A. Goto, S. Tsubuku, M. Itoh, S. Ikegawa, S.

Ogawa, and T. Higashi, Anal. Sci., 2019, 35, 427. 67. D. Bylund, R. Danielsson, G. Malmquist, and K. E.

Markides, J. Chromatogr., 2002, 961, 237. 68. T. Toyo’oka, Anal. Sci., 2017, 33, 555. 69. K.-i. Ohno, T. Hasegawa, T. Tamura, H. Utsumi, and K.

Yamashita, Anal. Sci., 2018, 34, 1017. 70. B. Schmidt, J. W. Jaroszewski, R. Bro, M. Witt, and D.

Stærk, Anal. Chem., 2008, 80, 1978. 71. Y. Li, R. Guo, S. Liu, A. He, Y. Bao, S. Weng, Y. Huang, Y.

Xu, Y. Ozaki, and I. Noda, Anal. Sci., 2017, 33, 105. 72. S. Liu, X. Zhang, R. Guo, Y. Wei, I. Noda, Y. Ozaki, Y. Xu,

and J. Wu, Anal. Sci., 2018, 34, 1351. 73. J. Ferrasse, S. Chavez, P. Arlabosse, and N. Dupuy,

Thermochim. Acta, 2003, 404, 97. 74. C. Vogel, S. Morita, H. Sato, I. Noda, Y. Ozaki, and H. W.

Siesler, Appl. Spectrosc., 2007, 61, 755. 75. R. Xiao, H.-L. Wu, Y. Hu, X.-L. Yin, H.-W. Gu, Z. Liu, T.

Wang, X.-D. Sun, and R.-Q. Yu, Anal. Sci., 2017, 33, 29. 76. C. Qian, L.-F. Wang, W. Chen, Y.-S. Wang, X.-Y. Liu, H.

Jiang, and H.-Q. Yu, Anal. Chem., 2017, 89, 4264. 77. M. Kamruzzaman, G. ElMasry, D.-W. Sun, and P. Allen,

Anal. Chim. Acta, 2012, 714, 57. 78. R. Vejarano, R. Siche, and W. Tesfaye, Int. J. Food Prop.,

2017, 20, 1264. 79. H. Yabe, N. Katayama, and M. Miyazawa, Anal. Sci., 2017,

33, 121.

80. K. Hara, T.-a. Yano, K. Suzuki, M. Hirayama, T. Hayashi, R. Kanno, and M. Hara, Anal. Sci., 2017, 33, 853.

81. A. Watanabe, S. Morita, and Y. Ozaki, Appl. Spectrosc., 2006, 60, 1054.

82. A. Watanabe, S. Morita, S. Kokot, M. Matsubara, K. Fukai, and Y. Ozaki, J. Mol. Struct., 2006, 799, 102.

83. H. Shinzawa, S. Morita, Y. Ozaki, and R. Tsenkova, Appl. Spectrosc., 2006, 60, 884.

84. T. Næs and H. Martens, J. Chemom., 1988, 2, 155. 85. Y. M. Jung, Vib. Spectrosc., 2004, 36, 267. 86. T. Chen, E. Martin, and G. Montague, Comput. Stat. Data

Anal., 2009, 53, 3706. 87. S. Morita, H. Shinzawa, I. Noda, and Y. Ozaki, Appl.

Spectrosc., 2006, 60, 398. 88. S. Morita, K. Kitagawa, I. Noda, and Y. Ozaki, J. Mol.

Struct., 2008, 883, 181. 89. 2DShige, sites.google.com/view/shigemorita/home/2dshige. 90. A. K. Jain, M. N. Murty, and P. J. Flynn, ACM Computing

Surveys, 1999, 31, 264. 91. J. H. Ward Jr, J. Am. Stat. Assoc., 1963, 58, 236. 92. J. Wu, “Advances in K-means Clustering”, 2014, Springer,

Heidelberg. 93. M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, KDD-96

Proceedings, 1996, 96, 226. 94. SciPy, www.scipy.org. 95. H. Yamazaki, S. Gohda, K.-i. Yokota, and T. Shirasaki,

Anal. Sci., 2001, 17, i1565. 96. M. Hida, H. Satoh, and T. Mitsui, Anal. Sci., 2001, 17,

i1507. 97. Y. Suzuki, M. Kasamatsu, S. Suzuki, T. Nakanishi, M.

Takatsu, S. Muratsu, O. Shimoda, S. Watanabe, Y. Nishiwaki, and N. Miyamoto, Anal. Sci., 2005, 21, 855.

98. G. Tripolis, Informatica, 2007, 31, 249. 99. R. A. Fisher, Ann. Eugen., 1936, 7, 179. 100. K. Ariyama, H. Horita, and A. Yasui, Anal. Sci., 2004, 20,

871. 101. T. Hofmann, B. Schölkopf, and A. J. Smola, Ann. Stat.,

2008, 1171. 102. N. S. Altman, Am. Stat., 1992, 46, 175. 103. H. Sun, J. Med. Chem., 2005, 48, 4031. 104. I. Steinwart and A. Christmann, “Support Vector Machines”,

2008, Springer, Heidelberg. 105. A. Niazi, J. Zolgharnein, and S. Afiuni-Zadeh, Anal. Sci.,

2007, 23, 1311. 106. Y.-P. Zhou, L. Xu, L.-J. Tang, J.-H. Jiang, G.-L. Shen, R.-Q.

Yu, and Y. Ozaki, Anal. Sci., 2007, 23, 793. 107. A. A. Ensafi, M. Taei, T. Khayamian, and F. Hasanpour,

Anal. Sci., 2010, 26, 803. 108. H. Chen, Z. Lin, H. Wu, L. Wang, T. Wu, and C. Tan,

Spectrochim. Acta, Part A, 2015, 135, 185. 109. T. Zhang, D. Xia, H. Tang, X. Yang, and H. Li, Chemom.

Intell. Lab. Syst., 2016, 157, 196. 110. S. Kito, T. Hattori, and Y. Murakami, Anal. Sci., 1991, 7,

761. 111. S. Sun, H. Huang, Y. Xu, and S. Cai, Anal. Sci., 2001, 17,

a451. 112. K. Saeki, K. Funatsu, and K. Tanabe, Anal. Sci., 2003, 19,

309. 113. E. C. Ferreira, D. M. Milori, E. J. Ferreira, R. M. Da Silva,

and L. Martin-Neto, Spectrochim. Acta, Part B, 2008, 63, 1216.

114. P. Mamoshina, A. Vieira, E. Putin, and A. Zhavoronkov, Mol. Pharm., 2016, 13, 1445.

115. M. Ziatdinov, O. Dyck, A. Maksov, X. Li, X. Sang, K. Xiao, R. R. Unocic, R. Vasudevan, S. Jesse, and S. V.

Page 6: Chemometrics and Related Fields in Python

112 ANALYTICAL SCIENCES JANUARY 2020, VOL. 36

Kalinin, ACS Nano, 2017, 11, 12742. 116. TensorFlow, www.tensorflow.org. 117. Chainer, chainer.org. 118. Z. Chen, L. Zang, Y. Wu, H. Nakayama, Y. Shimada, R.

Shrestha, Y. Zhao, Y. Miura, H. Chiba, S.-P. Hui, and N. Nishimura, Anal. Sci., 2018, 34, 1201.

119. M. Mimura, S. Tomita, R. Kurita, and K. Shiraki, Anal. Sci., 2019, 35, 99.

120. E. Taira, M. Ueno, K. Saengprachatanarug, and Y. Kawamitsu, J. Near Infrared Spectrosc., 2013, 21, 281.

121. R. Tsenkova, S. Atanassova, K. Itoh, Y. Ozaki, and K. Toyoda, J. Anim. Sci., 2000, 78, 515.

122. T. Fujimoto, Y. Kurata, K. Matsumoto, and S. Tsuchikawa, J. Near Infrared Spectrosc., 2007, 16, 529.

123. C.-K. Huang, M. Ando, H.-o. Hamaguchi, and S. Shigeto, Anal. Chem., 2012, 84, 5661.

124. T. Miyasaka, T. Ikemoto, and T. Kohno, Appl. Surf. Sci., 2008, 255, 1576.

125. Z. R. Lazic, “Design of Experiments in Chemical Engineering: A Practical Guide”, 2006, Wiley-VCH, Weinheim.

126. pyDOE2, github.com/clicumu/pyDOE2. 127. T. Takayama, H. Mizuno, T. Toyo’oka, and K. Todoroki,

Anal. Sci., 2019, 35, 1053. 128. V. Liberman, R. Adato, T. H. Jeys, B. G. Saar, S. Erramilli,

and H. Altug, Opt. Express, 2012, 20, 11953. 129. M. Eliasson, S. Rännar, R. Madsen, M. A. Donten, E.

Marsden-Edwards, T. Moritz, J. P. Shockcor, E. Johansson,

and J. Trygg, Anal. Chem., 2012, 84, 6869. 130. I. Noda, Appl. Spectrosc., 1993, 47, 1329. 131. I. Noda, and Y. Ozaki, “Two-Dimensional Correlation

Spectroscopy: Applications in Vibrational and Optical Spectroscopy”, 2004, Wiley, Chichester.

132. I. Noda, Anal. Sci., 2007, 23, 139. 133. Y. Park, S. Jin, I. Noda, and Y. M. Jung, J. Mol. Struct.,

2018, 1168, 1. 134. 2Dpy, github.com/shigemorita/2Dpy. 135. matplotlib, matplotlib.org. 136. T. Nishii, S. Morita, T. Genkawa, M. Watari, D. Ishikawa,

and Y. Ozaki, Appl. Spectrosc., 2015, 69, 665. 137. T. Nishii, T. Genkawa, M. Watari, and Y. Ozaki, Anal. Sci.,

2012, 28, 1165. 138. W. Gu and P. Wu, Anal. Sci., 2007, 23, 823. 139. Student, Biometrika, 1908, 1. 140. J. Nakanishi, K. Sugiyama, H. Matsuo, Y. Takahashi, S.

Omura, and T. Nakashima, Anal. Sci., 2019, 35, 65. 141. L. Dolatyari, M. R. Yaftian, S. Rostamnia, and M. S.

Seyeddorraji, Anal. Sci., 2017, 33, 769. 142. X. Zhang, F. Ji, Y. Li, T. He, Y. Han, D. Wang, Z. Lin, and

S. Chen, Anal. Sci., 2018, 34, 407. 143. Y. Zhu, Y. Kitamaki, and M. Numata, Anal. Sci., 2017, 33,

209. 144. A. Savitzky and M. J. Golay, Anal. Chem., 1964, 36, 1627. 145. M. T. Heideman, D. H. Johnson, and C. S. Burrus, Arch.

Hist. Exact Sci., 1985, 34, 265. 146. pandas, pandas.pydata.org.