introduction to pythondsc.soic.indiana.edu/publications/cloudcomputing-python.pdfintroduction to...
TRANSCRIPT
![Page 1: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/1.jpg)
![Page 2: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/2.jpg)
INTRODUCTIONTOPYTHON
GregorvonLaszewski
(c)GregorvonLaszewski,2018,2019
![Page 3: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/3.jpg)
INTRODUCTIONTOPYTHON
1PREFACE1.1Disclaimer☁�1.1.1Acknowledgment1.1.2Extensions
2INTRODUCTION2.1IntroductiontoPython☁�2.1.1References
3INSTALATION3.1Python3.7.4Installation☁�3.1.1Hardware3.1.2PrerequisitsUbuntu19.043.1.3PrerequisitsmacOS3.1.3.1InstallationfromAppleAppStore3.1.3.2Installationfrompython.org3.1.3.3InstallationfromHoembrew
3.1.4PrerequisitsUbuntu18.043.1.5PrerequisiteWindows103.1.5.1LinuxSubsystemInstall
3.1.6Prerequisitvenv3.1.7InstallPython3.7viaAnaconda3.1.7.1Downloadcondainstaller3.1.7.2Installconda3.1.7.3InstallPython3.7.4viaconda
3.2Multi-VersionPythonInstallation☁�3.2.1Disablingwrongpythoninstalls3.2.2Managing2.7and3.7PythonVersionswithoutPyenv3.2.3ManagingMultiplePythonVersionswithPyenv3.2.3.1InstallationpyenvviaHomebrew3.2.3.2InstallpyenvonUbuntu18.043.2.3.3Usingpyenv3.2.3.3.1UsingpyenvtoInstallDifferentPythonVersions3.2.3.3.2SwitchingEnvironments
3.2.3.4UpdatingPythonVersionList3.2.3.4.1UpdatingtoanewversionofPythonwithpyenv
![Page 4: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/4.jpg)
3.2.4AnacondaandMinicondaandConda3.2.4.1Miniconda3.2.4.2Anaconda
3.2.5Exercises4FIRSTSTEPS4.1InteractivePython☁�4.1.1REPL(ReadEvalPrintLoop)4.1.2Interpreter4.1.3Python3FeaturesinPython2
4.2Editors☁�4.2.1Pycharm4.2.2Pythonin45minutes
4.3GoogleColab☁�4.3.1IntroductiontoGoogleColab4.3.2ProgramminginGoogleColab4.3.3BenchamrkinginGoogleColabwithCloudmesh
5LANGUAGE5.1Language☁�5.1.1StatementsandStrings5.1.2Comments5.1.3Variables5.1.4DataTypes5.1.4.1Booleans5.1.4.2Numbers
5.1.5ModuleManagement5.1.5.1ImportStatement5.1.5.2Thefrom…importStatement
5.1.6DateTimeinPython5.1.7ControlStatements5.1.7.1Comparison5.1.7.2Iteration
5.1.8Datatypes5.1.8.1Lists5.1.8.2Sets5.1.8.3RemovalandTestingforMembershipinSets5.1.8.4Dictionaries5.1.8.5DictionaryKeysandValues
![Page 5: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/5.jpg)
5.1.8.6CountingwithDictionaries5.1.9Functions5.1.10Classes5.1.11Modules5.1.12LambdaExpressions5.1.12.1map5.1.12.2dictionary
5.1.13Iterators5.1.14Generators5.1.14.1Generatorswithfunction5.1.14.2Generatorsusingforloop5.1.14.3GeneratorswithListComprehension5.1.14.4WhytouseGenerators?
6CLOUDMESH6.1Introduction☁�6.2Installation☁�6.2.1Prerequisite6.2.2BasicInstall
6.3Output☁�6.3.1Console6.3.2Banner6.3.3Heading6.3.4VERBOSE6.3.5Usingprintandpprint
6.4Dictionaries☁�6.4.1Dotdict6.4.2FlatDict6.4.3PrintingDicts
6.5Shell☁�6.6StopWatch☁�6.7CloudmeshCommandShell☁�6.7.1CMD56.7.1.1Resources6.7.1.2Installationfromsource6.7.1.3Execution6.7.1.4CreateyourownExtension6.7.1.5Bug:Quotes
![Page 6: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/6.jpg)
6.8Exercises☁�6.8.1CloudmeshCommon6.8.2CloudmeshShell
7LIBRARIES7.1PythonModules☁�7.1.1UpdatingPip7.1.2UsingpiptoInstallPackages7.1.3GUI7.1.3.1GUIZero7.1.3.2Kivy
7.1.4FormattingandCheckingPythonCode7.1.5Usingautopep87.1.6WritingPython3CompatibleCode7.1.7UsingPythononFutureSystems7.1.8Ecosystem7.1.8.1pypi7.1.8.2AlternativeInstallations
7.1.9Resources7.1.9.1JupyterNotebookTutorials
7.1.10Exercises7.2DataManagement☁�7.2.1Formats7.2.1.1Pickle7.2.1.2TextFiles7.2.1.3CSVFiles7.2.1.4Excelspreadsheets7.2.1.5YAML7.2.1.6JSON7.2.1.7XML7.2.1.8RDF7.2.1.9PDF7.2.1.10HTML7.2.1.11ConfigParser7.2.1.12ConfigDict
7.2.2Encryption7.2.3DatabaseAccess7.2.4SQLite
![Page 7: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/7.jpg)
7.2.4.1Exercises �7.3Plottingwithmatplotlib☁�7.4DocOpts☁�7.5OpenCV☁�7.5.1Overview7.5.2Installation7.5.3ASimpleExample7.5.3.1Loadinganimage7.5.3.2Displayingtheimage7.5.3.3ScalingandRotation7.5.3.4Gray-scaling7.5.3.5ImageThresholding7.5.3.6EdgeDetection
7.5.4AdditionalFeatures7.6SecchiDisk☁�7.6.1SetupforOSX7.6.2Step1:Recordthevideo7.6.3Step2:AnalysetheimagesfromtheVideo7.6.3.1ImageThresholding7.6.3.2EdgeDetection7.6.3.3Blackandwhite
8DATA8.1DataFormats☁�8.1.1YAML8.1.2JSON8.1.3XML
9MONGO9.1MongoDBinPython☁�9.1.1CloudmeshMongoDBUsageQuickstart9.1.2MongoDB9.1.2.1Installation9.1.2.1.1Installationprocedure
9.1.2.2CollectionsandDocuments9.1.2.2.1Collectionexample9.1.2.2.2Documentstructure9.1.2.2.3CollectionOperations
9.1.2.3MongoDBQuerying
![Page 8: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/8.jpg)
9.1.2.3.1MongoQueriesexamples9.1.2.4MongoDBBasicFunctions9.1.2.4.1Import/Exportfunctionsexamples
9.1.2.5SecurityFeatures9.1.2.5.1Collectionbasedaccesscontrolexample
9.1.2.6MongoDBCloudService9.1.3PyMongo9.1.3.1Installation9.1.3.2Dependencies9.1.3.3RunningPyMongowithMongoDeamon9.1.3.4ConnectingtoadatabaseusingMongoClient9.1.3.5AccessingDatabases9.1.3.6CreatingaDatabase9.1.3.7InsertingandRetrievingDocuments(Querying)9.1.3.8LimitingResults9.1.3.9UpdatingCollection9.1.3.10CountingDocuments9.1.3.11Indexing9.1.3.12Sorting9.1.3.13Aggregation9.1.3.14DeletingDocumentsfromaCollection9.1.3.15CopyingaDatabase9.1.3.16PyMongoStrengths
9.1.4MongoEngine9.1.4.1Installation9.1.4.2ConnectingtoadatabaseusingMongoEngine9.1.4.3QueryingusingMongoEngine
9.1.5Flask-PyMongo9.1.5.1Installation9.1.5.2Configuration9.1.5.3Connectiontomultipledatabases/servers9.1.5.4Flask-PyMongoMethods9.1.5.5AdditionalLibraries9.1.5.6ClassesandWrappers
9.2Mongoengine☁�9.2.1Introduction9.2.2Installandconnect
![Page 9: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/9.jpg)
9.2.3Basics10OTHER10.1WordCountwithParallelPython☁�10.1.1GeneratingaDocumentCollection10.1.2SerialImplementation10.1.3SerialImplementationUsingmapandreduce10.1.4ParallelImplementation10.1.5Benchmarking10.1.6Excersises10.1.7References
10.2NumPy☁�10.2.1InstallingNumPy10.2.2NumPyBasics10.2.3DataTypes:TheBasicBuildingBlocks10.2.4Arrays:StringingThingsTogether10.2.5Matrices:AnArrayofArrays10.2.6SlicingArraysandMatrices10.2.7UsefulFunctions10.2.8LinearAlgebra10.2.9NumPyResources
10.3Scipy☁�10.3.1Introduction10.3.2References
10.4Scikit-learn☁�10.4.1IntroductiontoScikit-learn10.4.2Installation10.4.3SupervisedLearning10.4.4UnsupervisedLearning10.4.5Building a end to end pipeline for Supervisedmachine learningusingScikit-learn10.4.6Stepsfordevelopingamachinelearningmodel10.4.7ExploratoryDataAnalysis10.4.7.1Barplot10.4.7.2Correlationbetweenattributes10.4.7.3HistogramAnalysisofdatasetattributes10.4.7.4BoxplotAnalysis10.4.7.5ScatterplotAnalysis
![Page 10: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/10.jpg)
10.4.8DataCleansing-RemovingOutliers10.4.9PipelineCreation10.4.9.1 Defining DataFrameSelector to separate Numerical andCategoricalattributes10.4.9.2FeatureCreation/AdditionalFeatureEngineering
10.4.10CreatingTrainingandTestingdatasets10.4.11Creatingpipelinefornumericalandcategoricalattributes10.4.12Selectingthealgorithmtobeapplied10.4.12.1LinearRegression10.4.12.2LogisticRegression10.4.12.3Decisiontrees10.4.12.4KMeans10.4.12.5SupportVectorMachines10.4.12.6NaiveBayes10.4.12.7RandomForest10.4.12.8Neuralnetworks10.4.12.9DeepLearningusingKeras10.4.12.10XGBoost
10.4.13ScikitCheatSheet10.4.14ParameterOptimization10.4.14.1Hyperparameteroptimization/tuningalgorithms
10.4.15 Experiments with Keras (deep learning), XGBoost, and SVM(SVC)comparedtoLogisticRegression(Baseline)10.4.15.1Creatingaparametergrid10.4.15.2 Implementing Grid search with models and also creatingmetricsfromeachofthemodel.10.4.15.3ResultstablefromtheModelevaluationwithmetrics.10.4.15.4ROCAUCScore
10.4.16K-meansinscikitlearn.10.4.16.1Import
10.4.17K-meansAlgorithm10.4.17.1Import10.4.17.2Createsamples10.4.17.3Createsamples10.4.17.4Visualize10.4.17.5Visualize
10.5Dask-RandomForestFeatureDetection☁�
![Page 11: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/11.jpg)
10.5.1Setup10.5.2Dataset10.5.3DetectingFeatures10.5.3.1DataPreparation
10.5.4RandomForest10.5.5Acknowledgement
10.6ParallelComputinginPython☁�10.6.1Multi-threadinginPython10.6.1.1ThreadvsThreading10.6.1.2Locks
10.6.2Multi-processinginPython10.6.2.1Process10.6.2.2Pool10.6.2.2.1SynchronousPool.map()10.6.2.2.2AsynchronousPool.map_async()
10.6.2.3Locks10.6.2.4ProcessCommunication10.6.2.4.1Value
10.7Dask☁�10.7.1HowDaskWorks10.7.2DaskBag10.7.3ConcurrencyFeatures10.7.4DaskArray10.7.5DaskDataFrame10.7.6DaskDataFrameStorage10.7.7Links
11APPLICATIONS11.1FingerprintMatching☁�11.1.1Overview11.1.2Objectives11.1.3Prerequisites11.1.4Implementation11.1.5Utilityfunctions11.1.6Dataset11.1.7DataModel11.1.7.1Utilities11.1.7.1.1Checksum
![Page 12: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/12.jpg)
11.1.7.1.2Path11.1.7.1.3Image
11.1.7.2Mindtct11.1.7.3Bozorth311.1.7.3.1RunningBozorth311.1.7.3.1.1One-to-one11.1.7.3.1.2One-to-many
11.1.8Plotting11.1.9PuttingitallTogether
11.2NISTPedestrianandFaceDetection �☁�11.2.0.1Introduction11.2.0.1.1INRIAPersonDataset11.2.0.1.2HOGwithSVMmodel11.2.0.1.3AnsibleAutomationTool
11.2.0.2DeploymentbyAnsible11.2.0.3CloudmeshforProvisioning11.2.0.4RolesExplainedforInstallation11.2.0.4.1ServergroupsforMasters/SlavesbyAnsibleinventory
11.2.0.5InstructionsforDeployment11.2.0.5.1CloningPedestrianDetectionRepositoryfromGithub11.2.0.5.2AnsiblePlaybook
11.2.0.6OpenCVinPython11.2.0.6.1Importcv211.2.0.6.2ImageDetection
11.2.0.7HumanandFaceDetectioninOpenCV11.2.0.7.1INRIAPersonDataset11.2.0.7.2FaceDetectionusingHaarCascades11.2.0.7.3FaceDetectionPythonCodeSnippet
11.2.0.8PedestrianDetectionusingHOGDescriptor11.2.0.8.1PythonCodeSnippet
11.2.0.9ProcessingbyApacheSpark11.2.0.9.1ParallelizeinSparkContext11.2.0.9.2MapFunction(apply_batch)11.2.0.9.3CollectFunction
11.2.0.10Resultsfor100+imagesbySparkCluster12REFERENCES
![Page 13: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/13.jpg)
1PREFACE
SatNov2305:25:16EST2019☁�
1.1DISCLAIMER☁�ThisbookhasbeengeneratedwithCyberaideBookmanager.
Bookmanagerisatooltocreateapublicationfromanumberofsourcesontheinternet. It is especially useful to create customized books, lecture notes, orhandouts. Content is best integrated inmarkdown format as it is very fast toproducetheoutput.
Bookmanagerhasbeendevelopedbasedonourexperienceoverthelast3yearswith amore sophisticated approach.Bookmanager takes the lessons from thisapproachanddistributesatoolthatcaneasilybeusedbyothers.
The followingshieldsprovide some informationabout it.Feel free toclickonthem.
pypipypi v0.2.28v0.2.28 LicenseLicense Apache2.0Apache2.0 pythonpython 3.73.7 formatformat wheelwheel statusstatus stablestable buildbuild unknownunknown
1.1.1Acknowledgment
Ifyouusebookmanagertoproduceadocumentyoumustincludethefollowingacknowledgement.
“This document was produced with Cyberaide Bookmanagerdeveloped by Gregor von Laszewski available athttps://pypi.python.org/pypi/cyberaide-bookmanager. It is in theresponsibility of the user tomake sure an author acknowledgementsection is included in your document. Copyright verification ofcontentincludedinabookisresponsibilityofthebookeditor.”
Thebibtexentryis@Misc{www-cyberaide-bookmanager,
author={GregorvonLaszewski},
![Page 14: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/14.jpg)
1.1.2Extensions
We are happy to discuss with you bugs, issues and ideas for enhancements.Pleaseusetheconvenientgithubissuesat
https://github.com/cyberaide/bookmanager/issues
Pleasedonotfilewithusissuesthatrelatetoaneditorsbook.Theywillprovideyouwiththeirownmechanismonhowtocorrecttheircontent.
title={{CyberaideBookManager}},
howpublished={pypi},
month=apr,
year=2019,
url={https://pypi.org/project/cyberaide-bookmanager/}
}
![Page 15: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/15.jpg)
2INTRODUCTION
2.1INTRODUCTIONTOPYTHON☁�
LearningObjectives
Learn quickly Python under the assumption you know a programminglanguageWorkwithmodulesUnderstanddocoptsandcmdContuctsomepythonexamplestorefreshyourpythonknpwledgeLearnaboutthemapfunctioninPythonLearnhowtostartsubprocessesandrederecttheiroutputLearnmoreadvancedconstructssuchasmultiprocessingandQueuesUnderstandwhywedonotuseanacondaGetfamiliarwithpyenv
Portions of this lesson have been adapted from the official Python TutorialcopyrightPythonSoftwareFoundation.
Pythonisaneasytolearnprogramminglanguage.Ithasefficienthigh-leveldatastructuresandasimplebuteffectiveapproachtoobject-orientedprogramming.Python’ssimplesyntaxanddynamictyping,togetherwithitsinterpretednature,make it an ideal language for scripting and rapid application development inmanyareasonmostplatforms.ThePythoninterpreterandtheextensivestandardlibraryarefreelyavailableinsourceorbinaryformforallmajorplatformsfromthe PythonWeb site, https://www.python.org/, and may be freely distributed.ThesamesitealsocontainsdistributionsofandpointerstomanyfreethirdpartyPythonmodules,programsandtools,andadditionaldocumentation.ThePythoninterpretercanbeextendedwithnewfunctionsanddatatypesimplementedinCor C++ (or other languages callable from C). Python is also suitable as anextensionlanguageforcustomizableapplications.
![Page 16: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/16.jpg)
Pythonisaninterpreted,dynamic,high-levelprogramminglanguagesuitableforawiderangeofapplications.
ThephilosophyofpythonissummarizedinTheZenofPythonasfollows:
ExplicitisbetterthanimplicitSimpleisbetterthancomplexComplexisbetterthancomplicatedReadabilitycounts
ThemainfeaturesofPythonare:
UseofindentationwhitespacetoindicateblocksObjectorientparadigmDynamictypingInterpretedruntimeGarbagecollectedmemorymanagementalargestandardlibraryalargerepositoryofthird-partylibraries
Python is used by many companies and is applied for web development,scientific computing, embedded applications, artificial intelligence, softwaredevelopment,andinformationsecurity,tonameafew.
The material collected here introduces the reader to the basic concepts andfeaturesofthePythonlanguageandsystem.Afteryouhaveworkedthroughthematerialyouwillbeableto:
usePythonusetheinteractivePythoninterfaceunderstandthebasicsyntaxofPythonwriteandrunPythonprogramshaveanoverviewofthestandardlibraryinstall Python libraries using pyenv for multipython interpreterdevelopment.
Edoenotattempttobecomprehensiveandcovereverysinglefeature,orevenevery commonly used feature. Instead, it introduces many of Python’s most
![Page 17: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/17.jpg)
noteworthyfeatures,andwillgiveyouagoodideaofthelanguage’sflavorandstyle.After reading it, youwillbeable to readandwritePythonmodulesandprograms,andyouwillbereadytolearnmoreaboutthevariousPythonlibrarymodules.
Inordertoconductthislessonyouneed
AcomputerwithPython2.7.16or3.7.4FamiliaritywithcommandlineusageA text editor such as PyCharm, emacs, vi or others.You should identitywhichworksbestforyouandsetitup.
2.1.1References
Some important additional information can be found on the following Webpages.
PythonPipVirtualenvNumPySciPyMatplotlibPandaspyenvPyCharm
Python module of the week is a Web site that provides a number of shortexamplesonhowtousesomeelementarypythonmodules.Notallmodulesareequallyusefulandyoushoulddecideiftherearebetteralternatives.Howeverforbeginnersthissiteprovidesanumberofgoodexamples
Python2:https://pymotw.com/2/Python3:https://pymotw.com/3/
![Page 18: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/18.jpg)
3INSTALATION
3.1PYTHON3.7.4INSTALLATION☁�
LearningObjectives
Learnhowtoinstallpython.FindadditionalinformationaboutPython.MakesureyourComputersupportsPython.
Inthissetionweexplainhowtoinstallpython3.7.4onacomputer.Likelymuchofthecodewillworkwithearlierversions,butwedothedevelopmentinPythononthenewestversionofpythonavailableathttps://www.python.org/downloads.
3.1.1Hardware
Python does not require any special hardware.We have installed Python notonlyonPC’sandLaptops,butalsoonRaspberryPI’sandLegoMindstorms.
However,therearesomethingstoconsider.Ifyouusemanyprogramsonyourdesktop and run them all at the same time you will find that in up-to-dateoperating systems you will find your self quickly out of memmory. This isespeciallytrueifyouuseeditorssuchasPyCharmwhichwehighlyrecommend.Furthermore,asyoulikelyhavelotsofdiskaccess,makesuretouseafastHDDorbetteranSSD.
AtypicalmoderndeveloperPCorLaptophas16GBRAMandanSSD.Youcancertainlydopythonona$35RapbperryPI,butyouprobablywillnotbeabletorun PyCharm. There are many alternative editors with lessMemory footprintavialable.
3.1.2PrerequisitsUbuntu19.04
![Page 19: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/19.jpg)
Python 3.7 is installed in ubuntu 19.04. Therefore, it already fulfills theprerequisits.Howeverwerecommendthatyouupdate to thenewestversionofpythonandpip.Howeverwerecommendthatyouupdatethethenewestversionofpython.Pleasevisit:https://www.python.org/downloads
3.1.3PrerequisitsmacOS
3.1.3.1InstallationfromAppleAppStore
Youwant a number of useful tool on yourmacOS. They are not installed bydefault,butareavailableviaXcode.Firstyouneedtoinstallxcodefrom
https://apps.apple.com/us/app/xcode/id497799835
NextyouneedtoinstallmacOSxcodecommandlinetools:
3.1.3.2Installationfrompython.org
The easiest instalation of Python for cloudmesh is to use the instaltion fromhttps://www.python.org/downloads. Please, visit the page and follow theinstructions.Afterthisinstallyouhavepython3avalablefromthecommandline
3.1.3.3InstallationfromHoembrew
An alternative instalation is provided from Homebrew. To use this installmethod,youneed to installHomebrewfirst.Start theprocessby installing thepython3usinghomebrew.Installhomebrewusingtheinstructionintheirwebpage:
ThenyoushouldbeabletoinstallPython3.7.4using:
3.1.4PrerequisitsUbuntu18.04
We recommend you update your ubuntu version to 19.04 and follow the
$xcode-select--install
$/usr/bin/ruby-e"$(curl-fsSLhttps://raw.githubusercontent.com/Homebrew/install/master/install)"
$brewinstallpython
![Page 20: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/20.jpg)
instructionsforthatversioninstead,asitissignificantlyeasier.Ifyouhoweverarenotabletodoso,thefollowinginstructionsmaybehelpful.
Wefirstneed tomakesure that thecorrectversionof thePython3is installed.ThedefaultversionofPythononUbuntu18.04is3.6.Youcangettheversionwith:
Iftheversionisnot3.7.4ornewer,youcanupdateitasfollows:
Youcan thencheck the installedversionusing python3.7--version which should be3.7.4.
Nowwewillcreateanewvirtualenvironment:
Theeditthe~/.bashrcfileandaddthefollowinglineattheend:
nowactivatethevirtualenvironmentusing:
nowyoucaninstallthepipforthevirtualenvironmentwithoutconflictingwiththenativepip:
3.1.5PrerequisiteWindows10
Python 3.7 can be installed on Windows 10 using:https://www.python.org/downloads
For3.7.4cangoto thedownloadpageanddownloadoneof thedifferent filesforWindows.
$python3--version
$sudoapt-getupdate
$sudoaptinstallsoftware-properties-common
$sudoadd-apt-repositoryppa:deadsnakes/ppa
$sudoapt-getinstallpython3.7python3-devpython3.7-dev
$python3.7-mvenv--without-pip~/ENV3
aliasENV3="source~/ENV3/bin/activate"
ENV3
$source~/.bashrc
$curl"https://bootstrap.pypa.io/get-pip.py"-o"get-pip.py"
$pythonget-pip.py
$rmget-pip.py
![Page 21: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/21.jpg)
LetusassumeyouchoetheWebbasedinstaller,thanyouclickonthefileintheedge browser (make sure the account you use has administrative priviledges).Followtheinstructionsthattheinstallergives.Importantisthatyouselectatonepoint“[x]AddtoPath”.Therewillbeanemptycheckmarkaboutthisthatyouwillclickon.
Onceitisinstalled.choseaterminalandexecute
However, ifyouhave installedconda for somereasonyouneed to readuponhowtoinstall3.7.4pythonincondaoridentifyhowtoruncondaandpython.orgatthesametime.Weseeoftenothersgivingthewronginstallationinstructions.
Analternative is tousepythonfromwithin theLinuxSubsystem.But thathassomelimitationsandyouwillneedtoexplorehowtoexxessthefilesysteminthesubssytemtohaveasmoothintegrationbetweenyourWindowshostsoyoucanforexampleusePyCharm.
3.1.5.1LinuxSubsystemInstall
ToactivatetheLinuxSubsystem,pleasefollowtheinstructionsat
https://docs.microsoft.com/en-us/windows/wsl/install-win10
Asuitabledistributionwouldbe
https://www.microsoft.com/en-us/p/ubuntu-1804-lts/9n9tngvndl3q?activetab=pivot:overviewtab
Howeverasitusesanolderversionofpythonyouwillahvetoupdateit.
3.1.6Prerequisitvenv
This step is highly recommend if you have not yet already installed a venv forpythontomakesureyouarenotinterferingwithyoursystempython.NotusingavenvcouldhavecatastrophicconsequencesandadestructionofyouroperatingsystemtoolsiftheyrealyonPython.Theuseofvenvissimple.Forourpurposesweassumethatyouusethedirectory:
python--version
![Page 22: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/22.jpg)
Followthesestepsfirst:
Firstcdtoyourhomedirectory.Thenexecute
Youcanaddat theendofyour .bashrc(ubuntu)or .bash_profile (macOS)filetheline
sotheenvironmentisalwaysloaded.Nowyouarereadytoinstallcloudmesh.
Checkifyouhavetherightversionofpythoninstalledwith
Tomakesureyouhaveanuptodateversionofpipissuethecommand
3.1.7InstallPython3.7viaAnaconda
3.1.7.1Downloadcondainstaller
Minicondaisrecommendedhere.Downloadan installerforWindows,macOS,andLinuxfromthispage:https://docs.conda.io/en/latest/miniconda.html
3.1.7.2Installconda
Followinstructionstoinstallcondaforyouroperatingsystems:
Windows. https://conda.io/projects/conda/en/latest/user-guide/install/windows.htmlmacOS. https://conda.io/projects/conda/en/latest/user-guide/install/macos.htmlLinux.https://conda.io/projects/conda/en/latest/user-guide/install/linux.html
~/ENV3
$python3-mvenv~/ENV3
$source~/ENV3/bin/activate
$source~/ENV3/bin/activate
$python--version
$pipinstallpip-U
![Page 23: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/23.jpg)
3.1.7.3InstallPython3.7.4viaconda
Itisveryimportanttomakesureyouhaveanewerversionofpipinstalled.Afteryou installed and created theENV3you need to activate it. This can be donewith
Ifyou like toactivate itwhenyoustartanewterminal,pleaseadd this line toyour.bashrcor.bash_profile
Ifyouusezshpleaseadditto.zprofileinstead.
3.2MULTI-VERSIONPYTHONINSTALLATION☁�
LearningObjectives
Understandwhyweneedtoworryaboutpython3.7and2.7UsepyenvtosupportbothversionsUnderstandthelimitationsofanaconda/condafordevelopers
WearelivinginaninterestingjunctionpointinthedevelopmentofPython.InJanuary 2019, it is encouraged that Python developers swoth from pythonversion2.7topythonversion3.7.
Howevertheremaybetherequirementwhenyoustillneedtodevelopcodenotonlyinpython3.7butalsoinpython2.7.Tofacilitatethismulti-pythonversiondevelopment,thebesttoolweknowaboutcapableofdoingsoispyenv.Wewillexplainyouinthissectionhowtoinstallbothversionswiththehelpofpyenv.
Python is easy to install andverygood instructions formostplatformscanbefoundonthepython.orgWebpage.Weseetwodifferentversions:
$cd~
$condacreate-nENV3python=3.7.4
$condaactivateENV3
$condainstall-canacondapip
$condadeactivateENV3
$condaactivateENV3
![Page 24: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/24.jpg)
Python2.7.16Python3.7.4
Tomanagepythonmodules,itisusefultohavepippackageinstallationtoolonyoursystem.
We assume that you have a computer with python installed. The version ofpythonhowevermaynotbethenewestversion.Pleasecheckwith
whichversionofpythonyourun.Ifitisnotthenewestversion,weusepyenvtoinstallanewerversionsoyoudonoteffect thedefaultversionofpythonfromyoursystem.
3.2.1Disablingwrongpythoninstalls
Whileworkingwithstudentswehaveseenattimesthattheytakeotherclasseseither at universities or online that teach them how to program in python.Unfortunately, they seem to often ignore to teach you how to properly installPython.Ijustrecentlyhadastudentsthathadinstalledpython7differenttimesonhismacOSmachine,whileanotherstudenthad3differentinstallations,allofwhich conflicted with each other as they were not set up properly and thestudents did not even realize that theywere using Python incorrectly on theircomputerduetosetupissuesandconflictinglibraries.
Werecommendthatyouinspectifyouhaveafilessuchas~/.bashrcor~/.bashrc_profileinyourhomedirectoryandidentifyifitactivatesvariousversionsofpythononyourcomputer.Ifsoyoucouldtrytodeactivatethemwhileout-commentingthevarious versionswith the # character at the beginning of the line, start a newterminal and see if the terminal shell still works. Than you can follow ourinstructionsherewhileusinganinstallonpyenv.
3.2.2Managing2.7and3.7PythonVersionswithoutPyenv
Ifyouneedtohavemorethanonepythonversioninstalledanddonotwantorcanusepyenv,werecommendyoudownloadandinstallpython2.7.16and3.7.4frompython.org(https://www.python.org/downloads/)
$python--version
![Page 25: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/25.jpg)
YOucanthanuseeitherpython2orpython3toinvokethepythoninterpreter.
3.2.3ManagingMultiplePythonVersionswithPyenv
Python has several versions that are used by the community. This includesPython2andPython3,butalldifferentmanagementofthepythonlibraries.AseachOSmayhavetheirownversionofpythoninstalled.Itisrecommendedthatyounotmodifythatversion.Insteadyoumaywanttocreatealocalizedpythoninstallation that you as a user can modify. To do that we recommend pyenv.Pyenv allows users to switch between multiple versions of Python(https://github.com/yyuu/pyenv).Tosummarize:
userstochangetheglobalPythonversiononaper-userbasis;userstoenablesupportforper-projectPythonversions;easyversionchangeswithoutcomplexenvironmentvariablemanagement;tosearchinstalledcommandsacrossdifferentpythonversions;integratewithtox(https://tox.readthedocs.io/).
Toinstallpyenvonyoursystemyoucanusethecommand
Nowyoucaninstalldifferentpythonversionsonyoursystemsuchaspython2.7and3.7withafewcommands:
To automatically access them fromyour shellwe integrate them into bash byediting the bash configuration files.Make sure that on Linux you add to the~/.bashrcfileandonmacOStothefile~/.bash_profileor.zprofile.
$curlhttps://pyenv.run|bash
$pyenvinstall3.7.4
$pyenvinstall2.7.16
$pyenvvirtualenv3.7.4ENV3
$pyenvvirtualenv2.7.16ENV2
exportPYENV_ROOT="$HOME/.pyenv"
exportPATH="$PYENV_ROOT/bin:$PATH"
exportPYENV_VIRTUALENV_DISABLE_PROMPT=1
eval"$(pyenvinit-)"
eval"$(pyenvvirtualenv-init-)"
__pyenv_version_ps1(){
localret=$?;
output=$(pyenvversion-name)
if[[!-z$output]];then
echo-n"($output)"
fi
return$ret;
}
![Page 26: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/26.jpg)
Werecommendthatyoudothistowardstheendofyourfile.ThanlookupourconveniencemethodstosetanALIASandinstallPython3.7.4viapyenv
Nextwerecommendtoupdatepip
3.2.3.1InstallationpyenvviaHomebrew
OnmacOSyoucaninstallpyenvalsoviaHomebrew.Beforeinstallinganythingon your computermake sure you have enough space.Use in the terminal thecommand:
whichgivesyour anoverviewofyour file system. Ifyoudonothaveenoughspace,pleasemakesureyoufreeupunusedfilesfromyourdrive.
In many occasions it is beneficial to use readline as it provides nice editingfeaturesfortheterminalandxzforcompletion.First,makesureyouhavexcodeinstalled:
OnMojaveyouwillgetanerrorthatzlibisnotinstalled.THisisduetothattheheaderfilesarenotproperlyinstalled.Todothisyoucansay
Next install homebrew, pyenv, pyenv-virtualenv and pyenv-virtualwrapper.Additionallyinstallreadlineandsomecompressiontools:
PS1="\$(__pyenv_version_ps1)${PS1}"
aliasENV2="pyenvactivateENV2"
aliasENV3="pyenvactivateENV3"
ENV3
$ENV2
$pipinstallpip-U
$ENV3
$pipinstallpip-U
$df-h
$xcode-select--install
$sudoinstaller-pkg/Library/Developer/CommandLineTools/Packages/macOS_SDK_headers_for_macOS_10.14.pkg-target/
$/usr/bin/ruby-e"$(curl-fsSLhttps://raw.githubusercontent.com/Homebrew/install/master/install)"
$brewupdate
$brewinstallreadlinexz
![Page 27: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/27.jpg)
Toinstallpyenvwithhomebrewexecuteintheterminal:
3.2.3.2InstallpyenvonUbuntu18.04
Thefollowingstepswillinstallpyenvinanewubuntu18.04distribution.
Start up a terminal and execute in the terminal the following commands.Werecommend that you do it one command at a time so you can observe if thecommandsucceeds:
Youcanalsoinstallpyenvusingcurlcommandinfollowingway:
Theninstallitsdependencies:
Now that you have installed pyenv it is not yet activated in your currentterminal.Theeasiestthingtodoistostartanewterminalandtypin:
Ifyouseearesponsepyenvisinstalledandyoucanproceedwiththenextsteps.
Pleaserememberwheneveryoumodify.bashrcor.bash_profileor.zprofileyouneedtostartanewterminal.
3.2.3.3Usingpyenv
3.2.3.3.1UsingpyenvtoInstallDifferentPythonVersions
brewinstallpyenvpyenv-virtualenvpyenv-virtualenvwrapper
$sudoapt-getupdate
$sudoapt-getinstallgitpython-pipmakebuild-essentiallibssl-dev
$sudoapt-getinstallzlib1g-devlibbz2-devlibreadline-devlibsqlite3-dev
$sudopipinstallvirtualenvwrapper
$gitclonehttps://github.com/yyuu/pyenv.git~/.pyenv
$gitclonehttps://github.com/pyenv/pyenv-virtualenv.git~/.pyenv/plugins/pyenv-virtualenv
$gitclonehttps://github.com/yyuu/pyenv-virtualenvwrapper.git~/.pyenv/plugins/pyenv-virtualenvwrapper
$echo'exportPYENV_ROOT="$HOME/.pyenv"'>>~/.bashrc
$echo'exportPATH="$PYENV_ROOT/bin:$PATH"'>>~/.bashrc
$curl-Lhttps://raw.githubusercontent.com/yyuu/pyenv-installer/master/bin/pyenv-installer|bash
$sudoapt-getupdate&&sudoapt-getupgrade
$sudoapt-getinstall-ymakebuild-essentiallibssl-dev
$sudoapt-getinstall-yzlib1g-devlibbz2-devlibreadline-devlibsqlite3-dev
$sudoapt-getinstall-ywgetcurlllvmlibncurses5-devgit
$whichpyenv
![Page 28: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/28.jpg)
Pyenv provides a large list of different python versions. To see the entire listpleaseusethecommand:
However, forusweonlyneed toworryaboutpython2.7.16andpython3.7.4.You can now install different versions of python into your local environmentwiththefollowingcommands:
Youcansettheglobalpythondefaultversionwith:
Typethefollowingtodeterminewhichversionyouactivated:
Typethefollowingtodeterminewhichversionsyouhaveavailable:
Associate a specific environment namewith a certain python version, use thefollowingcommands:
In the example, ENV2 would represent python 2.7.16 while ENV3 wouldrepresentpython3.7.4.Oftenitiseasiertotypethealiasratherthantheexplicitversion.
3.2.3.3.2SwitchingEnvironments
After setting up the different environments, switching between them is noweasy.Simplyusethefollowingcommands:
Tomakeiteveneasier,youcanaddthefollowinglinestoyour.bash_profileoror
$pyenvinstall-l
$pyenvupdate
$pyenvinstall2.7.16
$pyenvinstall3.7.4
$pyenvglobal3.7.4
$pyenvversion
$pyenvversions
$pyenvvirtualenv2.7.16ENV2
$pyenvvirtualenv3.7.4ENV3
(2.7.16)$pyenvactivateENV2
(ENV2)$pyenvactivateENV3
(ENV3)$pyenvactivateENV2
(ENV2)$pyenvdeactivateENV2
(2.7.16)$
![Page 29: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/29.jpg)
.zprofilefile:
If you start a new terminal, you can switch between the different versions ofpythonsimplybytyping:
3.2.3.4UpdatingPythonVersionList
Pyenvmaintainslocallyalistofavailablepythonversions.Toseethelistusethecommand
Youwillseetheupdatedlist.
3.2.3.4.1UpdatingtoanewversionofPythonwithpyenv
Naturally python itself evolves and new versions will become available viapyenv.Tofacilitatesuchanewversionyouneedtofirstinstallitintopyenv.Letus assume you had an old version of python installed onto the ENV3environment.Thanyouneedtoexecutethefollowingsteps:
Withthepiinstallcommand,wemakesurewehavethenewestversionofpip.Incaseyougetanerror,youmayhavetoupdatexcodeasfollowsandtryagain:
AfteryouinstalledityoucanactivateitbytypingENV3.NaturallythisrequiresthatyouaddedittoyourbashenvironmentasdiscussedinSection1.1.1.8. �
3.2.4AnacondaandMinicondaandConda
aliasENV2="pyenvactivateENV2"
aliasENV3="pyenvactivateENV3"
$ENV2
$ENV3
$pyenvupdate
$pyenvinstall-l
$pyenvdeactivate
$pyenvuninstallENV3
$pyenvinstall3.7.4
$pyenvvirtualenv3.7.4ENV3
$ENV3
$pipinstallpip-U
xcode-select--install
![Page 30: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/30.jpg)
While inothers on the internet or inyour classesmayhave taught you touseanaconda,Wewillavoiditasithasseveraldisadvantagesforedevelopers.Thereasonforthisisthatitinstallsmanypackagesthatyouarelikelynottouse.InfactinstallinganacondaonyourVMwillwastespaceandtimeandyoushouldlookintootherinstalls.
Wedonotrecommendthatyouuseanacondaorminicondaasitmay
interferewithyourdefaultpythoninterpretersandsetup.
Pleasenotethatbeginnerstopythonshouldalwaysuseanacondaorminicondaonlyafter theyhave installedpyenvanduse it.For thisclassneitheranacondanorminicondaisrequired.Infactwedonotrecommendit.WekeepthissectionasweknowthatotherclassesatIUmayuseanaconda.Wearenotawareiftheseclassesteachyoutherightwaytoinstallit,withpyenv.
3.2.4.1Miniconda
This section about miniconda is experimental and has not beentested.Wearelookingforcontributorsthathelpcompletingit.Ifyouuseanacondaorminicondawerecommendtomanageitviapyenv.
Toinstallminicondayoucanusethefollowingcommands:
Toactivateuse:
Todeactivateuse:
3.2.4.2Anaconda
This section about anaconda is experimental and has not been
$mkdirana
$cdana
$pyenvinstallminiconda3-latest
$pyenvlocalminiconda3-latest
$pyenvactivateminiconda3-latest
$condacreate-nanaanaconda
$sourceactivateana
$sourcedeactivate
![Page 31: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/31.jpg)
tested.Wearelookingforcontributorsthathelpcompletingit.
Youcanaddanacondatoyourpyenvwiththefollowingcommands:
To switch more easily we recommend that you use the following in your.bash_profileor.zprofilefile:
Onceyouhavedonethisyoucaneasilyswitchtoanacondawiththecommand:
Terminologyinanacondacouldleadtoconfusion.Thusweliketopointoutthattheversionnumberofanacondaisunrelatedtothepythonversion.Furthermore,anaconda uses the term root not for the root user, but for the originatingdirectoryinwhichtheanacondaprogramisinstalled.
Incaseyouliketobuildyourowncondapackagesatalatertimewerecommendthatyouinstalltheconda-buildpackage:
Whenexecuting:
youwillseeaftertheinstallcompletedtheanacondaversionsinstalled:
Letusnowcreatevirtualenvforanaconda:
Toactivateityoucannowuse:
pyenvinstallanaconda3-4.3.1
aliasANA="pyenvactivateanaconda3-4.3.1"
$ANA
$condainstallconda-build
$pyenvversions
pyenvversions
system
2.7.16
2.7.16/envs/ENV2
3.7.4
3.7.4/envs/ENV3
ENV2
ENV3
*anaconda3-4.3.1(setbyPYENV_VERSIONenvironmentvariable)
$pyenvvirtualenvanaconda3-4.3.1ANA
$pyenvANA
![Page 32: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/32.jpg)
However, anacondamaymodify your .bashrc or .bash_profile or or .zprofile files andmay result in incompatibilitieswith other python versions. For this reasonwerecommendnot touseit. Ifyoufindwaystoget it toworkreliablywithotherversions,pleaseletusknowandweupdatethistutorial.
3.2.5Exercises
E.Python.Install.1:
InstallPython3.7.4
E.Python.Install.1:
Writeinstallationinstructionsforanoperatingsystemofyourchoiceandaddtothisdocumentation.
E.Python.Install.2:
Replicate the steps to install pyenv, so you can type in ENV2 andENV3inyourterminalstoswitchbetweenpython2and3.
E.Python.Install.3:
Why do you not want to use generally anaconda for cloudcomputing?Whenisitoktouseanaconda?
![Page 33: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/33.jpg)
4FIRSTSTEPS
4.1INTERACTIVEPYTHON☁�Pythoncanbeusedinteractively.Youcanentertheinteractivemodebyenteringtheinteractiveloopbyexecutingthecommand:
Youwillseesomethinglikethefollowing:
The >>> is the prompt used by the interpreter. This is similar to bash wherecommonly$isused.
Sometimes it is convenient to show the promptwhen illustrating an example.This is to provide some context for what we are doing. If you are followingalongyouwillnotneedtotypeintheprompt.
Thisinteractivepythonprocessdoesthefollowing:
readyourinputcommandsevaluateyourcommandprinttheresultofevaluationloopbacktothebeginning.
This is why you may see the interactive loop referred to as aREPL:Read-Evaluate-Print-Loop.
4.1.1REPL(ReadEvalPrintLoop)
There are many different types beyond what we have seen so far, such asdictionariess,lists,sets.Onehandywayofusingtheinteractivepythonistogetthetypeofavalueusingtype():
$python
$python
Python3.7.1(default,Nov242018,14:27:15)
[Clang10.0.0(clang-1000.11.45.5)]ondarwin
Type"help","copyright","credits"or"license"formoreinformation.
>>>
![Page 34: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/34.jpg)
Youcanalsoaskforhelpaboutsomethingusinghelp():
Usinghelp()opensupahelpmessagewithinapager.Tonavigateyoucanusethespacebartogodownapagewtogoupapage,thearrowkeystogoup/downline-by-line,orqtoexit.
4.1.2Interpreter
Althoughtheinteractivemodeprovidesaconvenienttooltotestthingsoutyouwillseequicklythatforourclasswewanttousethepythoninterpreterfromthecommandline.Letusassumetheprogramiscalledprg.py.Onceyouhavewrittenitinthatfileyousimplycancallitwith
Itisimportanttonametheprogramwithmeaningfulnames.
4.1.3Python3FeaturesinPython2
In this coursewewant to be able to seamlessly switch between python 2 andpython3.Thusitisconvenientfromthestarttousepython3syntaxwhenitissupportedalsoinpython2.Oneofthemostusedfunctionsistheprintstatementthathasinpython3parentheses.Toenableitinpython2youjustneedtoimportthisfunction:
Thefirstoftheseimportsallowsustousetheprintfunctiontooutputtexttothescreen, instead of the print statement, which Python 2 uses. This is simply adesigndecisionthatbetterreflectsPython’sunderlyingphilosophy.
Otherfunctionssuchasthedivisionalsobehavedifferently.Thusweuse
>>>type(42)
<type'int'>
>>>type('hello')
<type'str'>
>>>type(3.14)
<type'float'>
>>>help(int)
>>>help(list)
>>>help(str)
$pythonprg.py
from__future__importprint_function,division
from__future__importdivision
![Page 35: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/35.jpg)
Thisimportmakessurethatthedivisionoperatorbehavesinawayanewcomertothelanguagemightfindmoreintuitive.InPython2,division/isfloordivisionwhentheargumentsareintegers,meaningthatthefollowing
InPython3,division/isafloatingpointdivision,thus
4.2EDITORS☁�Thissectionismeanttogiveanoverviewofthepythoneditingtoolsneededfordoing for this course. There are many other alternatives, however, we dorecommendtousePyCharm.
4.2.1Pycharm
PyCharm is an Integrated Development Environment (IDE) used forprogramming in Python. It provides code analysis, a graphical debugger, anintegratedunittester,integrationwithgit.
Python8:56Pycharm
4.2.2Pythonin45minutes
AnadditionalcommunityvideoaboutthePythonprogramminglanguagethatwefoundontheinternet.Naturallytherearemanyalternativestothisvideo,butthevideoisprobablyagoodstart.ItalsousesPyCharmwhichwerecommend.
Python43:16PyCharm
Howmuchyouwanttounderstandofpythonisactuallyabituptoyou.Whileitsgood toknowclassesand inheritance,youmaybeable for thisclass togetawaywithoutusingit.However,wedorecommendthatyoulearnit.
PyCharmInstallation:Method1:PyCharmInstallationonubuntuusingumake
(5/2==2)isTrue
(5/2==2.5)isTrue
![Page 36: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/36.jpg)
Once umake command is run, use the next command to install Pycharmcommunityedition:
IfyouwanttoremovePyCharminstalledusingumakecommand,usethis:
Method2:PyCharminstallationonubuntuusingPPA
PyCharm also has a Professional (paid) version which can be installed usingfollowingcommand:
Onceinstalled,gotoyourVMdashboardandsearchforPyCharm.
4.3GOOGLECOLAB☁�In thissectionwearegoingto introduceyou,howtouseGoogleColabtorundeeplearningmodels.
4.3.1IntroductiontoGoogleColab
ThisvideocontainstheintroductiontoGoogleColab.InthissectionwewillbelearninghowtostartaGoogleColabproject.
4.3.2ProgramminginGoogleColab
Inthisvideowewilllearnhowtocreateasimple,ColabNotebook.
sudoadd-apt-repositoryppa:ubuntu-desktop/ubuntu-make
sudoapt-getupdate
sudoapt-getinstallubuntu-make
umakeidepycharm
umake-ridepycharm
sudoadd-apt-repositoryppa:mystic-mirage/pycharm
sudoapt-getupdate
sudoapt-getinstallpycharm-community
sudoapt-getinstallpycharm
![Page 37: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/37.jpg)
RequiredInstallations
4.3.3BenchamrkinginGoogleColabwithCloudmesh
In this video we learn how to do a basic benchmark with Cloudmesh tools.CloudmeshStopWatchwillbeusedinthistutorial.
RequiredInstallations
pipinstallnumpy
pipinstallnumpy
pipinstallcloudmesh-installer
pipinstallcloudmesh-common
![Page 38: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/38.jpg)
5LANGUAGE
5.1LANGUAGE☁�
5.1.1StatementsandStrings
LetusexplorethesyntaxofPythonwhilestartingwithaprintstatement
Thiswillprintontheterminal
The print function was given a string to process. A string is a sequence ofcharacters. A character can be a alphabetic (A through Z, lower and uppercase), numeric (any of the digits), white space (spaces, tabs, newlines, etc),syntacticdirectives(comma,colon,quotation,exclamation,etc),andsoforth.Astringisjustasequenceofthecharacterandtypicallyindicatedbysurroundingthecharactersindoublequotes.
StandardoutputisdiscussedintheSectionLinux.
So, what happened when you pressed Enter? The interactive Python programreadthelineprint("HelloworldfromPython!"),splititintotheprintstatementandthe"HelloworldfromPython!"string,andthenexecutedtheline,showingyoutheoutput.
5.1.2Comments
Commentsinpythonarefollowedbya#:
5.1.3Variables
Youcanstoredataintoavariabletoaccessitlater.Forinstance:
print("HelloworldfromPython!")
HelloworldfromPython!
#Thisisacomment
hello='HelloworldfromPython!'
print(hello)
![Page 39: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/39.jpg)
Thiswillprintagain
5.1.4DataTypes
5.1.4.1Booleans
Aboolean is a value that can have the values True or False. You can combinebooleanswithbooleanoperatorssuchasandandor
5.1.4.2Numbers
Theinteractiveinterpretercanalsobeusedasacalculator.Forinstance,saywewantedtocomputeamultipleof21:
Wesawheretheprintstatementagain.Wepassedintheresultoftheoperation21 * 2.An integer (or int) in Python is a numeric valuewithout a fractionalcomponent(thosearecalledfloatingpointnumbers,orfloatforshort).
Themathematicaloperators compute the relatedmathematicaloperation to theprovidednumbers.Someoperatorsare:
Operator Function* multiplication/ division+ addition- subtraction** exponent
Exponentiationxyiswrittenasx**yisxtotheythpower.
HelloworldfromPython!
print(TrueandTrue)#True
print(TrueandFalse)#False
print(FalseandFalse)#False
print(TrueorTrue)#True
print(TrueorFalse)#True
print(FalseorFalse)#False
print(21*2)#42
![Page 40: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/40.jpg)
Youcancombinefloatsandints:
Notethatoperatorprecedenceisimportant.Usingparenthesistoindicateaffecttheorderofoperationsgivesadifferenceresults,asexpected:
5.1.5ModuleManagement
AmoduleallowsyoutologicallyorganizeyourPythoncode.Groupingrelatedcodeintoamodulemakesthecodeeasiertounderstandanduse.AmoduleisaPythonobjectwitharbitrarilynamedattributesthatyoucanbindandreference.Amodule is a file consistingofPython code.Amodule candefine functions,classesandvariables.Amodulecanalsoincluderunnablecode.
5.1.5.1ImportStatement
Whentheinterpreterencountersanimportstatement,itimportsthemoduleifthemoduleispresentinthesearchpath.Asearchpathisalistofdirectoriesthattheinterpreter searches before importing a module. The from…import StatementPython’s fromstatement letsyou importspecificattributes fromamodule intothecurrentnamespace.Itispreferredtouseforeachimportitsownlinesuchas:
Whentheinterpreterencountersanimportstatement,itimportsthemoduleifthemoduleispresentinthesearchpath.Asearchpathisalistofdirectoriesthattheinterpretersearchesbeforeimportingamodule.
5.1.5.2Thefrom…importStatement
Python’s fromstatement letsyou importspecificattributes fromamodule intothecurrentnamespace.Thefrom…importhasthefollowingsyntax:
print(3.14*42/11+4-2)#13.9890909091
print(2**3)#8
print(3.14*(42/11)+4-2)#11.42
print(1+2*3-4/5.0)#6.2
print((1+2)*(3-4)/5.0)#-0.6
importnumpy
importmatplotlib
fromdatetimeimportdatetime
![Page 41: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/41.jpg)
5.1.6DateTimeinPython
Thedatetimemodulesuppliesclassesformanipulatingdatesandtimesinbothsimpleandcomplexways.Whiledateandtimearithmeticissupported,thefocusof the implementation is on efficient attribute extraction for output formattingand manipulation. For related functionality, see also the time and calendarmodules.
The import Statement You can use any Python source file as a module byexecutinganimportstatementinsomeotherPythonsourcefile.
Thismoduleoffersagenericdate/timestringparserwhichisabletoparsemostknownformatstorepresentadateand/ortime.
pandas is an open source Python library for data analysis that needs to beimported.
Createastringvariablewiththeclassstarttime
Convertthestringtodatetimeformat
Creatingalistofstringsasdates
ConvertClass_datesstringsintodatetimeformatandsavethelistintovariablea
Useparse()toattempttoauto-convertcommonstringformats.Parsermustbea
fromdatetimeimportdatetime
fromdateutil.parserimportparse
importpandasaspd
fall_start='08-21-2018'
datetime.strptime(fall_start,'%m-%d-%Y')\#
datetime.datetime(2017,8,21,0,0)
class_dates=[
'8/25/2017',
'9/1/2017',
'9/8/2017',
'9/15/2017',
'9/22/2017',
'9/29/2017']
a=[datetime.strptime(x,'%m/%d/%Y')forxinclass_dates]
![Page 42: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/42.jpg)
stringorcharacterstream,notlist.
Useparse()oneveryelementoftheClass_datesstring.
Useparse,butdesignatethatthedayisfirst.
Create adataframe.ADataFrame is a tabulardata structure comprisedof rowsand columns, akin to a spreadsheet, database table. DataFrame as a group ofSeriesobjectsthatshareanindex(thecolumnnames).
Convertdf[`date`]fromstringtodatetime
5.1.7ControlStatements
5.1.7.1Comparison
parse(fall_start)#datetime.datetime(2017,8,21,0,0)
[parse(x)forxinclass_dates]
#[datetime.datetime(2017,8,25,0,0),
#datetime.datetime(2017,9,1,0,0),
#datetime.datetime(2017,9,8,0,0),
#datetime.datetime(2017,9,15,0,0),
#datetime.datetime(2017,9,22,0,0),
#datetime.datetime(2017,9,29,0,0)]
parse(fall_start,dayfirst=True)
#datetime.datetime(2017,8,21,0,0)
importpandasaspd
data={
'dates':[
'8/25/201718:47:05.069722',
'9/1/201718:47:05.119994',
'9/8/201718:47:05.178768',
'9/15/201718:47:05.230071',
'9/22/201718:47:05.230071',
'9/29/201718:47:05.280592'],
'complete':[1,0,1,1,0,1]}
df=pd.DataFrame(
data,
columns=['dates','complete'])
print(df)
#datescomplete
#08/25/201718:47:05.0697221
#19/1/201718:47:05.1199940
#29/8/201718:47:05.1787681
#39/15/201718:47:05.2300711
#49/22/201718:47:05.2300710
#59/29/201718:47:05.2805921
importpandasaspd
pd.to_datetime(df['dates'])
#02017-08-2518:47:05.069722
#12017-09-0118:47:05.119994
#22017-09-0818:47:05.178768
#32017-09-1518:47:05.230071
#42017-09-2218:47:05.230071
#52017-09-2918:47:05.280592
#Name:dates,dtype:datetime64[ns]
![Page 43: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/43.jpg)
Computer programs do not only execute instructions. Occasionally, a choiceneedstobemade.Suchasachoiceisbasedonacondition.Pythonhasseveralconditionaloperators:
Operator Function> greaterthan< smallerthan== equals!= isnot
Conditionsarealwayscombinedwithvariables.Aprogramcanmakeachoiceusingtheifkeyword.Forexample:
In this example,You guessed correctly! will only be printed if the variable xequals to four. Python can also executemultiple conditions using the elif andelsekeywords.
5.1.7.2Iteration
To repeat code, the for keyword can be used. For example, to display thenumbersfrom1to10,wecouldwritesomethinglikethis:
Thesecondargument to range,11, isnot inclusive,meaning that the loopwillonlygetto10beforeitfinishes.Pythonitselfstartscountingfrom0,sothiscodewillalsowork:
x=int(input("Guessx:"))
ifx==4:
print('Correct!')
x=int(input("Guessx:"))
ifx==4:
print('Correct!')
elifabs(4-x)==1:
print('Wrong,butclose!')
else:
print('Wrong,wayoff!')
foriinrange(1,11):
print('Hello!')
foriinrange(0,10):
print(i+1)
![Page 44: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/44.jpg)
Infact,therangefunctiondefaultstostartingvalueof0,soitisequivalentto:
Wecanalsonestloopsinsideeachother:
In this case we have two nested loops. The code will iterate over the entirecoordinaterange(0,0)to(9,9)
5.1.8Datatypes
5.1.8.1Lists
see:https://www.tutorialspoint.com/python/python_lists.htm
Lists inPythonareorderedsequencesofelements,whereeachelementcanbeaccessedusinga0-basedindex.
Todefinealist,yousimplylistitselementsbetweensquarebrackets‘[]’:
Youcanalsouseanegative index ifyouwant tostartcountingelementsfromthe endof the list.Thus, the last element has index -1, the second before lastelementhasindex-2andsoon:
Pythonalsoallowsyoutotakewholeslicesofthelistbyspecifyingabeginningandendofthesliceseparatedbyacolon
foriinrange(10):
print(i+1)
foriinrange(0,10):
forjinrange(0,10):
print(i,'',j)
names=[
'Albert',
'Jane',
'Liz',
'John',
'Abby']
#accessthefirstelementofthelist
names[0]
#'Albert'
#accessthethirdelementofthelist
names[2]
#'Liz'
#accessthelastelementofthelist
names[-1]
#'Abby'
#accessthesecondlastelementofthelist
names[-2]
#'John'
![Page 45: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/45.jpg)
Asyoucanseefromtheexample,thestartingindexinthesliceisinclusiveandtheendingone,exclusive.
Pythonprovidesavarietyofmethodsformanipulatingthemembersofalist.
Youcanaddelementswithappend’:
Asyoucansee,theelementsinalistneednotbeunique.
Mergetwolistswith‘extend’:
Findtheindexofthefirstoccurrenceofanelementwith‘index’:
Removeelementsbyvaluewith‘remove’:
Removeelementsbyindexwith‘pop’:
Noticethatpopreturnstheelementbeingremoved,whileremovedoesnot.
Ifyouarefamiliarwithstacksfromotherprogramminglanguages,youcanuseinsertand‘pop’:
#themiddleelements,excludingfirstandlast
names[1:-1]
#['Jane','Liz','John']
names.append('Liz')
names
#['Albert','Jane','Liz',
#'John','Abby','Liz']
names.extend(['Lindsay','Connor'])
names
#['Albert','Jane','Liz','John',
#'Abby','Liz','Lindsay','Connor']
names.index('Liz')\#2
names.remove('Abby')
names
#['Albert','Jane','Liz','John',
#'Liz','Lindsay','Connor']
names.pop(1)
#'Jane'
names
#['Albert','Liz','John',
#'Liz','Lindsay','Connor']
names.insert(0,'Lincoln')
names
#['Lincoln','Albert','Liz',
#'John','Liz','Lindsay','Connor']
names.pop()
![Page 46: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/46.jpg)
ThePythondocumentationcontainsafulllistoflistoperations.
To go back to the range function you used earlier, it simply creates a list ofnumbers:
5.1.8.2Sets
Pythonlistscancontainduplicatesasyousawpreviously:
Whenwedonotwantthistobethecase,wecanuseaset:
Keepinmindthatthesetisanunorderedcollectionofobjects,thuswecannotaccessthembyindex:
However,wecanconvertasettoalisteasily:
Noticethatinthiscase,theorderofelementsinthenewlistmatchestheorderinwhichtheelementsweredisplayedwhenwecreatetheset.Wehadset(['Lincoln','John','Albert','Liz','Lindsay'])
andnowwehave['Lincoln','John','Albert','Liz','Lindsay'])
#'Connor'
names
#['Lincoln','Albert','Liz',
#'John','Liz','Lindsay']
range(10)
#[0,1,2,3,4,5,6,7,8,9]
range(2,10,2)
#[2,4,6,8]
names=['Albert','Jane','Liz',
'John','Abby','Liz']
unique_names=set(names)
unique_names
#set(['Lincoln','John','Albert','Liz','Lindsay'])
unique_names[0]
#Traceback(mostrecentcalllast):
#File"<stdin>",line1,in<module>
#TypeError:'set'objectdoesnotsupportindexing
unique_names=list(unique_names)
unique_names[`Lincoln',`John',`Albert',`Liz',`Lindsay']
unique_names[0]
#`Lincoln'
![Page 47: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/47.jpg)
You should not assume this is the case in general. That is, do not make anyassumptionsabouttheorderofelementsinasetwhenitisconvertedtoanytypeofsequentialdatastructure.
You can change a set’s contents using the add, remove and update methodswhich correspond to the append, remove and extend methods in a list. Inaddition to these, set objects support the operations youmay be familiarwithfrommathematicalsets:union,intersection,difference,aswellasoperations tocheck containment. You can read about this in the Python documentation forsets.
5.1.8.3RemovalandTestingforMembershipinSets
Oneimportantadvantageofasetoveralististhataccesstoelementsisfast. Ifyou are familiarwith different data structures fromaComputerScience class,thePython list is implementedby an array,while the set is implementedby ahashtable.
Wewilldemonstratethiswithanexample.Letussaywehavealistandasetofthesamenumberofelements(approximately100thousand):
Wewill use the timeit Pythonmodule to time 100 operations that test for theexistenceofamemberineitherthelistorset:
The exact duration of the operations on your systemwill be different, but thetake away will be the same: searching for an element in a set is orders ofmagnitudefasterthaninalist.Thisisimportanttokeepinmindwhenyouworkwithlargeamountsofdata.
5.1.8.4Dictionaries
importsys,random,timeit
nums_set=set([random.randint(0,sys.maxint)for_inrange(10**5)])
nums_list=list(nums_set)
len(nums_set)
#100000
timeit.timeit('random.randint(0,sys.maxint)innums',
setup='importrandom;nums=%s'%str(nums_set),number=100)
#0.0004038810729980469
timeit.timeit('random.randint(0,sys.maxint)innums',
setup='importrandom;nums=%s'%str(nums_list),number=100)
#0.398054122924804
![Page 48: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/48.jpg)
Oneoftheveryimportantdatastructuresinpythonisadictionaryalsoreferredtoasdict.
Adictionaryrepresentsakeyvaluestore:
Aconvenientfortoprintbynamedattributesis
Thisformofprintingwiththeformatstatementandareferencetodataincreasesreadabilityoftheprintstatements.
Youcandeleteelementswiththefollowingcommands:
Youcaniterateoveradict:
5.1.8.5DictionaryKeysandValues
Youcanretrieveboth thekeysandvaluesofadictionaryusing thekeys()andvalues()methodsofthedictionary,respectively:
person={
'Name':'Albert',
'Age':100,
'Class':'Scientist'
}
print("person['Name']:",person['Name'])
#person['Name']:Albert
print("person['Age']:",person['Age'])
#person['Age']:100
print("{Name}{Age}'.format(**data))
delperson['Name']#removeentrywithkey'Name'
#person
#{'Age':100,'Class':'Scientist'}
person.clear()#removeallentriesindict
#person
#{}
delperson#deleteentiredictionary
#person
#Traceback(mostrecentcalllast):
#File"<stdin>",line1,in<module>
#NameError:name'person'isnotdefined
person={
'Name':'Albert',
'Age':100,
'Class':'Scientist'
}
foriteminperson:
print(item,person[item])
#Age100
#NameAlbert
#ClassScientist
person.keys()#['Age','Name','Class']
person.values()#[100,'Albert','Scientist']
![Page 49: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/49.jpg)
Bothmethodsreturnlists.Notice,however,thattheorderinwhichtheelementsappear in the returned lists (Age, Name, Class) is different from the order inwhichwe listed theelementswhenwedeclared thedictionary initially (Name,Age,Class).Itisimportanttokeepthisinmind:
Youcannotmakeanyassumptionsabouttheorderinwhichtheelementsofadictionarywillbe returnedby thekeys()andvalues()methods.
However,youcanassumethatifyoucallkeys()andvalues()insequence,theorderof elements will at least correspond in both methods. In the example Agecorrespondsto100,Nameto Albert,andClasstoScientist,andyouwillobservethe same correspondence in general as long as keys() and values() are called onerightaftertheother.
5.1.8.6CountingwithDictionaries
Oneapplicationofdictionariesthatfrequentlycomesupiscountingtheelementsinasequence.Forexample,saywehaveasequenceofcoinflips:
Theactual listdie_rollswill likelybedifferentwhenyouexecute thisonyourcomputersincetheoutcomesofthedierollsarerandom.
Tocomputetheprobabilitiesofheadsandtails,wecouldcounthowmanyheadsandtailswehaveinthelist:
In addition to how we use the dictionary counts to count the elements of
importrandom
die_rolls=[
random.choice(['heads','tails'])for_inrange(10)
]
#die_rolls
#['heads','tails','heads',
#'tails','heads','heads',
'tails','heads','heads','heads']
counts={'heads':0,'tails':0}
foroutcomeincoin_flips:
assertoutcomeincounts
counts[outcome]+=1
print('Probabilityofheads:%.2f'%(counts['heads']/len(coin_flips)))
#Probabilityofheads:0.70
print('Probabilityoftails:%.2f'%(counts['tails']/sum(counts.values())))
#Probabilityoftails:0.30
![Page 50: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/50.jpg)
coin_flips,noticeacouplethingsaboutthisexample:
1. We used the assert outcome in counts statement. The assert statement inPython allows you to easily insert debugging statements in your code tohelp you discover errors more quickly. assert statements are executedwhenevertheinternalPython__debug__variableissettoTrue,whichisalwaysthecaseunlessyoustartPythonwiththe-OoptionwhichallowsyoutorunoptimizedPython.
2. When we computed the probability of tails, we used the built-in sumfunction,whichallowedus toquickly find the totalnumberof coin flips.sumisoneofmanybuilt-infunctionyoucanreadabouthere.
5.1.9Functions
Youcanreusecodebyputtingitinsideafunctionthatyoucancallinotherpartsofyourprograms.Functionsarealsoagoodwayofgroupingcodethatlogicallybelongs together in one coherentwhole.A function has a unique name in theprogram.Onceyoucallafunction,itwillexecuteitsbodywhichconsistsofoneormorelinesofcode:
The def keyword tells Python we are defining a function. As part of thedefinition,wehavethefunctionname,check_triangle,andtheparametersofthefunction–variablesthatwillbepopulatedwhenthefunctioniscalled.
Wecallthefunctionwitharguments4,5and6,whicharepassedinorderintotheparametersa,bandc.Afunctioncanbecalledseveral timeswithvaryingparameters.Thereisnolimittothenumberoffunctioncalls.
It is also possible to store the output of a function in a variable, so it can bereused.
defcheck_triangle(a,b,c):
return\
a<b+canda>abs(b-c)and\
b<a+candb>abs(a-c)and\
c<a+bandc>abs(a-b)
print(check_triangle(4,5,6))
defcheck_triangle(a,b,c):
return\
a<b+canda>abs(b-c)and\
b<a+candb>abs(a-c)and\
c<a+bandc>abs(a-b)
![Page 51: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/51.jpg)
5.1.10Classes
Aclass is an encapsulation of data and the processes thatwork on them.Thedata is represented inmember variables, and the processes are defined in themethodsoftheclass(methodsarefunctionsinsidetheclass).Forexample,let’sseehowtodefineaTriangleclass:
Python has full object-oriented programming (OOP) capabilities, however wecannotcoveralloftheminthissection,soifyouneedmoreinformationpleaserefertothePythondocsonclassesandOOP.
5.1.11Modules
Nowwritethissimpleprogramandsaveit:
Asacheck,makesurethefilecontainstheexpectedcontentsonthecommandline:
result=check_triangle(4,5,6)
print(result)
classTriangle(object):
def__init__(self,length,width,
height,angle1,angle2,angle3):
ifnotself._sides_ok(length,width,height):
print('Thesidesofthetriangleareinvalid.')
elifnotself._angles_ok(angle1,angle2,angle3):
print('Theanglesofthetriangleareinvalid.')
self._length=length
self._width=width
self._height=height
self._angle1=angle1
self._angle2=angle2
self._angle3=angle3
def_sides_ok(self,a,b,c):
return\
a<b+canda>abs(b-c)and\
b<a+candb>abs(a-c)and\
c<a+bandc>abs(a-b)
def_angles_ok(self,a,b,c):
returna+b+c==180
triangle=Triangle(4,5,6,35,65,80)
print("Helloworld!")
$cathello.py
print("Helloworld!")
![Page 52: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/52.jpg)
Toexecuteyourprogrampassthefileasaparametertothepythoncommand:
Files in which Python code is stored are calledmodules. You can execute aPythonmoduleformthecommandlinelikeyoujustdid,oryoucanimportitinotherPythoncodeusingtheimportstatement.
Let us write a more involved Python program that will receive as input thelengths of the three sides of a triangle, andwill outputwhether they define avalidtriangle.Atriangleisvalidifthelengthofeachsideislessthanthesumofthelengthsoftheothertwosidesandgreaterthanthedifferenceofthelengthsoftheothertwosides.:
Assumingwesavetheprograminafilecalledcheck_triangle.py,wecanrunitlikeso:
Letusbreakthisdownabit.
1. Weare importing theprint_function anddivisionmodules frompython3likewedidearlierinthissection.It’sagoodideatoalwaysincludetheseinyourprograms.
2. We’vedefinedabooleanexpressionthattellsusifthesidesthatwereinputdefine a valid triangle. The result of the expression is stored in the
$pythonhello.py
Helloworld!
"""Usage:check_triangle.py[-h]LENGTHWIDTHHEIGHT
Checkifatriangleisvalid.
Arguments:
LENGTHThelengthofthetriangle.
WIDTHThewidthofthetraingle.
HEIGHTTheheightofthetriangle.
Options:
-h--help
"""
fromdocoptimportdocopt
if__name__=='__main__':
arguments=docopt(__doc__)
a,b,c=int(arguments['LENGTH']),
int(arguments['WIDTH']),
int(arguments['HEIGHT'])
valid_triangle=\
a<b+canda>abs(b-c)and\
b<a+candb>abs(a-c)and\
c<a+bandc>abs(a-b)
print('Trianglewithsides%d,%dand%disvalid:%r'%(
a,b,c,valid_triangle
))
$pythoncheck_triangle.py456
Trianglewithsides4,5and6isvalid:True
![Page 53: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/53.jpg)
valid_trianglevariable.insidearetrue,andFalseotherwise.3. We’ve used the backslash symbol \ to format are code nicely. The
backslash simply indicates that the current line is being continued on thenextline.
4. Whenweruntheprogram,wedothecheckif__name__=='__main__'. __name__ is aninternal Python variable that allows us to tell whether the current file isbeingrunfromthecommandline(value__name__),orisbeingimportedbyamodule (the value will be the name of the module). Thus, with thisstatementwe’rejustmakingsuretheprogramisbeingrunbythecommandline.
5. Weareusing thedocoptmodule tohandlecommand linearguments.Theadvantageofusing thismodule is that itgeneratesausagehelpstatementfor theprogramandenforces command line arguments automatically.Allofthisisdonebyparsingthedocstringatthetopofthefile.
6. Intheprintfunction,weareusingPython’sstringformattingcapabilitiestoinsertvaluesintothestringwearedisplaying.
5.1.12LambdaExpressions
As oppose to normal functions in Python which are defined using the def
keyword, lambda functions in Python are anonymous functions which do nothaveanameandaredefinedusing the lambda keyword.Thegeneric syntaxof alambda function is in form oflambdaarguments:expression, as shown in the followingexample:
Asyoucouldprobablyguess,theresultis:
Nowconsiderthefollowingexamples:
The power2 function defined in the expression, is equivalent to the followingdefinition:
greeter=lambdax:print('Hello%s!'%x)
print(greeter('Albert'))
HelloAlbert!
power2=lambdax:x**2
defpower2(x):
returnx**2
![Page 54: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/54.jpg)
Lambdafunctionsareusefulforwhenyouneedafunctionforashortperiodoftime.Note that theycanalsobeveryusefulwhenpassedasanargumentwithotherbuilt-infunctionsthattakeafunctionasanargument,e.g.filter()andmap().Inthenextexampleweshowhowalambdafunctioncanbecombinedwiththefilerfunction. Consider the array all_names which contains five words that rhymetogether.Wewanttofilterthewordsthatcontainthewordname.Toachievethis,wepassthefunctionlambdax:'name'inxas thefirstargument.This lambdafunctionreturns True if the word name exists as a sub-string in the string x. The secondargumentoffilterfunctionisthearrayofnames,i.e.all_names.
Asyoucansee,thenamesaresuccessfullyfilteredasweexpected.
InPython3,filterfunctionreturnsafilterobjectortheiteratorwhichgetslazilyevaluatedwhichmeans neitherwe can access the elements of the filter objectwithindexnorwecanuselen()tofindthelengthofthefilterobject.
InPython,wecanhaveasmallusuallyasinglelineranonymousfunctioncalledLambda functionwhich canhave anynumberof arguments just like anormalfunctionbutwithonlyoneexpressionwithnoreturnstatement.Theresultofthisexpressioncanbeappliedtoavalue.
BasicSyntax:
Foranexample:afunctioninpython
SamefunctioncanwrittenasLambdafunction.Thisfunctionnamedasmultiplyishaving2argumentsandreturnstheirmultiplication.
all_names=['surname','rename','nickname','acclaims','defame']
filtered_names=list(filter(lambdax:'name'inx,all_names))
print(filtered_names)
#['surname','rename','nickname']
list_a=[1,2,3,4,5]
filter_obj=filter(lambdax:x%2==0,list_a)
#Convertthefilerobjtoalist
even_num=list(filter_obj)
print(even_num)
#Output:[2,4]
lambdaarguments:expression
defmultiply(a,b):
returna*b
#callthefunction
multiply(3*5)#outputs:15
![Page 55: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/55.jpg)
Lambdaequivalentforthisfunctionwouldbe:
Here a and b are the 2 arguments and a*b is the expression whose value isreturnedasanoutput.
Alsowedon’tneedtoassignLambdafunctiontoavariable.
Lambdafunctionsaremostlypassedasparametertoafunctionwhichexpectsafunctionobjectslikeinmaporfilter.
5.1.12.1map
Thebasicsyntaxofthemapfunctionis
mapfunctionsexpectsafunctionobjectandanynumberofiterableslikelistordictionary.Itexecutesthefunction_objectforeachelementinthesequenceandreturnsalistoftheelementsmodifiedbythefunctionobject.
Example:
IfwewanttowritesamefunctionusingLambda
5.1.12.2dictionary
Now,letsseehowwecaninterateoveradictionaryusingmapandlambdaLetssaywehaveadictionaryobject
multiply=Lambdaa,b:a*b
print(multiply(3,5))
#outputs:15
(lambdaa,b:a*b)(3*5)
map(function_object,iterable1,iterable2,...)
defmultiply(x):
returnx*2
map(multiply2,[2,4,6,8])
#Output[4,8,12,16]
map(lambdax:x*2,[2,4,6,8])
#Output[4,8,12,16]
dict_movies=[
![Page 56: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/56.jpg)
Wecan iterate over this dictionary and read the elements of it usingmap andlambdafunctionsinfollowingway:
In Python3, map function returns an iterator or map object which gets lazilyevaluatedwhichmeans neitherwe can access the elements of themap objectwith indexnorwe canuse len() to find the lengthof themapobject.We canforceconvertthemapoutputi.e.themapobjecttolistasshownnext:
5.1.13Iterators
InPython, an iteratorprotocol isdefinedusing twomethods: __iter()__ and next().The former returns the iterator object and latter returns the next element of asequence.Someadvantagesofiteratorsareasfollows:
ReadabilitySupportssequencesofinfinitelengthSavingresources
Thereareseveralbuilt-inobjects inPythonwhich implement iteratorprotocol,e.g.string,list,dictionary.Inthefollowingexample,wecreateanewclassthatfollowstheiteratorprotocol.Wethenusetheclasstogeneratelog2ofnumbers:
{'movie':'avengers','comic':'marvel'},
{'movie':'superman','comic':'dc'}]
map(lambdax:x['movie'],dict_movies)#Output:['avengers','superman']
map(lambdax:x['comic'],dict_movies)#Output:['marvel','dc']
map(lambdax:x['movie']=="avengers",dict_movies)
#Output:[True,False]
map_output=map(lambdax:x*2,[1,2,3,4])
print(map_output)
#Output:mapobject:<mapobjectat0x04D6BAB0>
list_map_output=list(map_output)
print(list_map_output)#Output:[2,4,6,8]
frommathimportlog2
classLogTwo:
"Implementsaniteratoroflogtwo"
def__init__(self,last=0):
self.last=last
def__iter__(self):
self.current_num=1
returnself
def__next__(self):
ifself.current_num<=self.last:
result=log2(self.current_num)
self.current_num+=1
returnresult
![Page 57: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/57.jpg)
As you can see,we first create an instance of the class and assign its __iter()__functiontoavariablecalledi.Thenbycallingthenext()functionfourtimes,wegetthefollowingoutput:
Asyouprobablynoticed,thelinesarelog2()of1,2,3,4respectively.
5.1.14Generators
Before we go to Generators, please understand Iterators. Generators are alsoIteratorsbuttheycanonlybeinteratedoveronce.ThatsbecauseGeneratorsdonotstorethevaluesinmemoryinsteadtheygeneratethevaluesonthego.Ifwewanttoprintthosevaluesthenwecaneithersimplyiterateoverthemorusetheforloop.
5.1.14.1Generatorswithfunction
For example:we have a function named asmultiplyBy10which prints all theinputnumbersmultipliedby10.
Now,ifwewanttouseGeneratorsherethenwewillmakefollowingchanges.
else:
raiseStopIteration
L=LogTwo(5)
i=iter(L)
print(next(i))
print(next(i))
print(next(i))
print(next(i))
$pythoniterator.py
0.0
1.0
1.584962500721156
2.0
defmultiplyBy10(numbers):
result=[]
foriinnumbers:
result.append(i*10)
returnresult
new_numbers=multiplyBy10([1,2,3,4,5])
printnew_numbers#Output:[10,20,30,40,50]
defmultiplyBy10(numbers):
foriinnumbers:
yield(i*10)
new_numbers=multiplyBy10([1,2,3,4,5])
![Page 58: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/58.jpg)
InGenerators,weuseyield() function inplaceof return().Sowhenwe try toprintnew_numberslistnow,itjustprintsGeneratorsobject.ThereasonforthisisbecauseGeneratorsdontholdanyvalue inmemory, ityieldsone resultatatime.Soessentiallyitisjustwaitingforustoaskforthenextresult.Toprintthenextresultwecanjustsayprintnext(new_numbers),sohowitisworkingisitsreadingthefirstvalueandsquaringitandyieldingoutvalue1.Alsointhiscasewecanjustprintnext(new_numbers)5timestoprintallnumbersandifwedoitfor6thtimethenwewillgetanerrorStopIterationwhichmeannsGeneratorshasexausteditslimitandithasno6thelementtoprint.
5.1.14.2Generatorsusingforloop
Ifwenowwanttoprintthecompletelistofsquaredvaluesthenwecanjustdo:
Theoutputwillbe:
5.1.14.3GeneratorswithListComprehension
Python has something called List Comprehension, ifwe use this thenwe canreplacethecompletefunctiondefwithjust:
Here the point to note is square brackets [] in line 1 is very important. Ifwechangeitto()thenagainwewillstartgettingGeneratorsobject.
printnew_numbers#Output:Generatorsobject
printnext(new_numbers)#Output:1
defmultiplyBy10(numbers):
foriinnumbers:
yield(i*10)
new_numbers=multiplyBy10([1,2,3,4,5])
fornuminnew_numbers:
printnum
10
20
30
40
50
new_numbers=[x*10forxin[1,2,3,4,5]]
printnew_numbers#Output:[10,20,30,40,50]
new_numbers=(x*10forxin[1,2,3,4,5])
printnew_numbers#Output:Generatorsobject
![Page 59: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/59.jpg)
Wecanget the individualelementsagain fromGenerators ifwedoa for loopovernew_numberslikewedidpreviously.Alternatively,wecanconvertitintoalistandthenprintit.
Buthereifweconvertthisintoalistthenwelooseonperformance,whichwewilljustseenext.
5.1.14.4WhytouseGenerators?
Generators are betterwithPerformance because it does not hold the values inmemoryandherewiththesmallexamplesweprovideitsnotabigdealsinceweare dealing with small amount of data but just consider a scenario where therecords are in millions of data set. And if we try to convert millions of dataelements into a list then that will definitely make an impact on memory andperformancebecauseeverythingwillinmemory.
Lets see an example on how Generators help in Performance. First, withoutGenerators, normal function taking 1 million record and returns theresult[people]for1million.
new_numbers=(x*10forxin[1,2,3,4,5])
printlist(new_numbers)#Output:[10,20,30,40,50]
names=['John','Jack','Adam','Steve','Rick']
majors=['Math',
'CompScience',
'Arts',
'Business',
'Economics']
#printsthememorybeforewerunthefunction
memory=mem_profile.memory_usage_resource()
print(f'Memory(Before):{memory}Mb')
defpeople_list(people):
result=[]
foriinrange(people):
person={
'id':i,
'name':random.choice(names),
'major':randon.choice(majors)
}
result.append(person)
returnresult
t1=time.clock()
people=people_list(10000000)
t2=time.clock()
#printsthememoryafterwerunthefunction
memory=mem_profile.memory_usage_resource()
print(f'Memory(After):{memory}Mb')
print('Took{time}seconds'.format(time=t2-t1))
#Output
Memory(Before):15Mb
![Page 60: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/60.jpg)
I am justgivingapproximatevalues tocompare itwithnext executionbutwejusttrytorunitwewillseeaseriousconsumptionofmemorywithgoodamountoftimetaken.
Now after running the same code using Generators, wewill see a significantamount of performanceboostwith alomost 0Seconds.And the reasonbehindthisisthatincaseofGenerators,wedonotkeepanythinginmemorysosystemjustreads1atatimeandyieldsthat.
Memory(After):318Mb
Took1.2seconds
names=['John','Jack','Adam','Steve','Rick']
majors=['Math',
'CompScience',
'Arts',
'Business',
'Economics']
#printsthememorybeforewerunthefunction
memory=mem_profile.memory_usage_resource()
print(f'Memory(Before):{memory}Mb')
defpeople_generator(people):
foriinxrange(people):
person={
'id':i,
'name':random.choice(names),
'major':randon.choice(majors)
}
yieldperson
t1=time.clock()
people=people_list(10000000)
t2=time.clock()
#printsthememoryafterwerunthefunction
memory=mem_profile.memory_usage_resource()
print(f'Memory(After):{memory}Mb')
print('Took{time}seconds'.format(time=t2-t1))
#Output
Memory(Before):15Mb
Memory(After):15Mb
Took0.01seconds
![Page 61: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/61.jpg)
6CLOUDMESH
6.1INTRODUCTION☁�
LearningObjectives
IntroductiontothecloudmeshAPIUsingcmd5viacmsIntroduction to cloudmesh convenience API for output, dotdict, shell,stopwatch,benchmarkmanagementCreatingyourowncmscommandsCloudmeshconfigurationfileCloudmeshinventory
InthisChapterweliketointroduceyoutocloudmeshwhichprovidesyouwithanumberofconvenientmethodstointerfacewiththelocalsystem,butalsowithcloud services.Wewill startwhile focussing on some simpleAPI’s and thangraduallyintroducethecloudmeshshellwhichnotonlyprovidesashell,butalsoacommandline interfacesoyoucanusecloudmeshfroma terminal.Thisdualabilityisquiteusefulaswecanwritecloudmeshscripts,butcanalsoinvokethefunctionality from the terminal. This is quite an important distinction towardsothertoolsthatonlyallowcommandlineinterfaces.
Moreoverwealsoshoyouthatitiseasytocreatenewcommandsandaddthemdynamicallytothecloudmeshshellviasimplepipinstalls.
Cloudmeshisanevolvingprojectandyouhavetheopportunitytoimproveitifyouseesomefeaturesmissing.
Themanualofcloudmeshcanbefoundat
https://cloudmesh.github.io/cloudmesh-manual
TheAPIdocumentationislocatedat
![Page 62: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/62.jpg)
https://cloudmesh.github.io/cloudmesh-manual/api/index.html#cloudmesh-api
Wewillinitiallyfocusonasubsetofthisfunctionality.
6.2INSTALLATION☁�Theinstallationofcloudmeshissimpleandcantechnicallybedoneviapipbyauser.Howeveryouarenotauser,youareadeveloper.Cloudmeshisdistributedindifferenttopicalrepositoriesandinorderfordeveloperstoeasilyinteractwiththemwehavewrittenaconvenientcloudmesh-installerprogram.
As a developer you must also use a python virtual environment to avoidaffectingyoursystemwidepythoninstallation.ThiscanbeachievedwhileusingPython3 from python.org or via conda. We do recommend that you usepython.orgas this is thevanillapython thatmostdevelopers in theworlduse.Condaisoftenusedbyusersofpythoniftheynotneedtousebleeding-edgebutolderprepackagedpythontoolsandlibraries.
6.2.1Prerequisite
Werequireyoutocreateapythonvirtualenvironmentandactivateit.HowtodothiswasdiscussedinSection3.1.Pleasecreate theENV3environment.Pleaseactivateit.
6.2.2BasicInstall
Cloudmeshcaninstallfordevelopersanumberofbundles.Abundleisasetofgitrepositories that are needed for a particular install. For us, we are mostlyinterested in thebundles cms, cloud, storage.Wewill introduceyou tootherbundlesthroughoutthisdocumentation.
Ifyouliketofindoutmoreaboutthedetailsofthisyoucanlookatcloudmesh-installerwhichwillberegularlyupdated.
Tomakeuseofthebundleandtheeasyinstallationfordeveloperspleaseinstallthecloudmesh-installerviapip,butmakesureyoudothisinapythonvirtualenv
![Page 63: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/63.jpg)
asdiscussedpreviously. Ifnotyoumay impactyoursystemnegatively.Pleasenote thatwe are not responsible for fixing your computer.Naturally, you canalso use a virtualmachine, if you prefer. It is also important thatwe create auniform development environment. In our case we create an empty directorycalledcminwhichweplacethebundle.
Toseethebundleyoucanuse
WewillstartwiththebasiccloudmeshfunctionalityatthistimeandonlyinstalltheshellandsomecommonAPI’s.
Thesecommandsdownloadandinstallcloudmeshshellintoyourenvironment.Itisimportantthatyouusethe-eflag
Toseeifitworksyoucanusethecommand
Youwillseeanoutput.Ifthisdoesnotworkforyou,andyoucannotfigureouttheissue,pleasecontactussowecanidentifywhatwentwrong.
Formoreinformation,pleasevisitourInstallationInstructionsforDevelopers
6.3OUTPUT☁�Cloudmesh provides a number of convenient API’s to make output easier ormorefancyful.
TheseAPI’sinclude
ConsoleBannerHeading
$mkdircm
$cdcm
$pipinstallcloudmesh-installer
$cloudmesh-installerbundles
$cloudmesh-installergitclonecms
$cloudmesh-installerinstallcms-e
$cmshelp
![Page 64: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/64.jpg)
VERBOSE
6.3.1Console
Print is theusual function tooutput to the terminal.However,oftenwe like tohavecoloredoutputthathelpsusinthenotificationtotheuser.Forthisreasonwe have a simple Console class that has several built-in features. You can evenswitchanddefineyourowncolorschemes.
In case of the error message we also have convenient flags that allow us toincludethetracebackintheoutput.
Theprefixcanbeswitchedonandoffwith theprefix flag,while the traceflagswitchesonandofifthetraceshouldbeset.
The verbosity of the output is controlled via variables that are stored in the~/.cloudmeshdirectory.
Formorefeatures,seeAPI:Console
6.3.2Banner
Incaseyouneedabanneryoucandothiswith
Formorefeatures,seeAPI:Banner
fromcloudmesh.common.consoleimportConsole
msg="mymessage"
Console.ok(msg)#prinsagreenmessage
Console.error(msg)#prinsaredmessageproceededwithERROR
Console.msg(msg)#prinsaregularblackmessage
Console.error(msg,prefix=True,traceflag=True)
fromcloudmesh.common.variablesimportVariables
variables=Variables()
variables['debug']=True
variables['trace']=True
variables['verbose']=10
fromcloudmesh.common.utilimportbanner
banner("mytext")
![Page 65: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/65.jpg)
6.3.3Heading
AparticularusefulfunctionisHEADING()whichprintsthemethodname.
The invocation of the HEADING() function doit prints a banner with the nameinformation.ThereasonwedidnotdoitasadecoratoristhatyoucanplacetheHEADING()functioninanarbitrarylocationofthemethodbody.
Formorefeatures,seeAPI:Heading
6.3.4VERBOSE
VERBOSEisaveryusefulmethodallowingyoutoprintadictionary.Notonlywillitprintthedict,butitwillalsoprovideyouwiththeinformationinwhichfileitisusedandwhichlinenumber.Itwillevenprintthenameofthedict thatyouuseinyourcode.
To use this youwill have to enable the debuggingmethods for cloudmesh asdiscusedinSection6.3.1
Formorefeatures,pleaseseeVERBOSE
6.3.5Usingprintandpprint
Inmanycasesitmaybesufficienttouseprintandpprintfordebugging.However,asthecodeisbigandyoumayforgetwhereyouplacedprintstatementsortheprintstatementsmayhavebeenaddedbyothers,werecommendthatyouusetheVERBOSE function. If you use print or pprint we recommend using a uniqueprefix,suchas:
fromcloudmesh.common.utilimportHEADING
classExample(object):
defdoit(self):
HEADING()
print("Hello")
fromcloudmesh.common.debugimportVERBOSE
m={"key":"value"}
VERBOSE(m)
frompprintimportpprint
![Page 66: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/66.jpg)
6.4DICTIONARIES☁�6.4.1Dotdict
For simple dictionaries we sometimes like to simplify the notation with a .
insteadofusingthe[]:
Youcanachievethiswithdotdict
Nowyoucaneithercall
or
Thisisespaciallyusefulinifconditionsasitmaybeeasiertoreadandwrite
andisthesameas
Formorefeatures,seeAPI:dotdict
6.4.2FlatDict
d={"sample":"value"}
print("MYDEBUG:")
pprint(d)
#orwithprint
print("MYDEBUG:",d)
fromcloudmesh.common.dotdictimportdotdict
data={
"name":"Gregor"
}
data=dotdict(data)
data["name"]
data.name
ifdata.nameis"Gregor":
print("thisisquitereadable")
ifdata["name"]is"Gregor":
print("thisisquitereadable")
![Page 67: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/67.jpg)
Insomecasesitisusefultobeabletoflattenoutdictionariesthatcontaindictswithindicts.ForthiswecanuseFlatDict.
Thiswillbeconvertedtoadictwiththefollowingstructure.
With sep you can change the sepaerator between the nested dict attributes. Formorefeatures,seeAPI:dotdict
6.4.3PrintingDicts
In case we want to print dicts and lists of dicts in various formats, we haveincludedasimplePrinterthatcanprintadictinyaml,json,table,andcsvformat.
Thefunctioncanevenguessfromthepassedparameterswhattheinputformatisandusestheappropriateinternalfunction.
Acommonexampleis
fromcloudmesh.common.FlatdictimportFlatDict
data={
"name":"Gregor"
"address":{
"city":"Bloomington",
"state":"IN"
}
}
flat=FlatDict(data,sep=".")
flat={
"name":"Gregor"
"address.city":"Bloomington",
"address.state":"IN"
}
frompprintimportpprint
fromcloudmesh.common.PrinterimportPrinter
data=[
{
"name":"Gregor",
"address":{
"street":"FunnyLane11",
"city":"Cloudville"
}
},
{
"name":"Albert",
"address":{
"street":"MemoryLane1901",
"city":"Cloudnine"
}
}
]
![Page 68: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/68.jpg)
Formorefeatures,seeAPI:Printer
Moreexamplesareavailableinthesourcecodeastests
6.5SHELL☁�Python provides a sophisticated method for starting background processes.However inmanycases it isquitecomplex to interactwith it. Italsodoesnotprovideconvenientwrappersthatwecanusetostarttheminapythonicfashion.Forthisreasonwehavewrittenaprimitive Shellclass thatprovidesjustenoughfunctionalitytobeusefulinmanycases.
Let us review some exampleswhere result is set to theoutput of the commandbeingexecuted.
Formanycommoncommands,weprovidebuilt-infunctions.Forexample:
Thelistincludes(naturallythecommandsmustbeavailableonyourOS.IftheshellcommandisnotavailableonyourOS,pleasehelpusimprovingthecodetoeither provide functions that work on your OS or develop with us platformindependentfunctionalityofasubsetofthefunctionalityfortheshellcommand
pprint(data)
table=Printer.flatwrite(data,
sort_keys=["name"],
order=["name","address.street","address.city"],
header=["Name","Street","City"],
output='table')
print(table)
fromcloudmesh.common.ShellimportShell
result=Shell.execute('pwd')
print(result)
result=Shell.execute('ls',["-l","-a"])
print(result)
result=Shell.execute('ls',"-l-a")
print(result)
result=Shell.ls("-aux")
print(result)
result=Shell.ls("-a","-u","-x")
print(result)
result=Shell.pwd()
print(result)
![Page 69: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/69.jpg)
thatwemaybenefitfrom.
VBoxManage(cls,*args)
bash(cls,*args)
blockdiag(cls,*args)
brew(cls,*args)
cat(cls,*args)
check_output(cls,*args,**kwargs)
check_python(cls)
cm(cls,*args)
cms(cls,*args)
command_exists(cls,name)
dialog(cls,*args)
edit(filename)
execute(cls,*args)
fgrep(cls,*args)
find_cygwin_executables(cls)
find_lines_with(cls,lines,what)
get_python(cls)
git(cls,*args)
grep(cls,*args)
head(cls,*args)
install(cls,name)
install(cls,name)
keystone(cls,*args)
kill(cls,*args)
live(cls,command,cwd=None)
ls(cls,*args)
mkdir(cls,directory)
mongod(cls,*args)
nosetests(cls,*args)
nova(cls,*args)
operating_system(cls)
pandoc(cls,*args)
ping(cls,host=None,count=1)
pip(cls,*args)
ps(cls,*args)
![Page 70: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/70.jpg)
pwd(cls,*args)
rackdiag(cls,*args)
remove_line_with(cls,lines,what)
rm(cls,*args)
rsync(cls,*args)
scp(cls,*args)
sh(cls,*args)
sort(cls,*args)
ssh(cls,*args)
sudo(cls,*args)
tail(cls,*args)
terminal(cls,command='pwd')
terminal_type(cls)
unzip(cls,source_filename,dest_dir)
vagrant(cls,*args)
version(cls,name)
which(cls,command)
Formorefeatures,pleaseseeShell
6.6STOPWATCH☁�Often you find yourself in a situation where you like to measure the timebetween two events.We provide a simple StopWatch that allows you not only tomeasureanumberoftimes,butalsotoprintthemoutinaconvenientformat.
Toprintthem,youcanalsouse:
Formorefeatures,pleaseseeeStopWatch
fromcloudmesh.common.StopWatchimportStopWatch
fromtimeimportsleep
StopWatch.start("test")
sleep(1)
StopWatch.stop("test")
print(StopWatch.get("test"))
StopWatch.benchmark()
![Page 71: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/71.jpg)
6.7CLOUDMESHCOMMANDSHELL☁�
6.7.1CMD5
Python’s CMD (https://docs.python.org/2/library/cmd.html) is a very usefulpackagetocreatecommandlineshells.Howeveritdoesnotallowthedynamicintegrationofnewlydefinedcommands.Furthermore,additionstoCMDneedtobe donewithin the same source tree. To simplify developing commands by anumber of people and to have a dynamic plugin mechanism, we developedcmd5.Itisarewriteonourearliereffortsincloudmeshclientandcmd3.
6.7.1.1Resources
Thesourcecodeforcmd5islocatedingithub:
https://github.com/cloudmesh/cmd5
We have discussed in Section 6.2 how to install cloudmesh as developer andhaveaccesstothesourcecodeinadirectorycalledcm.Asyoureadthisdocumentweassumeyouareadeveloperandcanskipthenextsection.
6.7.1.2Installationfromsource
WARNING:DONOT EXECUTE THIS IFYOUAREADEVELOPERORYOURENVIRONMENTWILLNOTPROPERLYWORK.
However,ifyouareauserofcloudmeshyoucaninstallitwith
6.7.1.3Execution
To run the shell you can activate it with the cms command. cms stands forcloudmeshshell:
Itwillprintthebannerandentertheshell:
$pipinstallcloudmesh-cmd5
(ENV2)$cms
![Page 72: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/72.jpg)
Toseethelistofcommandsyoucansay:
Toseethemanualpageforaspecificcommand,pleaseuse:
6.7.1.4CreateyourownExtension
OneofthemostimportantfeaturesofCMD5isitsabilitytoextenditwithnewcommands.Thisisdoneviapackagednamespaces.Werecommendyounameiscloudmesh-mycommand,wheremycommand is thenameof thecommand thatyou like to create. This can easily be done while using the sys* cloudmeshcommand (we suggest you use a different name than gregor maybe yourfirstname):
Itwilldownloadatemplatefromcloudmeshcalledcloudmesh-barandgenerateanewdirectorycloudmesh-gregorwithalltheneededfilestocreateyourowncommandandregister it dynamically with cloudmesh. All you have to do is to cd into thedirectoryandinstallthecode:
Addingyourowncommandiseasy.Itisimportantthatallobjectsaredefinedinthe command itself and that noglobal variables beuse in order to alloweachshell command to stand alone. Naturally you should develop API librariesoutside of the cloudmesh shell command and reuse them in order to keep thecommandcodeassmallaspossible.Weplacethecommandin:
+-------------------------------------------------------+
|_______|
|/___||_______||____________||__|
|||||/_\||||/_`|'_`_\/_\/__|'_\|
|||___||(_)||_||(_|||||||__/\__\||||
|\____|_|\___/\__,_|\__,_|_||_||_|\___||___/_||_||
+-------------------------------------------------------+
|CloudmeshCMD5Shell|
+-------------------------------------------------------+
cms>
cms>help
helpCOMMANDNAME
$cmssyscommandgenerategregor
$cdcloudmesh-gregor
$pythonsetup.pyinstall
#pipinstall.
cloudmsesh/mycommand/command/gregor.py
![Page 73: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/73.jpg)
Nowyoucangoaheadandmodifyyourcommandinthatdirectory.Itwilllooksimilarto(ifyouusedthecommandnamegregor):
An important difference to other CMD solutions is that our commands canleverage (besides the standarddefinition), docopts as away todefine themanualpage. This allows us to use arguments as dict and use simple if conditions tointerpret the command. Using docopts has the advantage that contributors areforcedtothinkaboutthecommandanditsoptionsanddocumentthemfromthestart.Previouslywedidnotusebutargparseandclick.Howeverwenoticedthatforourcontributorsbothsystemsleadtocommandsthatwereeithernotproperlydocumentedor thedevelopersdelivered ambiguous commands that resulted inconfusionandwrongusagebysubsequentusers.Hence,wedorecommendthatyou use docopts for documenting cmd5 commands. The transformation isenabledbythe@commanddecoratorthatgeneratesamanualpageandcreatesaproper help message for the shell automatically. Thus there is no need tointroduceaseparatehelpmethodaswouldnormallybeneeded inCMDwhilereducingtheeffortittakestocontributenewcommandsinadynamicfashion.
6.7.1.5Bug:Quotes
Wehaveonebugincmd5thatrelatestotheuseofquotesonthecommandline
Forexampleyouneedtosay
from__future__importprint_function
fromcloudmesh.shell.commandimportcommand
fromcloudmesh.shell.commandimportPluginCommand
classGregorCommand(PluginCommand):
@command
defdo_gregor(self,args,arguments):
"""
::
Usage:
gregor-fFILE
gregorlist
Thiscommanddoessomeusefulthings.
Arguments:
FILEafilename
Options:
-fspecifythefile
"""
print(arguments)
ifarguments.FILE:
print("Youhaveusedfile:",arguments.FILE)
return""
$cmsgregor-f\"filenamewithspaces\"
![Page 74: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/74.jpg)
Ifyou like tohelpus fix this thatwouldbegreat. it requires theuseof shlex.Unfortuantlywedidnotyettimetofixthis“feature”.
6.8EXERCISES☁�Whendoingyourassignment,make sureyou label theprogramsappropriatelywith comments that clearly identify the assignment.Place all assignments in afolderongithubnamed“cloudmesh-exercises”
ForexamplenametheprogramsolvingE.Cloudmesh.Common.1e-cloudmesh-1.pyandsoon.Formorecomplexassignmentsyoucannamethemasyoulike,aslongasinthefileyouhaveacommentsuchas#fa19-516-000E.Cloudmesh.Common.1
at the beginning of the file. Please do not store any screenshots in your gitrepositoryofyourworkingprogram.
6.8.1CloudmeshCommon
E.Cloudmesh.Common.1
Developaprogramthatdemonstratestheuseofbanner,HEADING,andVERBOSE.
E.Cloudmesh.Common.2
Developaprogramthatdemonstratestheuseofdotdict.
E.Cloudmesh.Common.3
DevelopaprogramthatdemonstratestheuseofFlatDict.
E.Cloudmesh.Common.4
Developaprogramthatdemonstratestheuseofcloudmesh.common.Shell.
E.Cloudmesh.Common.5
Developaprogramthatdemonstratestheuseofcloudmesh.common.StopWatch.
![Page 75: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/75.jpg)
6.8.2CloudmeshShell
E.Cloudmesh.Shell.1
Installcmd5andthecommandcmsonyourcomputer.
E.Cloudmesh.Shell.2
Writeanewcommandwithyourfirstnameasthecommandname.
E.Cloudmesh.Shell.3
Write a new command and experiment with docopt syntax andargumentinterpretationofthedictwithifconditions.
E.Cloudmesh.Shell.4
Ifyouhaveusefulextensionsthatyoulikeustoaddbydefault,pleaseworkwithus.
E.Cloudmesh.Shell.5
Atthis timeoneneedstoquoteinsomecommandsthe " intheshellcommandline.Developandtestcodethatfixesthis.
![Page 76: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/76.jpg)
7LIBRARIES
7.1PYTHONMODULES☁�OftenyoumayneedfunctionalitythatisnotpresentinPython’sstandardlibrary.Inthiscaseyouhavetwooption:
implementthefeaturesyourselfuseathird-partylibrarythathasthedesiredfeatures.
Oftenyoucanfindapreviousimplementationofwhatyouneed.Sincethisisacommonsituation,thereisaservicesupportingit:thePythonPackageIndex(orPyPiforshort).
Our task here is to install the autopep8 tool from PyPi. Thiswill allow us toillustratetheuseifvirtualenvironmentsusingthepyenvorvirtualenvcommand,andinstallinganduninstallingPyPipackagesusingpip.
7.1.1UpdatingPip
Itisimportantthatyouhavethenewestversionofpipinstalledforyourversionof python. Let us assume your python is registered with python and you usepyenv,thanyoucanupdatepipwith
without interferingwith a potential systemwide installed version of p ip thatmaybeneededby the systemdefaultversionofpython.See the sectionaboutpyenvformoredetails
7.1.2UsingpiptoInstallPackages
Letusnowlookatanother important toolforPythondevelopment: thePythonPackageIndex,orPyPIforshort.PyPIprovidesalargesetofthird-partypythonpackages. If youwant todo something inpython, first checkpypi, asodd aresomeonealreadyranintotheproblemandcreatedapackagesolvingit.
pipinstall-Upip
![Page 77: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/77.jpg)
InordertoinstallpackagefromPyPI,usethepipcommand.WecansearchforPyPIforpackages:
Itappearsthatthetoptworesultsarewhatwewantsoinstallthem:
Thiswill cause pip to download the packages fromPyPI, extract them, checktheir dependencies and install those as needed, then install the requestedpackages.
Youcanskip‘–trusted-hostpypi.python.org’optionifyouhave
patchedurllib3onPython2.7.9.
7.1.3GUI
7.1.3.1GUIZero
Installguizerowiththefollowingcommand:
Foracomprehensivetutorialonguizero,clickhere.
7.1.3.2Kivy
YoucaninstallKivyonmacOSasfollows:
Ahelloworldprogramforkivy is included in thecloudmesh.robot repository.Whichyoucanfinehere
https://github.com/cloudmesh/cloudmesh.robot/tree/master/projects/kivy
To run the program, please download it or execute it in cloudmesh.robot as
$pipsearch--trusted-hostpypi.python.orgautopep8pylint
$pipinstall--trusted-hostpypi.python.orgautopep8pylint
sudopipinstallguizero
brewinstallpkg-configsdl2sdl2_imagesdl2_ttfsdl2_mixergstreamer
pipinstall-UCython
pipinstallkivy
pipinstallpygame
![Page 78: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/78.jpg)
follows:
Tocreatestandalonepackageswithkivy,pleasesee:
7.1.4FormattingandCheckingPythonCode
First,getthebadcode:
Examinethecode:
As you can see, this is very dense and hard to read. Cleaning it up by handwouldbeatime-consuminganderror-proneprocess.Luckily,thisisacommonproblemsothereexistacouplepackagestohelpinthissituation.
7.1.5Usingautopep8
Wecannowrunthebadcodethroughautopep8tofixformattingproblems:
Letuslookattheresult.Thisisconsiderablybetterthanbefore.Itiseasytotellwhattheexample1andexample2functionsaredoing.
It is a good idea to develop a habit of using autopep8 in your python-development workflow. For instance: use autopep8 to check a file, and if itpasses,makeanychangesinplaceusingthe-iflag:
IfyouusepyCharmyouhavetheabilitytouseasimilarfunctionwhilepressingonInspectCode.
7.1.6WritingPython3CompatibleCode
cdcloudmesh.robot/projects/kivy
pythonswim.py
-https://kivy.org/docs/guide/packaging-osx.html
$wget--no-check-certificatehttp://git.io/pXqb-Obad_code_example.py
$emacsbad_code_example.py
$autopep8bad_code_example.py>code_example_autopep8.py
$autopep8file.py#checkoutputtoseeofpasses
$autopep8-ifile.py#updateinplace
![Page 79: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/79.jpg)
Towritepython2and3compatiblecodewerecommendthatyoutakealookat:http://python-future.org/compatible_idioms.html
7.1.7UsingPythononFutureSystems
ThisisonlyimportantifyouuseFuturesystemsresources.
InordertousePythonyoumust logintoyourFutureSystemsaccount.Thenattheshellpromptexecutethefollowingcommand:
Thiswillmakethepythonandvirtualenvcommandsavailabletoyou.
Thedetailsofwhatthemoduleloadcommanddoesaredescribedinthefuturelessonmodules.
7.1.8Ecosystem
7.1.8.1pypi
The Python Package Index is a large repository of software for the Pythonprogramminglanguagecontaininga largenumberofpackages,manyofwhichcanbefoundonpypi.Thenice thingaboutpypi is thatmanypackagescanbeinstalledwiththeprogram‘pip’.
Todosoyouhave to locate the<package_name>forexamplewith thesearchfunctioninpypiandsayonthecommandline:
where package_name is the string name of the package. an example would be thepackagecalledcloudmesh_clientwhichyoucaninstallwith:
Ifallgoeswellthepackagewillbeinstalled.
7.1.8.2AlternativeInstallations
$moduleloadpython
$pipinstall<package_name>
$pipinstallcloudmesh_client
![Page 80: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/80.jpg)
The basic installation of python is provided by python.org. However othersclaim to have alternative environments that allow you to install python. Thisincludes
CanopyAnacondaIronPython
Typically they include not only the python compiler but also several usefulpackages.Itisfinetousesuchenvironmentsfortheclass,butitshouldbenotedthat in both cases not every python librarymay be available for install in thegivenenvironment.Forexampleifyouneedtousecloudmeshclient,itmaynotbeavailableascondaorCanopypackage.This isalso thecaseformanyothercloudrelatedandusefulpythonlibraries.Hence,wedorecommendthat ifyouare new to python to use the distribution form python.org, and use pip andvirtualenv.
Additionally some python version have platform specific libraries ordependencies.Forexamplecocalibraries,.NETorotherframeworksareexamples.Fortheassignmentsandtheprojectssuchplatformdependentlibrariesarenottobeused.
If however you can write a platform independent code that works on Linux,macOSandWindowswhileusingthepython.orgversionbutdevelopitwithanyoftheothertoolsthatisjustfine.Howeveritisuptoyoutoguaranteethatthisindependence is maintained and implemented. You do have to writerequirements.txtfilesthatwillinstallthenecessarypythonlibrariesinaplatformindependent fashion.ThehomeworkassignmentPRG1hasevena requirementtodoso.
Inordertoprovideplatformindependencewehavegivenintheclassaminimalpythonversionthatwehavetestedwithhundredsofstudents:python.org.Ifyouuseanyotherversion,thatisyourdecision.Additionallysomestudentsnotonlyusepython.orgbuthaveusediPythonwhichisfinetoo.Howeverthisclassisnotonlyaboutpython,butalsoabouthowtohaveyourcoderunonanyplatform.Thehomeworkisdesignedsothatyoucanidentifyasetupthatworksforyou.
Howeverwehaveconcernsifyouforexamplewantedtousechameleoncloud
![Page 81: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/81.jpg)
whichwerequireyoutoaccesswithcloudmesh.cloudmeshisnotavailableasconda,canopy,orotherframeworkpackage.Cloudmeshclientisavailableformpypiwhichisstandardandshouldbesupportedbytheframeworks.Wehavenottestedcloudmeshonanyotherpythonversionthenpython.orgwhichistheopensourcecommunitystandard.Noneoftheotherversionsarestandard.
In factwe had students over the summer using canopyon theirmachines andtheygotconfusedas theynowhadmultiplepythonversionsanddidnotknowhow to switchbetween themandactivate the correct version.Certainly if youknowhowtodothat,thanfeelfreetousecanopy,andifyouwanttousecanopyall this isuptoyou.However thehomeworkandprojectrequiresyoutomakeyourprogramportabletopython.org.Ifyouknowhowtodothatevenifyouusecanopy,anaconda,oranyotherpythonversionthatisfine.Graderswilltestyourprograms on a python.org installation and not canopy, anaconda, ironpythonwhileusingvirtualenv. It isobviouswhy. Ifyoudonotknowthatansweryoumaywant to thinkabout thatevery timetheytestaprogramtheyneedtodoanewvirtualenvandrunvanillapythoninit.Ifweweretoruntwoinstallsinthesamesystem,thiswillnotworkaswedonotknowifonestudentwillcauseasideeffect foranother.Thusweas instructorsdonot justhave to lookatyourcode but code of hundreds of students with different setups. This is a nonscalablesolutionaseverytimewetestoutcodefromastudentwewouldhavetowipeout theOS, install itnew, installannewversionofwhateverpythonyouhave elected, become familiarwith that version and so on and on.This is thereason why the open source community is using python.org.We follow bestpractices.Usingotherversionsisnotacommunitybestpractice,butmayworkforanindividual.
We have however in regards to using other python version additional bonusprojectssuchas
deployrunanddocumentcloudmeshonironpythondeploy run and document cloudmesh on anaconda, develop script togenerateacondapackageformgithubdeployrunanddocumentcloudmeshoncanopy,developscripttogenerateacondapackageformgithubdeployrunanddocumentcloudmeshonironpythonotherdocumentationthatwouldbeuseful
![Page 82: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/82.jpg)
7.1.9Resources
IfyouareunfamiliarwithprogramminginPython,wealsoreferyoutosomeofthenumerousonlineresources.YoumaywishtostartwithLearnPythonorthebookLearnPythontheHardWay.OtheroptionsincludeTutorialsPointorCodeAcademy, and the Python wiki page contains a long list of references forlearningaswell.Additionalresourcesinclude:
https://virtualenvwrapper.readthedocs.iohttps://github.com/yyuu/pyenvhttps://amaral.northwestern.edu/resources/guides/pyenv-tutorialhttps://godjango.com/96-django-and-python-3-how-to-setup-pyenv-for-multiple-pythons/https://www.accelebrate.com/blog/the-many-faces-of-python-and-how-to-manage-them/http://ivory.idyll.org/articles/advanced-swc/http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.htmlhttp://www.youtube.com/watch?v=0vJJlVBVTFghttp://www.korokithakis.net/tutorials/python/http://www.afterhoursprogramming.com/tutorial/Python/Introduction/http://www.greenteapress.com/thinkpython/thinkCSpy.pdfhttps://docs.python.org/3.3/tutorial/modules.htmlhttps://www.learnpython.org/en/Modules/_and/_Packageshttps://docs.python.org/2/library/datetime.htmlhttps://chrisalbon.com/python/strings/_to/_datetime.html
Averylonglistofusefulinformationarealsoavailablefrom
https://github.com/vinta/awesome-pythonhttps://github.com/rasbt/python_reference
This list may be useful as it also contains links to data visualization andmanipulationlibraries,andAItoolsandlibraries.Pleasenotethatforthisclassyoucanreusesuchlibrariesifnototherwisestated.
7.1.9.1JupyterNotebookTutorials
![Page 83: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/83.jpg)
AShort Introduction toJupyterNotebooksandNumPyToviewthenotebook,open this link in a background tab https://nbviewer.jupyter.org/ and copy andpaste the following link in the URL input areahttps://cloudmesh.github.io/classes/lesson/prg/Jupyter-NumPy-tutorial-I523-F2017.ipynbThenhitGo.
7.1.10Exercises
E.Python.Lib.1:
Write a python program called iterate.py that accepts an integer nfromthecommandline.Passthisintegertoafunctioncallediterate.
Theiteratefunctionshouldtheniteratefrom1ton.Ifthei-thnumberis a multiple of three, print multiple of 3, if a multiple of 5 printmultipleof5,ifamultipleofbothprintmultipleof3and5,elseprintthevalue.
E:Python.Lib.2:
1. Createapyenvorvirtualenv~/ENV
2. Modify your ~/.bashrc shell file to activate your environmentuponlogin.
3. Installthedocoptpythonpackageusingpip
4. Write a program that uses docopt to define a commandlineprogram.Hint:modifytheiterateprogram.
5. Demonstratetheprogramworks.
7.2DATAMANAGEMENT☁�Obviouslywhendealingwithbigdatawemaynotonlybedealingwithdatainoneformatbutinmanydifferentformats.Itisimportantthatyouwillbeabletomastersuchformatsandseamlesslyintegrateinyouranalysis.Thusweprovidesome simple examples on which different data formats exist and how to use
![Page 84: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/84.jpg)
them.
7.2.1Formats
7.2.1.1Pickle
Pythonpickleallowsyoutosavedatainapythonnativeformatintoafilethatcan later be read in by other programs.However, the data formatmay not beportableamongdifferentpythonversionsthustheformatisoftennotsuitabletostoreinformation.Insteadwerecommendforstandarddatatouseeitherjsonoryaml.
Toreaditbackinuse
7.2.1.2TextFiles
Toreadtextfilesintoavariablecalledcontentyoucanuse
Youcanalsousethefollowingcodewhileusingtheconvenientwithstatement
Tosplitupthelinesofthefileintoanarrayyoucando
Thiscamalsobedonewiththebuildinreadlinesfunction
Incasethefileistoobigyouwillwanttoreadthefilelinebyline:
importpickle
flavor={
"small":100,
"medium":1000,
"large":10000
}
pickle.dump(flavor,open("data.p","wb"))
flavor=pickle.load(open("data.p","rb"))
content=open('filename.txt','r').read()
withopen('filename.txt','r')asfile:
content=file.read()
withopen('filename.txt','r')asfile:
lines=file.read().splitlines()
lines=open('filename.txt','r').readlines()
![Page 85: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/85.jpg)
7.2.1.3CSVFiles
Oftendataiscontainedincommaseparatedvalues(CSV)withinafile.Toreadsuchfilesyoucanusethecsvpackage.
Usingpandasyoucanreadthemasfollows.
TherearemanyothermodulesandlibrariesthatincludeCSVreadfunctions.Incaseyouneedtosplitasinglelinebycomma,youmayalsousethesplitfunction.However,rememberitswillsplitateverycomma,includingthosecontainedinquotes.Sothismethodalthoughlookingoriginallyconvenienthaslimitations.
7.2.1.4Excelspreadsheets
PandascontainsamethodtoreadExcelfiles
7.2.1.5YAML
YAML is a very important format as it allows you easily to structure data inhierarchicalfieldsItisfrequentlyusedtocoordinateprogramswhileusingyamlasthespecificationforconfigurationfiles,butalsodatafiles.Toreadinayamlfilethefollowingcodecanbeused
Thenicepartisthatthiscodecanalsobeusedtoverifyifafileisvalidyaml.Towritedataoutwecanuse
withopen('filename.txt','r')asfile:
line=file.readline()
print(line)
importcsv
withopen('data.csv','rb')asf:
contents=csv.reader(f)
forrowincontent:
printrow
importpandasaspd
df=pd.read_csv("example.csv")
importpandasaspd
filename='data.xlsx'
data=pd.ExcelFile(file)
df=data.parse('Sheet1')
importyaml
withopen('data.yaml','r')asf:
content=yaml.load(f)
![Page 86: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/86.jpg)
The flow style set to false formats the data in a nice readable fashion withindentations.
7.2.1.6JSON
7.2.1.7XML
XML format is extensively used to transport data across the web. It has ahierarchicaldataformat,andcanberepresentedintheformofatree.
ASampleXMLdatalookslike:
PythonprovidestheElementTreeXMLAPItoparseandcreateXMLdata.
ImportingXMLdatafromafile:
ReadingXMLdatafromastringdirectly:
Iteratingoverchildnodesinaroot:
ModifyingXMLdatausingElementTree:
Modifyingtextwithinatagofanelementusing.textmethod:
withopen('data.yml','w')asf:
yaml.dump(data,f,default_flow_style=False)
importjson
withopen('strings.json')asf:
content=json.load(f)
<data>
<items>
<itemname="item-1"></item>
<itemname="item-2"></item>
<itemname="item-3"></item>
</items>
</data>
importxml.etree.ElementTreeasET
tree=ET.parse('data.xml')
root=tree.getroot()
root=ET.fromstring(data_as_string)
forchildinroot:
print(child.tag,child.attrib)
tag.text=new_data
tree.write('output.xml')
![Page 87: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/87.jpg)
Adding/modifyinganattributeusing.set()method:
OtherPythonmodulesusedforparsingXMLdatainclude
minidom:https://docs.python.org/3/library/xml.dom.minidom.htmlBeautifulSoup:https://www.crummy.com/software/BeautifulSoup/
7.2.1.8RDF
ToreadRDFfilesyouwillneedtoinstallRDFlibwith
ThiswillthanallowyoutoreadRDFfiles
Good examples on using RDF are provided on the RDFlib Web page athttps://github.com/RDFLib/rdflib
FromtheWebpageweshowcasealsohowtodirectlyprocessRDFdata fromtheWeb
7.2.1.9PDF
The Portable Document Format (PDF) has been made available by AdobeInc.royaltyfree.ThishasenabledPDFtobecomeaworldwideadoptedformatthat also has been standardized in 2008 (ISO/IEC 32000-1:2008,https://www.iso.org/standard/51502.html). A lot of research is published inpapersmakingPDFoneofthede-factostandardsforpublishing.However,PDFis difficult to parse and is focused on high quality output instead of datarepresentation.Nevertheless,toolstomanipulatePDFexist:
tag.set('key','value')
tree.write('output.xml')
$pipinstallrdflib
fromrdflib.graphimportGraph
g=Graph()
g.parse("filename.rdf",format="format")
forentrying:
print(entry)
importrdflib
g=rdflib.Graph()
g.load('http://dbpedia.org/resource/Semantic_Web')
fors,p,oing:
prints,p,o
![Page 88: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/88.jpg)
PDFMiner
https://pypi.python.org/pypi/pdfminer/allowsthesimpletranslationofPDFinto text that than can be further mined. The manual page helps todemonstratesomeexampleshttp://euske.github.io/pdfminer/index.html.
pdf-parser.py
https://blog.didierstevens.com/programs/pdf-tools/ parses pdf documentsandidentifiessomestructuralelementsthatcanthanbefurtherprocessed.
Ifyouknowaboutothertools,letusknow.
7.2.1.10HTML
A very powerful library to parse HTML Web pages is provided withhttps://www.crummy.com/software/BeautifulSoup/
More details about it are provided in the documentation pagehttps://www.crummy.com/software/BeautifulSoup/bs4/doc/
�TODO:Studentscancontributeasection
BeautifulSoupisapythonlibrarytoparse,processandeditHTMLdocuments.
ToinstallBeautifulSoup,usepipcommandasfollows:
In order to process HTML documents, a parser is required. Beautiful Soupsupports the HTML parser included in Python’s standard library, but it alsosupports a number of third-party Python parsers like the lxml parser which iscommonlyused[1].
Followingcommandcanbeusedtoinstalllxmlparser
Tobeginwith,weimportthepackageandinstantiateanobjectasfollowsforahtmldocumenthtml_handle:
$pipinstallbeautifulsoup4
$pipinstalllxml
![Page 89: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/89.jpg)
Now,wewilldiscussafewfunctions,attributesandmethodsofBeautifulSoup.
prettifyfunction
prettify() method will turn a Beautiful Soup parse tree into a nicely formattedUnicode string,witha separate line for eachHTML/XML tagand string. It isanalgoustopprint()function.Theobjectcreatedabovecanbeviewedbyprintingtheprettfiedversionofthedocumentasfollows:
tagObject
AtagobjectreferstotagsintheHTMLdocument.ItispossibletogodowntotheinnerlevelsoftheDOMtree.Toaccessatagdivunderthetagbody,itcanbedoneasfollows:
TheattrsattributeofthetagobjectreturnsadictionaryofallthedefinedattributesoftheHTMLtagaskeys.
has_attr()method
Tocheckifatagobjecthasaspecificattribute,has_attr()methodcanbeused.
tagobjectattributes
name-Thisattributereturnsthenameofthetagselected.attrs -Thisattribute returnsadictionaryofall thedefinedattributesof theHTMLtagaskeys.contents -Thisattributereturnsa listofcontentsenclosedwithin theHTMLtagstring-ThisattributewhichreturnsthetextenclosedwithintheHTMLtag.ThisreturnsNoneiftherearemultiplechildrenstrings-Thisovercomesthelimitationofstringandreturnsageneratorofall
frombs4importBeautifulSoup
soup=BeautifulSoup(html_handle,`lxml`)
print(soup.prettify())
body_div=soup.body.div
print(body_div.prettify())
ifbody_div.has_attr('p'):
print('Thevalueof\'p\'attributeis:',body_div['p'])
![Page 90: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/90.jpg)
stringsenclosedwithinthegiventag
Followingcodeshowcasesusageoftheabovediscussedattributes:
SearchingtheTree
find() function takes a filter expression as argument and returns the firstmatchfoundfindall()functionreturnsalistofallthematchingelements
select()functioncanbeusedtosearchthetreeusingCSSselectors
7.2.1.11ConfigParser
�TODO:Studentscancontributeasection
https://pymotw.com/2/ConfigParser/
7.2.1.12ConfigDict
https://github.com/cloudmesh/cloudmesh-common/blob/master/cloudmesh/common/ConfigDict.py
7.2.2Encryption
body_tag=soup.body
print("Nameofthetag:',body_tag.name)
attrs=body_tag.attrs
print('Theattributesdefinedforbodytagare:',attrs)
print('Thecontentsof\'body\'tagare:\n',body_tag.contents)
print('Thestringvalueenclosedin\'body\'tagis:',body_tag.string)
forsinbody_tag.strings:
print(repr(s))
search_elem=soup.find('a')
print(search_elem.prettify())
search_elems=soup.find_all("a",class_="sample")
pprint(search_elems)
#Select`a`tagwithclass`sample`
a_tag_elems=soup.select('a.sample')
print(a_tag_elems)
![Page 91: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/91.jpg)
Oftenweneedtoprotecttheinformationstoredinafile.Thisisachievedwithencryption.Therearemanymethodsofsupportingencryptionandevenifafileisencrypteditmaybetargettoattacks.Thusitisnotonlyimportanttoencryptdatathatyoudonotwantotherstosebutalsotomakesurethatthesystemonwhichthedataishostedissecure.Thisisespeciallyimportantifwetalkaboutbigdatahavingapotentiallargeeffectifitgetsintothewronghands.
To illustrate one type of encryption that is non trivial we have chosen todemonstrate how to encrypt a file with an ssh key. In case you have opensslinstalledonyoursystem,thiscanbeachievedasfollows.
MostimportanthereareStep4thatencryptsthefileandStep5thatdecryptsthefile. Using the Python os module it is straight forward to implement this.However,weareprovidingincloudmeshaconvenientclassthatmakestheuseinpythonverysimple.
Inourclassweinitializeitwiththelocationsofthefilethatistobeencryptedanddecrypted.Toinitiatethatactionjustcallthemethodsencryptanddecrypt.
7.2.3DatabaseAccess
�TODO:Students:defineconventionaldatabaseaccesssection
see:https://www.tutorialspoint.com/python/python_database_access.htm
7.2.4SQLite
#!/bin/sh
#Step1.Creatingafilewithdata
echo"BigDataisthefuture.">file.txt
#Step2.Createthepem
opensslrsa-in~/.ssh/id_rsa-pubout>~/.ssh/id_rsa.pub.pem
#Step3.lookatthepemfiletoillustratehowitlookslike(optional)
cat~/.ssh/id_rsa.pub.pem
#Step4.encryptthefileintosecret.txt
opensslrsautl-encrypt-pubin-inkey~/.ssh/id_rsa.pub.pem-infile.txt-outsecret.txt
#Step5.decryptthefileandprintthecontentstostdout
opensslrsautl-decrypt-inkey~/.ssh/id_rsa-insecret.txt
fromcloudmesh.common.ssh.encryptimportEncryptFile
e=EncryptFile('file.txt','secret.txt')
e.encrypt()
e.decrypt()
![Page 92: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/92.jpg)
�TODO:Studentscancontributetothissection
https://www.sqlite.org/index.html
https://docs.python.org/3/library/sqlite3.html
7.2.4.1Exercises �
E:Encryption.1:
Testtheshellscripttoreplicatehowthisexampleworks
E:Encryption.2:
Testthecloudmeshencryptionclass
E:Encryption.3:
What other encryptionmethods exist. Can you provide an exampleandcontributetothesection?
E:Encryption.4:
WhatistheissueofencryptionthatmakeitchallengingforBigData
E:Encryption.5:
Givenatestdatasetwithmanyfilestextfiles,howlongwillittaketoencrypt anddecrypt themon variousmachines.Write a benchmarkthatyoutest.Developthisbenchmarkasagroup,testoutthetimeittakestoexecuteitonavarietyofplatforms.
7.3PLOTTINGWITHMATPLOTLIB☁�Abrief overviewofplottingwithmatplotlib alongwith examples is provided.Firstmatplotlibmustbeinstalled,whichcanbeaccomplishedwithpipinstallasfollows:$pipinstallmatplotlib
![Page 93: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/93.jpg)
Wewillstartbyplottingasimplelinegraphusingbuiltinnumpyfunctionsforsineandcosine.Thisfirststepistoimporttheproperlibrariesshownnext.
Nextwewilldefinethevaluesforthexaxis,wedothiswiththelinspaceoptioninnumpy.Thefirsttwoparametersarethestartingandendingpoints,thesemustbescalars.Thethirdparameterisoptionalanddefinesthenumberofsamplestobe generated between the starting and ending points, this value must be aninteger.Additionalparametersforthelinspaceutilitycanbefoundhere:
Nowwewillusethesineandcosinefunctionsinordertogenerateyvalues,forthiswewill use the values of x for the argument of both our sine and cosinefunctionsi.e.cos(x).
Youcandisplay thevaluesof the threeparameterswehavedefinedby typingtheminapythonshell.
Havingdefinedxandyvalueswecangeneratealineplotandsinceweimportedmatplotlib.pyplotaspltwesimplyuseplt.plot.
Wecandisplaytheplotusingplt.show()whichwillpopupafiguredisplayingtheplotdefined.
Additionallywecanaddthesinelinetooutlinegraphbyenteringthefollowing.
Invoking plt.show() now will show a figure with both sine and cosine linesdisplayed.Nowthatwehaveafiguregenerateditwouldbeusefultolabelthex
importnumpyasnp
importmatplotlib.pyplotasplt
x=np.linspace(-np.pi,np.pi,16)
cos=np.cos(x)
sin=np.sin(x)
x
array([-3.14159265,-2.72271363,-2.30383461,-1.88495559,-1.46607657,
-1.04719755,-0.62831853,-0.20943951,0.20943951,0.62831853,
1.04719755,1.46607657,1.88495559,2.30383461,2.72271363,
3.14159265])
plt.plot(x,cos)
plt.show()
plt.plot(x,sin)
![Page 94: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/94.jpg)
andyaxisandprovideatitle.Thisisdonebythefollowingthreecommands:
Alongwithaxislabelsandatitleanotherusefulfigurefeaturemaybealegend.Inordertocreatealegendyoumustfirstdesignatealabelfortheline,thislabelwill be what shows up in the legend. The label is defined in the initialplt.plot(x,y)instance,nextisanexample.
Theninordertodisplaythelegendthefollowingcommandisissued:
Thelocationisspecifiedbyusingupperorlowerandleftorright.Naturallyallthesecommandscanbecombinedandput ina filewith the .pyextensionandrunfromthecommandline.
�linkerror
Anexampleofabarchartisprecedednextusingdatafrom[T:fast-cars].
plt.xlabel("X-label(units)")
plt.ylabel("Y-label(units)")
plt.title("AcleverTitleforyourFigure")
plt.plot(x,cos,label="cosine")
plt.legend(loc='upperright')
importnumpyasnp
importmatplotlib.pyplotasplt
x=np.linspace(-np.pi,np.pi,16)
cos=np.cos(x)
sin=np.sin(x)
plt.plot(x,cos,label="cosine")
plt.plot(x,sin,label="sine")
plt.xlabel("X-label(units)")
plt.ylabel("Y-label(units)")
plt.title("AcleverTitleforyourFigure")
plt.legend(loc='upperright')
plt.show()
importmatplotlib.pyplotasplt
x=['ToyotaPrius',
'TeslaRoadster',
'BugattiVeyron',
'HondaCivic',
'LamborghiniAventador']
horse_power=[120,288,1200,158,695]
x_pos=[ifori,_inenumerate(x)]
plt.bar(x_pos,horse_power,color='green')
plt.xlabel("CarModel")
plt.ylabel("HorsePower(Hp)")
plt.title("HorsePowerforSelectedCars")
![Page 95: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/95.jpg)
You can customize plots further by using plt.style.use(), in python 3. If youprovide thefollowingcommandinsideapythoncommandshellyouwillseealistofavailablestyles.
Anexampleofusingapredefinedstyleisshownnext.
Uptothispointwehaveonlyshowcasedhowtodisplayfiguresthroughpythonoutput, however web browsers are a popular way to display figures. OneexampleisBokeh, thefollowinglinescanbeenteredinapythonshellandthefigureisoutputtedtoabrowser.
7.4DOCOPTS☁�Whenwewanttodesigncommandlineargumentsforpythonprogramswehavemany options. However, as our approach is to create documentation first,docoptsprovidesalsoagoodapprachforPython.Thecodeforitislocatedat
https://github.com/docopt/docopt
Itcanbeinstalledwith
Asampleprogramsarelocatedat
https://github.com/docopt/docopt/blob/master/examples/options_example.py
Asampleprogramofusingdocoptsforourpurposesloksasfollows
plt.xticks(x_pos,x)
plt.show()
print(plt.style.available)
plt.style.use('seaborn')
frombokeh.ioimportshow
frombokeh.plottingimportfigure
x_values=[1,2,3,4,5]
y_values=[6,7,2,3,6]
p=figure()
p.circle(x=x_values,y=y_values)
show(p)
$pipinstalldocopt
![Page 96: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/96.jpg)
Another good feature of using docopts is that we can use the same verbaldescriptioninotherprogramminglanguagesasshowcasedinthisbook.
7.5OPENCV☁�
LearningObjectives
Providesomesimplecalculationssowecantestcloudservices.ShowcasesomeelementaryOpenCVfunctionsShowanenvironmentalimageanalysisapplicationusingSecchidisks
OpenCV (OpenSourceComputerVisionLibrary) is a library of thousands ofalgorithmsforvariousapplicationsincomputervisionandmachinelearning.Ithas C++, C, Python, Java and MATLAB interfaces and supports Windows,Linux,AndroidandMacOS. In this section,wewill explainbasic featuresofthislibrary,includingtheimplementationofasimpleexample.
7.5.1Overview
OpenCVhascountlessfunctionsforimageandvideosprocessing.Thepipelinestarts with reading the images, low-level operations on pixel values,preprocessinge.g.denoising,andthenmultiplestepsofhigher-leveloperations
"""CloudmeshVMmanagement
Usage:
cm-govmstartNAME[--cloud=CLOUD]
cm-govmstopNAME[--cloud=CLOUD]
cm-goset--cloud=CLOUD
cm-go-h|--help
cm-go--version
Options:
-h--helpShowthisscreen.
--versionShowversion.
--cloud=CLOUDThenameofthecloud.
--mooredMoored(anchored)mine.
--driftingDriftingmine.
ARGUMENTS:
NAMEThenameoftheVM`
"""
fromdocoptimportdocopt
if__name__=='__main__':
arguments=docopt(__doc__,version='1.0.0rc2')
print(arguments)
![Page 97: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/97.jpg)
which vary depending on the application.OpenCV covers thewhole pipeline,especiallyprovidingalargesetoflibraryfunctionsforhigh-leveloperations.Asimpler library for image processing in Python is Scipy’s multi-dimensionalimageprocessingpackage(scipy.ndimage).
7.5.2Installation
OpenCV for Python can be installed on Linux in multiple ways, namelyPyPI(Python Package Index), Linux package manager (apt-get for Ubuntu),Condapackagemanager,andalsobuildingfromsource.YouarerecommendedtousePyPI.Here’sthecommandthatyouneedtorun:
ThiswastestedonUbuntu16.04withafreshPython3.6virtualenvironment.Inordertotest,importthemoduleinPythoncommandline:
If itdoesnotraiseanerror, it is installedcorrectly.Otherwise, try tosolvetheerror.
ForinstallationonWindows,see:
https://docs.opencv.org/3.0-beta/doc/py_tutorials/py_setup/py_setup_in_windows/py_setup_in_windows.html#install-opencv-python-in-windows
NotethatbuildingfromsourcecantakealongtimeandmaynotbefeasiblefordeployingtolimitedplatformssuchasRaspberryPi.
7.5.3ASimpleExample
Inthisexample,animageisloaded.Asimpleprocessingisperformed,andtheresultiswrittentoanewimage.
7.5.3.1Loadinganimage
$pipinstallopencv-python
importcv2
%matplotlibinline
importcv2
![Page 98: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/98.jpg)
TheimagewasdownloadedfromUSCstandarddatabase:
http://sipi.usc.edu/database/database.php?volume=misc&image=9
7.5.3.2Displayingtheimage
The image is saved in anumpyarray.Eachpixel is representedwith3values(R,G,B).Thisprovidesyouwithaccesstomanipulatetheimageatthelevelofsingle pixels. You can display the image using imshow function as well asMatplotlib’simshowfunction.
Youcandisplaytheimageusingimshowfunction:
oryoucanuseMatplotlib.IfyouhavenotinstalledMatplotlibbefore,installitusing:
Nowyoucanuse:
whichresultsinFigure1
Figure1:Imagedisplay
img=cv2.imread('images/opencv/4.2.01.tiff')
cv2.imshow('Original',img)
cv2.waitKey(0)
cv2.destroyAllWindows()
$pipinstallmatplotlib
importmatplotlib.pyplotasplt
plt.imshow(img)
![Page 99: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/99.jpg)
7.5.3.3ScalingandRotation
Scaling(resizing)theimagerelativetodifferentaxis
whichresultsinFigure2
Figure2:Scalingandrotation
Rotationoftheimageforanangleoft
whichresultsinFigure3
res=cv2.resize(img,
None,
fx=1.2,
fy=0.7,
interpolation=cv2.INTER_CUBIC)
plt.imshow(res)
rows,cols,_=img.shape
t=45
M=cv2.getRotationMatrix2D((cols/2,rows/2),t,1)
dst=cv2.warpAffine(img,M,(cols,rows))
plt.imshow(dst)
![Page 100: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/100.jpg)
Figure3:image
7.5.3.4Gray-scaling
whichresultsin+Figure4
Figure4:Graysacling
7.5.3.5ImageThresholding
whichresultsinFigure5
img2=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
plt.imshow(img2,cmap='gray')
ret,thresh=cv2.threshold(img2,127,255,cv2.THRESH_BINARY)
plt.subplot(1,2,1),plt.imshow(img2,cmap='gray')
plt.subplot(1,2,2),plt.imshow(thresh,cmap='gray')
![Page 101: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/101.jpg)
Figure5:ImageThresholding
7.5.3.6EdgeDetection
EdgedetectionusingCannyedgedetectionalgorithm
whichresultsinFigure6
Figure6:Edgedetection
7.5.4AdditionalFeatures
OpenCV has implementations of many machine learning techniques such asKMeansandSupportVectorMachines,thatcanbeputintousewithonlyafewlines of code. It also has functions especially for video analysis, featuredetection,objectrecognitionandmanymore.Youcanfindoutmoreaboutthemintheirwebsite
[OpenCV](https://docs.opencv.org/3.0-beta/index.html was initially developed
edges=cv2.Canny(img2,100,200)
plt.subplot(121),plt.imshow(img2,cmap='gray')
plt.subplot(122),plt.imshow(edges,cmap='gray')
![Page 102: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/102.jpg)
for C++ and still has a focus on that language, but it is still one of themostvaluableimageprocessinglibrariesinPython.
7.6SECCHIDISK☁�Wearedevelopinganautonomousrobotboatthatyoucanbepartofdevelopingwithinthisclass.Therobotbotisactuallymeasuringturbidityorwaterclarity.TraditionallythishasbeendonewithaSecchidisk.TheuseoftheSecchidiskisasfollows:
1. LowertheSecchidiskintothewater.2. Measurethepointwhenyoucannolongerseeit3. Recordthedepthatvariouslevelsandplotinageographical3Dmap
One of the thingswe can do is take a video of themeasurement instead of ahumanrecordingthem.Thanwecananalysethevideoautomaticallytoseehowdeep a diskwas lowered.This is a classical image analysis program.You areencouragedtoidentifyalgorithmsthatcanidentifythedepth.Themostsimplestseemstobetodoahistogramatavarietyofdepthsteps,andmeasurewhenthehistogramno longerchangessignificantly.Thedepthat that imagewillbe themeasurementwelookfor.
Thus ifwe analyse the imageswe need to look at the image and identify thenumbersonthemeasuringtape,aswellasthevisibilityofthedisk.
To show case how such a disk looks like we refer to the image showcasingdifferent Secchi disks. For our purpose the black-white contrast Secchi diskworkswell.SeeFigure7
![Page 103: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/103.jpg)
Figure 7: Secchi disk types. A marine style on the left and thefreshwaterversionontherightwikipedia.
MoreinformationaboutSecchiDiskcanbefoundat:
https://en.wikipedia.org/wiki/Secchi/_disk
WehaveincludednextacoupleofexampleswhileusingsomeobviouslyusefulOpenCVmethods.Surprisingly,theuseoftheedgedetectionthatcomesinmindfirst to identify if we still can see the disk, seems to complicated to use foranalysis.Weatthistimebelievethehistogramwillbesufficient.
Pleaseinspectourexamples.
7.6.1SetupforOSX
First lest setup theOpenCVenvironment forOSX.Naturallyyouwillhave toupdatetheversionsbasedonyourversionsofpython.Whenwetriedtheinstallof OpenCV on MacOS, the setup was slightly more complex than otherpackages. This may have changed by now and if you have improvedinstructions, pleas elt us know. However we do not want to install it viaAnacondaoutoftheobviousreasonthatanacondainstallstomanyotherthings.importos,sys
fromos.pathimportexpanduser
![Page 104: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/104.jpg)
7.6.2Step1:Recordthevideo
Recordthevideoontherobot
Wehaveactuallydonethisforyouandwillprovideyouwithimagesandvideosifyouareinterestedinanalyzingthem.SeeFigure8
7.6.3Step2:AnalysetheimagesfromtheVideo
Fornowwejustselected4imagesfromthevideo
os.path
home=expanduser("~")
sys.path.append('/usr/local/Cellar/opencv/3.3.1_1/lib/python3.6/site-packages/')
sys.path.append(home+'/.pyenv/versions/OPENCV/lib/python3.6/site-packages/')
importcv2
cv2.__version__
!pipinstallnumpy>tmp.log
!pipinstallmatplotlib>>tmp.log
%matplotlibinline
importcv2
importmatplotlib.pyplotasplt
img1=cv2.imread('secchi/secchi1.png')
img2=cv2.imread('secchi/secchi2.png')
img3=cv2.imread('secchi/secchi3.png')
img4=cv2.imread('secchi/secchi4.png')
figures=[]
fig=plt.figure(figsize=(18,16))
foriinrange(1,13):
figures.append(fig.add_subplot(4,3,i))
count=0
forimgin[img1,img2,img3,img4]:
figures[count].imshow(img)
color=('b','g','r')
fori,colinenumerate(color):
histr=cv2.calcHist([img],[i],None,[256],[0,256])
figures[count+1].plot(histr,color=col)
figures[count+2].hist(img.ravel(),256,[0,256])
count+=3
print("Legend")
print("Firstcolumn=imageofSecchidisk")
print("Secondcolumn=histogramofcolorsinimage")
print("Thirdcolumn=histogramofallvalues")
plt.show()
![Page 105: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/105.jpg)
Figure8:Histogram
7.6.3.1ImageThresholding
SeeFigure9,Figure10,Figure11,Figure12defthreshold(img):
ret,thresh=cv2.threshold(img,150,255,cv2.THRESH_BINARY)
plt.subplot(1,2,1),plt.imshow(img,cmap='gray')
plt.subplot(1,2,2),plt.imshow(thresh,cmap='gray')
threshold(img1)
![Page 106: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/106.jpg)
Figure9:Threshold1
Figure10:Threshold2
Figure11:Threshold3
Figure12:Threshold4
7.6.3.2EdgeDetection
threshold(img2)
threshold(img3)
threshold(img4)
![Page 107: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/107.jpg)
SeeFigure13,Figure14,Figure15,Figure16,Figure17.EdgedetectionusingCannyedgedetectionalgorithm
Figure13:EdgeDetection1
Figure14:EdgeDetection2
Figure15:EdgeDetection3
deffind_edge(img):
edges=cv2.Canny(img,50,200)
plt.subplot(121),plt.imshow(img,cmap='gray')
plt.subplot(122),plt.imshow(edges,cmap='gray')
find_edge(img1)
find_edge(img2)
find_edge(img3)
find_edge(img4)
![Page 108: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/108.jpg)
Figure16:EdgeDetection4
7.6.3.3Blackandwhite
Figure17:BackWhiteconversion
bw1=cv2.cvtColor(img1,cv2.COLOR_BGR2GRAY)
plt.imshow(bw1,cmap='gray')
![Page 109: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/109.jpg)
8DATA
8.1DATAFORMATS☁�
8.1.1YAML
ThetermYAMLstandfor“YAMLAinotMarkupLanguage”.AccordingtotheWebPageat
http://yaml.org/
“YAML is a human friendly data serialization standard for all programminglanguages.”TherearemultipleversionsofYAMLexistingandoneneedstotakecare of that your software supports the right version. The current version isYAML1.2.
YAML is oftenused for configuration and inmany cases can alsobeused asXMLreplacement.ImportantistatYAMincontrasttoXMLremovesthetagswhilereplacingthemwithindentation.Thishasnaturallytheadvantagethatitismoreasily to read,however, the format is strictandneeds toadhere toproperindentation. Thus it is important that you check your YAML files forcorrectness,eitherbywritingforexampleapythonprogramthatreadyouryamlfile,oranonlineYAMLcheckersuchasprovidedat
http://www.yamllint.com/
An example on how to use yaml in python is provided in our next example.PleasenotethatYAMLisasupersetofJSON.OriginallyYAMLwasdesignedasamarkuplanguage.Howeverasitisnotdocumentorientedbutdataorientedithasbeenrecastanditdoesnolongerclassifyitselfasmarkuplanguage.importos
importsys
importyaml
try:
yamlFilename=os.sys.argv[1]
yamlFile=open(yamlFilename,"r")
except:
print("filenamedoesnotexist")
sys.exit()
try:
![Page 110: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/110.jpg)
Resources:
http://yaml.org/https://en.wikipedia.org/wiki/YAMLhttp://www.yamllint.com/
8.1.2JSON
ThetermJSONstandforJavaScriptObjectNotation.Itistargetedasanopen-standard file format that emphasizes on integration of human-readable text totransmitdataobjects.Thedataobjectscontainattributevaluepairs.Althoughitoriginates from JavaScript, the format itself is language independent. It usesbracketstoalloworganizationofthedata.PLeasenotethatYAMLisasupersetofJSONandnotallYAMLdocumentscanbeconvertedtoJSON.FurthermoreJSONdoesnotsupportcomments.ForthesereasonsweoftenprefertousYAMlinsteadofJSON.HoweverJSONdatacaneasilybetranslatedtoYAMLaswellasXML.
Resources:
https://en.wikipedia.org/wiki/JSONhttps://www.json.org/
8.1.3XML
XMLstandsforExtensibleMarkupLanguage.XMLallowstodefinedocumentswith the help of a set of rules in order to make it machine readable. Theemphasize here is on machine readable as document in XML can becomequickly complex and difficult to understand for humans. XML is used fordocumentsaswellasdatastructures.
AtutorialaboutXMLisavailableat
https://www.w3schools.com/xml/default.asp
Resources:
yaml.load(yamlFile.read())
except:
print("YAMLfileisnotvalid.")
![Page 112: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/112.jpg)
9MONGO
9.1MONGODBINPYTHON☁�
LearningObjectives
IntroductiontobasicMongoDBknowledgeUseofMongoDBviaPyMongoUseofMongoEngineMongoEngineandObject-Documentmapper,UseofFlask-Mongo
In today’s era, NoSQL databases have developed an enormous potential toprocess the unstructured data efficiently. Modern information is complex,extensive, andmaynothavepre-existing relationships.With the adventof theadvanced search engines, machine learning, and Artificial Intelligence,technology expectations to process, store, and analyze such data have growntremendously[2].TheNoSQLdatabaseenginessuchasMongoDB,Redis,andCassandra have successfully overcome the traditional relational databasechallenges such as scalability, performance, unstructured data growth, agilesprint cycles, andgrowingneeds of processingdata in real-timewithminimalhardwareprocessingpower[3].TheNoSQLdatabasesareanewgenerationofengines that do not necessarily require SQL language and are sometimes alsocalledNotOnlySQL databases.However,mostof themsupportvarious third-partyopenconnectivitydriversthatcanmapNoSQLqueriestoSQL’s.Itwouldbe safe to say that althoughNoSQL databases are still far from replacing therelationaldatabases,theyareaddinganimmensevaluewhenusedinhybridITenvironmentsinconjunctionwithrelationaldatabases,basedontheapplicationspecific needs [3].We will be covering theMongoDB technology, its driverPyMongo, itsobject-documentmapperMongoEngine, and theFlask-PyMongomicro-webframeworkthatmakeMongoDBmoreattractiveanduser-friendly.
9.1.1CloudmeshMongoDBUsageQuickstart
![Page 113: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/113.jpg)
Beforeyoureadonwelikeyoutoreadthisquickstart.Theeasiestwayformanyof the activities we do to interact with MongoDB is to use our cloudmeshfunctionality.Thispreludesectionisnotintendedtodescribeallthedetails,butgetyoustartedquicklywhileleveragingcloudmesh
Thisisdoneviathecloudmeshcmd5andthecloudmesh_community/cmcode:
https://cloudmesh-community.github.io/cm/
ToinstallmongoonforexamplemacOSyoucanuse
Tostart,stopandseethestatusofmongoyoucanuse
To add anobject toMongo, you simplyhave to define a dictwithpredefinedvaluesforkindandcloud.InfuturesuchattributescanbepassedtothefunctiontodeterminetheMongoDBcollection.
When you invoke the function itwill automatically store the information intoMongoDB. Naturally this requires that the ~/.cloudmesh/cloudmesh.yaml file is properlyconfigured.
9.1.2MongoDB
TodayMongoDB isoneof leadingNoSQLdatabasewhich is fullycapableofhandling dynamic changes, processing large volumes of complex andunstructureddata,easilyusingobject-orientedprogrammingfeatures;aswellasdistributed system challenges [4]. At its core, MongoDB is an open source,cross-platform,documentdatabasemainlywritteninC++language.
$cmsadminmongoinstall
$cmsadminmongostart
$cmsadminmongostop
$cmsadminmongostatus
fromcloudmesh.mongo.DataBaseDecoratorimportDatabaseUpdate
@DatabaseUpdate
deftest():
data={
"kind":"test",
"cloud":"testcloud",
"value":"hello"
}
returndata
result=test()
![Page 114: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/114.jpg)
9.1.2.1Installation
MongoDBcanbeinstalledonvariousUnixPlatforms,includingLinux,Ubuntu,AmazonLinux,etc[5].ThissectionfocusesoninstallingMongoDBonUbuntu18.04BionicBeaverusedasastandardOSforavirtualmachineusedasapartofBigDataApplicationClassduringthe2018Fallsemester.
9.1.2.1.1Installationprocedure
Beforeinstalling,itisrecommendedtoconfigurethenon-rootuserandprovidethe administrative privileges to it, in order to be able to perform generalMongoDBadmin tasks.Thiscanbeaccomplishedby loginas the rootuser inthefollowingmanner[6].
When logged in as a regular user, one can perform actions with superuserprivilegesbytypingsudobeforeeachcommand[6].
Oncetheusersetupiscompleted,onecanloginasaregularuser(mongoadmin)andusethefollowinginstructionstoinstallMongoDB.
To update the Ubuntu packages to the most recent versions, use the nextcommand:
ToinstalltheMongoDBpackage:
Tochecktheserviceanddatabasestatus:
VerifyingthestatusofasuccessfulMongoDBinstallationcanbeconfirmedwithanoutputsimilartothis:
$addusermongoadmin
$usermod-aGsudosammy
$sudoaptupdate
$sudoaptinstall-ymongodb
$sudosystemctlstatusmongodb
$mongodb.service-Anobject/document-orienteddatabase
Loaded:loaded(/lib/systemd/system/mongodb.service;enabled;vendorpreset:enabled)
Active:**active**(running)sinceSat2018-11-1507:48:04UTC;2min17sago
Docs:man:mongod(1)
MainPID:2312(mongod)
Tasks:23(limit:1153)
![Page 115: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/115.jpg)
Toverify theconfiguration,morespecifically the installedversion, server,andport,usethefollowingcommand:
Similarly,torestartMongoDB,usethefollowing:
To allow access toMongoDB from an outside hosted server one can use thefollowingcommandwhichopensthefire-wallconnections[5].
Statuscanbeverifiedbyusing:
Other MongoDB configurations can be edited through the /etc/mongodb.conffilessuchasportandhostnames,filepaths.
Also, to complete this step, a server’s IPaddressmustbe added to thebindIPvalue[5].
MongoDB is now listening for a remote connection that can be accessed byanyonewithappropriatecredentials[5].
9.1.2.2CollectionsandDocuments
Each database within Mongo environment contains collections which in turncontaindocuments.Collectionsanddocumentsareanalogoustotablesandrowsrespectivelytotherelationaldatabases.Thedocumentstructureisinakey-valueform which allows storing of complex data types composed out of field andvalue pairs. Documents are objects which correspond to native data types inmanyprogramming languages, hence awell defined, embeddeddocument can
CGroup:/system.slice/mongodb.service
└─2312/usr/bin/mongod--unixSocketPrefix=/run/mongodb--config/etc/mongodb.conf
$mongo--eval'db.runCommand({connectionStatus:1})'
$sudosystemctlrestartmongodb
$sudoufwallowfromyour_other_server_ip/32toanyport27017
$sudoufwstatus
$sudonano/etc/mongodb.conf
$logappend=true
bind_ip=127.0.0.1,your_server_ip
*port=27017*
![Page 116: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/116.jpg)
helpreduceexpensivejoinsandimprovequeryperformance.The_idfieldhelpstoidentifyeachdocumentuniquely[3].
MongoDB offers flexibility towrite records that are not restricted by columntypes.Thedata storageapproach is flexibleas it allowsone to storedataas itgrowsandtofulfillvaryingneedsofapplicationsand/orusers.ItsupportsJSONlikebinarypointsknownasBSONwheredatacanbestoredwithoutspecifyingthe type of data.Moreover, it can be distributed tomultiplemachines at highspeed. It includes a sharding feature that partitions and spreads the data outacrossvariousservers.ThismakesMongoDBanexcellentchoiceforclouddataprocessing. Its utilities can load high volumes of data at high speed whichultimately provides greater flexibility and availability in a cloud-basedenvironment[2].
ThedynamicschemastructurewithinMongoDBallowseasytestingofthesmallsprints in theAgile projectmanagement life cycles and research projects thatrequirefrequentchangestothedatastructurewithminimaldowntime.Contrarytothisflexibleprocess,modifyingthedatastructureofrelationaldatabasescanbeaverytediousprocess[2].
9.1.2.2.1Collectionexample
ThefollowingcollectionexampleforapersonnamedAlbertincludesadditionalinformationsuchasage,status,andgroup[7].
9.1.2.2.2Documentstructure
9.1.2.2.3CollectionOperations
If collection does not exists, MongoDB database will create a collection by
{
name:"Albert"
age:"21"
status:"Open"
group:["AI","MachineLearning"]
}
{
field1:value1,
field2:value2,
field3:value3,
...
fieldN:valueN
}
![Page 117: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/117.jpg)
default.
9.1.2.3MongoDBQuerying
Thedataretrievalpatterns, thefrequencyofdatamanipulationstatementssuchas insert, updates, and deletes may demand for the use of indexes orincorporatingtheshardingfeaturetoimprovequeryperformanceandefficiencyof MongoDB environment [3]. One of the significant difference betweenrelationaldatabasesandNoSQLdatabasesare joins. In the relationaldatabase,onecancombineresultsfromtwoormoretablesusingacommoncolumn,oftencalled as key. The native table contains the primary key column while thereferenced table contains a foreign key. This mechanism allows one to makechangesinasinglerowinsteadofchangingallrowsinthereferencedtable.Thisaction is referred to as normalization.MongoDB is a document database andmainlycontainsdenormalizeddatawhichmeansthedataisrepeatedinsteadofindexedoveraspecifickey.Ifthesamedataisrequiredinmorethanonetable,itneedstoberepeated.ThisconstrainthasbeeneliminatedinMongoDB’snewversion 3.2. The new release introduced a $lookup feature whichmore likelyworksasaleft-outer-join.Lookupsarerestrictedtoaggregatedfunctionswhichmeansthatdatausuallyneedsometypeoffilteringandgroupingoperations tobe conducted beforehand. For this reason, joins in MongoDB require morecomplicated querying compared to the traditional relational database joins.Although at this time, lookups are still very far from replacing joins, this is aprominent feature that can resolve some of the relational data challenges forMongoDB[8].MongoDBqueriessupport regularexpressionsaswellas rangeasksforspecificfieldsthateliminatetheneedofreturningentiredocuments[3].MongoDB collections do not enforce document structure like SQL databaseswhichisacompellingfeature.However,itisessentialtokeepinmindtheneedsoftheapplications[2].
9.1.2.3.1MongoQueriesexamples
ThequeriescanbeexecutedfromMongoshellaswellasthroughscripts.
To query the data from a MongoDB collection, one would use MongoDB’s
>db.myNewCollection1.insertOne({x:1})
>db.myNewCollection2.createIndex({y:1})
![Page 118: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/118.jpg)
find()method.
Theoutputcanbeformattedbyusingthepretty()command.
TheMongoDBinsertstatementscanbeperformedinthefollowingmanner:
“The$lookup command performs a left-outer-join to an unshardedcollectioninthesamedatabasetofilterindocumentsfromthejoinedcollectionforprocessing”[9].
ThisoperationisequivalenttothefollowingSQLoperation:
ToperformaLikeMatch(Regex),onewouldusethefollowingcommand:
9.1.2.4MongoDBBasicFunctions
WhenitcomestothetechnicalelementsofMongoDB,itpossesarichinterfacefor importing and storage of external data in various formats. By using theMongoImport/Exporttool,onecaneasilytransfercontentsfromJSON,CSV,orTSV files into a database. MongoDB supports CRUD (create, read, update,delete) operations efficiently and has detailed documentation available on theproductwebsite.Itcanalsoquerythegeospatialdata,anditiscapableofstoringgeospatial data in GeoJSON objects. The aggregation operation of theMongoDB process data records and returns computed results. MongoDB
>db.COLLECTION_NAME.find()
>db.mycol.find().pretty()
>db.COLLECTION_NAME.insert(document)
${
$lookup:
{
from:<collectiontojoin>,
localField:<fieldfromtheinputdocuments>,
foreignField:<fieldfromthedocumentsofthe"from"collection>,
as:<outputarrayfield>
}
}
$SELECT*,<outputarrayfield>
FROMcollection
WHERE<outputarrayfield>IN(SELECT*
FROM<collectiontojoin>
WHERE<foreignField>=<collection.localField>);`
>db.products.find({sku:{$regex:/789$/}})
![Page 119: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/119.jpg)
aggregationframeworkismodeledontheconceptofdatapipelines[10].
9.1.2.4.1Import/Exportfunctionsexamples
ToimportJSONdocuments,onewouldusethefollowingcommand:
The CSV import uses the input file name to import a collection, hence, thecollectionnameisoptional[10].
“Mongoexport is a utility that produces a JSON or CSV export ofdatastoredinaMongoDBinstance”[10].
9.1.2.5SecurityFeatures
DatasecurityisacrucialaspectoftheenterpriseinfrastructuremanagementandisthereasonwhyMongoDBprovidesvarioussecurityfeaturessuchasolebasedaccess control, numerous authentication options, and encryption. It supportsmechanisms such as SCRAM, LDAP, and Kerberos authentication. Theadministrator can create role/collection-based access control; also roles can bepredefined or custom. MongoDB can audit activities such as DDL, CRUDstatements,authenticationandauthorizationoperations[11].
9.1.2.5.1Collectionbasedaccesscontrolexample
Auserdefinedrolecancontainthefollowingprivileges[11].
9.1.2.6MongoDBCloudService
In regards to the cloud technologies, MongoDB also offers fully automatedcloudservicecalledAtlaswithcompetitivepricingoptions.MongoAtlasCloudinterface offers interactive GUI for managing cloud resources and deploying
$mongoimport--dbusers--collectioncontacts--filecontacts.json
$mongoimport--dbusers--typecsv--headerline--file/opt/backups/contacts.csv
$mongoexport--dbtest--collectiontraffic--outtraffic.json
$privileges:[
{resource:{db:"products",collection:"inventory"},actions:["find","update"]},
{resource:{db:"products",collection:"orders"},actions:["find"]}
]
![Page 120: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/120.jpg)
applications quickly. The service is equipped with geographically distributedinstances to ensure no single point failure. Also, a well-rounded performancemonitoring interface allows users to promptly detect anomalies and generateindex suggestions to optimize the performance and reliability of the database.Global technology leaders such as Google, Facebook, eBay, and Nokia areleveragingMongoDB andAtlas cloud services makingMongoDB one of themostpopularchoicesamongtheNoSQLdatabases[12].
9.1.3PyMongo
PyMongo is the official Python driver or distribution that allowsworkwith aNoSQLtypedatabasecalledMongoDB[13].Thefirstversionofthedriverwasdevelopedin2009[14],onlytwoyearsafterthedevelopmentofMongoDBwasstarted.Thisdriverallowsdevelopers tocombinebothPython’sversatilityandMongoDB’sflexibleschemanatureintosuccessfulapplications.Currently,thisdriver supports MongoDB versions 2.6, 3.0, 3.2, 3.4, 3.6, and 4.0 [15].MongoDBandPythonrepresentacompatiblefitconsideringthatBSON(binaryJSON) used in this NoSQL database is very similar to Python dictionaries,whichmakes thecollaborationbetweenthe twoevenmoreappealing[16].Forthisreason,dictionariesaretherecommendedtoolstobeusedinPyMongowhenrepresentingdocuments[17].
9.1.3.1Installation
Prior to being able to exploit the benefits of Python and MongoDBsimultaneously,thePyMongodistributionmustbeinstalledusingpip.Toinstallitonallplatforms,thefollowingcommandshouldbeused[18]:$python-mpipinstallpymongo
SpecificversionsofPyMongocanbe installedwithcommand lines suchas inourexamplewherethe3.5.1versionisinstalled[18].
Asinglelineofcodecanbeusedtoupgradethedriveraswell[18].
$python-mpipinstallpymongo==3.5.1
$python-mpipinstall--upgradepymongo
![Page 121: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/121.jpg)
Furthermore, the installation process can be completed with the help of theeasy_installtool,whichrequiresuserstousethefollowingcommand[18].
To do an upgrade of the driver using this tool, the following command isrecommended[18]:
There are many other ways of installing PyMongo directly from the source,however, theyrequireforCextensiondependencies tobe installedprior to thedriver installation step, as they are the ones that skim through the sources onGitHubandusethemostup-to-datelinkstoinstallthedriver[18].
Tocheckiftheinstallationwascompletedaccurately,thefollowingcommandisusedinthePythonconsole[19].
If the command returns zero exceptions within the Python shell, one canconsiderforthePyMongoinstallationtohavebeencompletedsuccessfully.
9.1.3.2Dependencies
The PyMongo driver has a few dependencies that should be taken intoconsiderationpriortoitsusage.Currently,itsupportsCPython2.7,3.4+,PyPy,and PyPy 3.5+ interpreters [15]. An optional dependency that requires someadditionalcomponentstobeinstalledistheGSSAPIauthentication[15].FortheUnixbasedmachines, it requirespykerberos,while for theWindowsmachinesWinKerberos is needed to fullfill this requirement [15]. The automaticinstallation of this dependency can be done simultaneously with the driverinstallation,inthefollowingmanner:
Other third-party dependencies such as ipaddress, certifi, or wincerstore arenecessaryforconnectionswithhelpofTLS/SSLandcanalsobesimultaneouslyinstalledalongwiththedriverinstallation[15].
$python-measy_installpymongo
$python-measy_install-Upymongo
importpymongo
$python-mpipinstallpymongo[gssapi]
![Page 122: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/122.jpg)
9.1.3.3RunningPyMongowithMongoDeamon
OncePyMongois installed, theMongodeamoncanberunwithaverysimplecommandinanewterminalwindow[19].
9.1.3.4ConnectingtoadatabaseusingMongoClient
In order to be able to establish a connectionwith a database, aMongoClientclass needs to be imported, which sub-sequentially allows the MongoClientobjecttocommunicatewiththedatabase[19].
Thiscommandallowsaconnectionwithadefault,localhostthroughport27017,however, depending on the programming requirements, one can also specifythosebylistingthemintheclient instanceorusethesameinformationviatheMongoURIformat[19].
9.1.3.5AccessingDatabases
Since MongoClient plays a server role, it can be used to access any desireddatabasesinaneasyway.Todothat,onecanusetwodifferentapproaches.Thefirstapproachwouldbedoingthisvia theattributemethodwhere thenameofthe desired database is listed as an attribute, and the second approach, whichwouldincludeadictionary-styleaccess[19].Forexample,toaccessadatabasecalled cloudmesh_community, onewould use the following commands for theattributeandforthedictionarymethod,respectively.
9.1.3.6CreatingaDatabase
Creating a database is a straight forward process. First, one must create aMongoClientobjectandspecifytheconnection(IPaddress)aswellasthenameofthedatabasetheyaretryingtocreate[20].Theexampleof thiscommandispresentedinthefollowngsection:
$mongod
frompymongoimportMongoClient
client=MongoClient()
db=client.cloudmesh_community
db=client['cloudmesh_community']
![Page 123: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/123.jpg)
9.1.3.7InsertingandRetrievingDocuments(Querying)
Creating documents and storing data using PyMongo is equally easy asaccessingandcreatingdatabases.Inordertoaddnewdata,acollectionmustbespecifiedfirst.Inthisexample,adecisionismadetousethecloudmeshgroupofdocuments.
Oncethisstepiscompleted,datamaybeinsertedusingtheinsert_one()method,whichmeans that only one document is being created.Of course, insertion ofmultiple documents at the same time is possible as well with use of theinsert_many()method[19].Anexampleofthismethodisasfollows:
Anotherexampleofthismethodwouldbetocreateacollection.Ifwewantedtocreateacollectionofstudents in thecloudmesh_community,wewoulddo it inthefollowingmanner:
Retrievingdocumentsisequallysimpleascreatingthem.Thefind_one()methodcanbeusedtoretrieveonedocument[19].Animplementationofthismethodisgiveninthefollowingexample.
Similarly, to retieve multiple documents, one would use the find() method
importpymongo
client=pymongo.MongoClient('mongodb://localhost:27017/')
db=client['cloudmesh']
cloudmesh=db.cloudmesh
course_info={
'course':'BigDataApplicationsandAnalytics',
'instructor':'GregorvonLaszewski',
'chapter':'technologies'
}
result=cloudmesh.insert_one(course_info)`
student=[{'name':'John','st_id':52642},
{'name':'Mercedes','st_id':5717},
{'name':'Anna','st_id':5654},
{'name':'Greg','st_id':5423},
{'name':'Amaya','st_id':3540},
{'name':'Cameron','st_id':2343},
{'name':'Bozer','st_id':4143},
{'name':'Cody','price':2165}]
client=MongoClient('mongodb://localhost:27017/')
withclient:
db=client.cloudmesh
db.students.insert_many(student)
gregors_course=cloudmesh.find_one({'instructor':'GregorvonLaszewski'})
![Page 124: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/124.jpg)
insteadofthe find_one().Forexample,tofindallcoursesthoughtbyprofessorvonLaszewski,onewouldusethefollowingcommand:
Onethingthatusersshouldbecognizantofwhenusingthefind()methodisthatit doesnot return results in an array formatbut as acursor object,which is acombinationofmethods thatwork together tohelpwithdataquerying[19]. Inordertoreturnindividualdocuments,iterationovertheresultmustbecompleted[19].
9.1.3.8LimitingResults
When itcomes toworkingwith largedatabases it isalwaysuseful to limit thenumberofquery results.PyMongosupports thisoptionwith its limit()method[20]. This method takes in one parameter which specifies the number ofdocumentstobereturned[20].Forexample,ifwehadacollectionwithalargenumber of cloud technologies as individual documents, one couldmodify thequery results to return only the top 10 technologies.To do this, the followingexamplecouldbeutilized:
9.1.3.9UpdatingCollection
Updating documents is very similar to inserting and retrieving the same.Depending on the number of documents to be updated, one would use theupdate_one()orupdate_many()method[20].Twoparametersneedtobepassedintheupdate_one()methodforittosuccessfullyexecute.Thefirstargumentisthe query object that specifies the document to be changed, and the secondargumentistheobjectthatspecifiesthenewvalueinthedocument.Anexampleoftheupdate_one()methodinactionisthefollowing:
Updating all documents that fall under the same criteria can be donewith theupdate_many method [20]. For example, to update all documents in which
gregors_course=cloudmesh.find({'instructor':'GregorvonLaszewski'})
client=pymongo.MongoClient('mongodb://localhost:27017/')
db=client['cloudmesh']
col=db['technologies']
topten=col.find().limit(10)
myquery={'course':'BigDataApplicationsandAnalytics'}
newvalues={'$set':{'course':'CloudComputing'}}
![Page 125: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/125.jpg)
coursetitlestartswithletterBwithadifferentinstructorinformation,wewoulddothefollowing:
9.1.3.10CountingDocuments
Counting documents can be done with one simple operation calledcount_documents()insteadofusingafullquery[21].Forexample,wecancountthedocumentsinthecloudmesh_commpunitybyusingthefollowingcommand:
Tocreateamorespecificcount,onewoulduseacommandsimilartothis:
Thistechnologysupportssomemoreadvancedqueryingoptionsaswell.Thoseadvanced queries allow one to add certain contraints and narrow down theresults even more. For example, to get the courses thought by professor vonLaszewskiafteracertaindate,onewouldusethefollowingcommand:
9.1.3.11Indexing
Indexing is a very important part of querying. It can greately improve queryperformancebutalsoaddfunctionalityandaideinstoringdocuments[21].
“To create a unique index on a key that rejects documents whosevalueforthatkeyalreadyexistsintheindex”[21].
Weneedtofirstlycreatetheindexinthefollowingmanner:
client=pymongo.MongoClient('mongodb://localhost:27017/')
db=client['cloudmesh']
col=db['courses']
query={'course':{'$regex':'^B'}}
newvalues={'$set':{'instructor':'GregorvonLaszewski'}}
edited=col.update_many(query,newvalues)
cloudmesh=count_documents({})
cloudmesh=count_documents({'author':'vonLaszewski'})
d=datetime.datetime(2017,11,12,12)
forcourseincloudmesh.find({'date':{'$lt':d}}).sort('author'):
pprint.pprint(course)
result=db.profiles.create_index([('user_id',pymongo.ASCENDING)],
unique=True)
sorted(list(db.profiles.index_information()))
![Page 126: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/126.jpg)
Thiscommandacutallycreatestwodifferentindexes.Thefirstoneisthe*_id*,createdbyMongoDBautomatically,andthesecondoneis theuser_id,createdbytheuser.
Thepurposeof those indexes is to cleverlyprevent future additionsof invaliduser_idsintoacollection.
9.1.3.12Sorting
Sortingontheserver-sideisalsoavaialableviaMongoDB.ThePyMongosort()methodisequivalenttotheSQLorderbystatementanditcanbeperformedaspymongo.ascending andpymongo.descending [22].Thismethod ismuchmoreefficient as it is being completed on the server-side, compared to the sortingcompleted on the client side. For example, to return all userswith first nameGregorsortedindescendingorderbybirthdatewewoulduseacommandsuchasthis:
9.1.3.13Aggregation
Aggregationoperationsareusedtoprocessgivendataandproducesummarizedresults. Aggregation operations collect data from a number of documents andprovide collective results by grouping data. PyMongo in its documentationoffers a separate framework that supports data aggregation. This aggregationframeworkcanbeusedto
“provideprojectioncapabilitiestoreshapethereturneddata”[23].
In the aggregation pipeline, documents pass through multiple pipeline stageswhich convert documents into result data. The basic pipeline stages includefilters. Those filters act like document transformation by helping change thedocument output form. Other pipelines help group or sort documents withspecific fields. By using native operations from MongoDB, the pipelineoperatorsareefficientinaggregatingresults.
TheaddFieldsstageisusedtoaddnewfieldsintodocuments.Itreshapeseach
users=cloudmesh.users.find({'firstname':'Gregor'}).sort(('dateofbirth',pymongo.DESCENDING))
foruserinusers:
printuser.get('email')
![Page 127: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/127.jpg)
document in stream, similarly to the project stage. The output document willcontain existing fields from input documents and the newly added fields 24].Thefollowingexampleshowshowtoaddstudentdetailsintoadocument.
Thebucketstageisusedtocategorizeincomingdocumentsintogroupsbasedonspecified expressions. Those groups are called buckets [24]. The followingexampleshowsthebucketstageinaction.
In the bucketAuto stage, the boundaries are automatically determined in anattempttoevenlydistributedocumentsintoaspecifiednumberofbuckets.Inthefollowingoperation,inputdocumentsaregroupedintofourbucketsaccordingtothevaluesinthepricefield[24].
ThecollStatsstagereturnsstatisticsregardingacollectionorview[24].
Thecount stagepasses a document to thenext stage that contains thenumberdocumentsthatwereinputtothestage[24].
db.cloudmesh_community.aggregate([
{
$addFields:{
"document.StudentDetails":{
$concat:['$document.student.FirstName','$document.student.LastName']
}
}
}])
db.user.aggregate([
{"$group":{
"_id":{
"city":"$city",
"age":{
"$let":{
"vars":{
"age":{"$subtract":[{"$year":newDate()},{"$year":"$birthDay"}]}},
"in":{
"$switch":{
"branches":[
{"case":{"$lt":["$$age",20]},"then":0},
{"case":{"$lt":["$$age",30]},"then":20},
{"case":{"$lt":["$$age",40]},"then":30},
{"case":{"$lt":["$$age",50]},"then":40},
{"case":{"$lt":["$$age",200]},"then":50}
]}}}}},
"count":{"$sum":1}}})
db.artwork.aggregate([
{
$bucketAuto:{
groupBy:"$price",
buckets:4
}
}
])
db.matrices.aggregate([{$collStats:{latencyStats:{histograms:true}}
}])
db.scores.aggregate([{
![Page 128: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/128.jpg)
The facet stage helps processmultiple aggregation pipelines in a single stage[24].
The geoNear stage returns an ordered stream of documents based on theproximity to a geospatial point. The output documents include an additionaldistancefieldandcanincludealocationidentifierfield[24].
The graphLookup stage performs a recursive search on a collection. To eachoutputdocument, itaddsanewarrayfieldthatcontainsthetraversalresultsoftherecursivesearchforthatdocument[24].
Thegroup stageconsumes thedocumentdatapereachdistinctgroup. IthasaRAM limit of 100MB. If the stage exceeds this limit, thegroup produces anerror[24].
$match:{score:{$gt:80}}},
{$count:"passing_scores"}])
db.artwork.aggregate([{
$facet:{"categorizedByTags":[{$unwind:"$tags"},
{$sortByCount:"$tags"}],"categorizedByPrice":[
//Filteroutdocumentswithoutapricee.g.,_id:7
{$match:{price:{$exists:1}}},
{$bucket:{groupBy:"$price",
boundaries:[0,150,200,300,400],
default:"Other",
output:{"count":{$sum:1},
"titles":{$push:"$title"}
}}}],"categorizedByYears(Auto)":[
{$bucketAuto:{groupBy:"$year",buckets:4}
}]}}])
db.places.aggregate([
{$geoNear:{
near:{type:"Point",coordinates:[-73.99279,40.719296]},
distanceField:"dist.calculated",
maxDistance:2,
query:{type:"public"},
includeLocs:"dist.location",
num:5,
spherical:true
}}])
db.travelers.aggregate([
{
$graphLookup:{
from:"airports",
startWith:"$nearestAirport",
connectFromField:"connects",
connectToField:"airport",
maxDepth:2,
depthField:"numConnections",
as:"destinations"
}
}
])
db.sales.aggregate(
[
{
$group:{
_id:{month:{$month:"$date"},day:{$dayOfMonth:"$date"},
![Page 129: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/129.jpg)
The indexStats stage returns statistics regarding the use of each index for acollection[24].
The limit stage is used for controlling the number of documents passed to thenextstageinthepipeline[24].
ThelistLocalSessionsstagegivesthesessioninformationcurrentlyconnectedtomongosormongodinstance[24].
ThelistSessionsstagelistsoutallsessionthathavebeenactivelongenoughtopropagatetothesystem.sessionscollection[24].
Thelookupstageisusefulforperformingouterjoinstoothercollectionsinthesamedatabase[24].
Thematchstageisusedtofilterthedocumentstream.Onlymatchingdocumentspasstonextstage[24].
Theproject stage is used to reshape the documents by adding or deleting thefields.
year:{$year:"$date"}},
totalPrice:{$sum:{$multiply:["$price","$quantity"]}},
averageQuantity:{$avg:"$quantity"},
count:{$sum:1}
}
}
]
)
db.orders.aggregate([{$indexStats:{}}])
db.article.aggregate(
{$limit:5}
)
db.aggregate([{$listLocalSessions:{allUsers:true}}])
useconfig
db.system.sessions.aggregate([{$listSessions:{allUsers:true}}])
{
$lookup:
{
from:<collectiontojoin>,
localField:<fieldfromtheinputdocuments>,
foreignField:<fieldfromthedocumentsofthe"from"collection>,
as:<outputarrayfield>
}
}
db.articles.aggregate(
[{$match:{author:"dave"}}]
)
![Page 130: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/130.jpg)
The redact stage reshapes stream documents by restricting information usinginformationstoredindocumentsthemselves[24].
ThereplaceRootstageisusedtoreplaceadocumentwithaspecifiedembeddeddocument[24].
Thesample stage isused to sampleoutdataby randomlyselectingnumberofdocumentsforminput[24].
Theskipstageskipsspecifiedinitialnumberofdocumentsandpassesremainingdocumentstothepipeline[24].
Thesort stage is usefulwhile reordering document stream by a specified sortkey[24].
The sortByCounts stage groups the incoming documents based on a specifiedexpressionvalueandcountsdocumentsineachdistinctgroup[24].
Theunwindstagedeconstructsanarrayfieldfromtheinputdocumentstooutput
db.books.aggregate([{$project:{title:1,author:1}}])
db.accounts.aggregate(
[
{$match:{status:"A"}},
{
$redact:{
$cond:{
if:{$eq:["$level",5]},
then:"$$PRUNE",
else:"$$DESCEND"
}}}]);
db.produce.aggregate([
{
$replaceRoot:{newRoot:"$in_stock"}
}
])
db.users.aggregate(
[{$sample:{size:3}}]
)
db.article.aggregate(
{$skip:5}
);
db.users.aggregate(
[
{$sort:{age:-1,posts:1}}
]
)
db.exhibits.aggregate(
[{$unwind:"$tags"},{$sortByCount:"$tags"}])
![Page 131: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/131.jpg)
adocumentforeachelement[24].
Theoutstageisusedtowriteaggregationpipelineresultsintoacollection.Thisstageshouldbethelaststageofapipeline[24].
AnotheroptionfromtheaggregationoperationsistheMap/Reduceframework,whichessentiallyincludestwodifferentfunctions,mapandreduce.Thefirstoneprovidesthekeyvaluepairforeachtaginthearray,whilethelatterone
“sumsoveralloftheemittedvaluesforagivenkey”[23].
ThelaststepintheMap/Reduceprocessittocallthemap_reduce()functionanditerateovertheresults[23].TheMap/Reduceoperationprovidesresultdatainacollectionorreturnsresultsin-line.Onecanperformsubsequentoperationswiththesameinputcollectioniftheoutputofthesameiswrittentoacollection[25].Anoperationthatproducesresultsinain-lineformmustprovideresultswithintheBSONdocument size limit.Thecurrent limit for aBSONdocument is16MB.Thesetypesofoperationsarenotsupportedbyviews[25].ThePyMongo’sAPI supports all features of the MongoDB’s Map/Reduce engine [26].Moreover,Map/Reduce has the ability to getmore detailed results by passingfull_response=Trueargumenttothemap_reduce()function[26].
9.1.3.14DeletingDocumentsfromaCollection
ThedeletionofdocumentswithPyMongo is fairly straight forward.Todo so,one would use the remove() method of the PyMongo Collection object [22].Similarlytothereadsandupdates,specificationofdocumentstoberemovedisamust.Forexample,removaloftheentiredocumentcollectionwithascoreof1,wouldrequiredonetousethefollowingcommand:
ThesafeparametersettoTrueensurestheoperationwascompleted[22].
db.inventory.aggregate([{$unwind:"$sizes"}])
db.inventory.aggregate([{$unwind:{path:"$sizes"}}])
db.books.aggregate([
{$group:{_id:"$author",books:{$push:"$title"}}},
{$out:"authors"}
])
cloudmesh.users.remove({"score":1,safe=True})
![Page 132: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/132.jpg)
9.1.3.15CopyingaDatabase
Copying databases within the same mongod instance or between differentmongodservers ismadepossiblewith thecommand()methodafterconnectingto the desired mongod instance [27]. For example, to copy the cloudmeshdatabase and name the new database cloudmesh_copy, one would use thecommand()methodinthefollowingmanner:
There are two ways to copy a database between servers. If a server is notpassword-prodected, one would not need to pass in the credentials nor toauthenticate to the admin database [27]. In that case, to copy a database onewouldusethefollowingcommand:
On the other hand, if the server where we are copying the database to isprotected,onewouldusethiscommandinstead:
9.1.3.16PyMongoStrengths
One of PyMongo strengths is that allows document creation and queryingnatively
“through the use of existing language features such as nesteddictionariesandlists”[22].
Formoderately experienced Python developers, it is very easy to learn it andquicklyfeelcomfortablewithit.
“For these reasons, MongoDB and Python make a powerfulcombinationforrapid,iterativedevelopmentofhorizontallyscalable
client.admin.command('copydb',
fromdb='cloudmesh',
todb='cloudmesh_copy')
client.admin.command('copydb',
fromdb='cloudmesh',
todb='cloudmesh_copy',
fromhost='source.example.com')
client=MongoClient('target.example.com',
username='administrator',
password='pwd')
client.admin.command('copydb',
fromdb='cloudmesh',
todb='cloudmesh_copy',
fromhost='source.example.com')
![Page 133: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/133.jpg)
backendapplications”[22].
Accordingto[22],MongoDBisveryapplicable tomodernapplications,whichmakesPyMongoequallyvaluable[22].
9.1.4MongoEngine
“MongoEngineisanObject-DocumentMapper,writteninPythonforworkingwithMongoDB”[28].
It is actually a library that allows a more advanced communication withMongoDBcomparedtoPyMongo.AsMongoEngineistechnicallyconsideredtobeanobject-documentmapper(ODM),itcanalsobeconsideredtobe
“equivalenttoaSQL-basedobjectrelationalmapper(ORM)”[19].
Theprimary techniquewhyonewoulduse anODM includesdataconversionbetweencomputersystemsthatarenotcompatiblewitheachother[29].Forthepurpose of converting data to the appropriate form, a virtual object databasemustbecreatedwithin theutilizedprogramming language[29].This library isalsousedtodefineschematafordocumentswithinMongoDB,whichultimatelyhelpswithminimizingcodingerrorsaswelldefiningmethodsonexistingfields[30].Itisalsoverybeneficialtotheoverallworkflowasittrackschangesmadetothedocumentsandaidsinthedocumentsavingprocess[31].
9.1.4.1Installation
Theinstallationprocessforthistechnologyisfairlysimpleasitisconsideredtobealibrary.Toinstallit,onewouldusethefollowingcommand[32]:
Ableeding-edgeversionofMongoEnginecanbeinstalleddirectlyfromGitHubbyfirstcloningtherepositoryonthelocalmachine,virtualmachine,orcloud.
9.1.4.2ConnectingtoadatabaseusingMongoEngine
Once installed, MongoEngine needs to be connected to an instance of the
$pipinstallmongoengine
![Page 134: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/134.jpg)
mongod, similarly to PyMongo [33]. The connect() function must be used tosuccessfully complete this step and the argument that must be used in thisfunctionisthenameofthedesireddatabase[33].Priortousingthisfunction,thefunctionnameneedstobeimportedfromtheMongoEnginelibrary.
SimilarlytotheMongoClient,MongoEngineusesthelocalhostandport27017by default, however, the connect() function also allows specifying other hostsandportargumentsaswell[33].
Other types of connections are also supported (i.e. URI) and they can becompletedbyprovidingtheURIintheconnect()function[33].
9.1.4.3QueryingusingMongoEngine
ToqueryMongoDBusingMongoEngineanobjectsattribute isused,whichis,technically, a part of the document class [34]. This attribute is called theQuerySetManagerwhichinreturn
“createsanewQuerySetobjectonaccess”[34].
Tobeabletoaccessindividualdocumentsfromadatabase,thisobjectneedstobe iterated over. For example, to return/print all students in thecloudmesh_community object (database), the following command would beused.
MongoEngine also has a capability of query filtering which means that akeyword can be used within the called QuerySet object to retrieve specificinformation [34]. Let us say one would like to iterate overcloudmesh_communitystudentsthatarenativesofIndiana.Toachievethis,onewouldusethefollowingcommand:
Thislibraryalsoallowstheuseofalloperatorsexceptfortheequalityoperator
frommongoengineimportconnect
connect('cloudmesh_community')
connect('cloudmesh_community',host='196.185.1.62',port=16758)
foruserincloudmesh_community.objects:
printcloudmesh_community.student
indy_students=cloudmesh_community.objects(state='IN')
![Page 135: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/135.jpg)
in itsqueries, andmoreover,has thecapabilityofhandlingstringqueries,geoqueries,listquerying,andqueryingoftherawPyMongoqueries[34].
The string queries are useful in performing text operations in the conditionalqueries.Aquery to find a document exactlymatching andwith stateACTIVEcanbeperformedinthefollowingmanner:
ThequerytoretrievedocumentdatafornamesthatstartwithacasesensitiveALcanbewrittenas:
Toperformanexactsamequeryforthenon-key-sensitiveALonewouldusethefollowingcommand:
TheMongoEngineallowsdataextractionofgeographicallocationsbyusingGeoqueries.Thegeo_withinoperatorchecksifageometryiswithinapolygon.
Thelistquerylooksupthedocumentswherethespecifiedfieldsmatchesexactlytothegivenvalue.Tomatchallpagesthathavethewordcodingasaniteminthetagslistonewouldusethefollowingquery:
Overall,itwouldbesafetosaythatMongoEnginehasgoodcompatibilitywithPython. It provides different functions to utilize Python easily withMongoDBand which makes this pair even more attractive to applicationdevelopers.
9.1.5Flask-PyMongo
“Flaskisamicro-webframeworkwritteninPython”[35].
db.cloudmesh_community.find(State.exact("ACTIVE"))
db.cloudmesh_community.find(Name.startswith("AL"))
db.cloudmesh_community.find(Name.istartswith("AL"))
cloudmesh_community.objects(
point__geo_within=[[[40,5],[40,6],[41,6],[40,5]]])
cloudmesh_community.objects(
point__geo_within={"type":"Polygon",
"coordinates":[[[40,5],[40,6],[41,6],[40,5]]]})
classPage(Document):
tags=ListField(StringField())
Page.objects(tags='coding')
![Page 136: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/136.jpg)
ItwasdevelopedafterDjango,and it isverypythonic innaturewhich impliesthatitisexplicitlythetargetingthePythonusercommunity.ItislightweightasitdoesnotrequireadditionaltoolsorlibrariesandhenceisclassifiedasaMicro-Webframework.ItisoftenusedwithMongoDBusingPyMongoconnector,andit treats data within MongoDB as searchable Python dictionaries. TheapplicationssuchasPinterest,LinkedIn,andthecommunitywebpageforFlaskare using theFlask framework.Moreover, it supports various features such asthe RESTful request dispatching, secure cookies, Google app enginecompatibility,andintegratedsupportforunittesting,etc[35].Whenitcomestoconnectingtoadatabase,theconnectiondetailsforMongoDBcanbepassedasavariableorconfiguredinPyMongoconstructorwithadditionalargumentssuchasusernameandpassword,ifrequired.ItisimportantthatversionsofbothFlaskandMongoDBarecompatiblewitheachothertoavoidfunctionalitybreaks[36].
9.1.5.1Installation
Flask-PyMongocanbeinstalledwithaneasycommandsuchasthis:
PyMongocanbeaddedinthefollowingmanner:
9.1.5.2Configuration
TherearetwowaystoconfigureFlask-PyMongo.ThefirstwaywouldbetopassaMongoDBURItothePyMongoconstructor,whilethesecondwaywouldbeto
“assignittotheMONGO_URIFlaskconfiurationvariable”[36].
9.1.5.3Connectiontomultipledatabases/servers
Multiple PyMongo instances can be used to connect to multiple databases ordatabase servers. To achieve this, once would use a command similar to thefollowing:
$pipinstallFlask-PyMongo
fromflaskimportFlask
fromflask_pymongoimportPyMongo
app=Flask(__name__)
app.config["MONGO_URI"]="mongodb://localhost:27017/cloudmesh_community"
mongo=PyMongo(app)
![Page 137: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/137.jpg)
9.1.5.4Flask-PyMongoMethods
Flask-PyMongo provides helpers for some common tasks.One of them is theCollection.find_one_or_404methodshowninthefollowingexample:
This method is very similar to the MongoDB’s find_one() method, however,insteadofreturningNoneitcausesa404NotFoundHTTPstatus[36].
Similarly,thePyMongo.send_fileandPyMongo.save_filemethodsworkon thefile-likeobjectsandsavethemtoGridFSusingthegivenfilename[36].
9.1.5.5AdditionalLibraries
Flask-MongoAlchemyandFlask-MongoEngineare theadditional libraries thatcanbeused toconnect toaMongoDBdatabasewhileusingenhancedfeatureswith the Flask app. The Flask-MongoAlchemy is used as a proxy betweenPython and MongoDB to connect. It provides an option such as server ordatabasebasedauthenticationtoconnecttoMongoDB.Whilethedefault issetserver based, to use a database-based authentication, the config valueMONGOALCHEMY_SERVER_AUTHparametermustbesettoFalse[37].
Flask-MongoEngine is the Flask extension that provides integration with theMongoEngine. It handles connection management for the apps. It can beinstalledthroughpipandsetupveryeasilyaswell.Thedefaultconfigurationisset to the local host and port 27017. For the custom port and in caseswhereMongoDB is running on another server, the host and port must be explicitlyspecifiedinconnectstringswithintheMONGODB_SETTINGSdictionarywithapp.config, alongwith the database username and password, in caseswhere adatabaseauthenticationisenabled.TheURIstyleconnectionsarealsosupportedandsupply theURIas thehost in theMONGODB_SETTINGS dictionarywithapp.config.TherearevariouscustomquerysetsthatareavailablewithinFlask-
app=Flask(__name__)
mongo1=PyMongo(app,uri="mongodb://localhost:27017/cloudmesh_community_one")
mongo2=PyMongo(app,uri="mongodb://localhost:27017/cloudmesh_community_two")
mongo3=PyMongo(app,uri=
"mongodb://another.host:27017/cloudmesh_community_Three")
@app.route("/user/<username>")
defuser_profile(username):
user=mongo.db.cloudmesh_community.find_one_or_404({"_id":username})
returnrender_template("user.html",user=user)
![Page 138: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/138.jpg)
MongoenginethatareattachedtoMongoengine’sdefaultqueryset[38].
9.1.5.6ClassesandWrappers
Attributes such as cx and db in the PyMongo objects are the ones that helpprovideaccesstotheMongoDBserver[36].Toachievethis,onemustpasstheFlaskapptotheconstructororcallinit_app()[36].
“Flask-PyMongo wraps PyMongo’s MongoClient, Database, andCollectionclasses,andoverridestheirattributeanditemaccessors”[36].
This type of wrapping allows Flask-PyMongo to add methods to CollectionwhileatthesametimeallowingaMongoDB-styledottedexpressionsinthecode[36].
Flask-PyMongo creates connectivity between Python and Flask using aMongoDBdatabaseandsupports
“extensions that can add application features as if they wereimplementedinFlaskitself”[39],
hence, it canbeused as an additionalFlask functionality inPython code.Theextensions are there for the purpose of supporting form validations,authentication technologies, object-relational mappers and framework relatedtoolswhichultimatelyaddsalotofstrengthtothismicro-webframework[39].OneofthemainreasonsandbenefitswhyitisfrequentlyusedwithMongoDBisitscapabilityofaddingmorecontroloverdatabasesandhistory[39].
9.2MONGOENGINE☁�9.2.1Introduction
MongoEngine isadocumentmapper forworkingwithmongoldbwithpython.To be able to use mongo engine MongodD should be already installed and
type(mongo.cx)
type(mongo.db)
type(mongo.db.cloudmesh_community)
![Page 139: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/139.jpg)
running.
9.2.2Installandconnect
Mongoenginecanbeinstalledbyrunning:
Thiswillinstallsix,pymongoandmongoengine.
To connect to mongoldb use connect () function by specifying mongoldbinstancename.Youdon’tneedtogotomongoshellbutthiscanbedonefromunix shell or cmd line. In this case we are connecting to a database namedstudent_db.
Ifmongodb is runningonaportdifferent fromdefaultport ,portnumberandhost need to be specified. If mongoldb needs authentication username andpasswordneedtobespecified.
9.2.3Basics
Mongodbdoesnotenforceschemas.ComparingtoRDBMS,Rowinmongoldbis called a “document” and table can be compared toCollection. Defining aschemaishelpfulasitminimizescodingerror’s.Todefineaschemawecreateaclassthatinheritsfromdocument.
�TODO:Canyoufixthecodesectionsandlookattheexamplesweprovided.
Fields are notmandatory but if needed, set the required keyword argument toTrue. There are multiple values available for field types. Each field can becustomizedbybykeywordargument.IfeachstudentissendingtextmessagestoUniversitiescentraldatabase,thesecanbestoredusingMongodb.Eachtextcanhavedifferentdatatypes,somemighthaveimagesorsomemighthaveurl’s.SowecancreateaclasstextandlinkittostudentbyusingReferencefield(similar
$pipinstallmongoengine
frommongoengineimport*connect(‘student_db’)
frommongoengineimport*
classStudent(Document):
first_name=StringField(max_length=50)
last_name=StringField(max_length=50)
![Page 140: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/140.jpg)
toforeignkeyinRDBMS).
MongoDb supports adding tags to individual texts rather then storing themseparately and then having them referenced.Similarly Comments can also bestoreddirectlyinaText.
Foraccessingdata:ifweneedtogettitles.
Searchingtextswithtags.
classText(Document):
title=StringField(max_length=120,required=True)
author=ReferenceField(Student)
meta={'allow_inheritance':True}
classOnlyText(Text):
content=StringField()
classImagePost(Text):
image_path=StringField()
classLinkPost(Text):
link_url=StringField()
classText(Document):
title=StringField(max_length=120,required=True)
author=ReferenceField(User)
tags=ListField(StringField(max_length=30))
comments=ListField(EmbeddedDocumentField(Comment))
fortextinOnlyText.objects:
print(text.title)
fortextinText.objects(tags='mongodb'):
print(text.title)
![Page 141: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/141.jpg)
10OTHER
10.1WORDCOUNTWITHPARALLELPYTHON☁�We will demonstrate Python’s multiprocessing API for parallel computation bywriting a program that counts how many times each word in a collection ofdocumentsappear.
10.1.1GeneratingaDocumentCollection
Beforewebegin,letuswriteascriptthatwillgeneratedocumentcollectionsbyspecifying the number of documents and the number ofwords per document.Thiswillmakebenchmarkingstraightforward.
To keep it simple, the vocabulary of the document collection will consist ofrandomnumbersratherthanthewordsofanactuallanguage:'''Usage:generate_nums.py[-h]NUM_LISTSINTS_PER_LISTMIN_INTMAX_INTDEST_DIR
Generaterandomlistsofintegersandsavethem
as1.txt,2.txt,etc.
Arguments:
NUM_LISTSThenumberofliststocreate.
INTS_PER_LISTThenumberofintegersineachlist.
MIN_NUMEachgeneratedintegerwillbe>=MIN_NUM.
MAX_NUMEachgeneratedintegerwillbe<=MAX_NUM.
DEST_DIRAdirectorywherethegeneratednumberswillbestored.
Options:
-h--help
'''
from__future__importprint_function
importos,random,logging
fromdocoptimportdocopt
defgenerate_random_lists(num_lists,
ints_per_list,min_int,max_int):
return[[random.randint(min_int,max_int)\
foriinrange(ints_per_list)]foriinrange(num_lists)]
if__name__=='__main__':
args=docopt(__doc__)
num_lists,ints_per_list,min_int,max_int,dest_dir=[
int(args['NUM_LISTS']),
int(args['INTS_PER_LIST']),
int(args['MIN_INT']),
int(args['MAX_INT']),
args['DEST_DIR']
]
ifnotos.path.exists(dest_dir):
![Page 142: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/142.jpg)
Notice thatwe are using the docoptmodule that you should be familiar withfromtheSection[PythonDocOpts](#s-python-docopts}tomakethescripteasytorunfromthecommandline.
Youcangenerateadocumentcollectionwiththisscriptasfollows:
10.1.2SerialImplementation
Afirstserialimplementationofwordcountisstraightforward:
os.makedirs(dest_dir)
lists=generate_random_lists(num_lists,
ints_per_list,
min_int,
max_int)
curr_list=1
forlstinlists:
withopen(os.path.join(dest_dir,'%d.txt'%curr_list),'w')asf:
f.write(os.linesep.join(map(str,lst)))
curr_list+=1
logging.debug('Numberswritten.')
pythongenerate_nums.py1000100000100docs-1000-10000
'''Usage:wordcount.py[-h]DATA_DIR
Readacollectionof.txtdocumentsandcounthowmanytimeseachword
appearsinthecollection.
Arguments:
DATA_DIRAdirectorywithdocuments(.txtfiles).
Options:
-h--help
'''
from__future__importdivision,print_function
importos,glob,logging
fromdocoptimportdocopt
logging.basicConfig(level=logging.DEBUG)
defwordcount(files):
counts={}
forfilepathinfiles:
withopen(filepath,'r')asf:
words=[word.strip()forwordinf.read().split()]
forwordinwords:
ifwordnotincounts:
counts[word]=0
counts[word]+=1
returncounts
if__name__=='__main__':
args=docopt(__doc__)
ifnotos.path.exists(args['DATA_DIR']):
raiseValueError('Invaliddatadirectory:%s'%args['DATA_DIR'])
counts=wordcount(glob.glob(os.path.join(args['DATA_DIR'],'*.txt')))
logging.debug(counts)
![Page 143: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/143.jpg)
10.1.3SerialImplementationUsingmapandreduce
We can improve the serial implementation in anticipation of parallelizing theprogrambymakinguseofPython’smapandreducefunctions.
In short, you can use map to apply the same function to the members of acollection.Forexample,toconvertalistofnumberstostrings,youcoulddo:
We can use reduce to apply the same function cumulatively to the items of asequence.Forexample,tofindthetotalofthenumbersinourlist,wecouldusereduceasfollows:
Wecansimplifythisevenmorebyusingalambdafunction:
YoucanreadmoreaboutPython’slambdafunctioninthedocs.
Withthisinmind,wecanreimplementthewordcountexampleasfollows:
importrandom
nums=[random.randint(1,2)for_inrange(10)]
print(nums)
[2,1,1,1,2,2,2,2,2,2]
print(map(str,nums))
['2','1','1','1','2','2','2','2','2','2']
defadd(x,y):
returnx+y
print(reduce(add,nums))
17
print(reduce(lambdax,y:x+y,nums))
17
'''Usage:wordcount_mapreduce.py[-h]DATA_DIR
Readacollectionof.txtdocumentsandcounthow
manytimeseachword
appearsinthecollection.
Arguments:
DATA_DIRAdirectorywithdocuments(.txtfiles).
Options:
-h--help
'''
from__future__importdivision,print_function
importos,glob,logging
fromdocoptimportdocopt
logging.basicConfig(level=logging.DEBUG)
defcount_words(filepath):
counts={}
withopen(filepath,'r')asf:
words=[word.strip()forwordinf.read().split()]
![Page 144: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/144.jpg)
10.1.4ParallelImplementation
Drawingon theprevious implementationusing mapand reduce,we can parallelizetheimplementationusingPython’smultiprocessingAPI:
10.1.5Benchmarking
forwordinwords:
ifwordnotincounts:
counts[word]=0
counts[word]+=1
returncounts
defmerge_counts(counts1,counts2):
forword,countincounts2.items():
ifwordnotincounts1:
counts1[word]=0
counts1[word]+=counts2[word]
returncounts1
if__name__=='__main__':
args=docopt(__doc__)
ifnotos.path.exists(args['DATA_DIR']):
raiseValueError('Invaliddatadirectory:%s'%args['DATA_DIR'])
per_doc_counts=map(count_words,
glob.glob(os.path.join(args['DATA_DIR'],
'*.txt')))
counts=reduce(merge_counts,[{}]+per_doc_counts)
logging.debug(counts)
'''Usage:wordcount_mapreduce_parallel.py[-h]DATA_DIRNUM_PROCESSES
Readacollectionof.txtdocumentsandcount,inparallel,howmany
timeseachwordappearsinthecollection.
Arguments:
DATA_DIRAdirectorywithdocuments(.txtfiles).
NUM_PROCESSESThenumberofparallelprocessestouse.
Options:
-h--help
'''
from__future__importdivision,print_function
importos,glob,logging
fromdocoptimportdocopt
fromwordcount_mapreduceimportcount_words,merge_counts
frommultiprocessingimportPool
logging.basicConfig(level=logging.DEBUG)
if__name__=='__main__':
args=docopt(__doc__)
ifnotos.path.exists(args['DATA_DIR']):
raiseValueError('Invaliddatadirectory:%s'%args['DATA_DIR'])
num_processes=int(args['NUM_PROCESSES'])
pool=Pool(processes=num_processes)
per_doc_counts=pool.map(count_words,
glob.glob(os.path.join(args['DATA_DIR'],
'*.txt')))
counts=reduce(merge_counts,[{}]+per_doc_counts)
logging.debug(counts)
![Page 145: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/145.jpg)
Totimeeachoftheexamples,enteritintoitsownPythonfileanduseLinux’stimecommand:
Theoutputcontains the real run timeand theuser run time. real iswall clocktime-timefromstarttofinishofthecall.useristheamountofCPUtimespentin user-mode code (outside the kernel)within the process, that is, only actualCPUtimeusedinexecutingtheprocess.
10.1.6Excersises
E.python.wordcount.1:
Run the threedifferentprograms (serial, serialw/mapandreduce,parallel)andanswerthefollowingquestions:
1. Is there any performance difference between the differentversionsoftheprogram?
2. Doesusertimesignificantlydifferfromrealtimeforanyoftheversionsoftheprogram?
3. Experimentwithdifferentnumbersofprocessesfortheparallelexample, starting with 1. What is the performance gain whenyougoalfrom1to2processes?From2to3?Whendoyoustopseeing improvement? (this will depend on your machinearchitecture)
10.1.7References
Map,FilterandReducemultiprocessingAPI
10.2NUMPY☁�NumPyisapopularlibrarythatisusedbymanyotherPythonpackagessuchasPandas, SciPy, and scikit-learn. It provides a fast, simple-to-use way ofinteracting with numerical data organized in vectors and matrices. In thissection,wewillprovideashortintroductiontoNumPy.
$timepythonwordcount.pydocs-1000-10000
![Page 146: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/146.jpg)
10.2.1InstallingNumPy
The most common way of installing NumPy, if it wasn’t included with yourPythoninstallation,istoinstallitviapip:
IfNumPyhasalreadybeeninstalled,youcanupdatetothemostrecentversionusing:
YoucanverifythatNumPyisinstalledbytryingtouseitinaPythonprogram:
Notethat,byconvention,weimportNumPyusingthealias‘np’-wheneveryousee ‘np’ sprinkled in example Python code, it’s a good bet that it is usingNumPy.
10.2.2NumPyBasics
At its core, NumPy is a container for n-dimensional data. Typically, 1-dimensional data is called an array and 2-dimensional data is called amatrix.Beyond2-dimenionswouldbeconsideredamultidimensionalarray.Exampleswhereyou’llencounterthesedimenionsmayinclude:
1 Dimensional: time series data such as audio, stock prices, or a singleobservationinadataset.2 Dimensional: connectivity data between network nodes, user-productrecommendations,anddatabasetables.3+ Dimensional: network latency between nodes over time, video(RGB+time),andversioncontrolleddatasets.
All of these data can be placed into NumPy’s array object, just with varyingdimensions.
10.2.3DataTypes:TheBasicBuildingBlocks
Beforewedelveintoarraysandmatrices,wewillstartoffwiththemostbasic
$pipinstallnumpy
$pipinstall-Unumpy
importnumpyasnp
![Page 147: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/147.jpg)
element of those: a single value. NumPy can represent data utilizing manydifferentstandarddatatypessuchasuint8(an8-bitusigned integer), float64(a64-bitfloat),orstr(astring).Anexhaustivelistingcanbefoundat:
https://docs.scipy.org/doc/numpy-1.15.0/user/basics.types.html
Beforemovingon,itisimportanttoknowaboutthetradeoffmadewhenusingdifferentdatatypes.Forexample,auint8canonlycontainvaluesbetween0and255.This,however,contrastswithfloat64whichcanexpressanyvaluefrom+/-1.80e+308.Sowhywouldn’twejustalwaysusefloat64s?Thoughtheyallowustobemoreexpressiveintermsofnumbers,theyalsoconsumemorememory.Ifwewereworkingwith a12megapixel image, for example, storing that imageusinguint8valueswouldrequire3000*4000*8=96millionbits,or91.55MBof memory. If we were to store the same image utilizing float64, our imagewouldconsume8 timesasmuchmemory:768millionbitsor732.42MB.It’simportant use the right datatype for the job to avoid consuming unneccessaryresourcesorslowingdownprocessing.
Finally,whileNumPywillconvenientlyconvertbetweendatatypes,onemustbeawareofoverflowswhenusingsmallerdatatypes.Forexample:
In this example, it makes sense that 6+7=13. But how does 13+245=2? Putsimply, the object type (uint8) simply ran out of space to store the value andwrapped back around to the beginning. An 8-bit number is only capable ofstoring2^8,or256,uniquevalues.Anoperationthatresultsinavalueabovethatrangewill‘overflow’andcausethevaluetowrapbackaroundtozero.Likewise,anythingbelowthatrangewill‘underflow’andwrapbackaroundtotheend.Inour example, 13+245 became 258,whichwas too large to store in 8 bits andwrappedbackaroundto0andendedupat2.
NumPywill, generally, try to avoid this situation by dynamically retyping towhateverdatatypewillsupporttheresult:
a=np.array([6],dtype=np.uint8)
print(a)
>>>[6]
a=a+np.array([7],dtype=np.uint8)
print(a)
>>>[13]
a=a+np.array([245],dtype=np.uint8)
print(a)
>>>[2]
a=a+260
![Page 148: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/148.jpg)
Here,ouradditioncausedourarray,‘a’,tobeupscaledtouseuint16insteadofuint8. Finally, NumPy offers convenience functions akin to Python’s range()functiontocreatearraysofsequentialnumbers:
Wecanusethisfunctiontoalsogenerateparametersspacesthatcanbeiteratedon:
10.2.4Arrays:StringingThingsTogether
With our knowledge of datatypes in hand, we can begin to explore arrays.Simply put, arrays can be thought of as a sequence of values (not neccesarilynumbers).Arraysare1dimensionalandcanbecreatedandaccessedsimply:
Arrays (and, later,matrices)arezero-indexed.Thismakes it convenientwhen,forexample,usingPython’srange()functiontoiteratethroughanarray:
Arraysare,also,mutableandcanbechangedeasily:
NumPy also includes incredibly powerful broadcasting features.Thismakes itvery simple to perform mathematical operations on arrays that also makes
print(test)
>>>[262]
X=np.arange(0.2,1,.1)
print(X)
>>>array([0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9],dtype=float32)
P=10.0**np.arange(-7,1,1)
print(P)
forx,pinzip(X,P):
print('%f,%f'%(x,p))
a=np.array([1,2,3])
print(type(a))
>>><class'numpy.ndarray'>
print(a)
>>>[123]
print(a.shape)
>>>(3,)
a[0]
>>>1
foriinrange(3):
print(a[i])
>>>1
>>>2
>>>3
a[0]=42
print(a)
>>>array([42,2,3])
![Page 149: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/149.jpg)
intuitivesense:
Arrayscanalsointeractwithotherarrays:
In this example, the result of multiplying together two arrays is to take theelement-wiseproductwhilemultiplyingbyaconstantwillmultiplyeachelementin the array by that constant. NumPy supports all of the basic mathematicaloperations: addition, subtraction, multiplication, division, and powers. It alsoincludesanextensivesuiteofmathematicalfunctions,suchaslog()andmax(),whicharecoveredlater.
10.2.5Matrices:AnArrayofArrays
Matrices can be thought of as an extension of arrays - rather than having onedimension,matriceshave2(ormore).Muchlikearrays,matricescanbecreatedeasilywithinNumPy:
Accessingindividualelementsissimilartohowwediditforarrays.Wesimplyneedtopassinanumberofargumentsequaltothenumberofdimensions:
In this example, our first index selected the row and the second selected thecolumn-givingusourresultof3.Matricescanbeextendingouttoanynumberofdimensionsbysimplyusingmoreindicestoaccessspecificelements(thoughuse-casesbeyond4maybesomewhatrare).
Matricessupportallofthenormalmathematialfunctionssuchas+,-,*,and/.Aspecialnote:the*operatorwillresultinanelement-wisemultiplication.Using@ornp.matmul()formatrixmultiplication:
a*3
>>>array([3,6,9])
a**2
>>>array([1,4,9],dtype=int32)
b=np.array([2,3,4])
print(a*b)
>>>array([2,6,12])
m=np.array([[1,2],[3,4]])
print(m)
>>>[[12]
>>>[34]]
m[1][0]
>>>3
![Page 150: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/150.jpg)
MorecomplexmathematicalfunctionscantypicallybefoundwithintheNumPylibraryitself:
A full listing can be found at:https://docs.scipy.org/doc/numpy/reference/routines.math.html
10.2.6SlicingArraysandMatrices
As one can imagine, accessing elements one-at-a-time is both slow and canpotentially require many lines of code to iterate over every dimension in thematrix. Thankfully, NumPy incorporate a very powerful slicing engine thatallowsustoaccessrangesofelementseasily:
The ‘:’value tellsNumPy toselectallelements in thegivendimension.Here,we’ve requested all elements in the first row. We can also use indexing torequestelementswithinagivenrange:
Here,weaskedNumPy togiveuselements4 through7 (ranges inPythonareinclusiveatthestartandnon-inclusiveattheend).Wecanevengobackwards:
Inthepreviousexample,thenegativevalueisaskingNumPytoreturnthelast5elementsof thearray.Had theargumentbeen‘:-5’,NumPywould’vereturnedeverythingBUTthelastfiveelements:
Becoming more familiar with NumPy’s accessor conventions will allow youwritemoreefficient,clearercodeasitiseasiertoreadasimpleone-lineaccessor
print(m-m)
print(m*m)
print(m/m)
print(np.sin(x))
print(np.sum(x))
m[1,:]
>>>array([3,4])
a=np.arange(0,10,1)
print(a)
>>>[0123456789]
a[4:8]
>>>array([4,5,6,7])
a[-5:]
>>>array([5,6,7,8,9])
a[:-5]
>>>array([0,1,2,3,4])
![Page 151: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/151.jpg)
than it is a multi-line, nested loop when extracting values from an array ormatrix.
10.2.7UsefulFunctions
The NumPy library provides several convenient mathematical functions thatusers can use. These functions provide several advantages to codewritten byusers:
They are open source typically have multiple contributors checking forerrors.Many of them utilize a C interface andwill runmuch faster than nativePythoncode.They’rewrittentoveryflexible.
NumPyarraysandmatricescontainmanyusefulaggregatingfunctionssuchasmax(),min(),mean(), etc These functions are usually able to run an order ofmagnitudefasterthanloopingthroughtheobject,soit’simportanttounderstandwhatfunctionsareavailabletoavoid‘reinventingthewheel.’Inaddition,manyof the functions are able to sum or average across axes, which make themextremely useful if your data has inherent grouping. To return to a previousexample:
In thisexample,wecreateda2x2matrixcontaining thenumbers1 through4.Thesumof thematrix returned theelement-wiseadditionof theentirematrix.Summing across axis 0 (rows) returned a new array with the element-wiseadditionacrosseachrow.Likewise,summingacrossaxis1(columns)returnedthecolumnarsummation.
10.2.8LinearAlgebra
Perhaps one of the most important uses for NumPy is its robust support for
m=np.array([[1,2],[3,4]])
print(m)
>>>[[12]
>>>[34]]
m.sum()
>>>10
m.sum(axis=1)
>>>[3,7]
m.sum(axis=0)
>>>[4,6]
![Page 152: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/152.jpg)
Linear Algebra functions. Like the aggregation functions described in theprevious section, these functions are optimized to be much faster than userimplementationsandcanutilizeprocessesorlevelfeaturestoprovideveryquickcomputations. These functions can be accessed very easily from the NumPypackage:
Included in within np.linalg are functions for calculating theEigendecompositionofsquarematricesandsymmetricmatrices.Finally,togiveaquickexampleofhoweasy it is to implementalgorithms inNumPy,wecaneasilyuseittocalculatethecostandgradientwhenusingsimpleMean-Squared-Error(MSE):
Finally, more advanced functions are easily available to users via the linalglibraryofNumPyas:
10.2.9NumPyResources
https://docs.scipy.org/doc/numpyhttp://cs231n.github.io/python-numpy-tutorial/#numpyhttps://docs.scipy.org/doc/numpy-1.15.1/reference/routines.linalg.htmlhttps://en.wikipedia.org/wiki/Mean_squared_error
10.3SCIPY☁�SciPy is a library built around numpy and has a number of off-the-shelfalgorithmsandoperationsimplemented.Theseincludealgorithmsfromcalculus(such as integration), statistics, linear algebra, image-processing, signal
a=np.array([[1,2],[3,4]])
b=np.array([[5,6],[7,8]])
print(np.matmul(a,b))
>>>[[1922]
[4350]]
cost=np.power(Y-np.matmul(X,weights)),2).mean(axis=1)
gradient=np.matmul(X.T,np.matmul(X,weights)-y)
fromnumpyimportlinalg
A=np.diag((1,2,3))
w,v=linalg.eig(A)
print('w=',w)
print('v=',v)
![Page 153: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/153.jpg)
processing,machinelearning.
To achieve this, SciPy bundels a number of useful open-source software formathematics,science,andengineering.Itincludesthefollowingpackages:
NumPy,
formanagingN-dimensionalarrays
SciPylibrary,
toaccessfundamentalscientificcomputingcapabilities
Matplotlib,
toconduct2Dplotting
IPython,
foranInteractiveconsole(seejupyter)
Sympy,
forsymbolicmathematics
pandas,
forprovidingdatastructuresandanalysis
10.3.1Introduction
First we add the usual scientific computing modules with the typicalabbreviations, including sp for scipy. We could invoke scipy’s statisticalpackageassp.stats,butforthesakeoflazinessweabbreviatethattoo.
Nowwecreatesomerandomdatatoplaywith.Wegenerate100samplesfroma
importnumpyasnp#importnumpy
importscipyassp#importscipy
fromscipyimportstats#referdirectlytostatsratherthansp.stats
importmatplotlibasmpl#forvisualization
frommatplotlibimportpyplotasplt#referdirectlytopyplot
#ratherthanmpl.pyplot
![Page 154: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/154.jpg)
Gaussiandistributioncenteredatzero.
Howmanyelementsareintheset?
Whatisthemean(average)oftheset?
Whatistheminimumoftheset?
Whatisthemaximumoftheset?
Wecanusethescipyfunctionstoo.What’sthemedian?
Whataboutthestandarddeviationandvariance?
Isn’tthevariancethesquareofthestandarddeviation?
How close are the measures? The differences are close as the followingcalculationshows
Howdoesthislookasahistogram?SeeFigure18,Figure19,Figure20
s=sp.randn(100)
print('Thereare',len(s),'elementsintheset')
print('Themeanofthesetis',s.mean())
print('Theminimumofthesetis',s.min())
print('Themaximumofthesetis',s.max())
print('Themedianofthesetis',sp.median(s))
print('Thestandarddeviationis',sp.std(s),
'andthevarianceis',sp.var(s))
print('Thesquareofthestandarddeviationis',sp.std(s)**2)
print('Thedifferenceis',abs(sp.std(s)**2-sp.var(s)))
print('Andindecimalform,thedifferenceis%0.16f'%
(abs(sp.std(s)**2-sp.var(s))))
plt.hist(s)#yes,onelineofcodeforahistogram
plt.show()
![Page 155: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/155.jpg)
Figure18:Histogram1
Letusaddsometitles.
Figure19:Histogram2
Typically we do not include titles when we prepare images for inclusion inLaTeX.Thereweusethecaptiontodescribewhatthefigureisabout.
plt.clf()#clearoutthepreviousplot
plt.hist(s)
plt.title("HistogramExample")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()
plt.clf()#clearoutthepreviousplot
plt.hist(s)
plt.xlabel("Value")
![Page 156: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/156.jpg)
Figure20:Histogram3
Let us try out some linear regression, or curve fitting. See @#fig:scipy-output_30_0
plt.ylabel("Frequency")
plt.show()
importrandom
defF(x):
return2*x-2
defadd_noise(x):
returnx+random.uniform(-1,1)
X=range(0,10,1)
Y=[]
foriinrange(len(X)):
Y.append(add_noise(X[i]))
plt.clf()#clearouttheoldfigure
plt.plot(X,Y,'.')
plt.show()
![Page 157: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/157.jpg)
Figure21:Result1
Nowlet’strylinearregressiontofitthecurve.
Whatistheslopeandy-interceptofthefittedcurve?
Nowlet’sseehowwellthecurvefitsthedata.We’llcallthefittedcurveF’.
To save images into a PDF file for inclusion into LaTeX documents you cansavetheimagesasfollows.Otherformatssuchaspngarealsopossible,butthequalityisnaturallynotsufficientforinclusioninpapersanddocuments.Forthat
m,b,r,p,est_std_err=stats.linregress(X,Y)
print('Theslopeis',m,'andthey-interceptis',b)
defFprime(x):#thefittedcurve
returnm*x+b
X=range(0,10,1)
Yprime=[]
foriinrange(len(X)):
Yprime.append(Fprime(X[i]))
plt.clf()#clearouttheoldfigure
#theobservedpoints,bluedots
plt.plot(X,Y,'.',label='observedpoints')
#theinterpolatedcurve,connectedredline
plt.plot(X,Yprime,'r-',label='estimatedpoints')
plt.title("LinearRegressionExample")#title
plt.xlabel("x")#horizontalaxistitle
plt.ylabel("y")#verticalaxistitle
#legendlabelstoplot
plt.legend(['obseredpoints','estimatedpoints'])
#commentoutsothatyoucansavethefigure
#plt.show()
![Page 158: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/158.jpg)
youcertainlywant tousePDF.Thesaveof thefigurehas tooccurbeforeyouusetheshow()command.SeeFigure22
Figure22:Result2
10.3.2References
Formore informationaboutSciPywe recommend thatyouvisit the followinglink
https://www.scipy.org/getting-started.html#learning-to-work-with-scipy
Additionalmaterialandinspirationforthissectionarefrom
[ ]“GettingStartedguide”https://www.scipy.org/getting-started.html
[ ]Prasanth.“SimplestatisticswithSciPy.”Comfortat1AU.February28, 2011. https://oneau.wordpress.com/2011/02/28/simple-statistics-with-scipy/.
[ ] SciPy Cookbook. Lasted updated: 2015. http://scipy-cookbook.readthedocs.io/.
plt.savefig("regression.pdf",bbox_inches='tight')
plt.savefig('regression.png')
plt.show()
![Page 159: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/159.jpg)
createbibtexentries
10.4SCIKIT-LEARN☁�
LearningObjectives
ExploratorydataanalysisPipelinetopreparedataFulllearningpipelineFinetunethemodelSignificancetests
10.4.1IntroductiontoScikit-learn
Scikit learnisaMachineLearningspecific libraryusedinPython.Librarycanbeusedfordataminingandanalysis.ItisbuiltontopofNumPy,matplotlibandSciPy.ScikitLearnfeaturesDimensionalityreduction,clustering,regressionandclassificationalgorithms.Italsofeaturesmodelselectionusinggridsearch,crossvalidationandmetrics.
Scikitlearnalsoenablesuserstopreprocessthedatawhichcanthenbeusedformachinelearningusingmoduleslikepreprocessingandfeatureextraction.
Inthissectionwedemonstratehowsimpleitistousek-meansinscikitlearn.
10.4.2Installation
Ifyoualreadyhaveaworkinginstallationofnumpyandscipy,theeasiestwaytoinstallscikit-learnisusingpip
10.4.3SupervisedLearning
$pipinstallnumpy
$pipinstallscipy-U
$pipinstall-Uscikit-learn
![Page 160: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/160.jpg)
SupervisedLearningisusedinmachinelearningwhenwealreadyknowasetofoutputpredictionsbasedon input characteristics andbasedon thatweneed topredictthetargetforanewinput.Trainingdataisusedtotrainthemodelwhichthencanbeusedtopredicttheoutputfromaboundedset.
Problemscanbeoftwotypes
1. Classification:Trainingdatabelongstothreeorfourclasses/categoriesandbasedon the labelwewant topredict theclass/categoryfor theunlabeleddata.
2. Regression : Training data consists of vectorswithout any correspondingtargetvalues.Clusteringcanbeusedforthesetypeofdatasetstodeterminediscover groups of similar examples. Another way is density estimationwhichdeterminethedistributionofdatawithintheinputspace.Histogramisthemostbasicform.
10.4.4UnsupervisedLearning
UnsupervisedLearning isused inmachine learningwhenwehave the trainingsetavailablebutwithoutanycorrespondingtarget.Theoutcomeoftheproblemistodiscovergroupswithintheprovidedinput.Itcanbedoneinmanyways.
Fewofthemarelistedhere
1. Clustering:Discovergroupsofsimilarcharacteristics.2. Density Estimation : Finding the distribution of datawithin the provided
inputorchanging thedata fromahighdimensional space to twoor threedimension.
10.4.5BuildingaendtoendpipelineforSupervisedmachinelearningusingScikit-learn
Adatapipelineisasetofprocessingcomponentsthataresequencedtoproducemeaningfuldata.PipelinesarecommonlyusedinMachinelearning,sincethereislotofdatatransformationandmanipulationthatneedstobeappliedtomakedatauseful formachine learning.All components are sequenced in away thatthe output of one component becomes input for the next and each of thecomponentisselfcontained.Componentsinteractwitheachotherusingdata.
![Page 161: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/161.jpg)
Evenifacomponentbreaks,thedownstreamcomponentcanrunnormallyusingthe last output. Sklearn provide the ability to build pipelines that can betransformedandmodeledformachinelearning.
10.4.6Stepsfordevelopingamachinelearningmodel
1. Explorethedomainspace2. Extracttheproblemdefinition3. Getthedatathatcanbeusedtomakethesystemlearntosolvetheproblem
definition.4. DiscoverandVisualizethedatatogaininsights5. Featureengineeringandpreparethedata6. Finetuneyourmodel7. Evaluateyoursolutionusingmetrics8. Onceprovenlaunchandmaintainthemodel.
10.4.7ExploratoryDataAnalysis
Exampleproject=Frauddetectionsystem
Firststepistoloadthedataintoadataframeinorderforaproperanalysistobedoneontheattributes.
Performthebasicanalysisonthedatashapeandnullvalueinformation.
Hereistheexampleoffewofthevisualdataanalysismethods.
10.4.7.1Barplot
Abarchartorgraphisagraphwithrectangularbarsorbinsthatareusedtoplotcategoricalvalues.Eachbarinthegraphrepresentsacategoricalvariableandtheheightofthebarisproportionaltothevaluerepresentedbyit.
data=pd.read_csv('dataset/data_file.csv')
data.head()
print(data.shape)
print(data.info())
data.isnull().values.any()
![Page 162: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/162.jpg)
Bargraphsareused:
TomakecomparisonsbetweenvariablesTovisualizeanytrendinthedata,i.e.,they show the dependence of one variable on another Estimate values of avariable
Figure23:Exampleofscikit-learnbarplots
10.4.7.2Correlationbetweenattributes
Attributesinadatasetcanberelatedbasedondifferntaspects.
Examplesincludeattributesdependentonanotherorcouldbelooselyortightlycoupled.Alsoexampleincludestwovariablescanbeassociatedwithathirdone.
Inordertounderstandtherelationshipbetweenattributes,correlationrepresentsthebestvisualwaytogetaninsight.Positivecorrelationmeaningbothattributesmovingintothesamedirection.Negativecorrelationreferstooppostedirections.One attributes values increase results in value decrease for other. Zero
plt.ylabel('Transactions')
plt.xlabel('Type')
data.type.value_counts().plot.bar()
![Page 163: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/163.jpg)
correlationiswhentheattributesareunrelated.
Figure24:scikit-learncorrelationarray
10.4.7.3HistogramAnalysisofdatasetattributes
Ahistogramconsistsofasetofcountsthatrepresentthenumberoftimessome
#computethecorrelationmatrix
corr=data.corr()
#generateamaskforthelowertriangle
mask=np.zeros_like(corr,dtype=np.bool)
mask[np.triu_indices_from(mask)]=True
#setupthematplotlibfigure
f,ax=plt.subplots(figsize=(18,18))
#generateacustomdivergingcolormap
cmap=sns.diverging_palette(220,10,as_cmap=True)
#drawtheheatmapwiththemaskandcorrectaspectratio
sns.heatmap(corr,mask=mask,cmap=cmap,vmax=.3,
square=True,
linewidths=.5,cbar_kws={"shrink":.5},ax=ax);
![Page 164: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/164.jpg)
eventoccurred.
Figure25:scikit-learn
10.4.7.4BoxplotAnalysis
Box plot analysis is useful in detecting whether a distribution is skewed anddetectoutliersinthedata.
%matplotlibinline
data.hist(bins=30,figsize=(20,15))
plt.show()
fig,axs=plt.subplots(2,2,figsize=(10,10))
tmp=data.loc[(data.type=='TRANSFER'),:]
a=sns.boxplot(x='isFlaggedFraud',y='amount',data=tmp,ax=axs[0][0])
axs[0][0].set_yscale('log')
b=sns.boxplot(x='isFlaggedFraud',y='oldbalanceDest',data=tmp,ax=axs[0][1])
axs[0][1].set(ylim=(0,0.5e8))
c=sns.boxplot(x='isFlaggedFraud',y='oldbalanceOrg',data=tmp,ax=axs[1][0])
axs[1][0].set(ylim=(0,3e7))
d=sns.regplot(x='oldbalanceOrg',y='amount',data=tmp.loc[(tmp.isFlaggedFraud==1),:],ax=axs[1][1])
plt.show()
![Page 165: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/165.jpg)
Figure26:scikit-learn
10.4.7.5ScatterplotAnalysis
The scatter plot displays values of two numerical variables as Cartesiancoordinates.plt.figure(figsize=(12,8))
sns.pairplot(data[['amount','oldbalanceOrg','oldbalanceDest','isFraud']],hue='isFraud')
![Page 166: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/166.jpg)
Figure27:scikit-learnscatterplots
10.4.8DataCleansing-RemovingOutliers
Ifthetransactionamountislowerthan5percentoftheallthetransactionsANDdoesnotexceedUSD3000,wewillexcludeitfromouranalysistoreduceType1costsIfthetransactionamountishigherthan95percentofallthetransactionsAND exceeds USD 500000, we will exclude it from our analysis, and use ablanketreviewprocessforsuchtransactions(similartoisFlaggedFraudcolumninoriginaldataset)toreduceType2costslow_exclude=np.round(np.minimum(fin_samp_data.amount.quantile(0.05),3000),2)
high_exclude=np.round(np.maximum(fin_samp_data.amount.quantile(0.95),500000),2)
###UpdatingDatatoexcluderecordspronetoType1andType2costs
low_data=fin_samp_data[fin_samp_data.amount>low_exclude]
data=low_data[low_data.amount<high_exclude]
![Page 167: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/167.jpg)
10.4.9PipelineCreation
Machinelearningpipelineisusedtohelpautomatemachinelearningworkflows.Theyoperateby enabling a sequenceofdata tobe transformedandcorrelatedtogether in a model that can be tested and evaluated to achieve an outcome,whetherpositiveornegative.
10.4.9.1DefiningDataFrameSelectortoseparateNumericalandCategoricalattributes
SamplefunctiontoseperateoutNumericalandcategoricalattributes.
10.4.9.2FeatureCreation/AdditionalFeatureEngineering
DuringEDAweidentifiedthattherearetransactionswherethebalancesdonottallyafterthetransactioniscompleted.Webelievethiscouldpotentiallybecaseswherefraudisoccurring.Toaccountforthiserrorinthetransactions,wedefinetwo new features“errorBalanceOrig” and “errorBalanceDest”, calculated byadjusting theamountwith thebeforeandafterbalances for theOriginatorandDestinationaccounts.
Below,wecreateafunctionthatallowsustocreatethesefeaturesinapipeline.
fromsklearn.baseimportBaseEstimator,TransformerMixin
#Createaclasstoselectnumericalorcategoricalcolumns
#sinceScikit-Learndoesn'thandleDataFramesyet
classDataFrameSelector(BaseEstimator,TransformerMixin):
def__init__(self,attribute_names):
self.attribute_names=attribute_names
deffit(self,X,y=None):
returnself
deftransform(self,X):
returnX[self.attribute_names].values
fromsklearn.baseimportBaseEstimator,TransformerMixin
#columnindex
amount_ix,oldbalanceOrg_ix,newbalanceOrig_ix,oldbalanceDest_ix,newbalanceDest_ix=0,1,2,3,4
classCombinedAttributesAdder(BaseEstimator,TransformerMixin):
def__init__(self):#no*argsor**kargs
pass
deffit(self,X,y=None):
returnself#nothingelsetodo
deftransform(self,X,y=None):
errorBalanceOrig=X[:,newbalanceOrig_ix]+X[:,amount_ix]-X[:,oldbalanceOrg_ix]
errorBalanceDest=X[:,oldbalanceDest_ix]+X[:,amount_ix]-X[:,newbalanceDest_ix]
returnnp.c_[X,errorBalanceOrig,errorBalanceDest]
![Page 168: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/168.jpg)
10.4.10CreatingTrainingandTestingdatasets
Trainingsetincludesthesetofinputexamplesthatthemodelwillbefitinto ortrained on by adjusting the parameters. Testing dataset is critical to test thegeneralizability of the model . By using this set, we can get the workingaccuracyofourmodel.
Testingsetshouldnotbeexposedtomodelunlessmodeltraininghasnotbeencompleted.Thiswaytheresultsfromtestingwillbemorereliable.
10.4.11Creatingpipelinefornumericalandcategoricalattributes
IdentifyingcolumnswithNumericalandCategoricalcharacteristics.
10.4.12Selectingthealgorithmtobeapplied
Algorithimselectionprimarilydependsontheobjectiveyouaretryingtosolveandwhatkindofdatasetisavailable.Therearediffernttypeofalgorithmswhichcanbeappliedandwewilllookintofewofthemhere.
10.4.12.1LinearRegression
This algorithm can be applied when you want to compute some continuousvalue.Topredictsomefuturevalueofaprocesswhichiscurrentlyrunning,you
fromsklearn.model_selectionimporttrain_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.30,random_state=42,stratify=y)
X_train_num=X_train[["amount","oldbalanceOrg","newbalanceOrig","oldbalanceDest","newbalanceDest"]]
X_train_cat=X_train[["type"]]
X_model_col=["amount","oldbalanceOrg","newbalanceOrig","oldbalanceDest","newbalanceDest","type"]
fromsklearn.pipelineimportPipeline
fromsklearn.preprocessingimportStandardScaler
fromsklearn.preprocessingimportImputer
num_attribs=list(X_train_num)
cat_attribs=list(X_train_cat)
num_pipeline=Pipeline([
('selector',DataFrameSelector(num_attribs)),
('attribs_adder',CombinedAttributesAdder()),
('std_scaler',StandardScaler())
])
cat_pipeline=Pipeline([
('selector',DataFrameSelector(cat_attribs)),
('cat_encoder',CategoricalEncoder(encoding="onehot-dense"))
])
![Page 169: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/169.jpg)
cangowithregressionalgorithm.
Exampleswherelinearregressioncanusedare:
1. Predictthetimetakentogofromoneplacetoanother2. Predictthesalesforafuturemonth3. Predictsalesdataandimproveyearlyprojections.
10.4.12.2LogisticRegression
Thisalgorithmcanbeusedtoperformbinaryclassification.Itcanbeusedifyouwantaprobabilisticframework.Alsoincaseyouexpecttoreceivemoretrainingdata in the future that you want to be able to quickly incorporate into yourmodel.
1. Customerchurnprediction.2. CreditScoring&FraudDetectionwhichisourexampleproblemwhichwe
aretryingtosolveinthischapter.3. Calculatingtheeffectivenessofmarketingcampaigns.
10.4.12.3Decisiontrees
Decision trees handle feature interactions and they’re non-parametric. Doesnt
fromsklearn.linear_modelimportLinearRegression
fromsklearn.preprocessingimportStandardScaler
importtime
scl=StandardScaler()
X_train_std=scl.fit_transform(X_train)
X_test_std=scl.transform(X_test)
start=time.time()
lin_reg=LinearRegression()
lin_reg.fit(X_train_std,y_train)#SKLearn'slinearregression
y_train_pred=lin_reg.predict(X_train_std)
train_time=time.time()-start
fromsklearn.linear_modelimportLogisticRegression
fromsklearn.model_selectionimporttrain_test_split
X_train,_,y_train,_=train_test_split(X_train,y_train,stratify=y_train,train_size=subsample_rate,random_state=42)
X_test,_,y_test,_=train_test_split(X_test,y_test,stratify=y_test,train_size=subsample_rate,random_state=42)
model_lr_sklearn=LogisticRegression(multi_class="multinomial",C=1e6,solver="sag",max_iter=15)
model_lr_sklearn.fit(X_train,y_train)
y_pred_test=model_lr_sklearn.predict(X_test)
acc=accuracy_score(y_test,y_pred_test)
results.loc[len(results)]=["LRSklearn",np.round(acc,3)]
results
![Page 170: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/170.jpg)
supportonlinelearningandtheentiretreeneedstoberebuildwhennewtraningdatasetcomesin.Memoryconsumptionisveryhigh.
Canbeusedforthefollowingcases
1. Investmentdecisions2. Customerchurn3. Banksloandefaulters4. BuildvsBuydecisions5. Salesleadqualifications
10.4.12.4KMeans
Thisalgorithmisusedwhenwearenotawareofthelabelsandoneneedstobecreatedbasedon the featuresofobjects.Examplewillbe todivideagroupofpeopleintodifferntsubgroupsbasedoncommonthemeorattribute.
ThemaindisadvantageofK-meanisthatyouneedtoknowexactlythenumberofclustersorgroupswhichisrequired.IttakesalotofiterationtocomeupwiththebestK.
10.4.12.5SupportVectorMachines
fromsklearn.treeimportDecisionTreeRegressor
dt=DecisionTreeRegressor()
start=time.time()
dt.fit(X_train_std,y_train)
y_train_pred=dt.predict(X_train_std)
train_time=time.time()-start
start=time.time()
y_test_pred=dt.predict(X_test_std)
test_time=time.time()-start
fromsklearn.neighborsimportKNeighborsClassifier
fromsklearn.model_selectionimporttrain_test_split,GridSearchCV,PredefinedSplit
fromsklearn.metricsimportaccuracy_score
X_train,_,y_train,_=train_test_split(X_train,y_train,stratify=y_train,train_size=subsample_rate,random_state=42)
X_test,_,y_test,_=train_test_split(X_test,y_test,stratify=y_test,train_size=subsample_rate,random_state=42)
model_knn_sklearn=KNeighborsClassifier(n_jobs=-1)
model_knn_sklearn.fit(X_train,y_train)
y_pred_test=model_knn_sklearn.predict(X_test)
acc=accuracy_score(y_test,y_pred_test)
results.loc[len(results)]=["KNNArbitarySklearn",np.round(acc,3)]
results
![Page 171: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/171.jpg)
SVM is a supervised ML technique and used for pattern recognition andclassification problems when your data has exactly two classes. Its popular intextclassificationproblems.
FewcaseswhereSVMcanbeusedis
1. Detectingpersonswithcommondiseases.2. Hand-writtencharacterrecognition3. Textcategorization4. Stockmarketpriceprediction
10.4.12.6NaiveBayes
NaiveBayesisusedforlargedatasets.ThisalgoritmworkswellevenwhenwehavealimitedCPUandmemoryavailable.Thisworksbycalculatingbunchofcounts.Itrequireslesstrainingdata.Thealgorthimcantlearninterationbetweenfeatures.
NaiveBayescanbeusedinreal-worldapplicationssuchas:
1. Sentimentanalysisandtextclassification2. RecommendationsystemslikeNetflix,Amazon3. Tomarkanemailasspamornotspam4. Facerecognition
10.4.12.7RandomForest
RanmdonforestissimilartoDecisiontree.Canbeusedforbothregressionandclassificationproblemswithlargedatasets.
Fewcasewhereitcanbeapplied.
1. Predictpatientsforhighrisks.2. Predictpartsfailuresinmanufacturing.3. Predictloandefaulters.
fromsklearn.ensembleimportRandomForestRegressor
forest=RandomForestRegressor(n_estimators=400,criterion='mse',random_state=1,n_jobs=-1)
start=time.time()
forest.fit(X_train_std,y_train)
![Page 172: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/172.jpg)
10.4.12.8Neuralnetworks
Neural network works based on weights of connections between neurons.Weights are trained and based on that the neural network can be utilized topredicttheclassoraquantity.Theyareresourceandmemoryintensive.
Fewcaseswhereitcanbeapplied.
1. Appliedtounsupervisedlearningtasks,suchasfeatureextraction.2. Extracts features from raw images or speech with much less human
intervention
10.4.12.9DeepLearningusingKeras
Keras is most powerful and easy-to-use Python libraries for developing andevaluating deep learning models. It has the efficient numerical computationlibrariesTheanoandTensorFlow.
10.4.12.10XGBoost
XGBooststandsforeXtremeGradientBoosting.XGBoostisanimplementationof gradient boosted decision trees designed for speed and performance. It isengineeredforefficiencyofcomputetimeandmemoryresources.
10.4.13ScikitCheatSheet
ScikitlearninghasputaveryindepthandwellexplainedflowcharttohelpyouchoosetherightalgorithmthatIfindveryhandy.
y_train_pred=forest.predict(X_train_std)
train_time=time.time()-start
start=time.time()
y_test_pred=forest.predict(X_test_std)
test_time=time.time()-start
![Page 173: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/173.jpg)
Figure28:scikit-learn
10.4.14ParameterOptimization
Machinelearningmodelsareparameterizedsothattheirbehaviorcanbetunedforagivenproblem.Thesemodelscanhavemanyparametersand finding thebestcombinationofparameterscanbetreatedasasearchproblem.
Aparameterisaconfigurationthatispartofthemodelandvaluescanbederivedfromthegivendata.
1. Requiredbythemodelwhenmakingpredictions.2. Valuesdefinetheskillofthemodelonyourproblem.3. Estimatedorlearnedfromdata.4. Oftennotsetmanuallybythepractitioner.5. Oftensavedaspartofthelearnedmodel.
10.4.14.1Hyperparameteroptimization/tuningalgorithms
Gridsearchisanapproachtohyperparametertuningthatwillmethodicallybuildandevaluateamodelforeachcombinationofalgorithmparametersspecifiedin
![Page 174: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/174.jpg)
agrid.
Random search provide a statistical distribution for each hyperparameter fromwhichvaluesmayberandomlysampled.
10.4.15ExperimentswithKeras(deeplearning),XGBoost,andSVM(SVC)comparedtoLogisticRegression(Baseline)
10.4.15.1Creatingaparametergrid
10.4.15.2ImplementingGridsearchwithmodelsandalsocreatingmetricsfromeachofthemodel.
grid_param=[
[{#LogisticRegression
'model__penalty':['l1','l2'],
'model__C':[0.01,1.0,100]
}],
[{#keras
'model__optimizer':optimizer,
'model__loss':loss
}],
[{#SVM
'model__C':[0.01,1.0,100],
'model__gamma':[0.5,1],
'model__max_iter':[-1]
}],
[{#XGBClassifier
'model__min_child_weight':[1,3,5],
'model__gamma':[0.5],
'model__subsample':[0.6,0.8],
'model__colsample_bytree':[0.6],
'model__max_depth':[3]
}]
]
Pipeline(memory=None,
steps=[('preparation',FeatureUnion(n_jobs=None,
transformer_list=[('num_pipeline',Pipeline(memory=None,
steps=[('selector',DataFrameSelector(attribute_names=['amount','oldbalanceOrg','newbalanceOrig','oldbalanceDest'
tol=0.0001,verbose=0,warm_start=False))])
fromsklearn.metricsimportmean_squared_error
fromsklearn.metricsimportclassification_report
fromsklearn.metricsimportf1_score
fromxgboost.sklearnimportXGBClassifier
fromsklearn.svmimportSVC
test_scores=[]
#MachineLearningAlgorithm(MLA)SelectionandInitialization
MLA=[
linear_model.LogisticRegression(),
keras_model,
SVC(),
XGBClassifier()
![Page 175: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/175.jpg)
10.4.15.3ResultstablefromtheModelevaluationwithmetrics.
]
#createtabletocompareMLAmetrics
MLA_columns=['Name','Score','Accuracy_Score','ROC_AUC_score','final_rmse','Classification_error','Recall_Score','Precision_Score'
MLA_compare=pd.DataFrame(columns=MLA_columns)
Model_Scores=pd.DataFrame(columns=['Name','Score'])
row_index=0
foralginMLA:
#setnameandparameters
MLA_name=alg.__class__.__name__
MLA_compare.loc[row_index,'Name']=MLA_name
#MLA_compare.loc[row_index,'Parameters']=str(alg.get_params())
full_pipeline_with_predictor=Pipeline([
("preparation",full_pipeline),#combinationofnumericalandcategoricalpipelines
("model",alg)
])
grid_search=GridSearchCV(full_pipeline_with_predictor,grid_param[row_index],cv=4,verbose=2,scoring='f1',return_train_score
grid_search.fit(X_train[X_model_col],y_train)
y_pred=grid_search.predict(X_test)
MLA_compare.loc[row_index,'Accuracy_Score']=np.round(accuracy_score(y_pred,y_test),3)
MLA_compare.loc[row_index,'ROC_AUC_score']=np.round(metrics.roc_auc_score(y_test,y_pred),3)
MLA_compare.loc[row_index,'Score']=np.round(grid_search.score(X_test,y_test),3)
negative_mse=grid_search.best_score_
scores=np.sqrt(-negative_mse)
final_mse=mean_squared_error(y_test,y_pred)
final_rmse=np.sqrt(final_mse)
MLA_compare.loc[row_index,'final_rmse']=final_rmse
confusion_matrix_var=confusion_matrix(y_test,y_pred)
TP=confusion_matrix_var[1,1]
TN=confusion_matrix_var[0,0]
FP=confusion_matrix_var[0,1]
FN=confusion_matrix_var[1,0]
MLA_compare.loc[row_index,'Classification_error']=np.round(((FP+FN)/float(TP+TN+FP+FN)),5)
MLA_compare.loc[row_index,'Recall_Score']=np.round(metrics.recall_score(y_test,y_pred),5)
MLA_compare.loc[row_index,'Precision_Score']=np.round(metrics.precision_score(y_test,y_pred),5)
MLA_compare.loc[row_index,'F1_Score']=np.round(f1_score(y_test,y_pred),5)
MLA_compare.loc[row_index,'mean_test_score']=grid_search.cv_results_['mean_test_score'].mean()
MLA_compare.loc[row_index,'mean_fit_time']=grid_search.cv_results_['mean_fit_time'].mean()
Model_Scores.loc[row_index,'MLAName']=MLA_name
Model_Scores.loc[row_index,'MLScore']=np.round(metrics.roc_auc_score(y_test,y_pred),3)
#CollectMeanTestscoresforstatisticalsignificancetest
test_scores.append(grid_search.cv_results_['mean_test_score'])
row_index+=1
![Page 176: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/176.jpg)
Figure29:scikit-learn
10.4.15.4ROCAUCScore
AUC-ROCcurve isaperformancemeasurementforclassificationproblematvarious thresholds settings. ROC is a probability curve and AUC representsdegree or measure of separability. It tells how much model is capable ofdistinguishing between classes. Higher the AUC, better the model is atpredicting0sas0sand1sas1s.
Figure30:scikit-learn
![Page 177: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/177.jpg)
Figure31:scikit-learn
10.4.16K-meansinscikitlearn.
10.4.16.1Import
10.4.17K-meansAlgorithm
Inthissectionwedemonstratehowsimpleitistousek-meansinscikitlearn.
10.4.17.1Import
10.4.17.2Createsamples
fromtimeimporttime
importnumpyasnp
importmatplotlib.pyplotasplt
fromsklearnimportmetrics
fromsklearn.clusterimportKMeans
fromsklearn.datasetsimportload_digits
fromsklearn.decompositionimportPCA
fromsklearn.preprocessingimportscale
np.random.seed(42)
digits=load_digits()
data=scale(digits.data)
![Page 178: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/178.jpg)
10.4.17.3Createsamples
10.4.17.4Visualize
SeeFigure32
np.random.seed(42)
digits=load_digits()
data=scale(digits.data)
n_samples,n_features=data.shape
n_digits=len(np.unique(digits.target))
labels=digits.target
sample_size=300
print("n_digits:%d,\tn_samples%d,\tn_features%d"%(n_digits,n_samples,n_features))
print(79*'_')
print('%9s'%'init''timeinertiahomocomplv-measARIAMIsilhouette')
print("n_digits:%d,\tn_samples%d,\tn_features%d"
%(n_digits,n_samples,n_features))
print(79*'_')
print('%9s'%'init'
'timeinertiahomocomplv-measARIAMIsilhouette')
defbench_k_means(estimator,name,data):
t0=time()
estimator.fit(data)
print('%9s%.2fs%i%.3f%.3f%.3f%.3f%.3f%.3f'
%(name,(time()-t0),estimator.inertia_,
metrics.homogeneity_score(labels,estimator.labels_),
metrics.completeness_score(labels,estimator.labels_),
metrics.v_measure_score(labels,estimator.labels_),
metrics.adjusted_rand_score(labels,estimator.labels_),
metrics.adjusted_mutual_info_score(labels,estimator.labels_),
metrics.silhouette_score(data,estimator.labels_,metric='euclidean',sample_size=sample_size)))
bench_k_means(KMeans(init='k-means++',n_clusters=n_digits,n_init=10),name="k-means++",data=data)
bench_k_means(KMeans(init='random',n_clusters=n_digits,n_init=10),name="random",data=data)
metrics.silhouette_score(data,estimator.labels_,
metric='euclidean',
sample_size=sample_size)))
bench_k_means(KMeans(init='k-means++',n_clusters=n_digits,n_init=10),
name="k-means++",data=data)
bench_k_means(KMeans(init='random',n_clusters=n_digits,n_init=10),
name="random",data=data)
#inthiscasetheseedingofthecentersisdeterministic,hencewerunthe
#kmeansalgorithmonlyoncewithn_init=1
pca=PCA(n_components=n_digits).fit(data)
bench_k_means(KMeans(init=pca.components_,n_clusters=n_digits,n_init=1),name="PCA-based",data=data)
print(79*'_')
bench_k_means(KMeans(init=pca.components_,
n_clusters=n_digits,n_init=1),
name="PCA-based",
data=data)
![Page 179: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/179.jpg)
10.4.17.5Visualize
SeeFigure32
Figure32:Result
print(79*'_')
reduced_data=PCA(n_components=2).fit_transform(data)
kmeans=KMeans(init='k-means++',n_clusters=n_digits,n_init=10)
kmeans.fit(reduced_data)
#Stepsizeofthemesh.DecreasetoincreasethequalityoftheVQ.
h=.02#pointinthemesh[x_min,x_max]x[y_min,y_max].
#Plotthedecisionboundary.Forthat,wewillassignacolortoeach
x_min,x_max=reduced_data[:,0].min()-1,reduced_data[:,0].max()+1
y_min,y_max=reduced_data[:,1].min()-1,reduced_data[:,1].max()+1
xx,yy=np.meshgrid(np.arange(x_min,x_max,h),np.arange(y_min,y_max,h))
#Obtainlabelsforeachpointinmesh.Uselasttrainedmodel.
Z=kmeans.predict(np.c_[xx.ravel(),yy.ravel()])
#Puttheresultintoacolorplot
Z=Z.reshape(xx.shape)
plt.figure(1)
plt.clf()
plt.imshow(Z,interpolation='nearest',
extent=(xx.min(),xx.max(),yy.min(),yy.max()),
cmap=plt.cm.Paired,
aspect='auto',origin='lower')
plt.plot(reduced_data[:,0],reduced_data[:,1],'k.',markersize=2)
#PlotthecentroidsasawhiteX
centroids=kmeans.cluster_centers_
plt.scatter(centroids[:,0],centroids[:,1],
marker='x',s=169,linewidths=3,
color='w',zorder=10)
plt.title('K-meansclusteringonthedigitsdataset(PCA-reduceddata)\n'
'Centroidsaremarkedwithwhitecross')
plt.xlim(x_min,x_max)
plt.ylim(y_min,y_max)
plt.xticks(())
plt.yticks(())
plt.show()
![Page 180: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/180.jpg)
10.5DASK-RANDOMFORESTFEATUREDETECTION☁�
10.5.1Setup
First we need our tools. pandas gives us the DataFrame, very similar to R’sDataFrames.TheDataFrameisastructurethatallowsustoworkwithourdatamoreeasily.Ithasnicefeaturesforslicingandtransformationofdata,andeasywaystodobasicstatistics.
numpyhassomeveryhandyfunctionsthatworkonDataFrames.
10.5.2Dataset
We are using a dataset about the wine quality dataset, archived at UCI’sMachineLearningRepository(http://archive.ics.uci.edu/ml/index.php).
Nowwewillloadourdata.pandasmakesiteasy!
Like in R, there is a .describe() method that gives basic statistics for everycolumninthedataset.
fixedacidity volatileacidity citricacid residual
sugar chlorides
count 1599.000000 1599.000000 1599.000000 1599.000000 1599.000000mean 8.319637 0.527821 0.270976 2.538806 0.087467std 1.741096 0.179060 0.194801 1.409928 0.047065min 4.600000 0.120000 0.000000 0.900000 0.01200025% 7.100000 0.390000 0.090000 1.900000 0.070000
importpandasaspd
importnumpyasnp
#redwinequalitydata,packedinaDataFrame
red_df=pd.read_csv('winequality-red.csv',sep=';',header=0,index_col=False)
#whitewinequalitydata,packedinaDataFrame
white_df=pd.read_csv('winequality-white.csv',sep=';',header=0,index_col=False)
#rose?otherfruitwines?plumwine?:(
#forredwines
red_df.describe()
![Page 181: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/181.jpg)
50% 7.900000 0.520000 0.260000 2.200000 0.079000
75% 9.200000 0.640000 0.420000 2.600000 0.090000max 15.900000 1.580000 1.000000 15.500000 0.611000
fixedacidity volatileacidity citricacid residual
sugar chlorides
count 4898.000000 4898.000000 4898.000000 4898.000000 4898.000000mean 6.854788 0.278241 0.334192 6.391415 0.045772std 0.843868 0.100795 0.121020 5.072058 0.021848min 3.800000 0.080000 0.000000 0.600000 0.00900025% 6.300000 0.210000 0.270000 1.700000 0.03600050% 6.800000 0.260000 0.320000 5.200000 0.04300075% 7.300000 0.320000 0.390000 9.900000 0.050000max 14.200000 1.100000 1.660000 65.800000 0.346000
Sometimesitiseasiertounderstandthedatavisually.Ahistogramofthewhitewinequalitydatacitricacidsamplesisshownnext.Youcanofcoursevisualizeother columns’ data or other datasets. Just replace theDataFrame and columnname(seeFigure33).
#forwhitewines
white_df.describe()
importmatplotlib.pyplotasplt
defextract_col(df,col_name):
returnlist(df[col_name])
col=extract_col(white_df,'citricacid')#canreplacewithanotherdataframeorcolumn
plt.hist(col)
#TODO:addaxesandsuchtosetagoodexample
plt.show()
![Page 182: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/182.jpg)
Figure33:Histogram
10.5.3DetectingFeatures
Letustryoutasomeelementarymachinelearningmodels.Thesemodelsarenotalways for prediction. They are also useful to find what features are mostpredictiveofavariableofinterest.Dependingontheclassifieryouuse,youmayneedtotransformthedatapertainingtothatvariable.
10.5.3.1DataPreparation
LetusassumewewanttostudywhatfeaturesaremostcorrelatedwithpH.pHofcourseisreal-valued,andcontinuous.Theclassifierswewanttouseusuallyneed labeled or integer data.Hence,wewill transform the pH data, assigningwineswithpHhigherthanaverageashi(morebasicoralkaline)andwineswithpHlowerthanaverageaslo(moreacidic).#refreshtomakeJupyterhappy
red_df=pd.read_csv('winequality-red.csv',sep=';',header=0,index_col=False)
white_df=pd.read_csv('winequality-white.csv',sep=';',header=0,index_col=False)
#TODO:datacleansingfunctionshere,e.g.replacementofNaN
#ifthevariableyouwanttopredictiscontinuous,youcanmaprangesofvalues
#tointeger/binary/stringlabels
#forexample,mapthepHdatato'hi'and'lo'ifapHvalueismorethanor
#lessthanthemeanpH,respectively
M=np.mean(list(red_df['pH']))#expectinelegantcodeinthesemappings
Lf=lambdap:int(p<M)*'lo'+int(p>=M)*'hi'#someC-stylehackery
#createthenewclassifiablevariable
red_df['pH-hi-lo']=map(Lf,list(red_df['pH']))
#andremovethepredecessor
delred_df['pH']
![Page 183: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/183.jpg)
Nowwe specifywhich dataset and variable youwant to predict by assigningvluestoSELECTED_DFandTARGET_VAR,respectively.
Weliketokeepaparameterfilewherewespecifydatasourcesandsuch.Thisletsmecreategenericanalyticscodethatiseasytoreuse.
Afterwehavespecifiedwhatdatasetwewanttostudy,wesplitthetrainingandtestdatasets.We thenscale (normalize) thedata,whichmakesmostclassifiersrunbetter.
Nowwepickaclassifier.Asyoucansee, therearemany to tryout,andevenmoreinscikit-learn’sdocumentationandmanyexamplesandtutorials.RandomForests are data scienceworkhorses.They are thego-tomethod formost datascientists. Be careful relying on them though–they tend to overfit.We try toavoidoverfittingbyseparatingthetrainingandtestdatasets.
10.5.4RandomForest
Nowwewilltestitoutwiththedefaultparameters.
Notethatthiscodeisboilerplate.Youcanuseitinterchangeablyformostscikit-
fromsklearn.model_selectionimporttrain_test_split
fromsklearn.preprocessingimportStandardScaler
fromsklearnimportmetrics
#makeselectionsherewithoutdiggingincode
SELECTED_DF=red_df#selecteddataset
TARGET_VAR='pH-hi-lo'#thepredictedvariable
#generatenamelessdatastructures
df=SELECTED_DF
target=np.array(df[TARGET_VAR]).ravel()
deldf[TARGET_VAR]#nocheating
#TODO:datacleansingfunctioncallshere
#splitdatasetsfortrainingandtesting
X_train,X_test,y_train,y_test=train_test_split(df,target,test_size=0.2)
#setupthescaler
scaler=StandardScaler()
scaler.fit(X_train)
#applythescaler
X_train=scaler.transform(X_train)
X_test=scaler.transform(X_test)
#pickaclassifier
fromsklearn.treeimportDecisionTreeClassifier,DecisionTreeRegressor,ExtraTreeClassifier,ExtraTreeRegressor
fromsklearn.ensembleimportRandomForestClassifier,ExtraTreesClassifier
clf=RandomForestClassifier()
![Page 184: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/184.jpg)
learnmodels.
Nowoutputtheresults.ForRandomForests,wegetafeatureranking.Relativeimportances usually exponentially decay. The first few highly-ranked featuresareusuallythemostimportant.
Featureranking:
fixed acidity 0.269778 citric acid 0.171337 density 0.089660 volatile acidity0.088965 chlorides 0.082945 alcohol 0.080437 total sulfur dioxide 0.067832sulphates0.047786freesulfurdioxide0.042727residualsugar0.037459quality0.021075
Sometimesit’seasiertovisualize.We’lluseabarchart.SeeFigure34
#testitout
model=clf.fit(X_train,y_train)
pred=clf.predict(X_test)
conf_matrix=metrics.confusion_matrix(y_test,pred)
var_score=clf.score(X_test,y_test)
#theresults
importances=clf.feature_importances_
indices=np.argsort(importances)[::-1]
#forthesakeofclarity
num_features=X_train.shape[1]
features=map(lambdax:df.columns[x],indices)
feature_importances=map(lambdax:importances[x],indices)
print'Featureranking:\n'
foriinrange(num_features):
feature_name=features[i]
feature_importance=feature_importances[i]
print'%s%f'%(feature_name.ljust(30),feature_importance)
plt.clf()
plt.bar(range(num_features),feature_importances)
plt.xticks(range(num_features),features,rotation=90)
plt.ylabel('relativeimportance(a.u.)')
plt.title('Relativeimportancesofmostpredictivefeatures')
plt.show()
![Page 185: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/185.jpg)
Figure34:Result
10.5.5Acknowledgement
ThisnotebookwasdevelopedbyJulietteZerickandGregorvonLaszewski
10.6PARALLELCOMPUTINGINPYTHON☁�InthismodulewewillreviewtheavailablePythonmodulesthatcanbeusedforparallel computing. The parallel computing can be in form of either multi-threadingormulti-processing.Inmulti-threadingapproach,thethreadsruninthesame shared memory heap whereas in case of multi-processing, the memoryheaps of processes are separate and independent, therefore the communicationbetweentheprocessesarealittlebitmorecomplex.
10.6.1Multi-threadinginPython
ThreadinginPythonisperfectforI/Ooperationswheretheprocessisexpectedto be idle regularly, e.g. web scraping. This is a very useful feature because
importdask.dataframeasdd
red_df=dd.read_csv('winequality-red.csv',sep=';',header=0)
white_df=dd.read_csv('winequality-white.csv',sep=';',header=0)
![Page 186: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/186.jpg)
several applications and script might spend the majority of their runtime onwaiting for network or data I/O. In several cases, e.g. web scraping, theresources, i.e. downloading from different websites, are most of the timeindependent.Thereforetheprocessorcandownloadinparallelandjointheresultattheend.
10.6.1.1ThreadvsThreading
Therearetwobuilt-inmodulesinPythonthatarerelatedtothreading,namelythreadandthreading.TheformermoduleisdeprecatedforsometimeinPython2,andinPython3itisrenamedto_threadforthesakeofbackwardsincompatibilities.The_threadmoduleprovideslow-levelthreadingAPIformulti-threadinginPython,whereasthemodulethreadingbuildsahigh-levelthreadinginterfaceontopofit.
TheThread()isthemainmethodofthethreadingmodule,thetwoimportantargumentsof which are target, for specifying the callable object, and args to pass theargumentsforthetargetcallable.Weillustratetheseinthefollowingexample:
Thisistheoutputofthepreviousexample:
Incaseyouarenotfamiliarwiththeif__name__=='__main__:'statement,whatitdoesisbasicallymakingsurethatthecodenestedunderthisconditionwillberunonlyifyourunyourmoduleasaprogramanditwillnotrunincaseyourmoduleisimportedinanotherfile.
10.6.1.2Locks
Asmentionedprior,thememoryspaceissharedbetweenthethreads.Thisisat
importthreading
defhello_thread(thread_num):
print("HellofromThread",thread_num)
if__name__=='__main__':
forthread_numinrange(5):
t=threading.Thread(target=hello_thread,arg=(thread_num,))
t.start()
In[1]:%runthreading.py
HellofromThread0
HellofromThread1
HellofromThread2
HellofromThread3
HellofromThread4
![Page 187: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/187.jpg)
the same time beneficial and problematic: it is beneficial in a sense that thecommunication between the threads becomes easy, however, you mightexperience strange outcome if you let several threads change same variablewithoutcaution,e.g.thread2changesvariablexwhilethread1isworkingwithit.Thisiswhenlockcomesintoplay.Usinglock,youcanallowonlyonethreadtoworkwithavariable.Inotherwords,onlyasinglethreadcanholdthelock.Iftheother threadsneed toworkwith thatvariable, theyhave towaituntil theotherthreadisdoneandthevariableis“unlocked”.
Weillustratethiswithasimpleexample:
Supposewewant toprintmultiplesof3between1and12, i.e.3,6,9and12.Forthesakeofargument,wetrytodothisusing2threadsandanestedforloop.Thenwecreateaglobalvariablecalledcounterandweinitializeitwith0.Thenwhenever each of the incrementer1 or incrementer2 functions are called, the counter isincrementedby3 twice (counter is incrementedby6 in each functioncall). Ifyourunthepreviouscode,youshouldbereallyluckyifyougetthefollowingaspartofyouroutput:
Thereasonistheconflictthathappensbetweenthreadswhileincrementingthe
importthreading
globalcounter
counter=0
defincrementer1():
globalcounter
forjinrange(2):
foriinrange(3):
counter+=1
print("Greeter1incrementedthecounterby1")
print("Counteris%d"%counter)
defincrementer2():
globalcounter
forjinrange(2):
foriinrange(3):
counter+=1
print("Greeter2incrementedthecounterby1")
print("Counterisnow%d"%counter)
if__name__=='__main__':
t1=threading.Thread(target=incrementer1)
t2=threading.Thread(target=incrementer2)
t1.start()
t2.start()
Counterisnow3
Counterisnow6
Counterisnow9
Counterisnow12
![Page 188: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/188.jpg)
counter in thenestedfor loop.Asyouprobablynoticed, thefirst levelfor loopisequivalentofadding3 to thecounterand theconflict thatmighthappen isnoteffective on that level but the nested for loop.Accordingly, the output of thepreviouscodeisdifferentineveryrun.Thisisanexampleoutput:
We can fix this issue using a lock: whenever one of the function is going toincrementthevalueby3,itwillacquire()thelockandwhenitisdonethefunctionwillrelease()thelock.Thismechanismisillustratedinthefollowingcode:
Nomatterhowmanytimesyourunthiscode,theoutputwouldalwaysbeinthecorrectorder:
$python3lock_example.py
Greeter1incrementedthecounterby1
Greeter1incrementedthecounterby1
Greeter1incrementedthecounterby1
Counteris4
Greeter2incrementedthecounterby1
Greeter2incrementedthecounterby1
Greeter1incrementedthecounterby1
Greeter2incrementedthecounterby1
Greeter1incrementedthecounterby1
Counteris8
Greeter1incrementedthecounterby1
Greeter2incrementedthecounterby1
Counteris10
Greeter2incrementedthecounterby1
Greeter2incrementedthecounterby1
Counteris12
importthreading
increment_by_3_lock=threading.Lock()
globalcounter
counter=0
defincrementer1():
globalcounter
forjinrange(2):
increment_by_3_lock.acquire(True)
foriinrange(3):
counter+=1
print("Greeter1incrementedthecounterby1")
print("Counteris%d"%counter)
increment_by_3_lock.release()
defincrementer2():
globalcounter
forjinrange(2):
increment_by_3_lock.acquire(True)
foriinrange(3):
counter+=1
print("Greeter2incrementedthecounterby1")
print("Counteris%d"%counter)
increment_by_3_lock.release()
if__name__=='__main__':
t1=threading.Thread(target=incrementer1)
t2=threading.Thread(target=incrementer2)
t1.start()
t2.start()
$python3lock_example.py
![Page 189: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/189.jpg)
Using the Threading module increases both the overhead associated with threadmanagementaswellasthecomplexityoftheprogramandthatiswhyinmanysituations,employingmultiprocessingmodulemightbeabetterapproach.
10.6.2Multi-processinginPython
We already mentioned that multi-threading might not be sufficient in manyapplicationsandwemightneedtousemultiprocessingsometime,orbettertosaymostof the times. That is why we are dedicating this subsection to this particularmodule.ThismoduleprovidesyouwithanAPIforspawningprocessesthewayyou spawn threads using threading module. Moreover, there are somefunctionalities that are not even available in threading module, e.g. the Pool classwhichallowsyoutorunabatchofjobsusingapoolofworkerprocesses.
10.6.2.1Process
Similartothreadingmodulewhichwasemployingthread(aka_thread)underthehood,multiprocessingemploystheProcessclass.Considerthefollowingexample:
In this example, after importing the Processmodulewecreated a greeter() function
Greeter1incrementedthecounterby1
Greeter1incrementedthecounterby1
Greeter1incrementedthecounterby1
Counteris3
Greeter1incrementedthecounterby1
Greeter1incrementedthecounterby1
Greeter1incrementedthecounterby1
Counteris6
Greeter2incrementedthecounterby1
Greeter2incrementedthecounterby1
Greeter2incrementedthecounterby1
Counteris9
Greeter2incrementedthecounterby1
Greeter2incrementedthecounterby1
Greeter2incrementedthecounterby1
Counteris12
frommultiprocessingimportProcess
importos
defgreeter(name):
proc_idx=os.getpid()
print("Process{0}:Hello{1}!".format(proc_idx,name))
if__name__=='__main__':
name_list=['Harry','George','Dirk','David']
process_list=[]
forname_idx,nameinenumerate(name_list):
current_process=Process(target=greeter,args=(name,))
process_list.append(current_process)
current_process.start()
forprocessinprocess_list:
process.join()
![Page 190: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/190.jpg)
thattakesanameandgreetsthatperson.Italsoprintsthepid(processidentifier)oftheprocessthatisrunningit.Notethatweusedtheosmoduletogetthepid.Inthebottomofthecodeaftercheckingthe__name__='__main__'condition,wecreateaseriesofProcessesandstartthem.Finallyinthelastforloopandusingthejoinmethod,wetell Python towait for the processes to terminate. This is one of the possibleoutputsofthecode:
10.6.2.2Pool
ConsiderthePoolclassasapoolofworkerprocesses.ThereareseveralwaysforassigningjobstothePoolclassandwewillintroducethemostimportantonesinthis section. These methods are categorized as blocking or non-blocking. The formermeans that after calling the API, it blocks the thread/process until it has theresultoranswerreadyandthecontrolreturnsonlywhenthecallcompletes.Inthenon-blockinontheotherhand,thecontrolreturnsimmediately.
10.6.2.2.1SynchronousPool.map()
WeillustratethePool.mapmethodbyre-implementingourpreviousgreeterexampleusingPool.map:
Asyoucansee,wehavesevennamesherebutwedonotwanttodedicateeachgreeting toa separateprocess. Insteadwedo thewhole jobof“greetingsevenpeople”using“twoprocesses”.Wecreateapoolof3processeswithPool(processes=3)syntax and then we map an iterable called names to the greeter function usingpool.map(greeter,names).Asweexpected,thegreetingsintheoutputwillbeprintedfromthreedifferentprocesses:
$python3process_example.py
Process23451:HelloHarry!
Process23452:HelloGeorge!
Process23453:HelloDirk!
Process23454:HelloDavid!
frommultiprocessingimportPool
importos
defgreeter(name):
pid=os.getpid()
print("Process{0}:Hello{1}!".format(pid,name))
if__name__=='__main__':
names=['Jenna','David','Marry','Ted','Jerry','Tom','Justin']
pool=Pool(processes=3)
sync_map=pool.map(greeter,names)
print("Done!")
![Page 191: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/191.jpg)
Note that Pool.map() is in blocking category and does not return the control to yourscriptuntilitisdonecalculatingtheresults.ThatiswhyDone!isprintedafterallofthegreetingsareover.
10.6.2.2.2AsynchronousPool.map_async()
As the name implies, you can use the map_async method, when you want assignmany function calls to a pool of worker processes asynchronously. Note thatunlike map, the order of the results is not guaranteed (as oppose to map) and thecontrolisreturnedimmediately.Wenowimplementthepreviousexampleusingmap_async:
As you probably noticed, the only difference (clearly apart from the map_async
methodname)iscallingthe wait()methodin the last line.The wait()method tellsyourscripttowaitfortheresultofmap_asyncbeforeterminating:
Note that the order of the results are not preserved.Moreover, Done! is printerbefore anyof the results,meaning that ifwedonot use the wait()method, youprobablywillnotseetheresultatall.
$pythonpoolmap_example.py
Process30585:HelloJenna!
Process30586:HelloDavid!
Process30587:HelloMarry!
Process30585:HelloTed!
Process30585:HelloJerry!
Process30587:HelloTom!
Process30585:HelloJustin!
Done!
frommultiprocessingimportPool
importos
defgreeter(name):
pid=os.getpid()
print("Process{0}:Hello{1}!".format(pid,name))
if__name__=='__main__':
names=['Jenna','David','Marry','Ted','Jerry','Tom','Justin']
pool=Pool(processes=3)
async_map=pool.map_async(greeter,names)
print("Done!")
async_map.wait()
$pythonpoolmap_example.py
Done!
Process30740:HelloJenna!
Process30741:HelloDavid!
Process30740:HelloTed!
Process30742:HelloMarry!
Process30740:HelloJerry!
Process30741:HelloTom!
Process30742:HelloJustin!
![Page 192: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/192.jpg)
10.6.2.3Locks
Thewaymultiprocessingmoduleimplementslocksisalmostidenticaltothewaythethreadingmoduledoes.AfterimportingLockfrommultiprocessingallyouneedtodoistoacquireit,dosomecomputationandthenreleasethelock.WewillclarifytheuseofLockbyprovidinganexampleinnextsectionaboutprocesscommunication.
10.6.2.4ProcessCommunication
Process communication in multiprocessing is one of the most important, yetcomplicated,featuresforbetteruseofthismodule.Asopposetothreading,theProcessobjects will not have access to any shared variable by default, i.e. no sharedmemoryspacebetweentheprocessesbydefault.Thiseffectisillustratedinthefollowingexample:
Probably you already noticed that this is almost identical to our example inthreadingsection.Now,takealookatthestrangeoutput:
Asyoucansee,itisasiftheprocessesdoesnotseeeachother.Insteadofhavingtwoprocessesonecounting to6and theothercountingfrom6 to12,wehave
frommultiprocessingimportProcess,Lock,Value
importtime
globalcounter
counter=0
defincrementer1():
globalcounter
forjinrange(2):
foriinrange(3):
counter+=1
print("Greeter1:Counteris%d"%counter)
defincrementer2():
globalcounter
forjinrange(2):
foriinrange(3):
counter+=1
print("Greeter2:Counteris%d"%counter)
if__name__=='__main__':
t1=Process(target=incrementer1)
t2=Process(target=incrementer2)
t1.start()
t2.start()
$pythoncommunication_example.py
Greeter1:Counteris3
Greeter1:Counteris6
Greeter2:Counteris3
Greeter2:Counteris6
![Page 193: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/193.jpg)
twoprocessescountingto6.
Nevertheless, there are several ways that Processes from multiprocessing cancommunicatewitheachother,includingPipe,Queue,Value,ArrayandManager.Pipeand Queueare appropriate for inter-processmessage passing. To bemore specific, Pipe isuseful for process-to-process scenarios while Queue is more appropriate forprocesses-toprocessesones.ValueandArrayarebothusedtoprovideasynchronizedaccesstoashareddata(verymuchlikesharedmemory)andManagerscanbeusedondifferentdatatypes.Inthefollowingsub-sections,wecoverbothValueandArraysincetheyarebothlightweight,yetuseful,approaches.
10.6.2.4.1Value
The following example re-implements the broken example in the previoussection.Wefixthestrangeoutput,byusingbothLockandValue:
The usage of Lock object in this example is identical to the example in threadingsection.Theusageof counter ison theotherhand thenovelpart.First,note thatcounterisnotaglobalvariableanymoreandinsteaditisaValuewhichreturnsactypes object allocated from a shared memory between the processes. The firstargument 'i' indicates a signed integer, and the second argument defines the
frommultiprocessingimportProcess,Lock,Value
importtime
increment_by_3_lock=Lock()
defincrementer1(counter):
forjinrange(3):
increment_by_3_lock.acquire(True)
foriinrange(3):
counter.value+=1
time.sleep(0.1)
print("Greeter1:Counteris%d"%counter.value)
increment_by_3_lock.release()
defincrementer2(counter):
forjinrange(3):
increment_by_3_lock.acquire(True)
foriinrange(3):
counter.value+=1
time.sleep(0.05)
print("Greeter2:Counteris%d"%counter.value)
increment_by_3_lock.release()
if__name__=='__main__':
counter=Value('i',0)
t1=Process(target=incrementer1,args=(counter,))
t2=Process(target=incrementer2,args=(counter,))
t2.start()
t1.start()
![Page 194: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/194.jpg)
initializationvalue.Inthiscaseweareassigningasignedintegerinthesharedmemory initialized to size 0 to the counter variable.We thenmodified our twofunctionsandpassthissharedvariableasanargument.Finally,wechangethewayweincrementthecountersincecounterisnotanPythonintegeranymorebutactypes signed integerwherewe can access its value using the value attribute.Theoutputofthecodeisnowasweexpected:
Thelastexamplerelatedtoparallelprocessing,illustratestheuseofbothValueandArray,aswellasa technique topassmultiplearguments toa function.Note thattheProcessobjectdoesnotacceptmultipleargumentsforafunctionandthereforewe need this or similar techniques for passingmultiple arguments. Also, thistechniquecanalsobeusedwhenyouwanttopassmultipleargumentstomapormap_async:
$pythonmp_lock_example.py
Greeter2:Counteris3
Greeter2:Counteris6
Greeter1:Counteris9
Greeter1:Counteris12
frommultiprocessingimportProcess,Lock,Value,Array
importtime
fromctypesimportc_char_p
increment_by_3_lock=Lock()
defincrementer1(counter_and_names):
counter=counter_and_names[0]
names=counter_and_names[1]
forjinrange(2):
increment_by_3_lock.acquire(True)
foriinrange(3):
counter.value+=1
time.sleep(0.1)
name_idx=counter.value//3-1
print("Greeter1:Greeting{0}!Counteris{1}".format(names.value[name_idx],counter.value))
increment_by_3_lock.release()
defincrementer2(counter_and_names):
counter=counter_and_names[0]
names=counter_and_names[1]
forjinrange(2):
increment_by_3_lock.acquire(True)
foriinrange(3):
counter.value+=1
time.sleep(0.05)
name_idx=counter.value//3-1
print("Greeter2:Greeting{0}!Counteris{1}".format(names.value[name_idx],counter.value))
increment_by_3_lock.release()
if__name__=='__main__':
counter=Value('i',0)
names=Array(c_char_p,4)
names.value=['James','Tom','Sam','Larry']
t1=Process(target=incrementer1,args=((counter,names),))
t2=Process(target=incrementer2,args=((counter,names),))
t2.start()
t1.start()
![Page 195: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/195.jpg)
Inthisexamplewecreatedamultiprocessing.Array()objectandassignedittoavariablecallednames.Aswementionedbefore,thefirstargumentisthectypedatatypeandsincewewanttocreateanarrayofstringswithlengthof4(secondargument),weimportedthec_char_pandpasseditasthefirstargument.
Instead of passing the arguments separately,wemerged both the Value and Arrayobjects in a tuple andpassed the tuple to the functions.We thenmodified thefunctions to unpack the objects in the first two lines in the both functions.Finally we changed the print statement in a way that each process greets aparticularname.Theoutputoftheexampleis:
10.7DASK☁�Dask is a python-based parallel computing library for analytics. Parallelcomputingisatypeofcomputationinwhichmanycalculationsortheexecutionofprocessesarecarriedoutsimultaneously.Largeproblemscanoftenbedividedintosmallerones,whichcanthenbesolvedconcurrently.
Daskiscomposedoftwocomponents:
1. Dynamic task scheduling optimized for computation. This is similar toAirflow, Luigi, Celery, or Make, but optimized for interactivecomputationalworkloads.
2. BigDatacollections like parallel arrays, dataframes, and lists that extendcommoninterfaceslikeNumPy,Pandas,orPythoniteratorstolarger-than-memoryordistributedenvironments.Theseparallelcollectionsrunontopofthedynamictaskschedulers.
Daskemphasizesthefollowingvirtues:
Familiar: Provides parallelized NumPy array and Pandas DataFrameobjects.Flexible:Providesa task scheduling interface formorecustomworkloadsandintegrationwithotherprojects.
$python3mp_lock_example.py
Greeter2:GreetingJames!Counteris3
Greeter2:GreetingTom!Counteris6
Greeter1:GreetingSam!Counteris9
Greeter1:GreetingLarry!Counteris12
![Page 196: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/196.jpg)
Native: Enables distributed computing in Pure Pythonwith access to thePyDatastack.Fast:Operateswith lowoverhead, low latency, andminimal serializationnecessaryforfastnumericalalgorithmsScalesup:Runsresilientlyonclusterswith1000sofcoresScalesdown:TrivialtosetupandrunonalaptopinasingleprocessResponsive:Designedwithinteractivecomputinginminditprovidesrapidfeedbackanddiagnosticstoaidhumans
The section is structured in a number of subsections addressing the followingtopics:
Foundations:
anexplanationofwhatDaskis,howitworks,andhowtouselowerlevelprimitives to set up computations. Casual users may wish to skip thissection,althoughweconsideritusefulknowledgeforallusers.
DistributedFeatures:
information on runningDask on the distributed scheduler,which enablesscale-uptodistributedsettingsandenhancedmonitoringoftaskoperations.The distributed scheduler is now generally the recommended engine forexecutingtaskwork,evenonsingleworkstationsorlaptops.
Collections:
convenientabstractionsgivingafamiliarfeeltobigdata.
Bags:
Pythoniteratorswithafunctionalparadigm,suchasfoundinfunc/iter-toolsand toolz - generalize lists/generators to big data; this will seem veryfamiliartousersofPySpark’sRDD
Array:
massivemulti-dimensionalnumericaldata,withNumpyfunctionality
![Page 197: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/197.jpg)
Dataframe:
massivetabulardata,withPandasfunctionality
10.7.1HowDaskWorks
Daskiscomputationtoolforlarger-than-memorydatasets,parallelexecutionordelayed/backgroundexecution.
WecansummarizethebasicsofDaskasfollows:
processdata thatdoesnot fit intomemorybybreaking it intoblocksandspecifyingtaskchainsparallelizeexecutionoftasksacrosscoresandevennodesofaclustermovecomputationtothedataratherthantheotherwayaround,tominimizecommunicationoverheads
Weusefor-loops tobuildbasic tasks,Python iterators,and theNumpy(array)and Pandas (dataframe) functions for multi-dimensional or tabular data,respectively.
Daskallowsus toconstructaprescriptionfor thecalculationwewant tocarryout.AmodulenamedDask.delayedletsusparallelizecustomcode.Itisusefulwhenever our problem doesn’t quite fit a high-level parallel object likedask.array or dask.dataframe but could still benefit from parallelism.Dask.delayedworksbydelayingourfunctionevaluationsandputtingthemintoadaskgraph.Hereisasmallexample:
Herewehaveusedthedelayedannotationtoshowthatwewantthesefunctionstooperatelazily-tosavethesetofinputsandexecuteonlyondemand.
10.7.2DaskBag
fromdaskimportdelayed
@delayed
definc(x):
returnx+1
@delayed
defadd(x,y):
returnx+y
![Page 198: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/198.jpg)
Dask-bag excels in processing data that can be represented as a sequence ofarbitrary inputs. We’ll refer to this as “messy” data, because it can containcomplex nested structures, missing fields, mixtures of data types, etc. Thefunctional programming style fits very nicely with standard Python iteration,suchascanbefoundintheitertoolsmodule.
Messy data is often encountered at the beginning of data processing pipelineswhenlargevolumesofrawdataarefirstconsumed.TheinitialsetofdatamightbeJSON,CSV,XML,oranyotherformatthatdoesnotenforcestrictstructureanddatatypes.Forthisreason,theinitialdatamassagingandprocessingisoftendonewithPythonlists,dicts,andsets.
These core data structures are optimized for general-purpose storage andprocessing.Adding streaming computationwith iterators/generator expressionsorlibrarieslikeitertoolsortoolzletusprocesslargevolumesinasmallspace.Ifwe combine this with parallel processing then we can churn through a fairamountofdata.
Dask.bagisahighlevelDaskcollectiontoautomatecommonworkloadsofthisform.Inanutshell
YoucancreateaBagfromaPythonsequence,fromfiles,fromdataonS3,etc..
Bagobjectshold thestandardfunctionalAPIfoundinprojects like thePythonstandardlibrary,toolz,orpyspark,includingmap,filter,groupby,etc..
As with Array and DataFrame objects, operations on Bag objects create newbags.Callthe.compute()methodtotriggerexecution.
dask.bag=map,filter,toolz+parallelexecution
#eachelementisaninteger
importdask.bagasdb
b=db.from_sequence([1,2,3,4,5,6,7,8,9,10])
#eachelementisatextfileofJSONlines
importos
b=db.read_text(os.path.join('data','accounts.*.json.gz'))
#Requires`s3fs`library
#eachelementisaremoteCSVtextfile
b=db.read_text('s3://dask-data/nyc-taxi/2015/yellow_tripdata_2015-01.csv')
defis_even(n):
returnn%2==0
b=db.from_sequence([1,2,3,4,5,6,7,8,9,10])
![Page 199: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/199.jpg)
FormoredetailsonDaskBagcheckhttps://dask.pydata.org/en/latest/bag.html
10.7.3ConcurrencyFeatures
Dask supports a real-time task framework that extends Python’sconcurrent.futuresinterface.Thisinterfaceisgoodforarbitrarytaskscheduling,likedask.delayed,butisimmediateratherthanlazy,whichprovidessomemoreflexibility in situations where the computations may evolve over time. Thesefeatures depend on the second generation task scheduler found indask.distributed(which,despiteitsname,runsverywellonasinglemachine).
Daskallowsus tosimplyconstructgraphsof taskswithdependencies.Wecanfind that graphs can also be created automatically for us using functional,NumpyorPandassyntaxondatacollections.Noneofthiswouldbeveryuseful,if thereweren’talsoaway toexecute thesegraphs, inaparallelandmemory-awareway.Daskcomeswithfouravailableschedulers:
dask.threaded.get:aschedulerbackedbyathreadpooldask.multiprocessing.get:aschedulerbackedbyaprocesspooldask.async.get_sync:asynchronousscheduler,goodfordebuggingdistributed.Client.get: a distributed scheduler for executing graphs on multiplemachines.
Hereisasimpleprogramfordask.distributedlibrary:
For more details on Concurrent Features by Dask checkhttps://dask.pydata.org/en/latest/futures.html
10.7.4DaskArray
c=b.filter(is_even).map(lambdax:x**2)
c
#blockingform:waitforcompletion(whichisveryfastinthiscase)
c.compute()
fromdask.distributedimportClient
client=Client('scheduler:port')
futures=[]
forfninfilenames:
future=client.submit(load,fn)
futures.append(future)
summary=client.submit(summarize,futures)
summary.result()
![Page 200: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/200.jpg)
Dask arrays implement a subset of theNumPy interface on large arrays usingblocked algorithms and task scheduling. These behave like numpy arrays, butbreakamassivejobintotasksthatarethenexecutedbyascheduler.Thedefaultschedulerusesthreadingbutyoucanalsousemultiprocessingordistributedorevenserialprocessing(mainlyfordebugging).Youcantellthedaskarrayhowtobreakthedataintochunksforprocessing.
For more details on Dask Array checkhttps://dask.pydata.org/en/latest/array.html
10.7.5DaskDataFrame
A Dask DataFrame is a large parallel dataframe composed of many smallerPandasdataframes,splitalongtheindex.Thesepandasdataframesmayliveondisk for larger-than-memory computing on a single machine, or on manydifferentmachines in a cluster.Dask.dataframe implements a commonly usedsubset of the Pandas interface including elementwise operations, reductions,groupingoperations,joins,timeseriesalgorithms,andmore.ItcopiesthePandasinterfacefor theseoperationsexactlyandsoshouldbeveryfamiliar toPandasusers.BecauseDask.dataframeoperationsmerelycoordinatePandasoperationstheyusuallyexhibitsimilarperformancecharacteristicsasarefoundinPandas.Torunthefollowingcode,save‘student.csv’fileinyourmachine.
importdask.arrayasda
f=h5py.File('myfile.hdf5')
x=da.from_array(f['/big-data'],chunks=(1000,1000))
x-x.mean(axis=1).compute()
importpandasaspd
df=pd.read_csv('student.csv')
d=df.groupby(df.HID).Serial_No.mean()
print(d)
ID
1011
1022
1043
1054
1065
1076
1097
1118
2019
20210
Name:Serial_No,dtype:int64
importdask.dataframeasdd
df=dd.read_csv('student.csv')
dt=df.groupby(df.HID).Serial_No.mean().compute()
print(dt)
ID
1011.0
![Page 201: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/201.jpg)
For more details on Dask DataFrame checkhttps://dask.pydata.org/en/latest/dataframe.html
10.7.6DaskDataFrameStorage
Efficient storage can dramatically improve performance, particularly whenoperatingrepeatedlyfromdisk.
Decompressing text and parsing CSV files is expensive. One of the mosteffective strategies with medium data is to use a binary storage format likeHDF5.
Createdataifwedon’thaveany
Firstwereadourcsvdataasbefore.
CSV and other text-based file formats are themost common storage for datafrommanysources,becausetheyrequireminimalpre-processing,canbewrittenline-by-lineandarehuman-readable.SincePandas’read_csviswell-optimized,CSVs are a reasonable input, but far from optimized, since reading requiredextensivetextparsing.
HDF5 and netCDF are binary array formats very commonly used in thescientificrealm.
1022.0
1043.0
1054.0
1065.0
1076.0
1097.0
1118.0
2019.0
20210.0
Name:Serial_No,dtype:float64
#besuretoshutdownotherkernelsrunningdistributedclients
fromdask.distributedimportClient
client=Client()
fromprepimportaccounts_csvs
accounts_csvs(3,1000000,500)
importos
filename=os.path.join('data','accounts.*.csv')
filename
importdask.dataframeasdd
df_csv=dd.read_csv(filename)
df_csv.head()
![Page 202: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/202.jpg)
Pandas contains a specialized HDF5 format, HDFStore. Thedd.DataFrame.to_hdf method works exactly like the pd.DataFrame.to_hdfmethod.
For more information of Dask DataFrame Storage, clickhttp://dask.pydata.org/en/latest/dataframe-create.html
10.7.7Links
https://dask.pydata.org/en/latest/http://matthewrocklin.com/blog/work/2017/10/16/streaming-dataframes-1http://people.duke.edu/~ccc14/sta-663-2017/18A_Dask.htmlhttps://www.kdnuggets.com/2016/09/introducing-dask-parallel-programming.htmlhttps://pypi.python.org/pypi/dask/https://www.hdfgroup.org/2015/03/hdf5-as-a-zero-configuration-ad-hoc-scientific-database-for-python/https://github.com/dask/dask-tutorial
target=os.path.join('data','accounts.h5')
target
%timedf_csv.to_hdf(target,'/data')
df_hdf=dd.read_hdf(target,'/data')
df_hdf.head()
![Page 203: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/203.jpg)
11APPLICATIONS
11.1FINGERPRINTMATCHING☁�
PleasenotethatNISThastemporarilyremovedtheFingerprintdataset.Weunfortunatelydonothaveacopyof thedataste. Ifyouhaveone,pleasenotifyus—
Pythonisaflexibleandpopularlanguageforrunningdataanalysispipelines.Inthissectionwewillimplementasolutionforafingerprintmatching.
11.1.1Overview
Fingerprint recognition refers to the automated method for verifying a matchbetweentwofingerprintsandthatisusedtoidentifyindividualsandverifytheiridentity. Fingerprints (Figure 35) are themost widely used form of biometricusedtoidentifyindividuals.
Figure35:Fingerprints
Theautomatedfingerprintmatchinggenerallyrequiredthedetectionofdifferentfingerprintfeatures(aggregatecharacteristicsofridges,andminutiapoints)andthen theuseof fingerprintmatchingalgorithm,whichcandobothone-to-oneand one-to- many matching operations. Based on the number of matches a
![Page 204: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/204.jpg)
proximityscore(distanceorsimilarity)canbecalculated.
WeusethefollowingNISTdatasetforthestudy:
Special Database 14 - NIST Mated Fingerprint Card Pairs 2.(http://www.nist.gov/itl/iad/ig/special\_dbases.cfm)
11.1.2Objectives
Match the fingerprint images from a probe set to a gallery set and report thematchscores.
11.1.3Prerequisites
Forthisworkwewillusethefollowingalgorithms:
MINDTCT:TheNISTminutiaedetector,whichautomatically locatesandrecords ridge ending and bifurcations in a fingerprint image.(http://www.nist.gov/itl/iad/ig/nbis.cfm)BOZORTH3:ANISTfingerprintmatchingalgorithm,whichisaminutiaebasedfingerprint-matchingalgorithm.Itcandobothone-to-oneandone-to-manymatchingoperations.(http://www.nist.gov/itl/iad/ig/nbis.cfm)
Inordertofollowalong,youmusthavetheNBIStoolswhichprovidemindtctandbozorth3 installed. If you are on Ubuntu 16.04 Xenial, the following steps willaccomplishthis:
11.1.4Implementation
1. Fetchthefingerprintimagesfromtheweb2. Callouttoexternalprogramstoprepareandcomputethematchscoreds3. Storetheresultsinadatabase4. Generateaplottoidentifylikelymatches.
$sudoapt-getupdate-qq
$sudoapt-getinstall-ybuild-essentialcmakeunzip
$wget"http://nigos.nist.gov:8080/nist/nbis/nbis_v5_0_0.zip"
$unzip-dnbisnbis_v5_0_0.zip
$cdnbis/Rel_5.0.0
$./setup.sh/usr/local--without-X11
$sudomake
![Page 205: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/205.jpg)
wewillbeinteractingwiththeoperatingsystemandmanipulatingfilesandtheirpathnames.
Somegeneralusefullutilities
Usingtheattrslibraryprovidessomeniceshortcutstodefiningobjects
wewill be randomly dividing the entire dataset, based on user input, into theprobeandgallerystets
wewill need to call out to theNBIS software.wewill also be usingmultipleprocessestotakeadvantageofallthecoresonourmachine
Asforplotting,wewillusematplotlib,thoughtherearemanyalternatives.
Finally,wewillwritetheresultstoadatabase.
11.1.5Utilityfunctions
Next,wewilldefinesomeutilityfunctions:
from__future__importprint_function
importurllib
importzipfile
importhashlib
importos.path
importos
importsys
importshutil
importtempfile
importitertools
importfunctools
importtypes
frompprintimportpprint
importattr
importsys
importrandom
importsubprocess
importmultiprocessing
importmatplotlib.pyplotasplt
importpandasaspd
importnumpyasnp
importsqlite3
![Page 206: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/206.jpg)
11.1.6Dataset
wewillnowdefinesomeglobalparameters
First,thefingerprintdataset
deftake(n,iterable):
"Returnsageneratorofthefirst**n**elementsofaniterable"
returnitertools.islice(iterable,n)
defzipWith(function,*iterables):
"Zipasetof**iterables**togetherandapply**function**toeachtuple"
forgroupinitertools.izip(*iterables):
yieldfunction(*group)
defuncurry(function):
"TransformsanN-arry**function**sothatitacceptsasingleparameterofanN-tuple"
@functools.wraps(function)
defwrapper(args):
returnfunction(*args)
returnwrapper
deffetch_url(url,sha256,prefix='.',checksum_blocksize=2**20,dryRun=False):
"""Downloadaurl.
:paramurl:theurltothefileontheweb
:paramsha256:theSHA-256checksum.Usedtodetermineifthefilewaspreviouslydownloaded.
:paramprefix:directorytosavethefile
:paramchecksum_blocksize:blocksizetousedwhencomputingthechecksum
:paramdryRun:booleanindicatingthatcallingthisfunctionshoulddonothing
:returns:thelocalpathtothedownloadedfile
:rtype:
"""
ifnotos.path.exists(prefix):
os.makedirs(prefix)
local=os.path.join(prefix,os.path.basename(url))
ifdryRun:returnlocal
ifos.path.exists(local):
print('Verifyingchecksum')
chk=hashlib.sha256()
withopen(local,'rb')asfd:
whileTrue:
bits=fd.read(checksum_blocksize)
ifnotbits:break
chk.update(bits)
ifsha256==chk.hexdigest():
returnlocal
print('Downloading',url)
defreport(sofar,blocksize,totalsize):
msg='{}%\r'.format(100*sofar*blocksize/totalsize,100)
sys.stderr.write(msg)
urllib.urlretrieve(url,local,report)
returnlocal
DATASET_URL='https://s3.amazonaws.com/nist-srd/SD4/NISTSpecialDatabase4GrayScaleImagesofFIGS.zip'
DATASET_SHA256='4db6a8f3f9dc14c504180cbf67cdf35167a109280f121c901be37a80ac13c449'
![Page 207: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/207.jpg)
We’lldefinehowtodownloadthedataset.Thisfunctionisgeneralenoughthatitcouldbeusedtoretrievemostfiles,butwewilldefaultittousethevaluesfromprevious.
11.1.7DataModel
we will define some classes so we have a nice API for working with thedataflow. We set slots=True so that the resulting objects will be more space-efficient.
11.1.7.1Utilities
11.1.7.1.1Checksum
The checksum consists of the actual hash value (value) as well as a stringrepresentingthehashingalgorithm.Thevalidatorenforcesthatthealgorithcanonlybeoneofthelistedacceptablemethods
defprepare_dataset(url=None,sha256=None,prefix='.',skip=False):
url=urlorDATASET_URL
sha256=sha256orDATASET_SHA256
local=fetch_url(url,sha256=sha256,prefix=prefix,dryRun=skip)
ifnotskip:
print('Extracting',local,'to',prefix)
withzipfile.ZipFile(local,'r')aszip:
zip.extractall(prefix)
name,_=os.path.splitext(local)
returnname
deflocate_paths(path_md5list,prefix):
withopen(path_md5list)asfd:
forlineinitertools.imap(str.strip,fd):
parts=line.split()
ifnotlen(parts)==2:continue
md5sum,path=parts
chksum=Checksum(value=md5sum,kind='md5')
filepath=os.path.join(prefix,path)
yieldPath(checksum=chksum,filepath=filepath)
deflocate_images(paths):
defpredicate(path):
_,ext=os.path.splitext(path.filepath)
returnextin['.png']
forpathinitertools.ifilter(predicate,paths):
yieldimage(id=path.checksum.value,path=path)
@attr.s(slots=True)
classChecksum(object):
value=attr.ib()
kind=attr.ib(validator=lambdao,a,v:vin'md5sha1sha224sha256sha384sha512'.split())
![Page 208: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/208.jpg)
11.1.7.1.2Path
Pathsrefertoanimage'sfilepathandassociatedChecksum.Wegetthechecksum"for"free"sincetheMD5hashisprovidedforeachimageinthedataset.
11.1.7.1.3Image
Thestartofthedatapipelineistheimage.Animagehasanid(themd5hash)andthepathtotheimage.
11.1.7.2Mindtct
Thenextstepinthepipelineis toapplythe mindtctprogramfromNBIS.A mindtctobjectthereforerepresentstheresultsofapplyingmindtctonanimage.Thexytoutputisneededforthenextstep,andtheimageattributerepresentstheimageid.
Weneedawaytoconstructamindtctobjectfromanimageobject.Astraightforwardway of doing this would be to have a from_image @staticmethod or @classmethod, but thatdoesn'tworkwellwithmultiprocessingastop-levelfunctionsworkbestastheyneedtobeserialized.
@attr.s(slots=True)
classPath(object):
checksum=attr.ib()
filepath=attr.ib()
@attr.s(slots=True)
classimage(object):
id=attr.ib()
path=attr.ib()
@attr.s(slots=True)
classmindtct(object):
image=attr.ib()
xyt=attr.ib()
defpretty(self):
d=dict(id=self.image.id,path=self.image.path)
returnpprint(d)
defmindtct_from_image(image):
imgpath=os.path.abspath(image.path.filepath)
tempdir=tempfile.mkdtemp()
oroot=os.path.join(tempdir,'result')
cmd=['mindtct',imgpath,oroot]
try:
subprocess.check_call(cmd)
withopen(oroot+'.xyt')asfd:
xyt=fd.read()
result=mindtct(image=image.id,xyt=xyt)
![Page 209: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/209.jpg)
11.1.7.3Bozorth3
Thefinal step in thepipeline is running the bozorth3 fromNBIS.The bozorth3 classrepresentsthematchbeingdone:trackingtheidsoftheprobeandgalleryimagesaswellasthematchscore.
Sincewewillbewritingtheseinstanceouttoadatabase,weprovidesomestaticmethods for SQL statements. While there are many Object-Relational-Model(ORM) libraries available for Python, this approach keeps the currentimplementationsimple.
Inordertoworkwellwithmultiprocessing,wedefineaclassrepresentuingtheinputparamaters to bozorth3 and a helper function to run bozorth3. Thisway the pipelinedefinitioncanbekeptsimpletoamaptocreatetheinputandthenamaptoruntheprogram.
AsNBIS bozorth3 can be called to compare one-to-one or one-to-many,wewillalsodynamicallychoosebetween theseapproachesdependingon if thegalleryattributeisalistorasingleobject.
returnresult
finally:
shutil.rmtree(tempdir)
@attr.s(slots=True)
classbozorth3(object):
probe=attr.ib()
gallery=attr.ib()
score=attr.ib()
@staticmethod
defsql_stmt_create_table():
return'CREATETABLEIFNOTEXISTSbozorth3'\
+'(probeTEXT,galleryTEXT,scoreNUMERIC)'
@staticmethod
defsql_prepared_stmt_insert():
return'INSERTINTObozorth3VALUES(?,?,?)'
defsql_prepared_stmt_insert_values(self):
returnself.probe,self.gallery,self.score
@attr.s(slots=True)
classbozorth3_input(object):
probe=attr.ib()
gallery=attr.ib()
defrun(self):
ifisinstance(self.gallery,mindtct):
returnbozorth3_from_one_to_one(self.probe,self.gallery)
elifisinstance(self.gallery,types.ListType):
returnbozorth3_from_one_to_many(self.probe,self.gallery)
else:
raiseValueError('Unhandledtypeforgallery:{}'.format(type(gallery)))
![Page 210: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/210.jpg)
The next is the top-level function to running bozorth3. It accepts an instance ofbozorth3_input. The is implemented as a simple top-levelwrapper so that it can beeasilypassedtothemultiprocessinglibrary.
11.1.7.3.1RunningBozorth3
There are two cases to handle: 1.One-to-one probe to gallery sets 1.One-to-manyprobetogallerysets
Both approaches are implemented next. The implementations follow the samepattern:1.Createatemporarydirectorywithinwithtowork1.Writetheprobeand gallery images to files in the temporary directory 1. Call the bozorth3
executable 1. The match score is written to stdout which is captured and thenparsed.1.Returnabozorth3 instanceforeachmatch1.Makesuretocleanupthetemporarydirectory
11.1.7.3.1.1One-to-one
11.1.7.3.1.2One-to-many
defrun_bozorth3(input):
returninput.run()
defbozorth3_from_one_to_one(probe,gallery):
tempdir=tempfile.mkdtemp()
probeFile=os.path.join(tempdir,'probe.xyt')
galleryFile=os.path.join(tempdir,'gallery.xyt')
withopen(probeFile,'wb')asfd:fd.write(probe.xyt)
withopen(galleryFile,'wb')asfd:fd.write(gallery.xyt)
cmd=['bozorth3',probeFile,galleryFile]
try:
result=subprocess.check_output(cmd)
score=int(result.strip())
returnbozorth3(probe=probe.image,gallery=gallery.image,score=score)
finally:
shutil.rmtree(tempdir)
defbozorth3_from_one_to_many(probe,galleryset):
tempdir=tempfile.mkdtemp()
probeFile=os.path.join(tempdir,'probe.xyt')
galleryFiles=[os.path.join(tempdir,'gallery%d.xyt'%i)
fori,_inenumerate(galleryset)]
withopen(probeFile,'wb')asfd:fd.write(probe.xyt)
forgalleryFile,galleryinitertools.izip(galleryFiles,galleryset):
withopen(galleryFile,'wb')asfd:fd.write(gallery.xyt)
cmd=['bozorth3','-p',probeFile]+galleryFiles
try:
result=subprocess.check_output(cmd).strip()
scores=map(int,result.split('\n'))
![Page 211: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/211.jpg)
11.1.8Plotting
Forplottingwewilloperateonlyonthedatabase.wewillselectasmallnumberof probe images and plot the score between them and the rest of the galleryimages.
Themk_short_labelshelperfunctionwillbedefinednext.
Theimageidsarelonghashstrings.Inorderetominimizetheamountofspaceon the figure the labelsoccupy,weprovideahelper function tocreatea shortlabelthatstilluniquelyidentifieseachprobeimageintheselectedsample
11.1.9PuttingitallTogether
First,setupatemporarydirectoryinwhichtowork:
NextwedownloadandextractthefingerprintimagesfromNIST:
return[bozorth3(probe=probe.image,gallery=gallery.image,score=score)
forscore,galleryinzip(scores,galleryset)]
finally:
shutil.rmtree(tempdir)
defplot(dbfile,nprobes=10):
conn=sqlite3.connect(dbfile)
results=pd.read_sql(
"SELECTDISTINCTprobeFROMbozorth3ORDERBYscoreLIMIT'%s'"%nprobes,
con=conn
)
shortlabels=mk_short_labels(results.probe)
plt.figure()
fori,probeinresults.probe.iteritems():
stmt='SELECTgallery,scoreFROMbozorth3WHEREprobe=?ORDERBYgalleryDESC'
matches=pd.read_sql(stmt,params=(probe,),con=conn)
xs=np.arange(len(matches),dtype=np.int)
plt.plot(xs,matches.score,label='probe%s'%shortlabels[i])
plt.ylabel('Score')
plt.xlabel('Gallery')
plt.legend(bbox_to_anchor=(0,0,1,-0.2))
plt.show()
defmk_short_labels(series,start=7):
forsizeinxrange(start,len(series[0])):
iflen(series)==len(set(map(lambdas:s[:size],series))):
break
returnmap(lambdas:s[:size],series)
pool=multiprocessing.Pool()
prefix='/tmp/fingerprint_example/'
ifnotos.path.exists(prefix):
os.makedirs(prefix)
%%time
dataprefix=prepare_dataset(prefix=prefix)
![Page 212: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/212.jpg)
Nextwewill configure the location of of theMD5 checksum file that comeswiththedownload
Loadtheimagesfromthedownloadedfilestostarttheanalysispipeline
Wecan examineoneof the loaded image.Note that image is refers to theMD5checksumthatcamewiththeimageandthexytattributerepresentstherawimagedata.
For example purposeswewill only a use a small percentage of the database,randomlyselected,forpurprobeandgallerydatasets.
Wecannowcompute thematchingscoresbetween theprobeandgallerysets.Thiswilluseallcoresavailableonthisworkstation.
VerifyingchecksumExtracting
/tmp/fingerprint_example/NISTSpecialDatabase4GrayScaleImagesofFIGS.zip
to/tmp/fingerprint_example/CPUtimes:user3.34s,sys:645ms,
total:3.99sWalltime:4.01s
md5listpath=os.path.join(prefix,'NISTSpecialDatabase4GrayScaleImagesofFIGS/sd04/sd04_md5.lst')
%%time
print('Loadingimages')
paths=locate_paths(md5listpath,dataprefix)
images=locate_images(paths)
mindtcts=pool.map(mindtct_from_image,images)
print('Done')
LoadingimagesDoneCPUtimes:user187ms,sys:17ms,total:204ms
Walltime:1min21s
print(mindtcts[0].image)
print(mindtcts[0].xyt[:50])
98b15d56330cb17f1982ae79348f711d141462146252382237255118020
30332214
perc_probe=0.001
perc_gallery=0.1
%%time
print('Generatingsamples')
probes=random.sample(mindtcts,int(perc_probe*len(mindtcts)))
gallery=random.sample(mindtcts,int(perc_gallery*len(mindtcts)))
print('|Probes|=',len(probes))
print('|Gallery|=',len(gallery))
Generatingsamples=4=400CPUtimes:user2ms,sys:0ns,total:2
msWalltime:993µs
%%time
print('Matching')
input=[bozorth3_input(probe=probe,gallery=gallery)
forprobeinprobes]
bozorth3s=pool.map(run_bozorth3,input)
![Page 213: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/213.jpg)
bozorth3sisnowalistoflistsofbozorth3instances.
Nowaddtheresultstothedatabase
Wenowplottheresults.Figure36
MatchingCPUtimes:user19ms,sys:1ms,total:20msWalltime:1.07
s
print('|Probes|=',len(bozorth3s))
print('|Gallery|=',len(bozorth3s[0]))
print('Result:',bozorth3s[0][0])
=4=400Result:bozorth3(probe='caf9143b268701416fbed6a9eb2eb4cf',
gallery='22fa0f24998eaea39dea152e4a73f267',score=4)
dbfile=os.path.join(prefix,'scores.db')
conn=sqlite3.connect(dbfile)
cursor=conn.cursor()
cursor.execute(bozorth3.sql_stmt_create_table())
<sqlite3.Cursorat0x7f8a2f677490>
%%time
forgroupinbozorth3s:
vals=map(bozorth3.sql_prepared_stmt_insert_values,group)
cursor.executemany(bozorth3.sql_prepared_stmt_insert(),vals)
conn.commit()
print('Insertedresultsforprobe',group[0].probe)
Insertedresultsforprobecaf9143b268701416fbed6a9eb2eb4cfInserted
resultsforprobe55ac57f711eba081b9302eab74dea88eInsertedresultsfor
probe4ed2d53db3b5ab7d6b216ea0314beb4fInsertedresultsforprobe
20f68849ee2dad02b8fb33ecd3ece507CPUtimes:user2ms,sys:3ms,total:
5msWalltime:3.57ms
plot(dbfile,nprobes=len(probes))
![Page 214: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/214.jpg)
Figure36:Result
11.2NISTPEDESTRIANANDFACEDETECTION �☁�
No
Pedestrian and Face Detection uses OpenCV to identify people standing in apicture or a video and NIST use case in this document is built with ApacheSparkandMesosclustersonmultiplecomputenodes.
The example in this tutorial deploys software packages on OpenStack usingAnsiblewithitsroles.SeeFigure37,Figure38,Figure39,Figure40
cursor.close()
![Page 215: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/215.jpg)
Figure37:Original
Figure38:PedestrianDetected
Figure39:Original
![Page 216: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/216.jpg)
Figure40:PedestrianandFace/eyesDetected
11.2.0.1Introduction
Human (pedestrian) detection and face detection have been studied during thelastseveralyearsandmodelsforthemhaveimprovedalongwithHistogramsofOriented Gradients (HOG) for Human Detection [1]. OpenCV is a ComputerVision library including the SVM classifier and the HOG object detector forpedestriandetectionandINRIAPersonDataset[2]isoneofpopularsamplesforboth trainingand testingpurposes. In thisdocument,wedeployApacheSparkon Mesos clusters to train and apply detection models from OpenCV usingPythonAPI.
11.2.0.1.1INRIAPersonDataset
Thisdatasetcontainspositiveandnegativeimagesfortrainingandtestpurposeswithannotationfilesforuprightpersonsineachimage.288positivetestimages,453 negative test images, 614 positive training images and 1218 negativetraining images are included along with normalized 64x128 pixel formats.970MBdatasetisavailabletodownload[3].
11.2.0.1.2HOGwithSVMmodel
HistogramofOrientedGradient(HOG)andSupportVectorMachine(SVM)areused as object detectors and classifiers and built-in python libraries fromOpenCVprovidethesemodelsforhumandetection.
![Page 217: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/217.jpg)
11.2.0.1.3AnsibleAutomationTool
Ansible is a python tool to install/configure/manage software on multiplemachines with JSON files where system descriptions are defined. There arereasonswhyweuseAnsible:
Expandable:LeveragesPython(default)butmodulescanbewritteninanylanguage
Agentless:nosetuprequiredonmanagednode
Security:Allowsdeploymentfromuserspace;usessshforauthentication
Flexibility:onlyrequiressshaccesstoprivilegeduser
Transparency:YAMLBasedscriptfilesexpressthestepsofinstallingandconfiguringsoftware
Modularity: SingleAnsible Role (should) contain all required commandsandvariablestodeploysoftwarepackageindependently
Sharingandportability:rolesareavailablefromsource(github,bitbucket,gitlab,etc)ortheAnsibleGalaxyportal
We use Ansible roles to install software packages for Humand and FaceDetection which requires to run OpenCV Python libraries on Apache Mesoswithaclusterconfiguration.Datasetisalsodownloadedfromthewebusinganansiblerole.
11.2.0.2DeploymentbyAnsible
Ansible is to deploy applications and build clusters for batch-processing largedatasets towards targetmachines e.g.VM instancesonOpenStackandweuseansibleroleswithincludedirectivetoorganizelayersofbigdatasoftwarestacks(BDSS). Ansible provides abstractions by Playbook Roles and reusability byInclude statements.We defineX application inXAnsible Role, for example,anduseincludestatementstocombinewithotherapplicationse.g.YorZ.Thelayers exist in sub directories (see next) to add modularity to your Ansible
![Page 218: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/218.jpg)
deployment. For example, there are five roles used in this example that areApache Mesos in a scheduler layer, Apache Spark in a processing layer, aOpenCVlibraryinanapplicationlayer,INRIAPersonDatasetinadatasetlayerand a python script for human and facedetection in an analytics layer. If youhaveanadditionalsoftwarepackagetoadd,youcansimplyaddanewroleinamainansibleplaybookwithincludedirective.Withthis,yourAnsibleplaybookmaintainssimplebutflexibletoaddmoreroleswithouthavingalargesinglefilewhichisgettingdifficulttoreadwhenitdeploysmoreapplicationsonmultiplelayers.ThemainAnsibleplaybookrunsAnsiblerolesinorderwhichlooklike:
Directory names e.g. sched, proc, data, or anlys indicate BDSS layers like: -sched: scheduler layer - proc: data processing layer - apps: application layer -data:datasetlayer-anlys:analyticslayerandtwodigitsinthefilenameindicateanorderofrolestoberun.
11.2.0.3CloudmeshforProvisioning
It is assumed that virtual machines are created by cloudmesh, the cloudmanagementsoftware.ForexampleonOpenStack,cmclustercreate-N=6
commandstartsasetofvirtualmachineinstances.Thenumberofmachinesandgroups for clusters e.g. namenodes and datanodes are defined in the Ansibleinventory file, a list of target machines with groups, which will be generatedoncemachinesarereadytousebycloudmesh.Ansiblerolesinstallsoftwareanddatasetonvirtualclustersafterthatstage.
11.2.0.4RolesExplainedforInstallation
Mesos role is installed first as a scheduler layer formasters and slaveswheremesos-master runs on the masters group and mesos-slave runs on the slavesgroup.ApacheZookeeper is included in themesosrole thereforemesosslavesfind an electedmesos leader for the coordination. Spark, as a data processing
```
include:sched/00-mesos.yml
include:proc/01-spark.yml
include:apps/02-opencv.yml
include:data/03-inria-dataset.yml
Include:anlys/04-human-face-detection.yml
```
![Page 219: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/219.jpg)
layer,providestwooptionsfordistributedjobprocessing,batchjobprocessingvia a cluster mode and real-time processing via a client mode. The MesosdispatcherrunsonamastersgrouptoacceptabatchjobsubmissionandSparkinteractiveshell,whichistheclientmode,providesreal-timeprocessingonanynodeinthecluster.Eitherway,Sparkisinstalledlatertodetectamaster(leader)hostforajobsubmission.OtherrolesforOpenCV,INRIAPersonDatasetandHumanandFaceDetectionPythonapplicationsarefollowedby.
Thefollowingsoftwareareexpectedinthestacksaccordingtothegithub:
mesoscluster(master,worker)
spark(withdispatcherformesosclustermode)
openCV
zookeeper
INRIAPersonDataset
DetectionAnalyticsinPython
[1]Dalal,Navneet,andBillTriggs.“Histogramsoforientedgradients forhumandetection.”2005IEEEComputerSocietyConferenceonComputerVisionandPatternRecognition(CVPR’05).Vol.1.IEEE,
2005.[pdf]
[2]http://pascal.inrialpes.fr/data/human/
[3]ftp://ftp.inrialpes.fr/pub/lear/douze/data/INRIAPerson.tar
[4]https://docs.python.org/2/library/configparser.html
11.2.0.4.1ServergroupsforMasters/SlavesbyAnsibleinventory
Wemayseparatecomputenodesintwogroups:mastersandworkersthereforeMesos masters and zookeeper quorums manage job requests and leaders andworkers run actual tasks. Ansible needs group definitions in their inventory
![Page 220: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/220.jpg)
thereforesoftwareinstallationassociatedwithaproperpartcanbecompleted.
ExampleofAnsibleInventoryfile(inventory.txt)
11.2.0.5InstructionsforDeployment
The following commands complete NIST Pedestrian and Face DetectiondeploymentonOpenStack.
11.2.0.5.1CloningPedestrianDetectionRepositoryfromGithub
Rolesareincludedassubmoduleswhichrequire--recursiveoptiontocheckoutthemall.
Changethefollowingvariablewithactualipaddresses:
Createainventory.txtfilewiththevariableinyourlocaldirectory.
Addansible.cfgfilewithoptionsforsshhostkeycheckingandloginname.
Checkaccessibilitybyansiblepinglike:
[masters]
10.0.5.67
10.0.5.68
10.0.5.69
[slaves]
10.0.5.70
10.0.5.71
10.0.5.72
$gitclone--recursivehttps://github.com/futuresystems/pedestrian-and-face-detection.git
sample_inventory="""[masters]
10.0.5.67
10.0.5.68
10.0.5.69
[slaves]
10.0.5.70
10.0.5.71
10.0.5.72"""
!printf"$sample_inventory">inventory.txt
!catinventory.txt
ansible_config="""[defaults]
host_key_checking=false
remote_user=ubuntu"""
!printf"$ansible_config">ansible.cfg
!catansible.cfg
!ansible-mping-iinventory.txtall
![Page 221: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/221.jpg)
Makesure thatyouhaveacorrectsshkey inyouraccountotherwiseyoumayencounter‘FAILURE’inthepreviouspingtest.
11.2.0.5.2AnsiblePlaybook
We use a main ansible playbook to deploy software packages for NISTPedestrian and Face detection which includes: - mesos - spark -zookeeper -opencv-INRIAPersondataset-Pythonscriptforthedetection
Theinstallationmaytake30minutesoranhourtocomplete.
11.2.0.6OpenCVinPython
Beforewe run our code for this project, let’s tryOpenCV first to see how itworks.
11.2.0.6.1Importcv2
Let us import opencv pythonmodule andwewill use images from the onlinedatabase image-net.org to test OpenCV image recognition. See Figure 41,Figure42
Letusdownloadamailboximagewitharedcolortoseeifopencvidentifiestheshapewithacolor.Theexamplefileinthistutorialis:
100167k100167k00686k0–:–:––:–:––:–:–684k
!cdpedestrian-and-face-detection/&&ansible-playbook-i../inventory.txtsite.yml
importcv2
$curlhttp://farm4.static.flickr.com/3061/2739199963_ee78af76ef.jpg>mailbox.jpg
%matplotlibinline
fromIPython.displayimportImage
mailbox_image="mailbox.jpg"
Image(filename=mailbox_image)
![Page 222: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/222.jpg)
Figure41:Mailboximage
You can try other images. Check out the image-net.org for mailbox images:http://image-net.org/synset?wnid=n03710193
11.2.0.6.2ImageDetection
Justforatest,let’strytodetectaredcolorshapedmailboxusingopencvpythonfunctions.
Therearekeyfunctionsthatweuse:*cvtColor:toconvertacolorspaceofanimage * inRange: to detect a mailbox based on the range of red color pixelvalues * np.array: to define the range of red color using aNumpy library forbettercalculation*findContours:tofindaoutlineoftheobject*bitwise_and:toblack-outtheareaofcontoursfoundimportnumpyasnp
importmatplotlib.pyplotasplt
#imreadforloadinganimage
img=cv2.imread(mailbox_image)
#cvtColorforcolorconversion
hsv=cv2.cvtColor(img,cv2.COLOR_BGR2HSV)
#definerangeofredcolorinhsv
lower_red1=np.array([0,50,50])
upper_red1=np.array([10,255,255])
lower_red2=np.array([170,50,50])
upper_red2=np.array([180,255,255])
#thresholdthehsvimagetogetonlyredcolors
mask1=cv2.inRange(hsv,lower_red1,upper_red1)
mask2=cv2.inRange(hsv,lower_red2,upper_red2)
![Page 223: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/223.jpg)
Figure42:Maskedimage
Theredcolormailboxisleftaloneintheimagewhichwewantedtofindinthisexamplebyopencvfunctions.Youcantryotherimageswithdifferentcolorstodetect the different shape of objects using findContours and inRange fromopencv.
Formoreinformation,seethenextusefullinks.
contours features:http://docs.opencv.org/3.1.0/dd/d49/tutorial/_py/_contour/_features.html
contours:http://docs.opencv.org/3.1.0/d4/d73/tutorial/_py/_contours/_begin.html
mask=mask1+mask2
#findaredcolormailboxfromtheimage
im2,contours,hierarchy=cv2.findContours(mask,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)
#bitwise_andtoremoveotherareasintheimageexceptthedetectedobject
res=cv2.bitwise_and(img,img,mask=mask)
#turnoff-x,yaxisbar
plt.axis("off")
#textforthemaskedimage
cv2.putText(res,"maskedimage",(20,300),cv2.FONT_HERSHEY_SIMPLEX,2,(255,255,255))
#display
plt.imshow(cv2.cvtColor(res,cv2.COLOR_BGR2RGB))
plt.show()
![Page 224: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/224.jpg)
redcolorinhsv:http://stackoverflow.com/questions/30331944/finding-red-color-using-python-opencv
inrange:http://docs.opencv.org/master/da/d97/tutorial/_threshold/_inRange.html
inrange: http://docs.opencv.org/3.0-beta/doc/py/_tutorials/py/_imgproc/py/_colorspaces/py/_colorspaces.html
numpy: http://docs.opencv.org/3.0-beta/doc/py/_tutorials/py/_core/py/_basic/_ops/py/_basic/_ops.html
11.2.0.7HumanandFaceDetectioninOpenCV
11.2.0.7.1INRIAPersonDataset
Weuse INRIAPersondataset to detect upright people and faces in images inthisexample.Letusdownloaditfirst.
100969M100969M008480k00:01:570:01:57–:–:–12.4M
11.2.0.7.2FaceDetectionusingHaarCascades
This section is prepared based on the opencv-python tutorial:http://docs.opencv.org/3.1.0/d7/d8b/tutorial/_py/_face/_detection.html#gsc.tab=0
Thereisapre-trainedclassifierforfacedetection,downloaditfromhere:
100908k100908k002225k0–:–:––:–:––:–:–2259k
This classifierXML filewill be used to detect faces in images. If you like tocreate a new classifier, find out more information about training from here:http://docs.opencv.org/3.1.0/dc/d88/tutorial/_traincascade.html
$curlftp://ftp.inrialpes.fr/pub/lear/douze/data/INRIAPerson.tar>INRIAPerson.tar
$tarxvfINRIAPerson.tar>logfile&&taillogfile
$curlhttps://raw.githubusercontent.com/opencv/opencv/master/data/haarcascades/haarcascade_frontalface_default.xml>haarcascade_frontalface_default.xml
![Page 225: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/225.jpg)
11.2.0.7.3FaceDetectionPythonCodeSnippet
Now, we detect faces from the first five images using the classifier. SeeFigure 43, Figure 44, Figure 45, Figure 46, Figure 47, Figure 48, Figure 49,Figure50,Figure51,Figure52,Figure53#importthenecessarypackages
from__future__importprint_function
importnumpyasnp
importcv2
fromosimportlistdir
fromos.pathimportisfile,join
importmatplotlib.pyplotasplt
mypath="INRIAPerson/Test/pos/"
face_cascade=cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
onlyfiles=[join(mypath,f)forfinlistdir(mypath)ifisfile(join(mypath,f))]
cnt=0
forfilenameinonlyfiles:
image=cv2.imread(filename)
image_grayscale=cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
faces=face_cascade.detectMultiScale(image_grayscale,1.3,5)
iflen(faces)==0:
continue
cnt_faces=1
for(x,y,w,h)infaces:
cv2.rectangle(image,(x,y),(x+w,y+h),(255,0,0),2)
cv2.putText(image,"face"+str(cnt_faces),(x,y-10),cv2.FONT_HERSHEY_SIMPLEX,1,(0,0,0),2)
plt.figure()
plt.axis("off")
plt.imshow(cv2.cvtColor(image[y:y+h,x:x+w],cv2.COLOR_BGR2RGB))
cnt_faces+=1
plt.figure()
plt.axis("off")
plt.imshow(cv2.cvtColor(image,cv2.COLOR_BGR2RGB))
cnt=cnt+1
ifcnt==5:
break
![Page 226: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/226.jpg)
Figure43:Example
Figure44:Example
![Page 227: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/227.jpg)
Figure45:Example
Figure46:Example
![Page 228: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/228.jpg)
Figure47:Example
Figure48:Example
![Page 229: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/229.jpg)
Figure49:Example
Figure50:Example
![Page 230: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/230.jpg)
Figure51:Example
Figure52:Example
![Page 231: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/231.jpg)
Figure53:Example
11.2.0.8PedestrianDetectionusingHOGDescriptor
WewilluseHistogramofOrientedGradients(HOG)todetectauprightpersonfrom images. See Figure 54, Figure 55, Figure 56, Figure 57, Figure 58,Figure59,Figure60,Figure61,Figure62,Figure63
11.2.0.8.1PythonCodeSnippet
#initializetheHOGdescriptor/persondetector
hog=cv2.HOGDescriptor()
hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())
cnt=0
forfilenameinonlyfiles:
img=cv2.imread(filename)
orig=img.copy()
gray=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
#detectpeopleintheimage
(rects,weights)=hog.detectMultiScale(img,winStride=(8,8),
padding=(16,16),scale=1.05)
#drawthefinalboundingboxes
for(x,y,w,h)inrects:
cv2.rectangle(img,(x,y),(x+w,y+h),(0,255,0),2)
plt.figure()
plt.axis("off")
plt.imshow(cv2.cvtColor(orig,cv2.COLOR_BGR2RGB))
plt.figure()
plt.axis("off")
plt.imshow(cv2.cvtColor(img,cv2.COLOR_BGR2RGB))
cnt=cnt+1
ifcnt==5:
![Page 232: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/232.jpg)
Figure54:Example
Figure55:Example
break
![Page 233: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/233.jpg)
Figure56:Example
Figure57:Example
![Page 234: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/234.jpg)
Figure58:Example
Figure59:Example
![Page 235: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/235.jpg)
Figure60:Example
Figure61:Example
![Page 236: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/236.jpg)
Figure62:Example
Figure63:Example
11.2.0.9ProcessingbyApacheSpark
INRIAPersondatasetprovides100+ imagesandSparkcanbeused for image
![Page 237: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/237.jpg)
processinginparallel.Weload288imagesfrom“Test/pos”directory.
Spark provides a special object ‘sc’ to connect between a spark cluster andfunctionsinpythoncode.Therefore,wecanrunpythonfunctionsinparalleltodetetobjectsinthisexample.
map function is used to process pedestrian and face detection per imagefromtheparallelize()functionof‘sc’sparkcontext.
collectfonctionmergesresultsinanarray.
defapply_batch(imagePath):importcv2importnumpyasnp#initializetheHOG descriptor/person detector hog = cv2.HOGDescriptor()hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())image = cv2.imread(imagePath) # detect people in the image (rects,weights) = hog.detectMultiScale(image, winStride=(8, 8), padding=(16,16),scale=1.05)#drawthefinalboundingboxesfor(x,y,w,h) inrects:cv2.rectangle(image,(x,y),(x+w,y+h),(0,255,0),2)returnimage
11.2.0.9.1ParallelizeinSparkContext
Thelistofimagefilesisgiventoparallelize.
11.2.0.9.2MapFunction(apply_batch)
The‘apply_batch’functionthatwecreatedpreviouslyisgiventomapfunctiontoprocessinasparkcluster.
11.2.0.9.3CollectFunction
Theresultofeachmapprocessismergedtoanarray.
11.2.0.10Resultsfor100+imagesbySparkCluster
pd=sc.parallelize(onlyfiles)
pdc=pd.map(apply_batch)
result=pdc.collect()
forimageinresult:
![Page 238: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/238.jpg)
plt.figure()
plt.axis("off")
plt.imshow(cv2.cvtColor(image,cv2.COLOR_BGR2RGB))
![Page 239: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/239.jpg)
12REFERENCES
☁�
[1]L.Richardson,“Beautifulsouppythonpackageoverview.”WebPage,2019[Online].Available:https://www.crummy.com/software/BeautifulSoup/bs4/doc/
[2]C.WODEHOUSE,“Shouldyouusemongodb?Alookattheleadingnosqldatabase.” Web Page, 2018 [Online]. Available:https://www.upwork.com/hiring/data/should-you-use-mongodb-a-look-at-the-leading-nosql-database/
[3]Guru99, “Introduction tomongodb.”WebPage, 2018 [Online].Available:https://www.guru99.com/mongodb-tutorials.html#1
[4] MongoDB, “Https://www.mongodb.com/.” Web Page, 2018 [Online].Available:https://docs.mongodb.com/manual/introduction/
[5]M.Papiernik,“Howto installmongodbonubuntu18.04.”WebPage,Jun-2018 [Online]. Available:https://www.digitalocean.com/community/tutorials/how-to-install-mongodb-on-ubuntu-18-04
[6]J.Ellingwood,“Initialserversetupwithubuntu18.04.”WebPage,Apr-2018[Online]. Available: https://www.digitalocean.com/community/tutorials/initial-server-setup-with-ubuntu-18-04
[7]MongoDB,Databasesandcollections,4.0ed.NewYork,NewYork,USA:MongoDB Inc, 2008 [Online]. Available:https://docs.mongodb.com/manual/core/databases-and-collections/
[8]J.M.CraigBuckler,“Usingjoinsinmongodbnosqldatabases.”WebPage,Sep-2016 [Online]. Available: https://www.sitepoint.com/using-joins-in-mongodb-nosql-databases/
[9] MongoDB, Lookup (aggregation), 3.2 ed. New York City, New York,United States: MongoDB Inc, 2008 [Online]. Available:
![Page 240: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/240.jpg)
https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/
[10]MongoDB,MongoDB package components - mongoexport, 4.0 ed. NewYorkCity,NewYork,UnitedStates:MongoDBInc,2008[Online].Available:https://docs.mongodb.com/manual/reference/program/mongoexport/
[11]MongoDB, Security, 4.0 ed. New York City, New York, United States:MongoDB Inc, 2008 [Online]. Available:https://docs.mongodb.com/manual/security/
[12] MongoDB, “MongoDB atlas.” Web Page, 2018 [Online]. Available:https://www.mongodb.com/cloud/atlas
[13]I.MongoDB,“PyMongo3.7.1documentation.”WebPage,2008[Online].Available:https://api.mongodb.com/python/current/api
[14]A. J. J. Davis, “Announcing pymongo3.”Web Page,Apr-2015 [Online].Available:https://emptysqua.re/blog/announcing-pymongo-3/
[15] M. Dirolf, “PyMongo.” Web Page, Jul-2018 [Online]. Available:https://github.com/mongodb/mongo-python-driver
[16] N. Leite, “MongoDB and python.” Web Page, Mar-2015 [Online].Available:https://www.slideshare.net/NorbertoLeite/mongodb-and-python
[17]V.Oleynik, “How do you usemongodbwith python?”Web Page,Mar-2017 [Online]. Available: https://gearheart.io/blog/how-do-you-use-mongodb-with-python/
[18] I. MongoDB, “Installing / upgrading.” Web pages, 2008 [Online].Available:http://api.mongodb.com/python/current/installation.html
[19] R. Python, “Introduction to mongodb and python.” Web Page, 2016[Online]. Available: https://realpython.com/introduction-to-mongodb-and-python/
[20]W3Schools,“Pythonmongodbcreatedatabase.”WebPage,1999[Online].Available:https://www.w3schools.com/python/python_mongodb_create_db.asp
![Page 241: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/241.jpg)
[21]I.MongoDB,“PyMongo3.7.1documentation.”WebPage,2008[Online].Available:https://api.mongodb.com/python/current/tutorial.html
[22] N. O’Higgins, PyMongo & python. O’Reilly, 2011 [Online]. Available:http://img105.job1001.com/upload/adminnew/2015-04-07/1428393873-MHKX3LN.pdf
[23]I.MongoDB,“PyMongo3.7.1documentation.”WebPage,2008[Online].Available:https://api.mongodb.com/python/current/examples/aggregation.html
[24] MongoDB, “PyMongo 3.7.2 documenation.” Web Page, 2008 [Online].Available: https://docs.mongodb.com/manual/reference/operator/aggregation-pipeline/
[25] MongoDB, “PyMongo 3.7.2 documenation.” Web Page, 2008 [Online].Available:https://docs.mongodb.com/manual/core/map-reduce/
[26] MongoDB, “PyMongo v2.0 documentation.” Web Page, 2008 [Online].Available:https://api.mongodb.com/python/2.0/examples/map_reduce.html
[27] MongoDB, “PyMongo 3.7.2 documenation.” Web Page, 2008 [Online].Available:https://api.mongodb.com/python/current/examples/copydb.html
[28] MongoEngine, “MongoEngine user documentation.” Web Page, 2009[Online].Available:http://docs.mongoengine.org/
[29] Wikipedia, “Object-relational mapping.” Web Page, May-2009 [Online].Available:https://en.wikipedia.org/wiki/Object-relational_mapping
[30] MongoDB, “Flask-mongoengine.” Web Page, 2008 [Online]. Available:http://docs.mongoengine.org/guide/defining-documents.html
[31] MongoEngine, “User guide: Document instances.” Web Page, 2009[Online]. Available: http://docs.mongoengine.org/guide/document-instances.html
[32] MongoEngine, “2.1 installing mongoengine.” Web Page, 2009 [Online].Available:http://docs.mongoengine.org/guide/installing.html
![Page 242: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION](https://reader034.vdocuments.site/reader034/viewer/2022042223/5ec9de1bbb8ca67fb446593a/html5/thumbnails/242.jpg)
[33]MongoEngine, “2.2 connection to mongodb.”Web Page, 2009 [Online].Available:http://docs.mongoengine.org/guide/connecting.html
[34]MongoEngine,“Userguide2.5.Querying thedatabase.”WebPage,2009[Online].Available:http://docs.mongoengine.org/guide/querying.html
[35]Wikipedia,“Flask(webframework).”WebPage,2010[Online].Available:https://en.wikipedia.org/wiki/Flask_(web_framework)
[36] MongoDB, “Flask-pymongo.” Web Page, 2008 [Online]. Available:https://flask-pymongo.readthedocs.io/en/latest/
[37]MongoDB,“Flaskmongoalchemy.”WebPage,2008 [Online].Available:https://pythonhosted.org/Flask-MongoAlchemy/
[38] MongoDB, “Flask-mongoengine.” Web Page, 2008 [Online]. Available:http://docs.mongoengine.org/projects/flask-mongoengine/en/latest/
[39] Wikipedia, “Flask (web framework).” Web Page, Oct-2018 [Online].Available:https://en.wikipedia.org/wiki/Flask_(web_framework)