cleaning then using wolfram language data · 25/07/2017  · albania , cambodia , cocos keeling...

9
Cleaning then using Wolfram Language Data The scope of Wolfram Language data is difficult to comprehend. “Massive” does not do the scope of the data justice. However I have run into multiple obstacles over the years with respect to curation, completeness and error-handling facilities in the run-time of Mathematica. This lesson shows how to overcome some of the obstacles I encountered and provides additional information that I cannot find in the documentation. We will mainly be working with the “Missing” attribute. There is some very good information at this web site concerning the general subject of how to handle missing data in the Wolfram Language. How to | Replace or Remove Invalid or Missing Data http://reference.wolfram.com/language/howto/ReplaceOrRemoveInvalidOrMissingData.html.en I started out thinking that this info is already out there. I immediately ran into some issue and when I applied the “fix” to my original problem, I decided that it would be difficult to resolve for a typical engi- neer or scientist who is not a programmer. First let’s look at the examples described on the web page above. In[57]:= gdps = Map[CountryData[#, "GDP"] &, CountryData[All]]; In[58]:= Max[gdps] Max : Comparison of $1.7419 × 10 13 per year and -is invalid. Out[58]= MaxMissing[NotAvailable], $1.7419 × 10 13 per year When this article was written, the output looked like this. When run against V11.1, there is a slightly different result. Maybe the CountryData[] has been updated. The article has several examples using DeleteCases[] in order to remove missing items. Here is a simple example which runs faster by choosing 20 countries at random and lists the country name with the GDP. Of course this is random and when/if you run this command it will list different countries

Upload: others

Post on 19-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cleaning then using Wolfram Language Data · 25/07/2017  · Albania , Cambodia , Cocos Keeling Islands , Antigua and Barbuda , Oman , Saint Vincent and the Grenadines , Vatican City

Cleaning then using Wolfram Language Data

The scope of Wolfram Language data is difficult to comprehend. “Massive” does not do the scope of the

data justice.

However I have run into multiple obstacles over the years with respect to curation, completeness and

error-handling facilities in the run-time of Mathematica.

This lesson shows how to overcome some of the obstacles I encountered and provides additional

information that I cannot find in the documentation.

We will mainly be working with the “Missing” attribute. There is some very good information at this web

site concerning the general subject of how to handle missing data in the Wolfram Language.

How to | Replace or Remove Invalid or Missing Data

http://reference.wolfram.com/language/howto/ReplaceOrRemoveInvalidOrMissingData.html.en

I started out thinking that this info is already out there. I immediately ran into some issue and when I

applied the “fix” to my original problem, I decided that it would be difficult to resolve for a typical engi-

neer or scientist who is not a programmer.

First let’s look at the examples described on the web page above.

In[57]:= gdps = Map[CountryData[#, "GDP"] &, CountryData[All]];

In[58]:= Max[gdps]

Max: Comparison of $1.7419×1013 per year and -∞ is invalid.

Out[58]= MaxMissing[NotAvailable], $1.7419 × 1013 per year

When this article was written, the output looked like this. When run against V11.1, there is a slightly

different result. Maybe the CountryData[] has been updated.

The article has several examples using DeleteCases[] in order to remove missing items. Here is a

simple example which runs faster by choosing 20 countries at random and lists the country name with

the GDP.

Of course this is random and when/if you run this command it will list different countries

Page 2: Cleaning then using Wolfram Language Data · 25/07/2017  · Albania , Cambodia , Cocos Keeling Islands , Antigua and Barbuda , Oman , Saint Vincent and the Grenadines , Vatican City

In[59]:= RandomEntity["Country", 20]

Out[59]= Tanzania , China , Lebanon , Montenegro , Solomon Islands ,

Albania , Cambodia , Cocos Keeling Islands , Antigua and Barbuda ,

Oman , Saint Vincent and the Grenadines , Vatican City , Bulgaria ,

Sudan , Norfolk Island , Tonga , Mali , Haiti , Guernsey , Uzbekistan

Also list the country name with the GDP

In[62]:= {#, CountryData[#, "GDP"]} & /@ RandomEntity["Country", 20]

Out[62]= Japan , $4.60146 × 1012 per year , Cameroon , $3.20508 × 1010 per year ,

Ethiopia , $5.56122 × 1010 per year , Macau , $5.55017 × 1010 per year ,

Malawi , $4.25803 × 109 per year , Kenya , $6.09365 × 1010 per year ,

Tokelau , $1.5 × 106 per year , Australia , $1.45468 × 1012 per year ,

Cyprus , $2.32262 × 1010 per year , Hong Kong , $2.90896 × 1011 per year ,

Grenada , $9.11804 × 108 per year , Fiji , $4.53182 × 109 per year ,

Latvia , $3.12868 × 1010 per year , Vatican City , Missing[NotAvailable],

Norway , $4.99817 × 1011 per year , India , $2.04852 × 1012 per year ,

Niue , $1.001 × 107 per year , Chad , $1.39222 × 1010 per year ,

Monaco , $6.07488 × 109 per year , Iceland , $1.70361 × 1010 per year

In[63]:= countriesList = %

In this list, only one country has missing data. After making several mistakes with DeleteCases[], I came

up with the following command.

2 disectingAnatomy_25-July-2017a.nb

Page 3: Cleaning then using Wolfram Language Data · 25/07/2017  · Albania , Cambodia , Cocos Keeling Islands , Antigua and Barbuda , Oman , Saint Vincent and the Grenadines , Vatican City

In[64]:= DeleteCases[countriesList, {_Entity, _Missing}]

Out[64]= Japan , $4.60146 × 1012 per year ,

Cameroon , $3.20508 × 1010 per year , Ethiopia , $5.56122 × 1010 per year ,

Macau , $5.55017 × 1010 per year , Malawi , $4.25803 × 109 per year ,

Kenya , $6.09365 × 1010 per year , Tokelau , $1.5 × 106 per year ,

Australia , $1.45468 × 1012 per year , Cyprus , $2.32262 × 1010 per year ,

Hong Kong , $2.90896 × 1011 per year , Grenada , $9.11804 × 108 per year ,

Fiji , $4.53182 × 109 per year , Latvia , $3.12868 × 1010 per year ,

Norway , $4.99817 × 1011 per year , India , $2.04852 × 1012 per year ,

Niue , $1.001 × 107 per year , Chad , $1.39222 × 1010 per year ,

Monaco , $6.07488 × 109 per year , Iceland , $1.70361 × 1010 per year

With DeleteCases, I frequently have problems getting the patterns to reflect my intent the first time. For

a non-programmer, I can imagine that it is challenging.

Let’s move on tot he motivating sample. I was reading and testing this blog entry:

Dissecting the New Anatomy Content in the Wolfram Language

http://blog.wolfram.com/2015/11/11/dissecting-the-new-anatomy-content-in-the-wolfram-language/

Near the beginning of the article are these 2 simple commands:

legMuscles = leg (anatomical structure) muscles

Out[65]= popliteus , tibialis posterior , extensor digitorum longus , extensor hallucis longus ,

fibularis longus , fibularis brevis , fibularis tertius , flexor hallucis longus ,

tibialis anterior , gastrocnemius , plantaris , soleus , flexor digitorum longus

disectingAnatomy_25-July-2017a.nb 3

Page 4: Cleaning then using Wolfram Language Data · 25/07/2017  · Albania , Cambodia , Cocos Keeling Islands , Antigua and Barbuda , Oman , Saint Vincent and the Grenadines , Vatican City

legMuscles3D = Show@EntityValue[legMuscles, "Graphics3D"]

Out[66]=

It lists and displays the muscles of the leg in a 3D graphic. Very dramatic when rotated.So I just tried to

accomplish an analogous task, to no avail:

armMuscles = arm (anatomical structure) muscles

Out[67]= brachialis , coracobrachialis , anconeus , articularis cubiti , biceps brachii , triceps brachii

4 disectingAnatomy_25-July-2017a.nb

Page 5: Cleaning then using Wolfram Language Data · 25/07/2017  · Albania , Cambodia , Cocos Keeling Islands , Antigua and Barbuda , Oman , Saint Vincent and the Grenadines , Vatican City

armMuscles3D = Show@EntityValue[armMuscles, "Graphics3D"]

Show: Could not combine the graphics objects in Show , , , Missing[

NotAvailable], , .

Out[68]= Show , , ,

Missing[NotAvailable], ,

Is is (and was) not obvious to me what the problem is. Mouseing over the red boxes yields: indecpher-

able (to me) info about Specularity, GraphicsComplex and Skeletons. In contrast, there are some

pictures with the Show[] command. These might be muscles.

So the approach I decided to take is to ignore the messages and work to remove the missing entry.

Now remember, this is missing data inside Wolfram Language. Maybe you can address these issues to

the curator. With WL, I am not sure how to do this.

In an analogous manner to the country list above, we make a short display showing entries for each

muscle and the 3D graphic. That should tell us which entry has a missing graphic. Note that this is the

same approach used above looking for the list of countries with missing data.

disectingAnatomy_25-July-2017a.nb 5

Page 6: Cleaning then using Wolfram Language Data · 25/07/2017  · Albania , Cambodia , Cocos Keeling Islands , Antigua and Barbuda , Oman , Saint Vincent and the Grenadines , Vatican City

In[69]:= {#, EntityValue[#, "Graphics3D"]} & /@ armMuscles

Out[69]= brachialis , , coracobrachialis , ,

anconeus , , articularis cubiti , Missing[NotAvailable],

biceps brachii , , triceps brachii ,

Indeed the fourth entry (i.e. articularis cubiti) has a missing graphic. There are multiple ways to remove

this entry. Since it is a small list, the use of Drop is Q&D:

In[70]:= armMuscles = Drop[armMuscles, {4, 4}]

Out[70]= brachialis , coracobrachialis , anconeus , biceps brachii , triceps brachii

Now that the 4th entry id removed, the command runs successfully:

6 disectingAnatomy_25-July-2017a.nb

Page 7: Cleaning then using Wolfram Language Data · 25/07/2017  · Albania , Cambodia , Cocos Keeling Islands , Antigua and Barbuda , Oman , Saint Vincent and the Grenadines , Vatican City

armMuscles3D = Show@EntityValue[armMuscles, "Graphics3D"]

Out[71]=

Subsequently this was also run for:

back

foot

abdomen

chest

Some of these work fine and some fail because of a missing graphic. Therefore the solution using

Drop[] will not handle the general case.

We ended up with using DeleteCases[] wrapped around the EntityValue[] invocation with Show[] still on

the outside. So the general command looks like:

What follows is the correct invocation for these anatomical areas.

disectingAnatomy_25-July-2017a.nb 7

Page 8: Cleaning then using Wolfram Language Data · 25/07/2017  · Albania , Cambodia , Cocos Keeling Islands , Antigua and Barbuda , Oman , Saint Vincent and the Grenadines , Vatican City

In[72]:= Show@EntityValue foot (anatomical structure) muscles , "Graphics3D"

Out[72]=

In[73]:= Show@

DeleteCasesEntityValue chest (anatomical structure) muscles , "Graphics3D", _Missing

Out[73]=

8 disectingAnatomy_25-July-2017a.nb

Page 9: Cleaning then using Wolfram Language Data · 25/07/2017  · Albania , Cambodia , Cocos Keeling Islands , Antigua and Barbuda , Oman , Saint Vincent and the Grenadines , Vatican City

In[74]:= Show@

DeleteCasesEntityValue back (anatomical structure) muscles , "Graphics3D", _Missing

Out[74]=

In[75]:= Show@DeleteCases

EntityValue abdomen (anatomical structure) muscles , "Graphics3D", _Missing

Out[75]=

disectingAnatomy_25-July-2017a.nb 9