GCDkit - cas.cz

Download Report

Transcript GCDkit - cas.cz

Menu
GCDkit
GCDkit I.
Loads ASCII file or imports clipboard
(e.g. data copied from Excel)
•
Data are separated by tabulators,
commas or semicolons
•
The 1st line contains unique labels
for the data columns (e.g. ‘SiO2’,
‘Fe2O3’, ‘Rb’, ‘Nd’),
the 1st column unique sample IDs
•
Decimal commas are converted to
decimal points if appropriate
•
Missing values are allowed
anywhere in the data file; as such
are interpreted also values ≤ 0, or
any of ‘NA’,‘N.A.’,‘-’,
‘b.d.’, ‘bd’
GCDkit I.
•
Total iron as ferrous oxide:
‘FeOt’ or ‘FeO*’
•
Structurally bound water:
‘H2O.PLUS’, ‘H2O+’,
‘H2OPLUS’ or ‘H2O_PLUS’
•
Column ‘Symbol’ (if any): plotting
symbols (as codes or single
characters)
•
A column whose name starts with
‘Col’ (if any): code for colour of
the symbols
•
Avoid special symbols in the
column names, and accented
characters throughout the file!
GCDkit I.
Appends new samples (= new rows) to
the data in memory.
•
The structures of both data files are,
as much as possible, matched.
•
If necessary, empty columns are
introduced to either of the data sets.
File 1
File 2
GCDkit I.
Adds new data (columns) to the
samples stored in the memory.
•
No new samples are introduced that
would occur solely in one of the
files.
File 1
File 2
GCDkit I.
Saves the modified data set stored in
memory under a specified filename.
•
The data can be retrieved again into
GCDkit using the ‘Load data
file’ command.
GCDkit I.
Information about the current dataset:
•
levels and frequencies for each of
the labels,
•
list + no. of numeric columns,
•
for each of the numeric variables no.
of available values,
•
total no. of samples,
•
list of samples in the selected subset
(or all samples if none is defined),
•
current grouping information.
GCDkit I.
Prints a cross table (contingency table)
for 1-3 labels and plots
corresponding barplots.
Contingency tables
An example of a contingency
table involving two labels
GCDkit I.
Restricts the textual output to an
absolute minimum
(which is useful for large data files)
GCDkit II.
GCDkit II.
Data
handling
Intermezzo 1:
Specifying a variable in GCDkit
1.
Enter complete name of
a variable (e.g., ‘SiO2’)
2.
Type only part of the variable
name. If the result is ambiguous,
the desired variable has to be
selected from the list of the
multiple matches by mouse
(applies also for empty patterns)
3.
Specify the variable sequence
number (2 for the second one).
4.
Often if a formula is entered,
the results are interpreted and
computed by the calculation
core.
S
Intermezzo 2:
Formulae & calculation core
Formula can involve any combination of names of existing numerical columns,
with the constants, brackets, arithmetic operators +-*/^ and R functions.
sqrt
square root
log, log10
natural/common logarithm
exp
exponential function
sin, cos, tan
trigonometric functions
•
(Na2O+K2O)/CaO
min
minimum
•
Rb^2
max
maximum
•
log10(Sr)
length
number of elements/cases
•
mean(SiO2)/10
sum
sum of the elements
mean
mean of the elements
prod
product of the elements
Examples of valid formulae:
Data handling I.
Displays a single numeric variable or
a result of a calculation
# Works as a simple R shell too!
•
summary(Rb,na.rm=T)
•
cbind(SiO2/2,TiO2,Na2O+K2O)
•
cbind(major)
•
hist(SiO2,col="red")
•
boxplot(Rb~factor(groups))
Intermezzo 3:
Specifying multiple variables
1.
List of column name(s), in full,
separated by commas
2.
Sequence numbers of variables
or their ranges (1,10:15)
3.
Name of a built-in list, such as
‘LILE’, ‘REE’, ‘major’ and
‘HFSE’ or their combinations
with the column names
4.
User-defined list = simple character vector. Currently only a single, standalone user-defined list can be employed as a search criterion
5.
For empty patterns, the correct name(s) has to be selected by mouse click(s)
(± Shift ± Ctrl) from the list of the available variables
Intermezzo 3:
Specifying multiple variables - examples
1.
Search pattern = major
SiO2, TiO2, Al2O3, Fe2O3, FeO, MnO, MgO, CaO, Na2O, K2O, P2O5
2.
Search pattern = LILE
Rb, Sr, Ba, K, Cs, Li
3.
Search pattern = HFSE
Nb, Zr, Hf, Ti, Ta, La, Ce, Y, Ga, Sc, Th, U
4.
Search pattern = REE
La, Ce, Pr, Nd, Sm, Eu, Gd, Tb, Dy, Ho, Er, Tm, Yb, Lu
5.
Search pattern = 1:5,7
Numeric data columns number 1, 2, ...5, 7
6.
# User-defined list
my.elems<-c("Rb","Sr","Ba")
Search pattern = my.elems
Rb, Sr, Ba
Intermezzo 4:
Searching and subsetting
1.
The search pattern is first tested
whether it could be interpreted as
a query of the sample name(s).
The list of exact sample names
separated by commas is allowed.
2.
The pattern is assumed to
correspond to a selection of
sample sequence numbers.
3.
Lastly the search pattern is
interpreted as a Boolean
condition.
4.
Entering empty pattern usually
returns all the samples in the data
set.
Intermezzo 4:
Searching and subsetting - examples
1. By sample name
1.
Search pattern = oz
Samples with names Koz, KozD-5, Roz-5 …
2.
Search pattern = Bl-1,Bl-2,Koz-3
Samples with names Bl-1,Bl-2,Koz-3
3.
Regular expressions (advanced technique, see later)
Intermezzo 4:
Searching and subsetting - examples
2. By sample range
In this case the search pattern is treated as a selection of sample sequence numbers
(effectively a list separated by commas that may also contain ranges
expressed by colons).
1.
Search pattern = 1:5
# First to fifth samples in the data set
2.
Search pattern = 1,10
# First and tenth samples
3.
Search pattern = 1:5, 10:11, 25
# Samples number 1, 2, ...5, 10, 11, 25
Intermezzo 4:
Searching and subsetting - examples
3. By Boolean conditions
Patterns may employ variable names and in R
common comparison operators (see Table).
<
lower than
•
The character strings should be quoted.
>
greater than
•
The conditions can be combined together
by logical and, or and brackets.
<=
lower or equal to
•
Logical and can be expressed as
‘.and.’ ‘.AND.’ ‘&’
>=
greater or equal to
=
equal to
•
Logical or can be expressed as
‘.or.’ ‘.OR.’ ‘|’
•
Regular expressions can be employed to
search in the textual labels.
(advanced technique, see later )
==
!=
not equal to
Intermezzo 4:
Searching and subsetting - examples
3. By Boolean conditions
1.
Search pattern: Intrusion="Rhum“
# Finds all analyses from Rhum
2.
Search pattern: Intrusion="Rhum".and.SiO2>65
Search pattern: Intrusion="Rhum".AND.SiO2>65
Search pattern: Intrusion="Rhum"&SiO2>65
# All analyses from Rhum with silica greater than 65
# (all three expressions are equivalent)
3.
Search pattern: MgO>10&(Locality="Skye"|Locality="Islay")
# All analyses from Skye or Islay with MgO greater than 10
Data handling I.
Displays specified combination of
numeric variable(s) and/or labels for
selected range of samples.
•
So far only names of existing
numeric data columns and not
formulae can be handled.
Data handling I.
Deletes a single numeric variable or
a label.
•
Some fields are mandatory and
cannot be removed.
Data handling I.
Appends an empty numeric data column
or new label to the current data set.
Data handling I.
Simultaneous editing of all labels for
individual samples using a
spreadsheet-like interface.
•
When the desired changes have
been performed, close button is to
be clicked.
Data handling I.
Data handling I.
Global replacement of selected discrete
values (levels) for a given label.
Data handling I.
Simultaneous editing of all numeric
data using a spreadsheet-like
interface.
Intermezzo 5:
Regular expressions
Many enquiries in the GCDkit employ regular expressions. This is a quite
powerful searching mechanism more familiar to people working in Unix.
•
Most characters, including letters and digits, are regular expressions that
match themselves.
•
Dot ‘.’ matches any character.
•
Metacharacters with a special meaning
‘?’ ‘+’ ‘{’ ‘} ’ ‘|’ ‘(’ ‘) ’)
must be preceded by a backslash.
•
Brackets can be used to group subexpressions.
Intermezzo 5:
Metacharacter
Regular expressions
Matches
.
Any character
x
Any instance of x
^
Beginning of the expression
$
End of the expression
[ xy]
Any of the characters given in square brackets
[ x-y]
Range of the characters given in square brackets
x|y
A logical OR operator (will match an instance of x or y)
Intermezzo 5:
Regular expressions
Repetition operator
The preceding item will be matched
?
At most once (i.e. is optional)
*
Zero or more times
+
One or more times
{n}
Exactly n times
{n,}
At least n times
{n,m}
At least n times, but not more than m times
Intermezzo 5:
Regular expressions - examples
# Searched is list of localities: Mull, Rhum, Skye, Coll, Colonsay, Hoy,
Westray, Sanday, Stronsay, Tiree, Islay
•
Search pattern = ol
Coll, Colonsay
•
Search pattern = n.a
Colonsay, Sanday, Stronsay
•
Search pattern = ^S
Skye, Sanday, Stronsay
•
Search pattern = e$
Skye, Tiree
•
Search pattern = [ds]ay
Colonsay, Sanday, Stronsay
•
Search pattern = [p-s]ay
Colonsay, Westray, Stronsay
Intermezzo 5:
Regular expressions - examples
# Searched is list of localities: Mull, Rhum, Skye, Coll, Colonsay, Hoy,
Westray, Sanday, Stronsay, Tiree, Islay
•
Search pattern = ol|oy
Coll, Colonsay, Hoy
•
Search pattern = l{2}
Mull, Coll
# Sample names are: Bl-1, Bl-3, Koz-1, Koz-2, Koz-5, Koz-11, KozD-1, Ri-1
•
Search pattern = oz-|BlBl-1, Bl-2, Bl-3, Koz-1, Koz-2, Koz-5, Koz-11
•
Search pattern = oz-[1-3]
Koz-1, Koz-2, Koz-11
•
Search pattern = oz-1{1,}
Koz-1, Koz-11
Data handling I.
Selecting subsets of the data stored in
memory by searching sample names
or levels of a single label.
•
regular expressions implemented
Lokalita
Data handling I.
Selecting subsets of the data stored in
memory by their range.
1:5
Data handling I.
Selecting subsets of the current dataset
using Boolean conditions.
•
queried can be both numeric fields
and labels (or combinations thereof)
•
regular expressions can be
employed to search the labels
Suita=“Ricany”
Data handling I.
Restores data for all samples in the
same form as they were loaded from
a data file.
Data handling II.
Grouping the data according to the
levels of a single label.
Suita
Data handling II.
Grouping the data according to the
interval a single numerical variable
falls into.
•
Enter a comma-delimited list of one
or more breakpoints defining the
intervals
•
The default includes the mean, that
would be supplemented by 0 and
maximum (i.e. two intervals)
•
The names of individual groups can
be specified
•
The vector containing the
information on the groups can be
appended to the labels.
Data handling II.
SiO2
52,63
Basic,Intermediate,Acid
Data handling II.
Grouping the data using selected
classification diagram.
•
The vector containing the
information on the current groups
can be appended to the labels.
Data handling II.
Grouping the data using the cluster
analysis.
5
•
After the dendrogram is drawn, the
user is asked how many clusters is
the dataset to be broken into.
•
The vector containing the
information on the current groups
can be appended to the labels.
•
The groups are initially numbered
but the names can be changed
readily using the function
Edit labels as factor.
Data handling II.
Enables merging several groups into
a single one.
•
Old
Young
Young
Old
The vector containing the
information on the current groups
can be appended to the labels.
Intermezzo 4:
Plotting symbols
Use codes from the table or
single character vectors as
‘*’,‘B’,‘s’
Intermezzo 5:
Plotting colours
NB that only numeric
codes can be used
to specify plotting
colours so far.
Data handling III.
Assigns plotting symbols and colours
simultaneously according to the
levels of the defined groups.
34
Data handling III.
Assign plotting symbols or colours
according to the levels of a single
label.
Data handling III.
Assign uniform plotting symbols or
colours to all the analyses in the
current data set.
Data handling III.
Data handling III.
Displays a graphical legend(s) with
current assignment of plotting
symbols and colours used by most
of the diagrams.
•
If necessary, two legends are
created, for symbols and colours
separately.
Calculations
Calculations I.
Computes a single numeric variable and
appends it, under specified name, to
the numeric data in memory.
SiO2/5
My.param
Calculations I.
Adds a formula for a single numeric
variable to the specified R script
(‘*.r’ ).
•
The user is prompted for the
variable name and any comments
that should appear in the file.
•
The script can be executed later
using the R command
‘File|Source’. Alternatively, it
can be placed among the plugins
into the subdirectory ‘\Plugin’.
Calculations I.
Recasts the selected data to a fixed sum.
Calculations I.
Calculates millications as used for many
plots of the French school, e.g. by
De la Roche et al. (1980) or Debon
& Le Fort (1983, 1988).
•
The calculated values are Si, Ti, Al,
Fe3, Fe2, Fe, Mn, Mg, Ca, Na, K, P.
Oxidei (wt.%)
Elem enti  1000
n
MW (Oxidei )
Where: MW = molecular weight of the
Oxidei, n = number of atoms of
Elementi in the formula
Calculations I.
Calculations II. (stats)
Prints statistical summary for a single
variable and the current dataset
(or its part).
•
Formulae are ok.
•
The statistical summary involves no.
of observations, missing values,
mean, std. deviation, minimum,
25% quartile, median (= 50%
quartile), 75% quartile and
maximum.
•
The function also plots a summary
boxplot and a histogram.
Calculations II. (stats)
Calculations II. (stats)
Like previous, but respecting grouping.
Calculations II. (stats)
Calculations II. (stats)
Prints statistical summary for selected
list of elements (majors or traces,
respectively) and the current dataset
(or its part).
•
The statistical summary involves no.
of observations, missing values,
mean, std. deviation, minimum,
25% quartile, median (= 50%
quartile), 75% quartile and
maximum.
•
Optionally the function also plots
summary boxplots and/or
histograms.
Calculations II. (stats)
Calculations II. (stats)
Like previous, but respecting grouping.
Calculations II. (stats)
Calculations II. (stats)
Displays a binary diagram of two
elements/oxides, in which are
plotted averages for the individual
groups with whiskers corresponding
to the standard deviations.
Calculations II. (stats)
Calculations II. (stats)
Plots a matrix of scatterplots in the
lower panel and one of other predefined panel functions in the upper.
panel.corr
Prints correlations, with
size proportional to their
magnitude
panel.cov
Prints covariances
panel.smooth
Fits smooth trend lines
panel.hist
Plots histograms of
frequencies
Calculations II. (stats)
Calculations II. (stats)
Calculations II. (stats)
Produces, for each group a separate, set
of plots of correlation coefficient
patterns (Rollinson 1993 and
references therein).
•
Similarity in patterns between two
or more elements indicates their
analogous geochemical behaviour,
potentially controlled by the same
geochemical process (fractional
crystallization, partial melting,
weathering, hydrothermal
alteration...)
Calculations II. (stats)
Calculations II. (stats)
Performs principal components analysis
(scaled variables, covariance or
correlation matrix) and plots a biplot
(Gabriel, 1971).
•
The length of the individual arrows
is proportional to the relative
variation of each of the variables.
•
Comparable direction of two arrows
implies that both variables are
positively correlated; the opposite
one indicates a strong negative
correlation.
(Buccianti & Peccerillo, 1999)
Calculations II. (stats)
Calculations II. (stats)
Hierarchical cluster analysis on a set of
dissimilarities.
•
The user is asked to specify a label
for the individual samples, default
are their names.
•
After the dendrogram is drawn, the
individual clusters can be identified.
•
For each sample falling into the
given group, specified information
(e.g. Locality, Rock type and/or
Author) can be printed.
Calculations II. (stats)
Calculations III.
Cationic parameters of Niggli (1948).
Various modifications of the CIPW
norm (Hutchison 1974, 1975)
Eine bessere Mesonorm for granitoids
(Mielke & Winkler 1979). If desired,
the Q'-ANOR diagram of Streckeisen
& Le Maitre (1979) is plotted.
Niggli's Molecular Norm (Catanorm),
the algorithm is by Hutchison (1974).
Calculations III.
Least-squares approximations of the
mode given major-element
compositions of the rock and its
main mineral constituents.
•
Both unconstrained and constrained
solutions are produced (see
Albarede 1995 and the help file).
•
Mineral compositions are to be
provided in a separate tab-delimited
text file.
•
The output includes computed
modal proportions of the individual
minerals, the calculated composition
of the rock and residuals.
Calculations III.
Calculations III.
Recalculates whole-rock analyses to
Debon & Le Fort's (1983, 1988)
multicationic values.
Recasts whole-rock data into
R1-R2 values of De La Roche et al.
(1980).
Calculations III.
Various petrochemical indexes, such as:
•
Total iron as Fe2O3
•
Fe2O3/FeO, Na2O/K2O and
K2O/Na2O ratios
•
Differentiation index (Larsen 1938)
•
Solidification index (Kuno 1959)
Calculations III.
Saves the variable ‘results’ ,
returned by most calculation
algorithms, to a tab-delimited text
(ASCII) file.
Calculations III.
Appends the most recently calculated
values (variable ‘results’) to
the numeric data stored in memory,
e.g. for plotting or statistical
evaluation.
Calculations III.
Copies the most recently calculated
results to a clipboard.
Plugin Saturation.r
(Menu Calculations)
Zircon saturation: yields temperatures for the observed major-element and Zr
contents. Returns also Zr saturation levels for the given major-element
compositions and assumed magma temperature (Watson & Harrison, 1983).
Monazite saturation computes saturation temperatures for given major-element
compositions and LREE contents of the magma (Montel, 1993).
Apatite saturation calculates saturation temperatures for observed whole-rock
major-element compositions (including P2O5 contents). Returns also a
phosphorus saturation levels for the given major-element compositions and
assumed magma temperature (Harrison & Watson, 1984; Bea et al., 1992;
Pichavant et al., 1992).
Plugin html_tables.r
(Menu Data handling)
(Menu Calculations)
Both functions output the specified data with (optional) labels into HTML. This
format is useful for importing into spreadsheets, word processors or publishing
on the WWW.
•
The plugin attempts to format sub- and superscripts in the variable names.
•
The created file ‘htmltable.html’ is placed in the subdirectory
‘\R2HTM’ of the main GCDkit directory; when finished, it is previewed in a
browser. The style for the table is determined by the cascade style file
R2HTML.css in the subdirectory ‘\Plugin’.
•
The plugin exploits R2HTML library by Eric Lecoutre, which must be
downloaded from CRAN and properly installed.
Plugin html_tables.r