1 cisc3130, spring 2013 x. zhang working with files
TRANSCRIPT
1
CISC3130, Spring 2013X. Zhang
Working with files
2
Outlines Finish up with awk: pipeline, external
commands Commands working with files
tree, ls (-d option, -1 option, -R, -a)od (octal dump), stat (show meta data of file) touch command, temporary file, file with
random bytesFile checksum, verification locate, type, which, find command: Finding
files
3
Some useful tips Bash stores the commands history
Use UP/DOWN arrow to browse themUse “history” to show past commands
Repeat a previous command!<command_no>
e.g., !239 “!<any prefix of previous command>
E.g., !g++
Search for a commandType Ctrl-r, and then a stringBash will search previous commands for a match
File name autocompletion: “tab” key
Output redirection: to pipeline
#!/bin/awk -f
BEGIN {
FS = ":“
## generate a temporay file
"mktemp /tmp/prog.XXXXXXXX" | getline tmpfile
print "temp file is: ", tmpfile
close ("mktemp")
}
{ # select username for users using bash
if ($7 ~ "/bin/bash")
print $1 >> tmpfile
}
4
END{
while ((getline < tmpfile) > 0)
{
cmd="mail -s Fellow_BASH_USER " $0
print "Hello," $0 | cmd
## send an email to every bash user
}
close (tmpfile);
}
pipe_mail.awkTodo:1. 2.
Execute external command Using system function (similar to C/C++)
E.g., system (“rm –f tmp”) to remove a file if (system(“rm –f tmp”)!=0) print “failed to rm tmp”
A shell is started to run the command line passed as argumentInherit awk program’s standard
input/output/error
5
6
Outlines Finish up with awk: pipeline, external
commands Commands working with files
tree, ls (-d option, -1 option, -R, -a)od (octal dump), stat (show meta data of
file), cmp, diff touch command temporary file, file with random byteslocate, type, which, find command: Finding
files
7
What’s in a file ? files are organized in a hierarchical directory structure
Each file has a name, resides under a directory, is associated with some meta info (permission, owner, timestamps)
Disk files, virtual file system, device filesContents of disk file: text (ASCII) file (such as your C/C++
source code), executable file (commands), a link to other files, …ln -s /path/to/file1.txt /path/to/file2.txt
/proc filesystem stores system configuration parameters, resides in kernels memoryNumerical subdirectories exist for every process.
a device file or special file is an interface for a device driver that appears in a file system as if it were an ordinary fileFor example, /dev/stdin, /dev/tty*
8
What’s in a file ? Recall, ls –l output, first character indicates file types:
d directory, - plain file, b block-type special file, c character-type special file, l symbolic link, s socket
To check type of file: “file filename”To view “octal dump” of a file:
od [OPTION]... [FILE]... od --traditional [FILE] [[+]OFFSET [[+]LABEL]]
Important options:-A: what base to use when displaying address (default:
base 8) -t: specify how to interpret file content
a: named character, c: ASCII character or backslash representation
d[size]: signed decimal, size bytes per integero[size], octal ; x[size], hexadecimal
9
What’s in a file ? Example of od $echo abc def ghi jkl | od -c
0000000 a b c d e f g h i j k l \n
0000020
[zhang@storm ~]$ echo abc def ghi jkl | od -Ad –c ## same as –t c
0000000 a b c d e f g h i j k l \n
0000016
$ echo abc def ghi jkl | od -Ad -t d1 ## interpret each byte as decimal integer
0000000 97 98 99 32 100 101 102 32 103 104 105 32 106 107 108 10
0000016
$echo abc def ghi jkl | od -Ad -t x1
0000000 61 62 63 20 64 65 66 20 67 68 69 20 6a 6b 6c 0a
0000016
Disk space usagedf report file system disk space usage
df [OPTION]... [FILE]...Show information about file system on which
each FILE resides, or all file systems by default.
du - estimate file space usagedu [OPTION]... [FILE]...Summarize disk usage of each FILE,
recursively for directories.quota - display disk usage and limits
10
11
Compare file contentsCompare files
cmp file1 file2: finds the first place where two files differ (in terms of line and character)
diff file1 file2: reports all lines that are differentdiff’s output is carefully designed so that it can be used
by other programs. For example, revision control systems use diff to manage the differences between successive versions of files under their management.
patch command: apply a diff file to an original patch [options] [originalfile [patchfile]] patch -pnum <patchfile
File checksumprovide a single number, signature, that is
characteristic of the file (computed from all of the bytes of the file)Files with different contents is unlikely to
have same checksumUsage: Software announcements include
checksums of distribution files for user to tell whether a copy matches original.
12
openssla cryptography toolkit implementing Secure
Sockets Layer and Transport Layer Security network protocols and related cryptography standards
openssl program: a command line tool for using various cryptography functions from shell. Creation and management of private keys, public keys and
parameters Public key cryptographic operationsCreation of X.509 certificates, CSRs and CRLs Calculation of Message Digests Encryption and Decryption with CiphersSSL/TLS Client and Server TestsHandling of S/MIME signed or encrypted mail Time Stamp requests, generation and verification
13
Message digestopenssl dgst [-md5|-md4|-md2|-sha1|-sha|-
mdc2|-ripemd160|-dss1] [-c] [-d] [-hex] [-binary] [-out filename] [-sign filename] [-keyform arg] [-passin arg] [-verify filename] [-prverify filename] [-signature filename] [-hmac key] [file...]
Or [md5|md4|md2|sha1|sha|mdc2|ripemd160]
[-c] [-d] [file...]Output message digest of a supplied file or
files in hexadecimal form
14
Example $ md5sum /bin/l?696a4fa5a98b81b066422a39204ffea4 /bin/lncd6761364e3350d010c834ce11464779 /bin/lp351f5eab0baa6eddae391f84d0a6c192 /bin/lsOutput: 32 hexadecimal digits, i.e., 128 bits.chance of two different files with identical
signatures is: 1/2128 (the book: 1/264) In 2005, researchers were able to create pairs
of PostScript documents and X.509 certificates with the same hash. Later that year, MD5's designer Ron Rivest wrote, "md5 and sha1 are both clearly broken (in terms of collision-resistance)."
15
public-key cryptographyData security by two related keys: a private key, known
only to its owner, and a public key, potentially known to anyoneExamples: RSA, DSA algorithms
Digital signature: Alice => Bob communicationIf Alice wants to sign an open letter, she uses her private key to
encrypt it. Bob uses Alice’s public key to decrypt signed letter, and can then be confident that only Alice could have signed it, provided that she is trusted not to divulge her private key.
Secrecy:If Alice wants to send a letter to Bob that only he can read, she
encrypts it with Bob’s public key, and he then uses his private key to decrypt it. As long as Bob keeps his private key secret, Alice can be confident that only Bob can read her letter.
16
Secure Software Distributionmany software archives include digital signatures
that incorporate information from a file checksum as well as from signer’s private key.
how to verify such signatures ?$ ls -l coreutils-5.0.tar* ##Show the distribution files-rw-rw-r-- 1 jones devel 6020616 Apr 2 2003 coreutils-5.0.tar.gz-rw-rw-r-- 1 jones devel 65 Apr 2 2003 coreutils-5.0.tar.gz.sig$ gpg coreutils-5.0.tar.gz.sig ##Try to verify the
signaturegpg: Signature made Wed Apr 2 14:26:58 2003 MST using DSA
key ID D333CBA1gpg: Can't check signature: public key not found
17
Verify using public key Obtain public key from public servers Add the public key to your key ring
$ gpg --import temp.keygpg: key D333CBA1: public key "Jim Meyering
<[email protected]>" importedgpg: Total number processed: 1gpg: imported: 1
Verify the signature successfully:$ gpg coreutils-5.0.tar.gz.sig Verify the digital
signatureOnline resource: The GNU Privacy Handbook
18
19
Outlines Finish up with awk: pipeline, external
commands Commands working with files
tree, ls and echo (-d option, -1 option, -R, -a)od (octal dump), stat (show meta data of file),
cmp, diff touch command, mktemp, file with random
bytesFile checksum, verification locate, type, which, find command: Finding files
Process-related commands
touch: update modification timeTouch sometimes used to create empty files: their
existence and possibly their timestamps, but not their contents, are significant. a lock file to indicate that a program is already running, and
that a second instance should not be started. to record a file timestamp for later comparison with other
files.Example:
$touch -t 197607040000.00 US-bicentennial$ ls -l US-bicentennial ##List the file-rw-rw-r-- 1 jones devel 0 Jul 4 1976 US-bicentennial$ touch -r US-bicentennial birthday #Copy timestamp to the
new birthday file$ ls -l birthday ## List the new file-rw-rw-r-- 1 jones devel 0 Jul 4 1976 birthday
20
Temporary filesSo far, we created in current directory
And remove it after using itWhat if multiple scripts use same file name?
or malicious users modify the files?Special directories, /tmp (cleared when
system reboots) and /var/tmp To avoid filename collision, append process
id as suffix ## create a temporary file in shell scriptstmpfile=temp.$$ ## $$ (process id) echo $tmpfile
21
mktemp commandmktemp: takes an optional filename template
containing a string of trailing X characters, preferably at least a dozen of them.mktemp replaces them with an alphanumeric
string derived from random numbers and process ID, creates the file with no access for group and other, and prints filename on standard output.
$ TMPFILE=`mktemp /tmp/myprog.XXXXXXXXXXXX` || exit 1 Make unique temporary file
$ ls -l $TMPFILE List the temporary file
-rw------- 1 jones devel 0 Mar 17 07:30 /tmp/myprog.hJmNZbq25727
22
Random bytes two random pseudodevices: /dev/random
and /dev/urandom.These devices serve as never-empty
streams of random bytes: such a data source is needed in many cryptographic and security applications.
23
24
Outlines Finish up with awk: pipeline, external
commands Commands working with files
tree, ls and echo (-d option, -1 option, -R, -a)od (octal dump), stat (show meta data of
file), cmp, diff File checksum, verificationtouch command temporary file, file with random byteslocate, type, which, find command: Finding
files
Search for files locate: find files by name, using regularly updated
database constructed by complete scans of the filesystemlocate [OPTION]... PATTERN...$locate cksum
which: display full pathname for a command, using PATH variable$which rm alias rm='rm' /bin/rm
type: shell built-in command, how each name would be interpreted if used as a command name-t option: report if a name is an alias, shell reserved word,
function, builtin, or disk file
25
find commandfind [ files-or-directories ] [ options ]: find files
matching specified name patterns, or having given attributes.–atime n: Select files with access times of n days (-ctime, -
mtime)–ls: Produce a listing similar to the ls long form, rather than
just filenames.–name 'pattern’ : select files matching the shell wildcard
pattern (quoted to protect it from shell interpretation).–perm mask: select files matching the specified octal
permission mask.–prune: do not descend recursively into directory trees.–size n: select files of size n.–type t: select files of type t,a single letter: d (directory), f
(file),or l (symbolic link).
26
find: basic operationsfind [ files-or-directories ] [ options ]:
When it finds a file, it first carries out selection restrictions implied by options, and if those tests succeed, it hands the name off to internal action routine.default action: print name on standard output,–exec option: provides a command template into
which name is substituted, and the command is then executed.
27
files and directories to search (directories are (almost) always descended into recursively)
Options: select names for ultimate display or action
find usage examples find: display all files/directory under current directory find -ls: display files/directories in “ls” stylefind * -prunefind $HOME/. ! -user $USER.find -ls -type f -fprint /tmp/mytemp
$find -ls -type f -fprint /tmp/mytemp23724924 4 drwxr-xr-x 2 zhang staff 4096 Mar 25 22:40 .23724925 0 --wx------ 1 zhang staff 0 Mar 25 22:35 ./a23724927 0 -rw-r--r-- 1 zhang staff 0 Mar 25 22:35 ./b23724928 4 -rw-r--r-- 1 zhang staff 10 Mar 25 22:40 ./tmp[zhang@storm testfind]$ more /tmp/mytemp./a./b./tmp
28
find: examplesFiles that haven’t been modified in the last year
find . -mtime +365Unsigned integer: exactly that many days oldNegative: less than that absolute valuePositive: more than that value
Files that user has writing permissionfind . –perm -200 ## all bits set needs to match permission mask as an octal string
Unsigned: an exact match on the permissions is required. Negative: all of the bits set are required to match. Positive: at least one of the bits set must match,
E.g., +700 //user can read, or write, or execute … Files that user does not have reading permission
find . ! –perm -400
29
Find: selectorselector options can be combined: all must
match for the action to be taken. interspersed with the –a (AND) option –o (OR) option: at least one selector of the
surrounding pair must match. Find nonempty files smaller than 10
blocks (5120 bytes)$ find . -size +0 -a -size -10
Find files that are empty or unread in the past year$ find . -size 0 -o -atime +365
30
Usage of find in shell script#!/bin/bash … ## go to top level web site directoryfind . -name '*.html' -type f | ##Find all HTML
fileswhile read file ## Read filename into variabledo echo $file ## Print progress mv $file $file.save ## Save a backup copy ##Make the change sed -f $HOME/html2xhtml.sed < $file.save >
$file done
31
html2xhtml.sedconverts HTML to XHTML: converts tags to
lowercase, and changes <br> tag into self-closing form, <br/>:s/<H1>/<h1>/g Slash delimiters/<H2>/<h2>/gs/<H3>/<h3>/gs/<H4>/<h4>/gs/<H5>/<h5>/gs/<H6>/<h6>/gs:</H1>:</h1>:g Colon delimiter, slash in datas:</H2>:</h2>:g..s:</[Hh][Tt][Mm][LL]>:</html>:gs:</[Hh][Tt][Mm][Ll]>:</html>:gs:<[Bb][Rr]>:<br/>:g
32
HTML to XHTML, standardized XML-based version of HTML
Total file size $ find -ls | awk '{Sum += $7} END
{printf("Total: %.0f bytes\n", Sum)}'Total: 23079017 bytes
33
xargs commandSupply the list returned by find as
arguments to another command Via shell’s command substitution feature.
E.g., searching for symbol POSIX_OPEN_MAX in system header files:$ grep POSIX_OPEN_MAX /dev/null $(find /usr/include -
type f | sort)/usr/include/limits.h: #define _POSIX_OPEN_MAX 16Note: why /dev/null here? Potential problems: command line might exceed
system limit => argument list too long error$getconf ARG_MAX ##sysget configuration values2097152
34
Xargs commandxargs: takes a list of arguments from standard
input, one per line, and feeds them in suitably sized groups (determined by ARG_MAX) to another command given as arguments to xargs.
$ find /usr/include -type f | xargs grep POSIX_OPEN_MAX /dev/null
/usr/include/bits/posix1_lim.h:#define _POSIX_OPEN_MAX 16
/usr/include/bits/posix1_lim.h:#define _POSIX_FD_SETSIZE _POSIX_OPEN_MAX
35
Code Studies: filesdirectories
36
37
Summary Finish up with awk: pipeline, external
commands Commands working with files
tree, ls (-d option, -1 option, -R, -a)od (octal dump), stat (show meta data of file) touch command, temporary file, file with
random bytesFile checksum, verification locate, type, which, find command: Finding
files