GOOGLE N-GRAMS ON AMAZON WEB SERVICES PART 2 Thomas Tiahrt, MA, PhD Computer Science 482 – Introduction to Text Analytics Google Books N-Grams n-gram viewer http://books.google.com/ngrams/info n-gram.
Download ReportTranscript GOOGLE N-GRAMS ON AMAZON WEB SERVICES PART 2 Thomas Tiahrt, MA, PhD Computer Science 482 – Introduction to Text Analytics Google Books N-Grams n-gram viewer http://books.google.com/ngrams/info n-gram.
Slide 1
GOOGLE N-GRAMS
ON AMAZON WEB SERVICES
PART 2
Thomas Tiahrt, MA, PhD
Computer Science 482 – Introduction to Text Analytics
Google Books N-Grams
2
n-gram viewer
http://books.google.com/ngrams/info
n-gram datasets
http://storage.googleapis.com/books/ngrams/books/
datasetsv2.html
File Format for Google’s N-Grams
3
Data is compressed
Fields are separated by tabs ('\t')
One record per line
newline
character ('\n') ends record
N-gram is the 1gram, 2gram, 3gram, 4gram, 5gram
Version 2
4
Data created July 2012
Version 2 file format
N-gram \t year \t match_count \t volume_count \n
N-gram:1gram,
2gram, 3gram, 4gram, 5gram
year: publication year
match_count: occurrences for that year
volume_count: number of books where n-gram occurred
5
End of Part Two
This is the end of part two. Please proceed to
part three.
Slide 2
GOOGLE N-GRAMS
ON AMAZON WEB SERVICES
PART 2
Thomas Tiahrt, MA, PhD
Computer Science 482 – Introduction to Text Analytics
Google Books N-Grams
2
n-gram viewer
http://books.google.com/ngrams/info
n-gram datasets
http://storage.googleapis.com/books/ngrams/books/
datasetsv2.html
File Format for Google’s N-Grams
3
Data is compressed
Fields are separated by tabs ('\t')
One record per line
newline
character ('\n') ends record
N-gram is the 1gram, 2gram, 3gram, 4gram, 5gram
Version 2
4
Data created July 2012
Version 2 file format
N-gram \t year \t match_count \t volume_count \n
N-gram:1gram,
2gram, 3gram, 4gram, 5gram
year: publication year
match_count: occurrences for that year
volume_count: number of books where n-gram occurred
5
End of Part Two
This is the end of part two. Please proceed to
part three.
Slide 3
GOOGLE N-GRAMS
ON AMAZON WEB SERVICES
PART 2
Thomas Tiahrt, MA, PhD
Computer Science 482 – Introduction to Text Analytics
Google Books N-Grams
2
n-gram viewer
http://books.google.com/ngrams/info
n-gram datasets
http://storage.googleapis.com/books/ngrams/books/
datasetsv2.html
File Format for Google’s N-Grams
3
Data is compressed
Fields are separated by tabs ('\t')
One record per line
newline
character ('\n') ends record
N-gram is the 1gram, 2gram, 3gram, 4gram, 5gram
Version 2
4
Data created July 2012
Version 2 file format
N-gram \t year \t match_count \t volume_count \n
N-gram:1gram,
2gram, 3gram, 4gram, 5gram
year: publication year
match_count: occurrences for that year
volume_count: number of books where n-gram occurred
5
End of Part Two
This is the end of part two. Please proceed to
part three.
Slide 4
GOOGLE N-GRAMS
ON AMAZON WEB SERVICES
PART 2
Thomas Tiahrt, MA, PhD
Computer Science 482 – Introduction to Text Analytics
Google Books N-Grams
2
n-gram viewer
http://books.google.com/ngrams/info
n-gram datasets
http://storage.googleapis.com/books/ngrams/books/
datasetsv2.html
File Format for Google’s N-Grams
3
Data is compressed
Fields are separated by tabs ('\t')
One record per line
newline
character ('\n') ends record
N-gram is the 1gram, 2gram, 3gram, 4gram, 5gram
Version 2
4
Data created July 2012
Version 2 file format
N-gram \t year \t match_count \t volume_count \n
N-gram:1gram,
2gram, 3gram, 4gram, 5gram
year: publication year
match_count: occurrences for that year
volume_count: number of books where n-gram occurred
5
End of Part Two
This is the end of part two. Please proceed to
part three.
Slide 5
GOOGLE N-GRAMS
ON AMAZON WEB SERVICES
PART 2
Thomas Tiahrt, MA, PhD
Computer Science 482 – Introduction to Text Analytics
Google Books N-Grams
2
n-gram viewer
http://books.google.com/ngrams/info
n-gram datasets
http://storage.googleapis.com/books/ngrams/books/
datasetsv2.html
File Format for Google’s N-Grams
3
Data is compressed
Fields are separated by tabs ('\t')
One record per line
newline
character ('\n') ends record
N-gram is the 1gram, 2gram, 3gram, 4gram, 5gram
Version 2
4
Data created July 2012
Version 2 file format
N-gram \t year \t match_count \t volume_count \n
N-gram:1gram,
2gram, 3gram, 4gram, 5gram
year: publication year
match_count: occurrences for that year
volume_count: number of books where n-gram occurred
5
End of Part Two
This is the end of part two. Please proceed to
part three.
GOOGLE N-GRAMS
ON AMAZON WEB SERVICES
PART 2
Thomas Tiahrt, MA, PhD
Computer Science 482 – Introduction to Text Analytics
Google Books N-Grams
2
n-gram viewer
http://books.google.com/ngrams/info
n-gram datasets
http://storage.googleapis.com/books/ngrams/books/
datasetsv2.html
File Format for Google’s N-Grams
3
Data is compressed
Fields are separated by tabs ('\t')
One record per line
newline
character ('\n') ends record
N-gram is the 1gram, 2gram, 3gram, 4gram, 5gram
Version 2
4
Data created July 2012
Version 2 file format
N-gram \t year \t match_count \t volume_count \n
N-gram:1gram,
2gram, 3gram, 4gram, 5gram
year: publication year
match_count: occurrences for that year
volume_count: number of books where n-gram occurred
5
End of Part Two
This is the end of part two. Please proceed to
part three.
Slide 2
GOOGLE N-GRAMS
ON AMAZON WEB SERVICES
PART 2
Thomas Tiahrt, MA, PhD
Computer Science 482 – Introduction to Text Analytics
Google Books N-Grams
2
n-gram viewer
http://books.google.com/ngrams/info
n-gram datasets
http://storage.googleapis.com/books/ngrams/books/
datasetsv2.html
File Format for Google’s N-Grams
3
Data is compressed
Fields are separated by tabs ('\t')
One record per line
newline
character ('\n') ends record
N-gram is the 1gram, 2gram, 3gram, 4gram, 5gram
Version 2
4
Data created July 2012
Version 2 file format
N-gram \t year \t match_count \t volume_count \n
N-gram:1gram,
2gram, 3gram, 4gram, 5gram
year: publication year
match_count: occurrences for that year
volume_count: number of books where n-gram occurred
5
End of Part Two
This is the end of part two. Please proceed to
part three.
Slide 3
GOOGLE N-GRAMS
ON AMAZON WEB SERVICES
PART 2
Thomas Tiahrt, MA, PhD
Computer Science 482 – Introduction to Text Analytics
Google Books N-Grams
2
n-gram viewer
http://books.google.com/ngrams/info
n-gram datasets
http://storage.googleapis.com/books/ngrams/books/
datasetsv2.html
File Format for Google’s N-Grams
3
Data is compressed
Fields are separated by tabs ('\t')
One record per line
newline
character ('\n') ends record
N-gram is the 1gram, 2gram, 3gram, 4gram, 5gram
Version 2
4
Data created July 2012
Version 2 file format
N-gram \t year \t match_count \t volume_count \n
N-gram:1gram,
2gram, 3gram, 4gram, 5gram
year: publication year
match_count: occurrences for that year
volume_count: number of books where n-gram occurred
5
End of Part Two
This is the end of part two. Please proceed to
part three.
Slide 4
GOOGLE N-GRAMS
ON AMAZON WEB SERVICES
PART 2
Thomas Tiahrt, MA, PhD
Computer Science 482 – Introduction to Text Analytics
Google Books N-Grams
2
n-gram viewer
http://books.google.com/ngrams/info
n-gram datasets
http://storage.googleapis.com/books/ngrams/books/
datasetsv2.html
File Format for Google’s N-Grams
3
Data is compressed
Fields are separated by tabs ('\t')
One record per line
newline
character ('\n') ends record
N-gram is the 1gram, 2gram, 3gram, 4gram, 5gram
Version 2
4
Data created July 2012
Version 2 file format
N-gram \t year \t match_count \t volume_count \n
N-gram:1gram,
2gram, 3gram, 4gram, 5gram
year: publication year
match_count: occurrences for that year
volume_count: number of books where n-gram occurred
5
End of Part Two
This is the end of part two. Please proceed to
part three.
Slide 5
GOOGLE N-GRAMS
ON AMAZON WEB SERVICES
PART 2
Thomas Tiahrt, MA, PhD
Computer Science 482 – Introduction to Text Analytics
Google Books N-Grams
2
n-gram viewer
http://books.google.com/ngrams/info
n-gram datasets
http://storage.googleapis.com/books/ngrams/books/
datasetsv2.html
File Format for Google’s N-Grams
3
Data is compressed
Fields are separated by tabs ('\t')
One record per line
newline
character ('\n') ends record
N-gram is the 1gram, 2gram, 3gram, 4gram, 5gram
Version 2
4
Data created July 2012
Version 2 file format
N-gram \t year \t match_count \t volume_count \n
N-gram:1gram,
2gram, 3gram, 4gram, 5gram
year: publication year
match_count: occurrences for that year
volume_count: number of books where n-gram occurred
5
End of Part Two
This is the end of part two. Please proceed to
part three.