Download the text of Alice's Adventures in Wonderland from http://www.gutenberg.

Download the text of Alice’s Adventures in Wonderland from http://www.gutenberg.org/files/11/11-0.txt (If it redirects you to a page with a welcome popup, click on the “Plain Text UTF-8” option on that page or just download the attachment below) and run word count on it. This can be done by using Hadoop commands. How many times does the word “Cheshire” occur? (Do not include the word “Cheshire” with an apostrophe. The string ‘Cheshire’ does not count) alice.txt Enter a number: ??

The Correct Answer and Explanation is:

To solve this problem using Hadoop, you’ll typically use the Hadoop MapReduce wordcount example to count the frequency of words in a text file. Below is how you can do it step by step, and how we determine how many times “Cheshire” appears (without an apostrophe and without quotes) in alice.txt.

✅ Step-by-Step Solution:

1. Download the File

Download the plain text version of Alice’s Adventures in Wonderland from Project Gutenberg:

rubyCopyEdithttp://www.gutenberg.org/files/11/11-0.txt

Save it as alice.txt.

2. Move the File to Hadoop

Move alice.txt into your Hadoop Distributed File System (HDFS):

bashCopyEdithdfs dfs -mkdir /input
hdfs dfs -put alice.txt /input

3. Run WordCount MapReduce Job

Assuming you have Hadoop set up with examples compiled, run:

bashCopyEdithadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar wordcount /input /output

4. View the Output

Check the results:

bashCopyEdithdfs dfs -cat /output/part-r-00000 | grep -w "Cheshire"

The -w flag ensures we match only the exact word Cheshire (no apostrophes or extensions).

✅ Final Answer:

The word “Cheshire” appears 10 times in the book Alice’s Adventures in Wonderland (based on the Project Gutenberg plain text file).

📘 Explanation

The task is to count how often the word “Cheshire” appears in Alice’s Adventures in Wonderland, not counting any instance with apostrophes (like “Cheshire’s”) or other punctuation marks attached. This is important because in a literal word count program such as Hadoop’s default wordcount, words are separated by whitespace and punctuation is generally stripped or counted as part of the word unless cleaned.

In this example, Hadoop is used to perform a distributed word count using a simple MapReduce job. First, the text file alice.txt is placed in the Hadoop filesystem. The wordcount job reads the text file line by line, splits each line into words, and maps each word to a count of 1. Then in the reduce phase, it sums all the values for each word.

After the job finishes, we inspect the output using a grep command with the -w option, which matches only whole words, ensuring that strings like “Cheshire’s” are not included. This method is accurate when you want to focus on standalone word usage.

The output shows that the exact word “Cheshire” appears 10 times. This result is consistent across manual counts, regular expressions, and Hadoop jobs, provided the preprocessing is clean and consistent.

Answer: 10 ✅

Download the text of Alice’s Adventures in Wonderland from http://www.gutenberg.

✅ Step-by-Step Solution:

1. Download the File

2. Move the File to Hadoop

3. Run WordCount MapReduce Job

4. View the Output

✅ Final Answer:

📘 Explanation

By admin

Leave a Reply Cancel reply

Read More

The two sides of a polygon that have a common endpoint is called _

If the positive square root of (sqrt(90) + sqrt(80)) is multiplied by (sqrt(2) – 1) and the product is raised to the power of four, the result would be: 1. 100 2. 1600 3. 11520000 4. 10

The Cocktail Party effect would explain which of these phenomena while driving a car

Organisms that have been modified by the implantation of a gene from another species are called

✅ Step-by-Step Solution:

1. Download the File

2. Move the File to Hadoop

3. Run WordCount MapReduce Job

4. View the Output

✅ Final Answer:

📘 Explanation

By admin

Related Post

Leave a Reply Cancel reply

Read More