Question

In: Computer Science

How do i remove all javacript text from a block of text in python for example...

How do i remove all javacript text from a block of text in python

for example

Removal of any HTML tags – any text between a < character and a >you can assume is a HTML tag and needs to be removed.. Removal of JavaScript code – before you remove your HTML tags above, you will also need to remove any text that is between the <script> or </script> tags "Note that a <script> or </script>tag can have any amounts of whitespace or other text between the "" character and valid script tag that must be removed.

Im also not allowed to import any module form the python library

Solutions

Expert Solution

Question: How do i remove all javascript text from a block of text in Python

Sol:  

  Removing javascript in a python string is a common operation if you have crawled a web page. We can remove all the javscript text from a block of text in Python by using Regular Expressions.

For example:

  Create a text contains javascript code

text = ' ' '
this is a script test.
<Script type = "text/javascript">
alert( 'test' )
</script>
test is end.
  
' ' '

  Build regular expression to remove javascript code

re_script = re.compile( ' <\s*script[^>]*>.*?<\s*/\s*script\s*>', re.S | re.I )

A regular expression is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern. The Python module re provides full support for Perl-like regular expressions in Python. Your regular expression (For example) <\s*script[^>]*>[^<]*<\s*/\s*script\s*> should not have the [^<]*. You should reserve that just for matching tags themselves. Instead you should use the non-greedy *, usually syntactically denoted as: *?. Some characters, like '|' or '(', are special. Special characters either stand for classes of ordinary characters, or affect how the regular expressions around them are interpreted. The re module raises the exception re.error if an error occurs while compiling or using a regular expression. re.S Makes a period (dot) match any character, including a newline. re.I performs case-insensitive matching.

Remove javascript code

   text = re_script.sub(' ', text)

re_script_sub is used to remove the Script. Run this python script, you will find they are removed, the result is:

Output:

this is a script test

test is end


Related Solutions

How do i remove the decimals out of the output? Do I need to int a...
How do i remove the decimals out of the output? Do I need to int a new value, or use Math.round? import java.util.Scanner; public class Main { public static void main(String[] args) { double numbers[] = new double[5];    inputArray(numbers); maxNumber(numbers); minNumber(numbers); }    public static void inputArray( double[] numbers) { Scanner in = new Scanner(System.in);    for(int i = 0; i < numbers.length; i++) { numbers[i] = in.nextDouble(); } }    public static void maxNumber(double[] numbers) { double maxNum...
How do I remove a node from a linked list C++? void LinkedList::Remove(int offset){ shared_ptr<node> cursor(top_ptr_);...
How do I remove a node from a linked list C++? void LinkedList::Remove(int offset){ shared_ptr<node> cursor(top_ptr_); shared_ptr<node> temp(new node); if(cursor == NULL) { temp = cursor-> next; cursor= temp; if (temp = NULL) { temp->next = NULL; } } else if (cursor-> next != NULL) { temp = cursor->next->next; cursor-> next = temp; if (temp != NULL) { temp->next = cursor; } } }
How do I count CAG number in DNA using python? please provide an example!
How do I count CAG number in DNA using python? please provide an example!
How do I write a script for this in python in REPL or atom, NOT python...
How do I write a script for this in python in REPL or atom, NOT python shell Consider the following simple “community” in Python . . . triangle = [ ["top", [0, 1]], ["bottom-left", [0, 0]], ["bottom-right", [2, 0]], ] This is the calling of function. >>> nearestneighbor([0, 0.6], triangle, myeuclidean) 'top' The new point is (0, 0.6) and the distance function is Euclidean. Now let’s confirm this result . . . >>> myeuclidean([0, 0.6], [0, 1]) 0.4 >>> myeuclidean([0,...
suppose i have a list in python that is [hello,yo,great,this,cool,fam] how do I get all the...
suppose i have a list in python that is [hello,yo,great,this,cool,fam] how do I get all the possible 2 combination that I can have from this list in tuples for example output [ {hello,hello},{hello,yo},{hello,great},....... {yo,hello},{yo,yo} and so on and son} no packages allowed
how do I make a histogram. I am using the example from the book Essentials of...
how do I make a histogram. I am using the example from the book Essentials of Statistics Chapter 2.2, Problem 9BSC.
How can I write java program code that do reverse, replace, remove char from string without...
How can I write java program code that do reverse, replace, remove char from string without using reverse, replace, remove method. Only string method that I can use are length, concat, charAt, substring, and equals (or equalsIgnoreCase).
How do I start to code in mysql to Alter table to modify column remove not...
How do I start to code in mysql to Alter table to modify column remove not null constraint First drop the foreign key associated with Drop the column Add the column back with new definition Add the foreign key back This Is what I wrote so far alter table employees modify column repotsTo remove null:
How do you add all the values that return by all the threads from threadpools? I...
How do you add all the values that return by all the threads from threadpools? I have created a threadpool with 1000 of threads and run the task 1000 times. However, when I try to print out the result it gives a whole bunch of values that return by all the threads, it looks like... Thread-1 xxxx Thread-2 xxxx Thread-3 xxxx Thread-4 xxxx . . . . . Is there a possible way to get a single value back instead...
How do I do this: Write a program that can read a text file of numbers...
How do I do this: Write a program that can read a text file of numbers and calculate the mean and standard deviation of those numbers. Print the result in another text file. Put the result on the computer screen. EACH LINE OF THE PROGRAM MUST BE COMMENTED!
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT