Hi experts,
I need a little perl help. Here is the requirement (at unix shell).
I have a text file, entries.txt with the following lines (just giving
a few lines here, but there are around 100 entries in the actual
file). Each line has one email id' followed by a user id (both
separated by a tab). I am just giving first three lines here. ========================
abc@google.com abc1
cdef@yahoo.com cde
xyz@gmail.com xyz2
=========================
Now the perl script should parse through a big dump of data (a file
called text.xml) and replace the first email with the second entry
(example: all abc@google.com entries in the dump should be replaced by
abc1 and so on and so forth). Can someone help me with the code?
Now the Perl script should be like this:
read entries.txt file;
separate each line (split) in to two entries
loop through the below dump (whatever is below __DATA__)
Replace the first email entry with the second user id
Write all the updated data to a new file, updated.xml
__DATA__ (the below dump is in fact a file text.xml)
Hello world abc@google.com this is line 1
This is the second line with a lot text cdef@yahoo.com and much more
Here is the third line xyz@gmail.com and lot of stuff here
One more line with abc@google.com
Now the output file, updated.xml should contain the following dump: ============
Hello world abc1 this is line 1
This is the second line with a lot text cde and much more
Here is the third line xyz2 and lot of stuff here
One more line with abc2
=============
What have you tried? What bits are causing you trouble? If you just
want someone to write it for you, you may get lucky, but most people
prefer to help with learning rather than coding for free.
Hi experts,
I need a little perl help. Here is the requirement (at unix shell).
I have a text file, entries.txt with the following lines (just giving a
few lines here, but there are around 100 entries in the actual file).
Each line has one email id' followed by a user id (both separated by a
tab). I am just giving first three lines here.
========================
abc@google.com abc1
cdef@yahoo.com cde
xyz@gmail.com xyz2
=========================
Now the perl script should parse through a big dump of data (a file
called text.xml) and replace the first email with the second entry
(example: all abc@google.com entries in the dump should be replaced by
abc1 and so on and so forth). Can someone help me with the code?
Now the Perl script should be like this:
read entries.txt file;
separate each line (split) in to two entries
loop through the below dump (whatever is below __DATA__)
Replace the first email entry with the second user id
Write all the updated data to a new file, updated.xml
__DATA__ (the below dump is in fact a file text.xml)
Hello world abc@google.com this is line 1
This is the second line with a lot text cdef@yahoo.com and much more
Here is the third line xyz@gmail.com and lot of stuff here
One more line with abc@google.com
Now the output file, updated.xml should contain the following dump: ============
Hello world abc1 this is line 1
This is the second line with a lot text cde and much more
Here is the third line xyz2 and lot of stuff here
One more line with abc2
=============
Thanks in advance..
Ryder
Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
What have you tried? What bits are causing you trouble? If you just
want someone to write it for you, you may get lucky, but most people
prefer to help with learning rather than coding for free.
Indeed. The quick-and-dirty approach I've done in this kind of stuff is
to collect the strings & replacements into a hash, then make a regexp
$r='\\b('.join('|',map {...} sort {...} keys %myhash).')\\b';
(with appropriate regexp quoting for the individual keys with map {}, selecting the sort {} to put long strings first), then do something like
s/$r/$myhash{$1}/goe
on our whole target string.
I'm not terribly fond of using clunky string operations to build
regexps, and then there's the question of getting the regexp quoting
right. Is there some more elegant method people can think of?
Hi experts,
I need a little perl help. Here is the requirement (at unix shell).
I have a text file, entries.txt with the following lines (just giving a
few lines here, but there are around 100 entries in the actual file).
Each line has one email id' followed by a user id (both separated by a
tab). I am just giving first three lines here.
========================
abc@google.com abc1
cdef@yahoo.com cde
xyz@gmail.com xyz2
=========================
Now the perl script should parse through a big dump of data (a file
called text.xml) and replace the first email with the second entry
(example: all abc@google.com entries in the dump should be replaced by
abc1 and so on and so forth). Can someone help me with the code?
Can I assume this is homework? Maybe we can have some fun with
it...
I'm not much of a Perl golfer, but here's an attempt:
#!/usr/bin/perl
{local$/;$s=<DATA>};@ARGV='entries.txt';for(map{[split' ']}<>){$s
=~s/\Q$_->[0]/$_->[1]/g};open$g,'>updated.xml';print$g $s;
__DATA__
Hello world abc@google.com this is line 1
This is the second line with a lot text cdef@yahoo.com and much
more
Here is the third line xyz@gmail.com and lot of stuff here
One more line with abc@google.com
It seems to follow the specification.
Wasell <usenet2020@wasell.eu> writes:
Hi experts,
I need a little perl help. Here is the requirement (at unix shell).
I have a text file, entries.txt with the following lines (just giving a
few lines here, but there are around 100 entries in the actual file).
Each line has one email id' followed by a user id (both separated by a
tab). I am just giving first three lines here.
========================
abc@google.com abc1
cdef@yahoo.com cde
xyz@gmail.com xyz2
=========================
Now the perl script should parse through a big dump of data (a file
called text.xml) and replace the first email with the second entry
(example: all abc@google.com entries in the dump should be replaced by
abc1 and so on and so forth). Can someone help me with the code?
[...]
Can I assume this is homework? Maybe we can have some fun with
it...
I'm not much of a Perl golfer, but here's an attempt:
#!/usr/bin/perl
{local$/;$s=<DATA>};@ARGV='entries.txt';for(map{[split' ']}<>){$s
=~s/\Q$_->[0]/$_->[1]/g};open$g,'>updated.xml';print$g $s;
__DATA__
Hello world abc@google.com this is line 1
This is the second line with a lot text cdef@yahoo.com and much
more
Here is the third line xyz@gmail.com and lot of stuff here
One more line with abc@google.com
It seems to follow the specification.
%m=map{split}`cat entries.txt`; for(<DATA>){/\G\S+/gc&&(print($m{$&}//$&),redo);/\G\s+/gc&&(print($&),redo)} __DATA__
Hello world abc@google.com this is line 1
This is the second line with a lot text cdef@yahoo.com and much more
Here is the third line xyz@gmail.com and lot of stuff here
One more line with abc@google.com
:-)
Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
Not elegant, no, but I think I'd slurp the input and then loop over the substitutions:
while (<$subs>) {
chomp;
my ($k, $s) = split /\t/;
$content =~ s/\b\Q$k\E\b/$s/g;
}
Rainer Weikusat <rweikusat@talktalk.net> writes:
Ben Bacarisse <ben.usenet@bsb.me.uk> writes:
Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
[...]
Not elegant, no, but I think I'd slurp the input and then loop over the
substitutions:
while (<$subs>) {
chomp;
my ($k, $s) = split /\t/;
$content =~ s/\b\Q$k\E\b/$s/g;
}
A pretty awful algorithm: The runtime will be proportional to the number
of substitutions times the length of the text, ie, quadratic.
I don't think that's technically quadratic, but I know what you mean.
It's pretty awful. This looked like a throw-away task, so I didn't care about the O(mn) complexity.
More defensively written alternate suggestion:
--------> my %subs;
{
my $fh;
open($fh, '<', 'entries.txt') or die("open: $!");
%subs = map { split } <$fh>;
(The OP had tab separated pairs)
}
for (<DATA>) {
s|\S+|$subs{$&} // $&|ge;
This is likely to miss some expected cases in XML data since, say, <addr mail="abc@goole.com"> won't match abc@goole.com.
Ben Bacarisse <ben.usenet@bsb.me.uk> writes:
Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
[...]
Not elegant, no, but I think I'd slurp the input and then loop over the
substitutions:
while (<$subs>) {
chomp;
my ($k, $s) = split /\t/;
$content =~ s/\b\Q$k\E\b/$s/g;
}
A pretty awful algorithm: The runtime will be proportional to the number
of substitutions times the length of the text, ie, quadratic.
More defensively written alternate suggestion:
--------> my %subs;
{
my $fh;
open($fh, '<', 'entries.txt') or die("open: $!");
%subs = map { split } <$fh>;
}
for (<DATA>) {
s|\S+|$subs{$&} // $&|ge;
print;
}
__DATA__
Hello world abc@google.com this is line 1
This is the second line with a lot text cdef@yahoo.com and much more
Here is the third line xyz@gmail.com and lot of stuff here
One more line with abc@google.com
-------
That's a linear algorithm as it makes just one pass through the input
data.
NB: I didn't benchmark this and the O-difference doesn't necessarily
mean it'll be faster in practice for realistic amounts of input data. It
also won't replace results of prior replacements which may or may not be desired.
Ben Bacarisse <ben.usenet@bsb.me.uk> writes:
Rainer Weikusat <rweikusat@talktalk.net> writes:
Ben Bacarisse <ben.usenet@bsb.me.uk> writes:
Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
[...]
Not elegant, no, but I think I'd slurp the input and then loop over the >>>> substitutions:
while (<$subs>) {
chomp;
my ($k, $s) = split /\t/;
$content =~ s/\b\Q$k\E\b/$s/g;
}
A pretty awful algorithm: The runtime will be proportional to the number >>> of substitutions times the length of the text, ie, quadratic.
I don't think that's technically quadratic, but I know what you mean.
It's pretty awful. This looked like a throw-away task, so I didn't care
about the O(mn) complexity.
The first time in my life I can do an actual mathematical proof: There
are two sets involved here with lenghts n and m. The total running time
is proportional to n * m. There are two cases here:
1. n == m. In this case n * m = n * n which is obviously quadratic.
2. n < m or m < n, without less of generality, n < m is assumed. In this case, n * m = n * n * (m / n) [, m / n > 1 because n * m > n * n]. Hence, it's quadratic as well.
Not elegant, no, but I think I'd slurp the input and then loop over
the substitutions:
while (<$subs>) {
chomp;
my ($k, $s) = split /\t/;
$content =~ s/\b\Q$k\E\b/$s/g;
}
\Q and \E ensure the quoting is correct. And I stole your \b...\b
because I'd forgotten about that! I expect the OP wants it.
Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
Not elegant, no, but I think I'd slurp the input and then loop over
the substitutions:
while (<$subs>) {
chomp;
my ($k, $s) = split /\t/;
$content =~ s/\b\Q$k\E\b/$s/g;
}
\Q and \E ensure the quoting is correct. And I stole your \b...\b
because I'd forgotten about that! I expect the OP wants it.
I believe your algorithm might fail if the replaced strings can be
substrings of each other, depending on the order they are presented?
OP's question of course didn't have any such cases, but since we're
talking algorithms here, it'd be nice if also the edge cases worked.
#!/usr/bin/perl
{local$/;$s=<DATA>};@ARGV='entries.txt';for(map{[split' ']}<>){$s
=~s/\Q$_->[0]/$_->[1]/g};open$g,'>updated.xml';print$g $s;
__DATA__
"gamo" == gamo <gamo@telecable.es> writes:
gamo> It's a mistery for me why do you use split' '
gamo> instead of the more golfer split"\t" Could you explain?
One char in the string instead of two?
"gamo" == gamo <gamo@telecable.es> writes:
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 296 |
Nodes: | 16 (2 / 14) |
Uptime: | 23:25:25 |
Calls: | 6,646 |
Calls today: | 1 |
Files: | 12,191 |
Messages: | 5,327,626 |