I started with Bruce when it was Family Origins, went to RootsMagic
but changed to Legacy. I now have both on the computers.
I exported a Legacy GED to RM. I have hundreds of people who go by
their middle name - an old time Southern tradition.
My name shows up on the import from Legacy James "Hugh" "Hugh"
Sullivan. Of course the second "Hugh" is a "nickname" in RM.
How do I delete the duplicate en masse?
On Tue, 12 Jun 2018 02:10:04 GMT, Eagle@bellsouth.net (J. Hugh
Sullivan) wrote:
I started with Bruce when it was Family Origins, went to RootsMagic
but changed to Legacy. I now have both on the computers.
I exported a Legacy GED to RM. I have hundreds of people who go by
their middle name - an old time Southern tradition.
My name shows up on the import from Legacy James "Hugh" "Hugh"
Sullivan. Of course the second "Hugh" is a "nickname" in RM.
How do I delete the duplicate en masse?
I'm not sure how you do it en masse. Please let us know if you
discover how.
I too have both Legacy and RM, but find I'm using RM more these days.
--
Steve Hayes
http://www.khanya.org.za/stevesig.htm
http://khanya.wordpress.com
I wonder if a GED could be ported to Excel and use Find All to
identify NICK in red, sort red fonts to the top and delete. Even if
NICK "Hugh" appeared I could selectively erase the part I did not
want.
Hugh
On Tue, 12 Jun 2018 08:40:55 GMT, Eagle@bellsouth.net (J. Hugh
Sullivan) wrote:
I wonder if a GED could be ported to Excel and use Find All to
identify NICK in red, sort red fonts to the top and delete. Even if
NICK "Hugh" appeared I could selectively erase the part I did not
want.
Hugh
The method works - so far.
I started with Bruce when it was Family Origins, went to RootsMagic
but changed to Legacy. I now have both on the computers.
I exported a Legacy GED to RM. I have hundreds of people who go by
their middle name - an old time Southern tradition.
My name shows up on the import from Legacy James "Hugh" "Hugh"
Sullivan. Of course the second "Hugh" is a "nickname" in RM.
How do I delete the duplicate en masse?
I always go by First Initial, Middle Name and Surname - I will fight
that battle to the end with doctors, governors, licenses, etc.
Us WWII vets are sorta hard-nosed - and I've been married 70 years to
the same auburn-haired gal.
Hugh
On Tue, 12 Jun 2018 08:40:55 GMT, Eagle@bellsouth.net (J. Hugh
Sullivan) wrote:
I wonder if a GED could be ported to Excel and use Find All to
identify NICK in red, sort red fonts to the top and delete. Even if
NICK "Hugh" appeared I could selectively erase the part I did not
want.
Hugh
The method works - so far.
The GED is composed in .txt format. Select All, Copy and Paste in
Excel.
Find NICK(s), Replace with NICK(s) (in color font).
Delete NICK(s) that are duplicates. Don't Delete real nicknames.
The problem: I have about 300,000 lines of which 520 are NICK.
Scrolling and Deleting will take awhile.
Hugh
On Tue, 12 Jun 2018 10:06:42 GMT, Eagle@bellsouth.net (J. Hugh
Sullivan) wrote:
On Tue, 12 Jun 2018 08:40:55 GMT, Eagle@bellsouth.net (J. Hugh
Sullivan) wrote:
I wonder if a GED could be ported to Excel and use Find All to
identify NICK in red, sort red fonts to the top and delete. Even if
NICK "Hugh" appeared I could selectively erase the part I did not
want.
Hugh
The method works - so far.
Next problem: The Excel file will not convert to a GED file to be
imported into RM.
OOPS!
Hugh
On Tue, 12 Jun 2018 12:23:08 GMT, Eagle@bellsouth.net (J. Hugh
Sullivan) wrote:
On Tue, 12 Jun 2018 10:06:42 GMT, Eagle@bellsouth.net (J. Hugh
Sullivan) wrote:
On Tue, 12 Jun 2018 08:40:55 GMT, Eagle@bellsouth.net (J. Hugh
Sullivan) wrote:
I wonder if a GED could be ported to Excel and use Find All to
identify NICK in red, sort red fonts to the top and delete. Even if >>>>NICK "Hugh" appeared I could selectively erase the part I did not
want.
Hugh
The method works - so far.
Next problem: The Excel file will not convert to a GED file to be
imported into RM.
OOPS!
Hugh
Actually, all you need to do is to copy the text from Excel to a file
and rename it from TXT to GED.
I'm thinking that it could be done on a GEDCOM using a text editor
like EditPlus or Notepad++ and Regular Expressions. You would need a
small GEDCOM showing how the names appear coming from Legacy, and
another GEDCOM from RM showing how they should appear "after" you
corrected RM to show the names the way you want them to be. RE gives
you lots of latitude in doing a mass edit... the goal to edit the
Legacy GEDCOM to match that of the corrected RM GEDCOM.
On Tue, 12 Jun 2018 09:33:53 -0500, Charlie Hoffpauir
<invalid@invalid.com> wrote:
I'm thinking that it could be done on a GEDCOM using a text editor
like EditPlus or Notepad++ and Regular Expressions. You would need a
small GEDCOM showing how the names appear coming from Legacy, and
another GEDCOM from RM showing how they should appear "after" you
corrected RM to show the names the way you want them to be. RE gives
you lots of latitude in doing a mass edit... the goal to edit the
Legacy GEDCOM to match that of the corrected RM GEDCOM.
I'm not familiar with RE - I'll give it a look.
Hugh
For text editing I prefer EditPlus. However it's not the most popular, >probably because it's not free. Notepad++ seems very capable, and it's
free. I'd recommend downloading Notepad++ and reading a bit about
Regular Expressions. If the cost doesn't bother you, I do think
Editplus is easier to use (better learning curve). If the changes
needed are simply a search and replace, maybe you could do that in
Word.
On Tue, 12 Jun 2018 10:06:42 GMT, Eagle@bellsouth.net (J. Hugh
Sullivan) wrote:
On Tue, 12 Jun 2018 08:40:55 GMT, Eagle@bellsouth.net (J. Hugh
Sullivan) wrote:
I wonder if a GED could be ported to Excel and use Find All to
identify NICK in red, sort red fonts to the top and delete. Even if
NICK "Hugh" appeared I could selectively erase the part I did not
want.
Hugh
The method works - so far.
Next problem: The Excel file will not convert to a GED file to be
imported into RM.
Next problem: The Excel file will not convert to a GED file to be
imported into RM.
Better to use a text editor on the GED file itself.
Do a search and replace.
Steve Hayes
http://www.khanya.org.za/stevesig.htm
http://khanya.wordpress.com
On Wed, 13 Jun 2018 04:21:54 +0200, Steve Hayes
<hayesstw@telkomsa.net> wrote:
Next problem: The Excel file will not convert to a GED file to be >>>imported into RM.
Better to use a text editor on the GED file itself.
On Thu, 14 Jun 2018 03:53:47 +0200, Steve Hayes
<hayesstw@telkomsa.net> wrote:
But using any text editor that does search and replace should be much
easier than using something like Excel, which needs file conversion
both ways.
Steve Hayes
I have grown older getting there. The file conversions left scars. :)
The problem, so far, is...
2 NICK (name)
I can delete NICK en masse but it leaves 2 and (name). I have used
Notepad, Wordpad and Word. Word does best but I have to eliminate more
than 500 instances individually in more than 3,000 pages. I'm on Page
300 now.
I'll look at XyWrite if it is still available.
Thankee,
Hugh
But using any text editor that does search and replace should be much
easier than using something like Excel, which needs file conversion
both ways.
Steve Hayes
The problem, so far, is...
2 NICK (name)
I can delete NICK en masse but it leaves 2 and (name). I have used
Notepad, Wordpad and Word. Word does best but I have to eliminate more
than 500 instances individually in more than 3,000 pages. I'm on Page
300 now.
I'll look at XyWrite if it is still available.
The thing is, a .GED file is a text file, and Excel doesn't really
work with text files, at least not any of the versions I've used.
On Thu, 14 Jun 2018 11:33:52 GMT, Eagle@bellsouth.net (J. Hugh Sullivan) >declaimed the following:
The problem, so far, is...
2 NICK (name)
I can delete NICK en masse but it leaves 2 and (name). I have used
Notepad, Wordpad and Word. Word does best but I have to eliminate more
than 500 instances individually in more than 3,000 pages. I'm on Page
300 now.
Presuming ALL occurrences are on lines that start with "2 NICK", and do
not break across lines, your search string needs to be a wild card/regex
that specifies
start of line
2 NICK
* (match anything)
end of line
Best I can find in Word is
[*] Use wildcards
Find
^l2 NICK ?@^l
Replace
^l
where ^l represents manual line break (If the file loads with paragraph >marks, other processing will be needed -- since paragraph marks are NOT >available when "use wildcards" is active)
?@ represents "any character""1 or more of previous [ie: any
character]"
So the whole find string is
manual line break
2 NICK
any string of characters until
manual line break
and the replace string is
manual line break
(needed since we matched two line breaks, we need to put one back in)
I'll look at XyWrite if it is still available.
Might consider SciTE https://www.scintilla.org/SciTEDownload.html
You'd enable regex and use as the find string
^2 NICK .+$
^ beginning of line
2 NICK
.+ any character, 1 or more of it
$ end of line
and an empty replace string
SciTE regex work on lines, so the above should result in empty lines in the >file. Depending on the line ending in the file (turn on View/End-of-Line), >you can get rid of the empty lines by turning off regex, turning on
backslash expressions and doing (assuming Windows standard <cr><lf>
endings)
find
\r\n\r\n
replace
\r\n
to compress two line endings into one line ending
I suspect you don't have a Python interpreter installed -- the task is
fairly simple as a Python script.
-=-=-=-=- denick.py
import sys
for ln in sys.stdin:
if not ln.startswith("2 NICK"):
sys.stdout.write(ln) #might need (ln + "\n")
-=-=-=-=-
python denick.py <original.ged >edited.ged
--
bieber.genealogy@earthlink.net Dennis Lee Bieber HTTP://home.earthlink.net/~bieber.genealogy/
On 14/06/2018 13:33, J. Hugh Sullivan wrote:
On Thu, 14 Jun 2018 03:53:47 +0200, Steve Hayes
<hayesstw@telkomsa.net> wrote:
But using any text editor that does search and replace should be much
easier than using something like Excel, which needs file conversion
both ways.
Steve Hayes
I have grown older getting there. The file conversions left scars. :)
The problem, so far, is...
2 NICK (name)
I can delete NICK en masse but it leaves 2 and (name). I have used
Notepad, Wordpad and Word. Word does best but I have to eliminate more
than 500 instances individually in more than 3,000 pages. I'm on Page
300 now.
I'll look at XyWrite if it is still available.
Thankee,
Hugh
I think you will find XyWrite has not been available since 2003!
According to the wiki article "Despite these advantages in speed,
XyWrite does not have as many features as Word or OpenOffice.org. For >example, XyWrite is unaware of Windows ANSI or Unicode character sets
and Nota Bene does not support languages (such as Chinese) that require >double-byte characters."
Use the free Notepad++ from >https://notepad-plus-plus.org/download/v7.5.6.html It will read and
write a text GEDCOM - you can save it with the appropriate file extension.
A useful Help sheet >https://drive.google.com/file/d/0B86nuTd5nMTKaENHcmliUC1kdnc/edit
Make a copy of your GEDCOM
You can record a simple macro, save it and run it.
I'm not sure of your exact problem but I think you need a macro to
- search for 2 NICK (use Ctrl-F)
- Delete Line (use Ctrl-L)
you can then stop recording, run it a number of times or to the end of
the file and can save the macro.
By recording a macro you can see what your actions are on a copy of your
main GEDCOM
On Thu, 14 Jun 2018 09:28:05 -0400, Dennis Lee Bieber <bieber.genealogy@earthlink.net> wrote:
On Thu, 14 Jun 2018 11:33:52 GMT, Eagle@bellsouth.net (J. Hugh Sullivan)
declaimed the following:
The problem, so far, is...
2 NICK (name)
I can delete NICK en masse but it leaves 2 and (name). I have used
Notepad, Wordpad and Word. Word does best but I have to eliminate more
than 500 instances individually in more than 3,000 pages. I'm on Page
300 now.
Presuming ALL occurrences are on lines that start with "2 NICK", and do >> not break across lines, your search string needs to be a wild card/regex
that specifies
start of line
2 NICK
* (match anything)
end of line
Best I can find in Word is
[*] Use wildcards
Find
^l2 NICK ?@^l
Replace
^l
where ^l represents manual line break (If the file loads with paragraph
marks, other processing will be needed -- since paragraph marks are NOT
available when "use wildcards" is active)
?@ represents "any character""1 or more of previous [ie: any
character]"
So the whole find string is
manual line break
2 NICK
any string of characters until
manual line break
and the replace string is
manual line break
(needed since we matched two line breaks, we need to put one back in)
I'll look at XyWrite if it is still available.
Might consider SciTE https://www.scintilla.org/SciTEDownload.html
You'd enable regex and use as the find string
^2 NICK .+$
^ beginning of line
2 NICK
.+ any character, 1 or more of it
$ end of line
and an empty replace string
SciTE regex work on lines, so the above should result in empty lines in the >> file. Depending on the line ending in the file (turn on View/End-of-Line), >> you can get rid of the empty lines by turning off regex, turning on
backslash expressions and doing (assuming Windows standard <cr><lf>
endings)
I really appreciate your efforts. Not growing up with compuers is a
real problem for us has-beens.
My line appears
2 NICK Hugh
Others are
2 NICK Tom
2 NICK Dick
2 NICK Harry
2 NICK John
2 NICK Bill
I can delete the 2 NICK but not the name except one at the time
2 NICK Tom
2 NICK Dick
2 NICK Harry
2 NICK John
2 NICK Bill
At my age I'm supposed to have dementia - but I keep forgetting. :)
On Thu, 14 Jun 2018 16:42:05 +0200, john
<john1@s145802280.onlinehome.fr> wrote:
Which is much easier to do using Notepad++ and macros as I indicated
earlier rather than struggling with regex expressions
Macros scare me. I'm a control freak and I would never be sure of what
I had done with a large volume of data.
But I can't quit trying...
Hugh
Which is much easier to do using Notepad++ and macros as I indicated
earlier rather than struggling with regex expressions
I strongly second John's advice. The only thing I'd add, is to look
closely into using Regular Expressions. That's a very powerful feature
that's in Notepad++ but NOT in any Word Processor that I've seen, and
not in Notepad. RE allows you to do very complicated search and
replace; like searching for a string of text that contains certain >characters, and copy certain portions of the string.
In the example above, you could easily do a search for occurances of
the string 2 NICK and then change whatever follows 2 NICK to something
else. (Not that you'd want to do exactly that in this case.) My point
being that you can't do that kind of a search and replace in a word >processor.
That was why I suggested using a copy of your file to play with and get
it working correctly to your satisfaction!
You can see what the macro is doing as you are creating it step by step.
Once created, NotePad++ allows you to run it however many times you want
so you can see the effect or just let it run to the end of the file.
Only when you are satisfied do you need to run it on a full GEDCOM.
On Thu, 14 Jun 2018 17:32:05 GMT, Eagle@bellsouth.net (J. Hugh
Sullivan) wrote:
At my age I'm supposed to have dementia - but I keep forgetting. :)
LOL. You must be over 78 because I can't remember whether I've forgot >something or not.
Regular Expressions are kind of like leaning a second language. You
can use as many of the features as you want, ignoring everything that
seems complicated.
In the example I posted elsewhere in the thread..
2 NICK [A-Z][a-z]*\n
The search string simply means
Look for the string that starts with the phrase "2 NICK "
followed by any one capitalized letter
followed by any number of lower case letters strung together ([a-z]*) >followed by an end of line character (\n)
and replace all of that with nothing.
On Thu, 14 Jun 2018 12:52:46 -0500, Charlie Hoffpauir
<invalid@invalid.com> wrote:
On Thu, 14 Jun 2018 17:32:05 GMT, Eagle@bellsouth.net (J. Hugh
Sullivan) wrote:
At my age I'm supposed to have dementia - but I keep forgetting. :)
LOL. You must be over 78 because I can't remember whether I've forgot >>something or not.
I'm 90 1/2 - WWII Vet. My AcDu was enlisted but I retired from the
Reserve as an 0-5.
Regular Expressions are kind of like leaning a second language. You
can use as many of the features as you want, ignoring everything that
seems complicated.
In the example I posted elsewhere in the thread..
2 NICK [A-Z][a-z]*\n
The search string simply means
Look for the string that starts with the phrase "2 NICK "
followed by any one capitalized letter
followed by any number of lower case letters strung together ([a-z]*) >>followed by an end of line character (\n)
and replace all of that with nothing.
I could do that. Is RE a program or an add-on to a program?
I d'led ++ and didn't even like the first screen - every other word
was black highlighted.
Hugh
For me, EditPlus presents a cleaner view. Load a GEDCOM and it just
looks like it would if you opened it in Word or Notepad.
You mentioned working on your stamp collection earlier. Is it already
in some form of text file, and you just want to edit it for
consistancy? If you're working on it in Excel maybe you're treating it
as a database. Excel is nice for working on what is basically a "flat
file" database. The disadvantage is getting good reports out of it.
Since Access is included with Office 365, it's really convenient to do
the editing in Excel, load the Excel file into Access, and generate
neat report(s) from there.
On Thu, 14 Jun 2018 15:10:00 -0500, Charlie Hoffpauir
<invalid@invalid.com> wrote:
For me, EditPlus presents a cleaner view. Load a GEDCOM and it just
looks like it would if you opened it in Word or Notepad.
EditPad Lite appears to do the same thing and can activate some
functions that Excel does not without some maneuvering. Immediate all
caps or lower case for example and it has RE.
You mentioned working on your stamp collection earlier. Is it already
in some form of text file, and you just want to edit it for
consistancy? If you're working on it in Excel maybe you're treating it
as a database. Excel is nice for working on what is basically a "flat
file" database. The disadvantage is getting good reports out of it.
Since Access is included with Office 365, it's really convenient to do
the editing in Excel, load the Excel file into Access, and generate
neat report(s) from there.
I d'led one on the better listings of entities that produced stamps
from 1840. There are multiple variations of form of government, war,
colony, independent, republic, etc. for most entities. Of course
almost every entity has produced 100s-1000s of different stamps. I
trying to shrink to country and colony by printing years without
losing valuable data to make eBay purchases manageable.
I copy pasted the better list to Excel in text format. I'm doing
combos, deletes, punctuation, i. e., personalization. All I want is a
list of the type stamp I have and the missing types. I want to fill in
the blanks up to a certain level of expense.
Excel is perfect for that and I can port to EditPad for some action. I
still enjoy busy work up to 10 hours per day.
The first few lines look like below exept better formatted in columns.
Major entity Minor entity Entity type Start End
Scott
Abu Dhabi British Protectorate 1964 1972 ?
Aden British Colony 1937 1965 ?
Aden Kathiri State of Seiyun Aden States 1942 1964
?
Aden Kathiri State in Hadhramaut Aden States 1967
1968 ?
Aden Qu'aiti State of Shihr and Mukalla Aden States
1942 1951 ?
Aden Mahra State Qishn and Socotra Aden States 1967
1967 ?
Aden State of Upper Yafa Aden States 1967 1967
?
Italy Aegean Islands EGEO Federation 1912 1912
?
Hugh
On Thu, 14 Jun 2018 14:11:28 +0200, john
<john1@s145802280.onlinehome.fr> wrote:
By recording a macro you can see what your actions are on a copy of your >>main GEDCOM
I strongly second John's advice. The only thing I'd add, is to look
closely into using Regular Expressions. That's a very powerful feature
that's in Notepad++ but NOT in any Word Processor that I've seen, and
not in Notepad. RE allows you to do very complicated search and
replace; like searching for a string of text that contains certain >characters, and copy certain portions of the string.
In the example above, you could easily do a search for occurances of
the string 2 NICK and then change whatever follows 2 NICK to something
else. (Not that you'd want to do exactly that in this case.) My point
being that you can't do that kind of a search and replace in a word >processor.
My line appears
2 NICK Hugh
Others are
2 NICK Tom
2 NICK Dick
2 NICK Harry
2 NICK John
2 NICK Bill
I can delete the 2 NICK but not the name except one at the time
On Thu, 14 Jun 2018 03:53:47 +0200, Steve Hayes
<hayesstw@telkomsa.net> wrote:
The thing is, a .GED file is a text file, and Excel doesn't really
work with text files, at least not any of the versions I've used.
Well, that's not really right. Excel will work quite well with text
files, and I use it often to parse text into columnar data which I
then import into Access. As an example, I took text from Father
Hebert's books Southwest Louisiana Records, and using a combination of
Excel and a text editor, parse that information into fields which I
then imported into Access to create a searchable database.
Excel has import from TEXT built right into the options:
Data, Get Data, From File, From Text/CSV.
Excel "prefers" text separated by commas or tabs. (It then
automatically parses it into fields). But plain text will also
import.... then it's up to the user to figure out how to parse it.
For a GED file, there's no need to parse the text. Simply use the >spreadsheets search and replace functions. so importing a GED into
Excel puts everything into column A.
I'm not sure exactly what he does want to do, but in XyWrite (and I
think most other word processors have the equivalent) it is dead easy
to type
cha |2 NICK Sim|2 NICK Something Else|
Which changes this:
0 @I3910@ INDI
1 NAME Simeon /Growdon/
2 GIVN Simeon
2 SURN Growdon
2 NICK Sim
1 NAME Sim /Growdon/
2 GIVN Sim
2 SURN Growdon
2 TYPE nickname
1 SEX M
1 _UID 5B42F25016F04D1FB4E46962C25817957A5A
to
0 @I3910@ INDI
1 NAME Simeon /Growdon/
2 GIVN Simeon
2 SURN Growdon
2 NICK Something Else
1 NAME Sim /Growdon/
2 GIVN Sim
2 SURN Growdon
2 TYPE nickname
1 SEX M
1 _UID 5B42F25016F04D1FB4E46962C25817957A5A
and would do so throughout the file in less than a second.
Seems perfect application for cleaning in Excel.
The nice things about considering importing to Access:
Access makes it really easy to get the data into Access.
Once in Access it easy to extract whatever information you're
interested in looking at closely.
I don't keep up with the latest software so much now... but thanks for >mentioning Editpad lite. I'll give it a look.
On Thu, 14 Jun 2018 13:52:45 GMT, Eagle@bellsouth.net (J. Hugh
Sullivan) wrote:
My line appears
2 NICK Hugh
Others are
2 NICK Tom
2 NICK Dick
2 NICK Harry
2 NICK John
2 NICK Bill
I can delete the 2 NICK but not the name except one at the time
Again, I'm not sure what the commands are in other word processors,
but in XyWrite it is a trivial task.
Using my previous example, typing
cha |->@ NICK Sim||
produced this:
0 @I3910@ INDI?
1 NAME Simeon /Growdon/?
2 GIVN Simeon?
2 SURN Growdon?
1 NAME Sim /Growdon/?
2 GIVN Sim?
2 SURN Growdon?
2 TYPE nickname?
1 SEX M?
1 _UID 5B42F25016F04D1FB4E46962C25817957A5A?
the line 2 NICK Sim has gone (the -> is an approximation for the CR/LF
in a XyWrite screen display).
In the example that Hugh gave, where scattered throughout the GED file
are nicknames that appear like the list
2 NICK Tom
2 NICK Dick
2 NICK Harry
2 NICK John
2 NICK Bill
and he wants to eliminate ALL of them. In his word processor (or in
Excel) he can either do a search for <2 NICK> and replace with
nothing, or he can simply search through the file, looking for the
occurance of <2 NICK> and when found, delete the entire line. He was
actually doing the latter, until he found that RM could remove them on
export of a GED, which could then be re-imported thus removing them
all.
A final comment.... Hugh's final solution is actually much better than
my suggestion, because there are a number of situations where my
search and replace will fail. Say the nichname is <John Boy> or
<Little-John> or even <littlejohn>. My search criteria fails to delete
those from the GEDCOM. However, that doesn't mean that the principle
fails. A more elaborate RE expression would correct those errors.
2 NICK [A-z, ,-]*\n
There, that seems to work.
On Fri, 15 Jun 2018 08:11:46 +0200, Steve Hayes
<hayesstw@telkomsa.net> wrote:
On Thu, 14 Jun 2018 13:52:45 GMT, Eagle@bellsouth.net (J. Hugh
Sullivan) wrote:
My line appears
2 NICK Hugh
Others are
2 NICK Tom
2 NICK Dick
2 NICK Harry
2 NICK John
2 NICK Bill
I can delete the 2 NICK but not the name except one at the time
Again, I'm not sure what the commands are in other word processors,
but in XyWrite it is a trivial task.
Using my previous example, typing
cha |->@ NICK Sim||
produced this:
0 @I3910@ INDI?
1 NAME Simeon /Growdon/?
2 GIVN Simeon?
2 SURN Growdon?
1 NAME Sim /Growdon/?
2 GIVN Sim?
2 SURN Growdon?
2 TYPE nickname?
1 SEX M?
1 _UID 5B42F25016F04D1FB4E46962C25817957A5A?
the line 2 NICK Sim has gone (the -> is an approximation for the CR/LF
in a XyWrite screen display).
Maybe it's just semantics. I think I used XY write years ago, but if
so, I've forgot what the features were.
My point is that I haven't seen (at least recently) a word processor
that will handle Regular Expressions... and Regular Expressions are
really powerful tools in editing text.
In the example that Hugh gave, where scattered throughout the GED file
are nicknames that appear like the list
2 NICK Tom
2 NICK Dick
2 NICK Harry
2 NICK John
2 NICK Bill
and he wants to eliminate ALL of them. In his word processor (or in
Excel) he can either do a search for <2 NICK> and replace with
nothing, or he can simply search through the file, looking for the
occurance of <2 NICK> and when found, delete the entire line. He was
actually doing the latter, until he found that RM could remove them on
export of a GED, which could then be re-imported thus removing them
all.
With RE capability, a single search/replace command can and will
remove all the lines containing <2 NICK> AND will also delete anything >following <2 NICK> as long as what follows "fits" the criteria of a
single capitalized letter followed by any number of consecutive lower
case letters (same as each other or different) followed by an end of
line character.
2 NICK [A-Z][a-z]*\n
Think of what this means.....
If somehow Hugh had used the phrase "John appears in the program
GEDCOM as '2 NICK John' but I don't want that to show" within a note
in his program, that phrase would not be deleted from the note,
because it doesn't fit the strict structure of the RE expression. (no
end of line character following everything else that "matched" the
search string).
But this example mearly scratches the surface of the capabilities of
Regular Expressions.
I'd really like to find a word processor program that will do this
kind of editing. If XY Write will, I'd use it no matter how old it is.
On Fri, 15 Jun 2018 08:11:46 +0200, Steve Hayes
<hayesstw@telkomsa.net> wrote:
On Thu, 14 Jun 2018 13:52:45 GMT, Eagle@bellsouth.net (J. Hugh
Sullivan) wrote:
My line appears
2 NICK Hugh
Others are
2 NICK Tom
2 NICK Dick
2 NICK Harry
2 NICK John
2 NICK Bill
I can delete the 2 NICK but not the name except one at the time
Again, I'm not sure what the commands are in other word processors,
but in XyWrite it is a trivial task.
Using my previous example, typing
cha |->@ NICK Sim||
produced this:
0 @I3910@ INDI?
1 NAME Simeon /Growdon/?
2 GIVN Simeon?
2 SURN Growdon?
1 NAME Sim /Growdon/?
2 GIVN Sim?
2 SURN Growdon?
2 TYPE nickname?
1 SEX M?
1 _UID 5B42F25016F04D1FB4E46962C25817957A5A?
the line 2 NICK Sim has gone (the -> is an approximation for the CR/LF
in a XyWrite screen display).
Maybe it's just semantics. I think I used XY write years ago, but if
so, I've forgot what the features were.
My point is that I haven't seen (at least recently) a word processor
that will handle Regular Expressions... and Regular Expressions are
really powerful tools in editing text.
In the example that Hugh gave, where scattered throughout the GED file
are nicknames that appear like the list
2 NICK Tom
2 NICK Dick
2 NICK Harry
2 NICK John
2 NICK Bill
and he wants to eliminate ALL of them. In his word processor (or in
Excel) he can either do a search for <2 NICK> and replace with
nothing, or he can simply search through the file, looking for the
occurance of <2 NICK> and when found, delete the entire line. He was
actually doing the latter, until he found that RM could remove them on
export of a GED, which could then be re-imported thus removing them
all.
With RE capability, a single search/replace command can and will
remove all the lines containing <2 NICK> AND will also delete anything following <2 NICK> as long as what follows "fits" the criteria of a
single capitalized letter followed by any number of consecutive lower
case letters (same as each other or different) followed by an end of
line character.
2 NICK [A-Z][a-z]*\n
Think of what this means.....
If somehow Hugh had used the phrase "John appears in the program
GEDCOM as '2 NICK John' but I don't want that to show" within a note
in his program, that phrase would not be deleted from the note,
because it doesn't fit the strict structure of the RE expression. (no
end of line character following everything else that "matched" the
search string).
But this example mearly scratches the surface of the capabilities of
Regular Expressions.
I'd really like to find a word processor program that will do this
kind of editing. If XY Write will, I'd use it no matter how old it is.
A final comment.... Hugh's final solution is actually much better than
my suggestion, because there are a number of situations where my
search and replace will fail. Say the nichname is <John Boy> or
<Little-John> or even <littlejohn>. My search criteria fails to delete
those from the GEDCOM. However, that doesn't mean that the principle
fails. A more elaborate RE expression would correct those errors.
2 NICK [A-z, ,-]*\n
There, that seems to work.
But my problem is that I'm not sure exactly what Hugh wants to do.
I gather he wants to leave the 2 NICK line in the Gedcom file, but
with nothing following it. What is the point of that?
XyWrite can do that kind of of thing, though I'd have to do some
reading in the manual to find out how. In addition to macros, it also
has XPL, the XyWrite programming langualge, so it's more than a word >processor. It's like a Lego kit word processor, you can build it to do >various specialised tasks, but not, as someone pointed out, in
Unicode.
I believe AWK can do that kind of thing too, and at various tims I've
tried to learn AWK because I've thought, from its description, that it
could do amazing things with Gedcom files, but I've never managed to
learn it well enough to do anything useful.
But my problem is that I'm not sure exactly what Hugh wants to do.
I gather he wants to leave the 2 NICK line in the Gedcom file, but
with nothing following it. What is the point of that?
One potential problem with Regular Expressions (regex_ in
word-processing programs is how the embedded text formatting is handled. >There is limited regex available in later versions of MS Word; see the
bottom of this page >https://support.office.com/en-us/article/find-and-replace-text-and-other-data-in-a-word-document-c6728c16-469e-43cd-afe4-7708c6c779b7?ocmsassetID=HA102350661&CorrelationId=946bf317-fe64-40a3-9201-0d9661c4fdbc&ui=en-US&rs=en-US&ad=US
But you are probably better off using a text editor unless you really
want to keep formatting. I earlier suggested NotePad+++. Another good
free text editor is PSPad. Both are updated regularly, support different >character encoding, support regex, etc. Like many better text editors,
they both also have file compare. So you can see two files side-by-side
with similar, differences, and additions highlighted (which, in this
current GEDCOM case, can be useful comparing your editing actions with a
copy of the original file).
regex is very powerful if you use it often enough to remember all the >functions. Hugh had a simple problem. If he hadn't solved it with RM I'm
sure writing a simple macro would have been much easier, rather than
having to work out the appropriate regex expression to handle all the
cases which might have occurred in the file.
In SciTE
^2 NICK .*$
converts (using your text...) to
-=-=-=-=-
In the example that Hugh gave, where scattered throughout the GED file
are nicknames that appear like the list
I started with Bruce when it was Family Origins, went to RootsMagic
but changed to Legacy. I now have both on the computers.
I exported a Legacy GED to RM. I have hundreds of people who go by
their middle name - an old time Southern tradition.
My name shows up on the import from Legacy James "Hugh" "Hugh"
Sullivan. Of course the second "Hugh" is a "nickname" in RM.
How do I delete the duplicate en masse?
I always go by First Initial, Middle Name and Surname - I will fight
that battle to the end with doctors, governors, licenses, etc.
Us WWII vets are sorta hard-nosed - and I've been married 70 years to
the same auburn-haired gal.
Hugh
The *nix tool 'sed' can make the problem fairly simple:
sed '/^2 NICK/d' < old.ged > new.ged
This says to go through the file 'old.ged' and look for any line that begin= >s with '2 NICK' and delete it, putting the result into the new file 'new.ge= >d'
The problem that was also given in the first message, as I understand it, w= >as dealing with a line that had
1 NAME firstname "nickname""nickname" /lastname/
This can also be handled by 'sed'
At one time I kept REAL genealogy lines in one program and >research/unconnected lines in the other. I'll export the latter to
ged allowing me to only use 1 program.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 296 |
Nodes: | 16 (2 / 14) |
Uptime: | 11:59:01 |
Calls: | 6,645 |
Calls today: | 5 |
Files: | 12,190 |
Messages: | 5,326,712 |